Statistician I (#167)

mansfiem · June 27, 2008, 5:59pm

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Q. 2:

Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have passed from the time on this message.
Support Ruby Q. 2 by submitting ideas as often as you can! (A
permanent, new website is in the works for Ruby Q. 2. Until then,
please visit the temporary website at

http://splatbang.com/rubyquiz/.
Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby T. follow the discussion. Please reply to
the original quiz message, if you can.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Statistician I (#167)

This week begins a three-part quiz, the final goal to provide a little
library for parsing and analyzing line-based data. Hopefully, each
portion
of the larger problem is interesting enough on its own, without being
too
difficult to attempt. The first part – this week’s quiz – will focus
on
the pattern matching.

Let’s look at a bit of example input:

You wound Perl for 15 points of Readability damage.
You wound Perl with Metaprogramming for 23 points of Usability

damage.
Your mighty blow defeated Perl.
C++ walks into the arena.
C++ wounds you with Compiled Code for 37 points of Speed damage.
You wound C++ for 52 points of Usability damage.

Okay, it’s silly, but it is similar to a much larger data file I’ll
provide
end for testing.

You should definitely note the repetitiveness: just the sort of thing
that
we can automate. In fact, I’ve examined the input above and created
three
rules (a.k.a. patterns) that match (most of) the data:

[The ]<name> wounds you[ with <attack>] for <amount> point[s] of

[
damage].
You wound[ the] [ with ] for point[s] of
[
damage].
Your mighty blow defeated[ the] .

There are a few guidelines about these rules:

Text contained within square brackets is optional.
A word contained in angle brackets represents a field; not a literal
match, but data to be remembered.
Fields are valid within optional portions.
You may assume that both the rules and the input lines are stripped
of
excess whitespace on both ends.

Assuming the rules are in rules.txt and the input is in data.txt,
running your Ruby script as such:

> ruby reporter.rb rules.txt data.txt

Should generate the following output:

Rule 1: Perl, 15, Readability
Rule 1: Perl, Metaprogramming, 23, Usability
Rule 2: Perl
# No Match
Rule 0: C++, Compiled Code, 37, Speed
Rule 1: C++, 52, Usability

Unmatched input:
C++ walks into the arena.

Each line of the output corresponds to a line of the input; it indicates
which rule was matched (zero-based index), and outputs the matched
fields’
values. Any lines of the input that could not be matched to one of the
rules
should output an “No Match” comment, with all the unmatched input
records
printed in the “Unmatched input” section at the end (so the author of
the
rules can extend them appropriately).

One thing you should keep in mind while working on this week’s quiz is
that
you want to be flexible; followup quizzes will require that you modify
things a bit.

For testing, I am providing two larger datasets: combat logs taken from
Lord
of the Rings Online gameplay. There is data for a Guardian and a
Hunter; unzip before use. Both use the same ruleset:

[The ]<name> wounds you[ with <attack>] for <amount> point[s] of

[
damage].
You are wounded for point[s] of damage.
You wound[ the] [ with ] for point[s] of
[
damage].
You reflect point[s] of damage to[ the] .
You succumb to your wounds.
Your mighty blow defeated[ the] .

mansfiem · June 29, 2008, 8:47pm

Here’s my own submission for this problem. Once you wrap your head
around a few bits of the regular expression, it’s pretty simple to
understand.

class Rule
attr_reader :fields

def initialize(str)
patt = str.gsub(/[(.+?)]/, ‘(?:\1)?’).gsub(/<(.+?)>/, ‘(.+?)’)
@pattern = Regexp.new(’^’ + patt + ‘$’)
@fields = nil
end

def match(str)
if md = @pattern.match(str)
@fields = md.captures
else
@fields = nil
end
end
end

rules = []
File.open(ARGV[0]).each do |line|
line.strip!
next if line.empty?
rules << Rule.new(line)
end

unknown = []
File.open(ARGV[1]).each do |line|
line.strip!
if line.empty?
puts
next
end

if rule = rules.find { |rule| rule.match(line) }
indx, data = rules.index(rule), rule.fields.reject { |f| f.nil? }
puts “Rule #{indx}: #{data.join(’, ')}”
else
unknown << line
puts “# No match”
end
end

puts “\nUnmatched input:”
puts unknown.join("\n")

mansfiem · June 29, 2008, 10:21pm

Matthew M. wrote:

def initialize(str)
patt = str.gsub(/[(.+?)]/, ‘(?:\1)?’).gsub(/<(.+?)>/, ‘(.+?)’)
@pattern = Regexp.new(’^’ + patt + ‘$’)
@fields = nil
end

does the rule string not need to be regexp escaped somehow if it’s
gonna be directly Regexp.new’ed?

I fear a rule with something like “You run away[ from ] (you
coward)” would break this approach.

Matthew R.

mansfiem · June 30, 2008, 2:40am

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Statistician I (#167)

My first quiz, it’s very rough but it works most of the time.

I’m probably (re)implementing a very limited form of regular
expression, but in the process of making this I discovered several
ways it could fail, in the test cases, it’s just the case noted in the
comments.

Here is the code: http://pastie.org/224463
And the rules that catch most of the samples: http://pastie.org/224464

Lucas.

mansfiem · June 30, 2008, 3:23am

On Jun 29, 3:19 pm, Matthew R. Jacobs [email protected]
wrote:

I fear a rule with something like “You run away[ from ] (you
coward)” would break this approach.

Perhaps… My solution is likely not safe from all input sets. While I
hadn’t considered literal parentheses as part of the rule set, I
should have at the least considered the period (match any char).

For the current purposes, it is sufficient if your solution supports
the provided example ruleset, though any additional work towards
escaping parts/preventing breakage is certainly acceptable.

mansfiem · June 30, 2008, 3:23am

Here is my submission. I hope it’s flexible enough for the followup
quizzes. I sensed there might be a need to access the fields of a match
by name, which is why I added the RuleMatch#fields method. It returns a
hash that allows code like

puts Rule.match(line).fields[‘amount’] # prints the value of the
field

This method isn’t used in the current code however. But who knows, it
might come in handy later on.

You can find my submission at http://www.pastie.org/224480

Matthias

mansfiem · June 30, 2008, 8:05am

On Jun 30, 7:48 am, ThoML [email protected] wrote:

Statistician I (#167)

Here is my solution:http://www.pastie.org/224576

So I thought I could use pastie for a change so that I could still
make minor modifications and don’t have to repost the code. But …
wrong URL. Sorry, let’s hope this is the right one:

http://www.pastie.org/224585

Regards,
Thomas.

mansfiem · June 30, 2008, 7:50am

Statistician I (#167)

Here is my solution:
http://www.pastie.org/224576

It’s for ruby19 only.

mansfiem · June 30, 2008, 9:12am

On Fri, Jun 27, 2008 at 5:56 PM, Matthew M. [email protected]
wrote:

You wound Perl for 15 points of Readability damage.
we can automate. In fact, I’ve examined the input above and created three

Text contained within square brackets is optional.

C++ walks into the arena.

Hi,

This is my try at this quiz. I thought it would be cool to store the
field “names” too, for each match.
I also added a verbose output to show the field name and the value. As
the goal was to be flexible too,
I made some classes to encapsulate everything, to prepare for the
future:

class Match
attr_accessor :captures, :mappings, :rule

def initialize captures, mappings, rule
@captures = captures
@mappings = mappings
@rule = rule
end

def to_s verbose=false
s = “Rule #{@rule.id}: "
if verbose
@rule.names.each_with_index {|n,i| s << “[#{n} =>
#{@mappings[n]}]”
if @captures[i]}
s
else
s + “#{@captures.compact.join(”,”)}"
end
end
end

class Rule
attr_accessor :names, :id

    # Translate rules to regexps, specifying if the first captured

group
# has to be remembered
RULE_MAPPINGS = {
“[” => [“(?:”, false],
“]” => [“)?”, false],
/<(.?)>/ => ["(.?)", true],
}
def initialize id, line
@id = id
@names = []
escaped = escape(line)
reg = RULE_MAPPINGS.inject(escaped) do |line, (tag, value)|
replace, remember = *value
line.gsub(tag) do |m|
@names << $1 if remember
replace
end
end
@reg = Regexp.new(reg)
end

def escape line
# From the mappings, change the regexp sensitive chars with
non-sensitive ones
# so that we can Regexp.escape the line, then sub them back
escaped = line.gsub(“[”, ““).gsub(”]“, ““)
escaped = Regexp.escape(escaped)
escaped.gsub(””, “]”).gsub(””, “[”)
end

def match data
m = @reg.match data
return nil unless m
map = Hash[*@names.zip(m.captures).flatten]
Match.new m.captures, map, self
end
end

class RuleSet
def initialize file
@rules = []
File.open(file) do |f|
f.each_with_index {|line, i| @rules << Rule.new(i, line.chomp)}
end
p @rules
end

def apply data
match = nil
@rules.find {|r| match = r.match data}
match
end
end

rules_file = ARGV[0] || “rules.txt”
data_file = ARGV[1] || “data.txt”

rule_set = RuleSet.new rules_file

matches = nil
unmatched = []
File.open(data_file) do |f|
matches = f.map do |line|
m = rule_set.apply line.chomp
unmatched << line unless m
m
end
end

matches.each do |m|
if m
puts m
else
puts “#No match”
end
end

unless unmatched.empty?
puts "Unmatched input: "
puts unmatched
end

#~ puts “Verbose output:”
#~ matches.each do |m|
#~ if m
#~ puts (m.to_s(true))
#~ else
#~ puts “#No match”
#~ end
#~ end

mansfiem · June 30, 2008, 9:16am

On Mon, Jun 30, 2008 at 3:19 AM, Matthew M. [email protected]
wrote:

end
should have at the least considered the period (match any char).

For the current purposes, it is sufficient if your solution supports
the provided example ruleset, though any additional work towards
escaping parts/preventing breakage is certainly acceptable.

I had to escape the string in order to make my solution work due to
the final dot…

Jesus.

mansfiem · July 2, 2008, 12:50am

Added some lines to show the unmatched input.

http://www.pastie.org/225875

mansfiem · June 30, 2008, 4:11pm

Here is my solution to this weeks quiz. It’s also my first RubyQuiz.

http://www.pastie.org/224754

mansfiem · July 3, 2008, 10:12am

Here’s mine solution: (http://pastie.org/226949)

class Parse
def initialize(rules)
@rules = create_rules(rules)
end

Read the rules and transform then into regexp

def create_rules(rules)
rules.collect do |r|
vars =[];
r=Regexp.escape(r.chomp).gsub(“\[”,“[”).gsub(“\]”,“]”).gsub
/[([^]]+)]/, ‘(?:\1)?’;
r.gsub!(/<([^>]+)>/) do vars<<$1; ‘(.*?)’ end
[Regexp.new(r),vars]
end
end

Parse the given file upon the rules created

def parse(data)
@match =[]; @exceptions=[]; data.each do |l|
mdata=nil; @rules.each_with_index{|(r,d),i| break if !((mdata =
[i,r.match(l)]) == [i,nil]) }
if !mdata[1].nil?
@match << [“Rule #{mdata[0]+1}:”,*mdata[1].to_a[1…-1]]
else
@match << [“# No Match”]; @exceptions << l
end
end; self
end

#Print results
def to_s
“#{@match.collect{|m| m.join(” “)}.join(”\n")}" +
(@exceptions.empty? ? “” : “\n\nUnmatched
input:\n#{@exceptions.join(”“)}”)
end
end

Example of usage

puts
“#{Parse.new(File.read(“rules.txt”)).parse(File.read(“guardian.txt”))}”

On Tue, Jul 1, 2008 at 10:47 PM, [email protected] <