Code Review: re

tfpt review “/shelveset:re;REDMOND\sborde”
Comment :
Adds tests in library\regexp\match_specs.rb and
language\regexp_specs.rb
Fixes the issues found by it. \c support had a typo had checked for \C
instead. Added support for predefined character classes like [:alpha:].
I created a new class called RegexpTransformer for this to convert from
Ruby regexp to CLR regexp pattern. Its state-driven and so can be
extended if we need to do more complex analysis if needed. There are a
few cases where we might need to do this in the future, and also if we
want to give better error messages for bad regexps.
Added Debug-only command line option called -compileRegexps to check
perf impact of compiling Regexps to IL. It gives a 50%-300% improvement
in throughput. Have not measured startup impact. The command line option
will let us play with it easily.
Added -ruby19 command line option to RunRSpec

There are few more known issues with regexps that I will get to next.

RegexpSpecs.syntax_error, and the similar test in
language/regexp_specs.rb is testing the parser, not regexps.

For the others, you are obscuring the code too much. Methods in the
fixture files shouldn’t hide the behavior, methods in the fixture files
should be a convenient way to get at data.

it “returns nil if the object is nil” do
/\w+/.send(@method, nil).should == nil
end

it ‘supports escape characters’ do
RegexpSpecs.match(@method, /\t/, “\t”, [“\t”]) # horizontal tab
#…
End

I know what the first one is doing, and if I want to see how
Regexp#match, or the other shared methods, behave I can see that easily.
RegexpSpecs.match is obscuring a lot of the behavior in the fixture
files. In addition, since :=~ and :match have slightly different
behaviors, they shouldn’t be using a shared spec. One way to reduce the
code in a case like this is to use fixtures to define character classes
(like RegexpSpecs.blanks), then you could do loops in each spec file to
spec the behaviors in one line.

So the it ‘supports escape characters’ do line becomes:

In classes.rb:
#…
def self.escape_characters
%w{\t \v \n \r \f \a \e}
end

In match_spec.rb under describe “Regexp#match”:
it “supports escape characters” do
RegexpSpecs.escape_characters.each do |char|
char.send(@method, /#{char}/).to_a.should == [char]
end
end

In match_spec.rb under describe “Regexp#=~ on a successful match”:
it “supports escape characters” do
RegexpSpecs.escape_characters.each do |char|
(/#{char}/ =~ char).should == 0
end
end

Similar simplifications can be done for the others.

The idea is that each spec tests a facet of behavior of a method. If you
are trying to combine two facets (via case statements in this case),
then you really have two specs.

You can see my discussion with Brian F. and Evan P. about this
here: #rubyspec on 2009-02-06 — irc logs at whitequark.org.

JD

Test review feedback has been incorporated.

tfpt review “/shelveset:re2;REDMOND\sborde”

I still need a code review…

Looks good.

Tomas