Handling of regexp objects that aren't referenced by variables, arrays, tables or objects

Hi,

first of all I have to say I’m relatively unexperienced with Ruby and
also new to regular expressions. This causes me some problems:

I’m parsing text files and am using a lot of regexps for this.
Initially I was doing something like this:

file.each_line { |line|
if line =~ /^pattern[a]/
process_pattern_a(line)
elsif line =~ /pat+e(rn)? b\s
$/
process_pattern_b(line)

some more elsifs

end
}

But this was really, really slow. My suspicion is that the regexp
objects are recreated and thrown away for every iteration. Storing
all patterns in a table and referencing them like

file.each_line { |line|
if line =~ $line_patterns[“pattern a”]
process_pattern_a(line)
elsif line =~ $line_patterns[“pattern b”]
process_pattern_b(line)

some more elsifs

end
}

made things tremendously faster, but I’m not really keen on storing
every regular expression that occurs somewhere in my program in this
table or as a variable. This splits up code that I would like to have
at one place and can create variable clutter.[*]

Is it the case that such “anonymous” objects like regexps (maybe also
strings?) are re-created whenever the code snippet they are defined in
is executed? If so, is there a convenient way of preventing this? Is
this only the case for regexps or also for strings and other objects?
(Why is it the case at all - I can’t make any sense of it?) I would
like to learn how I can write Ruby code that is reasonably efficient
in this regard because the impact on execution time in the described
situation was so immense. (I’m currently using Ruby 1.9.1.)

Thanks!
Thomas W.

[*] I maybe could also store the regexps and the to be executed
functions in a table with the regexps as keys and the functions as
values, iterating through them until a matching regexp key was found
so that the function that is stored as a value can be executed. But
this is only possible in situations similar to the described one.

Is it the case that such “anonymous” objects like regexps (maybe also
strings?) are re-created whenever the code snippet they are defined in
is executed? If so, is there a convenient way of preventing this? Is
this only the case for regexps or also for strings and other objects?
(Why is it the case at all - I can’t make any sense of it?) I would
like to learn how I can write Ruby code that is reasonably efficient
in this regard because the impact on execution time in the described
situation was so immense. (I’m currently using Ruby 1.9.1.)

Yes, indeed a new object is indeed created every time an anonymous
object is created. The only core object I know of for which this is not
true is the symbol, which is basically an immutable string. There may be
others I’m not aware of though. I suppose your code shows that there
just might be a need for the symbol equivalent of a regexp.

Is this ok? But it still use variable :frowning:

file.each_line { |line|
if line =~ (a ||= $line_patterns[“pattern a”])
process_pattern_a(line)
elsif line =~ (b ||= $line_patterns[“pattern b”])
process_pattern_b(line)

some more elsifs

end
}

On 9/27/09, Ehsanul H. [email protected] wrote:

created. The only core object I know of for which this is not true is the
symbol, which is basically an immutable string. There may be others I’m not
aware of though. I suppose your code shows that there just might be a need
for the symbol equivalent of a regexp.

Actually, I believe that regexp literals are created only once even if
they’re executed multiple times. The exception to this would be when
you use #{} within a regexp… that forces ruby to not only create a
new object each time the regexp literal is executed, it has to
recompile the regexp each time… and that is really slow. You can
bypass this behavior by using the o regexp option, but that only works
right if the value of the inclusion (what’s inside #{}) is guaranteed
to be the same on each execution.

Thomas, are you using #{} within your regexps? If so, you should try
sticking an o on the end of each one; that will probably solve your
performance problem. for instance
x =~ /foo#{bar}/o
instead of
x =~ /foo#{bar}/

On 27 Sep., 20:04, Caleb C. [email protected] wrote:

On 9/27/09, Ehsanul H. [email protected] wrote:

Yes, indeed a new object is indeed created every time an anonymous object is
created. The only core object I know of for which this is not true is the
symbol, which is basically an immutable string.

I think that’s not quite what I meant. Of course, if I define the
same regular expression twice at different places, there would be two
regexp objects.

Actually, I believe that regexp literals are created only once even if
they’re executed multiple times. The exception to this would be when
you use #{} within a regexp… that forces ruby to not only create a
new object each time the regexp literal is executed, it has to
recompile the regexp each time… and that is really slow. You can
bypass this behavior by using the o regexp option, but that only works
right if the value of the inclusion (what’s inside #{}) is guaranteed
to be the same on each execution.

Thanks so much! Your suspicion was right, I am indeed using #{} in
some of the regular expressions, and the o option does fix the issue.
And your explanation why the expressions would otherwise be recompiled
in every iteration is now very obvious to me.

Now my code is already a bit shorter :)!

Thomas W.

On Sep 27, 2009, at 11:50 AM, ThomasW wrote:

}
This example is perfect for Ruby’s case statement:

file.each_line { |line|
case line
when /^pattern[a]/o
process_pattern_a(line)
when /pat+e(rn)? b\s
$/o
process_pattern_b(line)

more when clauses

else
# handle no match
end
}

Gary W.

Thairuby ->a, b {a + b} wrote:

Is this ok? But it still use variable :frowning:

file.each_line { |line|
if line =~ (a ||= $line_patterns[“pattern a”])
process_pattern_a(line)
elsif line =~ (b ||= $line_patterns[“pattern b”])
process_pattern_b(line)

some more elsifs

end
}

I’m wrong typing. It would be

file.each_line { |line|
if line =~ (a ||= /^pattern[a]/)
process_pattern_a(line)
elsif line =~ (b ||= /pat+e(rn)? b\s
$/)
process_pattern_b(line)

some more elsifs

end
}

Does it have o option for string? :slight_smile:

On 27 Sep., 21:57, Gary W. [email protected] wrote:

some more elsifs

process_pattern_b(line)

more when clauses

else
# handle no match
end

}

Gary W.

Thanks for that tip. I wasn’t aware that this also works with regexp
matches. It’s great that it does! By the way, is there anything
substantially different from an elsif chain, except for being slightly
less typing?

Thomas W.

On Sep 27, 2009, at 4:25 PM, ThomasW wrote:

Thanks for that tip. I wasn’t aware that this also works with regexp
matches. It’s great that it does! By the way, is there anything
substantially different from an elsif chain, except for being slightly
less typing?

The semantics are the same in this case but I think the
case statement highlights the fact that you are doing a
sequence of matches against a single object, whereas the
standard if/then/else is a more general construct.

Gary W.

On Sun, Sep 27, 2009 at 3:20 PM, Thairuby ->a, b {a + b} <
[email protected]> wrote:

}
}

Does it have o option for string? :slight_smile:

Posted via http://www.ruby-forum.com/.

Unfortunately, I don’t think this does anything, because a and b are
declared within the block, so while the scope is the same, the extent is
not. Essentially, a and b are no longer bound, after each iteration of
the
loop. So upon entering each iteration, they do not retain their
previously
assigned values.

This can be illustrated:

“patterna\npatte b”.each_line do |line|

p line

puts “defined?(a) => #{defined?(a).inspect}”
puts “defined?(b) => #{defined?(b).inspect}”

if line =~ (a ||= /^pattern[a]/)
elsif line =~ (b ||= /pat+e(rn)? b\s
$/)
else
end

puts “defined?(a) => #{defined?(a).inspect}”
puts “defined?(b) => #{defined?(b).inspect}” , ‘’

end
END

Which has the following output:
“patterna\n”
defined?(a) => nil
defined?(b) => nil
defined?(a) => “local-variable(in-block)”
defined?(b) => “local-variable(in-block)”

“patte b”
defined?(a) => nil
defined?(b) => nil
defined?(a) => “local-variable(in-block)”
defined?(b) => “local-variable(in-block)”

You can see, that a and b were defined after the if statement in
“patterna”,
but were no longer defined before the if statement for “patte b”

Oh, I forgot the scope of local variable :frowning:
Thank you very much for your explanation.

On 28 Sep., 03:09, “Thairuby ->a, b {a + b}” [email protected]
wrote:

Oh, I forgot the scope of local variable :frowning:
Thank you very much for your explanation.

Posted viahttp://www.ruby-forum.com/.

Thairuby, thanks anyway for your effort :).