Regular Expression Conundrums

http://pastie.org/4643789

Why, in Case 2, do I have to escape the asterisk twice to keep the
program from thinking /\ is a substarter? Thank you all for your help! I
really appreciate this.

On 1 September 2012 14:02, Stan S. [email protected] wrote:

http://pastie.org/4643789

Why, in Case 2, do I have to escape the asterisk twice to keep the
program from thinking /\ is a substarter? Thank you all for your help! I
really appreciate this.

Remember that the parameter you’re passing in is a string.

In string literals, backslashes that precede any non-special
characters (such as asterisk) are essentially ignored, so “/*/” is
the same as “/*/” (i.e. three characters: slash, star, slash).

Regexp.new “/*/” # => ////
Regexp.new "/
/" # => //*//

That resulting Regexp pattern is an escaped forward-slash, a star, and
an escaped forward slash; so it will match zero or more
forward-slashes followed by another forward-slash. That’s the same as
%r{/*/} or %r{/+}
Thus the numbers you get back are actually just the positions of every
sequence of slashes in the string. It doesn’t think "/" is a
substarter, just “/”

Putting a second backslash in the string literal means you have four
characters in your string: slash, (escaped) backslash, star, slash;
which interpreted as a Regexp means: a slash followed by an (escaped)
asterisk, followed by a slash.

You could use Regexp#quote to take away the special meaning of the
star in your string; e.g.

[line 20]
regex = Regexp.new( Regexp.quote regex_string )

and/or you could allow the regex_string parameter to be regexp object.
That way you could call, for example:

substarters = this_room_data.all_occurances “/*/” # no need to
escape regex characters
substarters = this_room_data.all_occurances %r{/*/} # no need to
escape string characters

No double-escaping; twice as much win.


Matthew K., B.Sc (CompSci) (Hons)
http://matthew.kerwin.net.au/
ABN: 59-013-727-651

“You’ll never find a programming language that frees
you from the burden of clarifying your ideas.” - xkcd

Stan S. wrote in post #1074140:

http://pastie.org/4643789

Why, in Case 2, do I have to escape the asterisk twice to keep the
program from thinking /\ is a substarter? Thank you all for your help! I
really appreciate this.

The string literal ‘\’ or “\” is a single backslash.

“\”.size
=> 1
‘\’.size
=> 1
puts “\”

=> nil

The reason is because backslash-X sequences in string literals have
special meanings, so a literal backslash is backslash-backslash

In a single quoted string, only \ and ' are interpreted specially; any
other sequences are passed through as-is. In a double-quoted string
there are lots more, e.g. \n for newline, \xNN for hex byte.

Thank you both very much! This really helped!