Ruby Regex

Hello,

I have a string as a = “&0&1”

I need to pass this value with the ampersand escaped to another command
in my program.

So I tried something like this:

irb(main):038:0> a.gsub(/&/,"\&")
=> “&0&1”

But if I replace the & with some other variable I get it properly
working:

irb(main):049:0> a.gsub(/&/,"\g")
=> “\g0\g1”

Any explanation on why this is happening and how to go about escaping
the &.

Thanks.
Sriram V…

Sriram V. wrote:

Hello,

I have a string as a = “&0&1”

I need to pass this value with the ampersand escaped to another command
in my program.

So I tried something like this:

irb(main):038:0> a.gsub(/&/,"\&")
=> “&0&1”

That’s because & has a special meaning in a replacement string (“the
matched string”). Either use a block to provide the replacement value
(which doesn’t do the backslash replacement), or put two backslashes in
the replacement string.

irb(main):002:0> a.gsub(/&/) { “\&” }
=> “\&0\&1”
irb(main):003:0> a.gsub(/&/, “\\&”)
=> “\&0\&1”

2010/1/13 Brian C. [email protected]:

irb(main):038:0> a.gsub(/&/,“\&”)
=> “\&0\&1”
But it’s only special when preceded by a backslash, which is special
in replacement strings.

irb(main):002:0> “123”.gsub(/\d/, ‘<&>’)
=> “<&><&><&>”
irb(main):003:0> “123”.gsub(/\d/, ‘<\&>’)
=> “<1><2><3>”
irb(main):004:0> “123”.gsub(/\d/, ‘<\\&>’)
=> “<\&><\&><\&>”

Another example - backslash with group index:

irb(main):011:0> “abc”.gsub(/\w(.)\w/, ‘<1>’)
=> “<1>”
irb(main):012:0> “abc”.gsub(/\w(.)\w/, ‘<\1>’)
=> “
irb(main):013:0> “abc”.gsub(/\w(.)\w/, ‘<\\1>’)
=> “<\1>”

So what you basically do here is you escape the escape so it looses
its special meaning in the replacement string. :slight_smile:

Kind regards

robert

String literals have a one-pass escaping at parse time, so that

"foo\\bar\nbaz"

is an encoded way to express

foo\bar
baz

And the result of that ordinary pass is what gsub receives.

Then, at runtime gsub inspects its argument and looks in turn for
occurrences of \1, & and friends. That is gsub’s contract, and has no
relationship with string literals parsing.

You need double-scaping for \1 and friends to skip both passes, one
related to literals, and the other one related to how gsub works.

On Jan 13, 2010, at 10:56 AM, Marnen Laibow-Koser wrote:

…and Ruby’s stupid backslash handling strikes again. This is a
completely brain-dead way to do it, and is one of the few things I
really hate about Ruby.

Is this really a Ruby snafu? It seems like it would be inherent in
any sort of character escape sequence, of which there are many
examples that have nothing at all to do with Ruby.

Any pointers to alternative encoding schemes that avoid this problem?

Gary W.

Robert K. wrote:

2010/1/13 Brian C. [email protected]:

irb(main):038:0> a.gsub(/&/,“\&”)
=> “\&0\&1”
But it’s only special when preceded by a backslash, which is special
in replacement strings.

irb(main):002:0> “123”.gsub(/\d/, ‘<&>’)
=> “<&><&><&>”
irb(main):003:0> “123”.gsub(/\d/, ‘<\&>’)
=> “<1><2><3>”
irb(main):004:0> “123”.gsub(/\d/, ‘<\\&>’)
=> “<\&><\&><\&>”

Another example - backslash with group index:

irb(main):011:0> “abc”.gsub(/\w(.)\w/, ‘<1>’)
=> “<1>”
irb(main):012:0> “abc”.gsub(/\w(.)\w/, ‘<\1>’)
=> “
irb(main):013:0> “abc”.gsub(/\w(.)\w/, ‘<\\1>’)
=> “<\1>”

So what you basically do here is you escape the escape so it looses
its special meaning in the replacement string. :slight_smile:

…and Ruby’s stupid backslash handling strikes again. This is a
completely brain-dead way to do it, and is one of the few things I
really hate about Ruby.

Kind regards

robert

Best,
–Â
Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Gary W. wrote:

On Jan 13, 2010, at 10:56 AM, Marnen Laibow-Koser wrote:

…and Ruby’s stupid backslash handling strikes again. This is a
completely brain-dead way to do it, and is one of the few things I
really hate about Ruby.

Is this really a Ruby snafu?

Yes. The problem is that Ruby “helpfully” does another level of
escaping, so that “\&” is equivaIent to “&”, whereas it should simply
take the escape at face value and consider it equivalent to the two
characters \ and &.

For real fun, try concatenating two strings, the first of which ends in
a backslash. It’s insane.

It seems like it would be inherent in
any sort of character escape sequence, of which there are many
examples that have nothing at all to do with Ruby.

But Ruby has its own special brand of idiocy here. Even Perl and PHP
get this right.

Any pointers to alternative encoding schemes that avoid this problem?

It has nothing to do with encoding. It’s a question of a particular
point of stupidity in Ruby’s parser and/or String class.

Gary W.

Best,
–Â
Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Marnen Laibow-Koser wrote:

The problem is that Ruby “helpfully” does another level of
escaping

It’s necessary because it lets you use sequences like \n in
double-quoted strings, and " if you want a double-quote, and #{expr}
if you want literally # { expr } rather than interpolation.

so that “\&” is equivaIent to “&”, whereas it should simply
take the escape at face value and consider it equivalent to the two
characters \ and &.

Which is what happens in single-quoted strings. But you can’t put any
control character sequences like \n in those.

For real fun, try concatenating two strings, the first of which ends in
a backslash. It’s insane.

a = “abc\” # that’s a string ending with one backslash
b = “def”
c = a + b

Looks OK to me.

But Ruby has its own special brand of idiocy here. Even Perl and PHP
get this right.

Perl is exactly the same.

#!/usr/bin/perl
print “abc\\n”;

This prints abc\ - same as Ruby would.

It was probably unfortunate that gsub uses sequences like \1 and the
like in the substitution side though. But that’s what perl does:

#!/usr/bin/perl
$_ = “ab&de\n”;
s/&/&/;
print;

That prints ab&de, which is the problem the OP was grappling with.

Xavier N. wrote:

String literals have a one-pass escaping at parse time, so that

"foo\\bar\nbaz"

is an encoded way to express

foo\bar
baz

And the result of that ordinary pass is what gsub receives.

Then, at runtime gsub inspects its argument and looks in turn for
occurrences of \1, & and friends. That is gsub’s contract, and has no
relationship with string literals parsing.

You need double-scaping for \1 and friends to skip both passes, one
related to literals, and the other one related to how gsub works.

Yes, I see that now. I wasn’t aware that gsub did an extra parsing
step. With that in mind, doubling backslashes makes sense.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Brian C. wrote:

Marnen Laibow-Koser wrote:

The problem is that Ruby “helpfully” does another level of
escaping

It’s necessary because it lets you use sequences like \n in
double-quoted strings, and " if you want a double-quote, and #{expr}
if you want literally # { expr } rather than interpolation.

so that “\&” is equivaIent to “&”, whereas it should simply
take the escape at face value and consider it equivalent to the two
characters \ and &.

Which is what happens in single-quoted strings. But you can’t put any
control character sequences like \n in those.

I know that.

For real fun, try concatenating two strings, the first of which ends in
a backslash. It’s insane.

a = “abc\” # that’s a string ending with one backslash
b = “def”
c = a + b

Looks OK to me.

And to me too, when I just now tried it. I did run into a problem
with this at one point, but I can’t now reproduce it. Perhaps it was
actually a gsub issue.

I’m glad to know that Ruby’s backslash handling is not as weird as I’d
thought. Thanks for the correction.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

On Wed, Jan 13, 2010 at 6:49 PM, Marnen Laibow-Koser [email protected]
wrote:

It has nothing to do with encoding. Â It’s a question of a particular
point of stupidity in Ruby’s parser and/or String class.

I don’t understand your point. The backslash is a special character in
string literals. If you want to include one you need to escape it.
That’s pretty normal.

What’s your complain about parsing? This gotcha is related to gsub’s
contract, nor to rules for string literals themselves.

Thanks Brian, Robert and Xavier for your explanation. It was very
helpful.

Regards,
Sriram V…

2010/1/13 Marnen Laibow-Koser [email protected]:

I’m glad to know that Ruby’s backslash handling is not as weird as I’d
thought. Thanks for the correction.

:slight_smile:

For me there is actually something weird about Ruby’s escape handling

  • but it’s something else: in some circumstances Ruby allows you to
    omit a backslash which is meant to be convenient (I believe) but
    which leads to a certain inconsistency:

irb(main):014:0> ‘\1’ # this might be seen as surprising
=> “\1”
irb(main):015:0> ‘\1’
=> “\1”

We can get a single backslash by just using one, but if we need more
backslashes we need to escape:

irb(main):027:0> ‘\1’
=> “\1”
irb(main):028:0> ‘\\1’
=> “\\1”
irb(main):029:0> ‘\\1’
=> “\\1”
irb(main):030:0> ‘\\\1’
=> “\\\1”
irb(main):031:0> ‘\\\1’
=> “\\\1”

For double quoted strings we always need to use two backslashes at
least if followed by a digit:

irb(main):016:0> “\1”
=> “\x01”
irb(main):017:0> “\1”
=> “\1”

and

irb(main):018:0> ‘\n’ # this might be seen as surprising
=> “\n”
irb(main):019:0> ‘\n’
=> “\n”

but

irb(main):020:0> “\n” # a single newline
=> “\n”
irb(main):021:0> “\n” # backslash and n
=> “\n”

Bottom line for me: I do not exploit the ‘\1’ case and have made it a
habit to use two backslashes whenever I need a literal backslash in a
string.

Kind regards

robert

Robert K. wrote:

For me there is actually something weird about Ruby’s escape handling

  • but it’s something else: in some circumstances Ruby allows you to
    omit a backslash which is meant to be convenient (I believe) but
    which leads to a certain inconsistency:

irb(main):014:0> ‘\1’ # this might be seen as surprising
=> “\1”

I think the principle is “single quoting does the absolute minimum
amount of dequoting”.

However it has to support a way to get a single-quote within a
single-quoted string, and they chose '. As a consequence, it has to
support \ to get a single backslash within a single-quoted string.

The question then is, should any other sequence like \1 raise an error,
or return literal \ and 1 ?

The alternative would have been to use two single quotes where you want
a single quote within a string:

‘It’‘s that time of day’

I quite like that, but arguably it’s just confusing in a different way.

2010/1/14 Brian C. [email protected]:

amount of dequoting".
Hmm, I never thought of it that way. I’m not sure I like this principle
though.

However it has to support a way to get a single-quote within a
single-quoted string, and they chose '. As a consequence, it has to
support \ to get a single backslash within a single-quoted string.

The question then is, should any other sequence like \1 raise an error,
or return literal \ and 1 ?

I opt for raising a syntax error. I know, this is unlikely to happen
anytime soon if only because of the large base of code that is
potentially affected. With what I have seen over the past years, the
number of backslashes needed for proper quoting (especially for #gsub
and friends) has caused much confusion. I believe that could be
avoided by disallowing the ‘\1’.

The alternative would have been to use two single quotes where you want
a single quote within a string:

‘It’‘s that time of day’

I quite like that, but arguably it’s just confusing in a different way.

I like the quoting approach better. :slight_smile:

Kind regards

robert