I need to pass this value with the ampersand escaped to another command
in my program.
So I tried something like this:
irb(main):038:0> a.gsub(/&/,"\&")
=> “&0&1”
That’s because & has a special meaning in a replacement string (“the
matched string”). Either use a block to provide the replacement value
(which doesn’t do the backslash replacement), or put two backslashes in
the replacement string.
String literals have a one-pass escaping at parse time, so that
"foo\\bar\nbaz"
is an encoded way to express
foo\bar
baz
And the result of that ordinary pass is what gsub receives.
Then, at runtime gsub inspects its argument and looks in turn for
occurrences of \1, & and friends. That is gsub’s contract, and has no
relationship with string literals parsing.
You need double-scaping for \1 and friends to skip both passes, one
related to literals, and the other one related to how gsub works.
On Jan 13, 2010, at 10:56 AM, Marnen Laibow-Koser wrote:
…and Ruby’s stupid backslash handling strikes again. This is a
completely brain-dead way to do it, and is one of the few things I
really hate about Ruby.
Is this really a Ruby snafu? It seems like it would be inherent in
any sort of character escape sequence, of which there are many
examples that have nothing at all to do with Ruby.
Any pointers to alternative encoding schemes that avoid this problem?
So what you basically do here is you escape the escape so it looses
its special meaning in the replacement string.
…and Ruby’s stupid backslash handling strikes again. This is a
completely brain-dead way to do it, and is one of the few things I
really hate about Ruby.
On Jan 13, 2010, at 10:56 AM, Marnen Laibow-Koser wrote:
…and Ruby’s stupid backslash handling strikes again. This is a
completely brain-dead way to do it, and is one of the few things I
really hate about Ruby.
Is this really a Ruby snafu?
Yes. The problem is that Ruby “helpfully” does another level of
escaping, so that “\&” is equivaIent to “&”, whereas it should simply
take the escape at face value and consider it equivalent to the two
characters \ and &.
For real fun, try concatenating two strings, the first of which ends in
a backslash. It’s insane.
It seems like it would be inherent in
any sort of character escape sequence, of which there are many
examples that have nothing at all to do with Ruby.
But Ruby has its own special brand of idiocy here. Even Perl and PHP
get this right.
Any pointers to alternative encoding schemes that avoid this problem?
It has nothing to do with encoding. It’s a question of a particular
point of stupidity in Ruby’s parser and/or String class.
The problem is that Ruby “helpfully” does another level of
escaping
It’s necessary because it lets you use sequences like \n in
double-quoted strings, and " if you want a double-quote, and #{expr}
if you want literally # { expr } rather than interpolation.
so that “\&” is equivaIent to “&”, whereas it should simply
take the escape at face value and consider it equivalent to the two
characters \ and &.
Which is what happens in single-quoted strings. But you can’t put any
control character sequences like \n in those.
For real fun, try concatenating two strings, the first of which ends in
a backslash. It’s insane.
a = “abc\” # that’s a string ending with one backslash
b = “def”
c = a + b
Looks OK to me.
But Ruby has its own special brand of idiocy here. Even Perl and PHP
get this right.
Perl is exactly the same.
#!/usr/bin/perl
print “abc\\n”;
This prints abc\ - same as Ruby would.
It was probably unfortunate that gsub uses sequences like \1 and the
like in the substitution side though. But that’s what perl does:
#!/usr/bin/perl
$_ = “ab&de\n”;
s/&/&/;
print;
That prints ab&de, which is the problem the OP was grappling with.
String literals have a one-pass escaping at parse time, so that
"foo\\bar\nbaz"
is an encoded way to express
foo\bar
baz
And the result of that ordinary pass is what gsub receives.
Then, at runtime gsub inspects its argument and looks in turn for
occurrences of \1, & and friends. That is gsub’s contract, and has no
relationship with string literals parsing.
You need double-scaping for \1 and friends to skip both passes, one
related to literals, and the other one related to how gsub works.
Yes, I see that now. I wasn’t aware that gsub did an extra parsing
step. With that in mind, doubling backslashes makes sense.
The problem is that Ruby “helpfully” does another level of
escaping
It’s necessary because it lets you use sequences like \n in
double-quoted strings, and " if you want a double-quote, and #{expr}
if you want literally # { expr } rather than interpolation.
so that “\&” is equivaIent to “&”, whereas it should simply
take the escape at face value and consider it equivalent to the two
characters \ and &.
Which is what happens in single-quoted strings. But you can’t put any
control character sequences like \n in those.
I know that.
For real fun, try concatenating two strings, the first of which ends in
a backslash. It’s insane.
a = “abc\” # that’s a string ending with one backslash
b = “def”
c = a + b
Looks OK to me.
And to me too, when I just now tried it. I did run into a problem
with this at one point, but I can’t now reproduce it. Perhaps it was
actually a gsub issue.
I’m glad to know that Ruby’s backslash handling is not as weird as I’d
thought. Thanks for the correction.
On Wed, Jan 13, 2010 at 6:49 PM, Marnen Laibow-Koser [email protected]
wrote:
It has nothing to do with encoding. Â It’s a question of a particular
point of stupidity in Ruby’s parser and/or String class.
I don’t understand your point. The backslash is a special character in
string literals. If you want to include one you need to escape it.
That’s pretty normal.
What’s your complain about parsing? This gotcha is related to gsub’s
contract, nor to rules for string literals themselves.
I’m glad to know that Ruby’s backslash handling is not as weird as I’d
thought. Thanks for the correction.
For me there is actually something weird about Ruby’s escape handling
but it’s something else: in some circumstances Ruby allows you to omit a backslash which is meant to be convenient (I believe) but
which leads to a certain inconsistency:
irb(main):014:0> ‘\1’ # this might be seen as surprising
=> “\1”
irb(main):015:0> ‘\1’
=> “\1”
We can get a single backslash by just using one, but if we need more
backslashes we need to escape:
For me there is actually something weird about Ruby’s escape handling
but it’s something else: in some circumstances Ruby allows you to omit a backslash which is meant to be convenient (I believe) but
which leads to a certain inconsistency:
irb(main):014:0> ‘\1’ # this might be seen as surprising
=> “\1”
I think the principle is “single quoting does the absolute minimum
amount of dequoting”.
However it has to support a way to get a single-quote within a
single-quoted string, and they chose '. As a consequence, it has to
support \ to get a single backslash within a single-quoted string.
The question then is, should any other sequence like \1 raise an error,
or return literal \ and 1 ?
The alternative would have been to use two single quotes where you want
a single quote within a string:
‘It’‘s that time of day’
I quite like that, but arguably it’s just confusing in a different way.
amount of dequoting".
Hmm, I never thought of it that way. I’m not sure I like this principle
though.
However it has to support a way to get a single-quote within a
single-quoted string, and they chose '. As a consequence, it has to
support \ to get a single backslash within a single-quoted string.
The question then is, should any other sequence like \1 raise an error,
or return literal \ and 1 ?
I opt for raising a syntax error. I know, this is unlikely to happen
anytime soon if only because of the large base of code that is
potentially affected. With what I have seen over the past years, the
number of backslashes needed for proper quoting (especially for #gsub
and friends) has caused much confusion. I believe that could be
avoided by disallowing the ‘\1’.
The alternative would have been to use two single quotes where you want
a single quote within a string:
‘It’‘s that time of day’
I quite like that, but arguably it’s just confusing in a different way.
I like the quoting approach better.
Kind regards
robert
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.