Sub/gsub bug?

Is this a bug?

irb(main):032:0> “matt’s test”.sub(/’/, “\’”)
=> “matts tests test”

It’s trying to replace a single quote with an escaped quote.

Thanks in advance,
Matt

Matt Gregory wrote:

Is this a bug?

irb(main):032:0> “matt’s test”.sub(/’/, “\’”)
=> “matts tests test”

It’s trying to replace a single quote with an escaped quote.

Thanks in advance,
Matt

irb(main):002:0> puts “matt’s test”.sub(/’/, “\\’”)
matt’s test
=> nil

The gsub replacement string is a crazy thing when it comes to escaping,
because remember you can put backreferences in it. Any double quoted
string needs 2 backslashes to represent a literal backslash, but
remember in this context, a literal backslash represents the
backreference syntax, so you need to escape it with another “literal
backslash” of 2 backslashes.

At least, I think that’s how I understand it :stuck_out_tongue:

J. Cooper wrote:

The gsub replacement string is a crazy thing when it comes to escaping,
because remember you can put backreferences in it. Any double quoted
string needs 2 backslashes to represent a literal backslash, but
remember in this context, a literal backslash represents the
backreference syntax, so you need to escape it with another “literal
backslash” of 2 backslashes.

At least, I think that’s how I understand it :stuck_out_tongue:

Yeah, I get all that, but I didn’t think that ’ was a backreference.
I thought it was limited to numbers and the ampersand. But now that I
think about it, Perl has a $’ variable that contains the portion of
the string after the match, so I guess that’s where the idea came
from. I’ve never seen that in backreference form before.

irb(main):005:0> “matt’s test”.gsub(/’/, “\`”)
=> “mattmatts test”
irb(main):006:0> “matt’s test”.gsub(/’/, “\&”)
=> “matt’s test”
irb(main):007:0> “matt’s test”.gsub(/’/, “\’”)
=> “matts tests test”

Yay!
Matt

Phlip wrote:

Untill someone dissects the source and finds (or adds) a better fix, here’s
a workaround:

Thanks for the reply, Phlip. As it turns out, when I was researching
this problem I found out that you can use replaceable parameters in
DBI queries instead of worrying about escaping quotes, so that’s a
relief. I was just kind of curious why gsub() wasn’t working the way
I thought it would.

irb(main):012:0> “matt’s test”.sub(/’/){"\’"}
=> “matt\'s test”

That surprised me too, because I thought that sub’s block form also
interpolated \n group inserters, but I suppose that blocks have room for
true $1 match references. \1 is often available in some regular expressions
that cram all the searchers and replacers into one huge line, so a $1 is
naturally not available inside its own source expression. (I also don’t
happen to know if Ruby supports such super-regices!)

Yeah that’s strange that they would write two replacement string
parsers, but I guess there’s probably a good reason for it.

Super-regices? You mean like s/xxx/yyy/g? I haven’t seen that
anywhere in my Ruby travels, but I’m kind of new.

Matt

On 18.05.2008 06:33, Phlip wrote:

Matt wrote:

Is this a bug?

irb(main):032:0> “matt’s test”.sub(/’/, “\’”)
=> “matts tests test”

It’s trying to replace a single quote with an escaped quote.

To understand the bug, try this:

There is no bug.

=> “matt’s test”

The " forces us to escape the \ as \ to get a literal .

And a literal \ in a replacement string is a meta character. To get a
literal in a replacement, you need \ in the string and consequently
\\ in a double quoted string.

So your \’ indeed goes in as a literal '. But the group replacer still
sees a , and snarfs it, expecting a number after it. I would call that a
bug (essentially because I might personally be capable of correctly parsing
a \1!)

No, this is not a bug and it is not the “group replacer”. The
replacement string is parsed by the regexp engine and since the
backslash does not escape a character that has meta capabilities (such
as & and 1…9) it is silently discarded.

Untill someone dissects the source and finds (or adds) a better fix, here’s
a workaround:

irb(main):012:0> “matt’s test”.sub(/’/){"\’"}
=> “matt\'s test”

No, the proper way to do it is this:

irb(main):002:0> “matt’s test”.sub(/’/, “\\’”)
=> “matt\'s test”
irb(main):003:0> puts “matt’s test”.sub(/’/, “\\’”)
matt’s test
=> nil

irb(main):006:0> “matt’s test”.sub(/’/, ‘\\’’)
=> “matt\'s test”
irb(main):007:0> puts “matt’s test”.sub(/’/, ‘\\’’)
matt’s test
=> nil

Blocks are only needed if you need to do some calculations based on the
match, e.g.

irb(main):008:0> “There is 1 number in this string”.gsub(/\d+/) {|m|
m.to_i * 34}
=> “There is 34 number in this string”

That surprised me too, because I thought that sub’s block form also
interpolated \n group inserters, but I suppose that blocks have room for
true $1 match references.

Yes.

\1 is often available in some regular expressions
that cram all the searchers and replacers into one huge line, so a $1 is
naturally not available inside its own source expression. (I also don’t
happen to know if Ruby supports such super-regices!)

I am not sure what you mean here. The reason $1 cannot be used in
regular expressions and replacement strings is that it is a variable
that gets its value before the regexp is created (of course you can use
it but it cannot refer to matching of the regexp you put it into).

You can use groups inside the regular expression as well as in the
replacement string if this is what you mean.

irb(main):017:0> “aba abc”.gsub(/(.)(b)\1/, “[\1]<\2>[\1]”)
=> “[a][a] abc”

Kind regards

robert

On 18.05.2008 13:38, Robert K. wrote:

To understand the bug, try this:
Now try it with “double quotes”:
So your \’ indeed goes in as a literal '. But the group replacer
still sees a , and snarfs it, expecting a number after it. I would
call that a bug (essentially because I might personally be capable of
correctly parsing a \1!)

No, this is not a bug and it is not the “group replacer”. The
replacement string is parsed by the regexp engine and since the
backslash does not escape a character that has meta capabilities (such
as & and 1…9) it is silently discarded.
^^^^^^^^^^^^^^^^^^

This was wrong of course as Matt demonstrated

irb(main):019:0> “abc”.gsub(/b/, “<\’>”)
=> “ac”

But the backslash is just retained if it does not appear with a meta
capable character:

irb(main):020:0> “abc”.gsub(/b/, “<\a>”)
=> “a<\a>c”
irb(main):021:0> puts “abc”.gsub(/b/, “<\a>”)
a<\a>c
=> nil

Sorry, for the added confusion.

Kind regards

robert