String#gsub escaping special characters


#1

Ugh, just got bitten by trying to replace ’ w/ ’ in a string (backslash
apostrophe).

Turns out that ’ is a regex interpolator, just like \1, \2, so
“a’b’c’d”.gsub("’","\’") did not work, nor did it with the 2nd param as
‘\’’.
The magic incantation from trial and error is:
“\\’”

Yuck!

Online documentation could certainly be improved.

-Gary


#2

On Tue, Feb 24, 2009 at 8:31 AM, Gary Y. removed_email_address@domain.invalid
wrote:

The magic incantation from trial and error is:
“\\’”

that should be “\\’” or ‘\\’’.

i feel dirty now.


#3

Turns out that ’ is a regex interpolator, just like \1, \2, so
“a’b’c’d”.gsub("’","\’") did not work, nor did it with the 2nd param as
‘\\’’.

The backslash in the string is first interpreted by ruby and then as
regexp substitution pattern. This \x becomes \x as substitution
pattern but that really is just x then because there is no special
substitution for \x. In order to replace x with \x, the substitution
has to be \x but since this is a string parsed by ruby before it gets
there you have to escape those backslashes and make it “\\x”.

It really isn’t that surprising but I agree that it would be nice to
have a special string syntax that disables any special handling of
backslashes so that you could write %X{’}. I don’t think such a
syntax exists, does it?


Leo

The end is here -->


#4

The backslash in the string is first interpreted by ruby and then as
regexp substitution pattern. This \x becomes \x as substitution
pattern but that really is just x then because there is no special
substitution for \x. In order to replace x with \x, the substitution
has to be \x but since this is a string parsed by ruby before it gets
there you have to escape those backslashes and make it “\\x”.

Or more likely, I was thinking of another language, which probably
explains my faulty explanation. You’re right. Note to self: Never post
without testing what you are posting. Sorry.


Leo

The end is here -->


#5

On Tue, Feb 24, 2009 at 10:34 PM, Leo removed_email_address@domain.invalid wrote:

It really isn’t that surprising but I agree that it would be nice to
have a special string syntax that disables any special handling of
backslashes so that you could write %X{’}. I don’t think such a
syntax exists, does it?

We can use %q{} to eliminate one of the backslashes:
$: irb #(edited)
01> s = “a’b”
02> puts s.sub( “’”, %q{\’} )
a’b

That at least gets it down to one backslash per escape character :slight_smile:

Cheers,
lasitha


#6

Gary Y. removed_email_address@domain.invalid wrote:

i feel dirty now.
This comes up a lot, including in a post of mine where I tripped over
much the same thing. The real solution is: except in very simple cases,
don’t use gsub(/regex/, “string”); use the block form instead. m.