Backslash sequences\1\2 in regexs (backreferences)

Is this behavior documented anywhere:

puts “fred:smith”.gsub(/(\w+):(\w+)/, ‘\2, \1’)

–output:–
smith, fred

puts “abc”.gsub(/a(b)©/, “a\2\1”)

–output:–
a

The double quotes surrounding the replacement string cause the backslash
sequences to stop working. With single quotes the backslash sequences
work. I can’t find anything in pickaxe2 about that. .My understanding
was that double quotes allowed for more substitutions than single
quotes. This appears to be a case where double quotes allow fewer
substitutions than single quotes.

On 1-Nov-07, at 9:47 PM, 7stud – wrote:

substitutions than single quotes.

Posted via http://www.ruby-forum.com/.

The double quotes interpolate the \1 and \2 as characters before gsub
ever sees it.

ratdog:~ mike$ ruby -e ‘puts “abc”.gsub(/a(b)(c)/, “a\2\1”)’ | od -c
0000000 a 002 001 \n
0000004

ratdog:~ mike$ irb
irb(main):001:0> ‘a\1\2’.length
=> 5
irb(main):002:0> “a\1\2”.length
=> 3
irb(main):003:0> “a\2\1”
=> “a\002\001”

the \2 and \1 are interpolated into two single characters in the
double quotes.

Table 22.2 in The Basic Types says \nnn goes to Octal nnn, and here
you see 8 (not a valid octal digit) doesn’t get treated the same way
as 1 and 2:

irb(main):004:0> “a\2\1\8”
=> “a\002\0018”

Hope this helps,

Mike

Mike S. [email protected]
http://www.stok.ca/~mike/

The “`Stok’ disclaimers” apply.

On Nov 1, 7:47 pm, [email protected] wrote:

Is this behavior documented anywhere:

Yes. In many Ruby books, in at least one Ruby FAQ, and many, many
times on the ruby mailing list/forum/newsgroup.

Mike S. wrote:

Table 22.2 in The Basic Types says \nnn goes to Octal nnn,

Ah. So, \1 and \2 are interpreted as octal character codes. I was
using the following puts statement to debug:

puts “abc”.gsub(/a(b)©/, “a\2\1”) + “<—”

–output:–
a<—

I should have been using:

p “abc”.gsub(/a(b)©/, “a\2\1”)

–output:–
“a\002\001”

Since the ascii codes 1 and 2 represent non-printable characters, I got
no output for them using puts.

My question stemmed from this passage about gsub() in pickaxe2 on p.
613:

“If a string is used as the replacement, special variables from the
match (such as $& and $1) cannot be substituted into it, as the
substitution into the string occurs before the pattern match starts.
However, the sequences \1, \2 and so on may be used to interpolate
successive groups in the match.”

That makes it sound like \1 and \2 can be freely used in the replacement
string. There is no mention of the fact that single quotes are required
to keep them from being interpreted as chars written in octal. That
description is very misleading

On Nov 2, 2007, at 12:59 AM, 7stud – wrote:

–output:–
got

That makes it sound like \1 and \2 can be freely used in the
replacement
string. There is no mention of the fact that single quotes are
required
to keep them from being interpreted as chars written in octal. That
description is very misleading

No, it’s not, That single quotes are required has nothing to do with
gsub. It’s something you should know from your understanding of how
the Ruby interpreter handles double quoted strings. As Mike S. said
the string literal is converted to “a\002\001” long before gsub is
called.

Regards, Morton

Morton G. wrote:

On Nov 2, 2007, at 12:59 AM, 7stud – wrote:

–output:–
got

That makes it sound like \1 and \2 can be freely used in the
replacement
string. There is no mention of the fact that single quotes are
required
to keep them from being interpreted as chars written in octal. That
description is very misleading

No, it’s not, That single quotes are required has nothing to do with
gsub. It’s something you should know from your understanding of how
the Ruby interpreter handles double quoted strings. As Mike S. said
the string literal is converted to “a\002\001” long before gsub is
called.

Regards, Morton
You should simply use “double-quote-double-quote”

irb(main):001:0> puts “fred:smith”.gsub(/(\w+):(\w+)/, ‘\2, \1’)
smith, fred

Wolfgang Nádasi-Donner