Regex backreference weirdness

It all started with trying to convert some strings with underscores in
them, to camel case…and me thinking of regexes as sed regexes

It looks like gsub saves back references, but only after the whole
method exits, so it can’t use what it found. Is this right?

Below is what I tried, which leads me up to the question of… what
would the right way be of doing this?

irb>“camel_case”.gsub(/(.)/,$1.upcase)
NoMethodError: undefined method `upcase’ for nil:NilClass
from (irb):1
#Which made me say Hu? it looks like a valid back reference to me…
#So I tried
“camel_case”=~/
(.)/
irb>puts $1
=>“c”
#ok really wierd…
irb>“camel_case”.gsub(/(.)/,$1.upcase)
=>“camelCase”
#Right, I’ll buy that since $1 is still hanging around
irb>“camel_face”.gsub(/
(.)/,$1.upcase)
=>“camelCase”
irb>“camel_face”.gsub(/_(.)/,$1.upcase)
=>“camelFase”
#Ha ha! gsub DOES save a backreference… so why isn’t this working?!
:frowning:

On 7/19/07, Kyle S. [email protected] wrote:

irb>“camel_case”.gsub(/(.)/,$1.upcase)
#Right, I’ll buy that since $1 is still hanging around
irb>“camel_face”.gsub(/
(.)/,$1.upcase)
=>“camelCase”
irb>“camel_face”.gsub(/_(.)/,$1.upcase)
=>“camelFase”
#Ha ha! gsub DOES save a backreference… so why isn’t this working?! :frowning:

Because you’re not using the right form:

EITHER
string.gsub(regexp, ‘\1’)
OR
string.gsub(regexp) { |match| $1 }

On Jul 19, 2007, at 6:23 PM, Kyle S. wrote:

Below is what I tried, which leads me up to the question of… what
would the right way be of doing this?

irb>“camel_case”.gsub(/_(.)/,$1.upcase)
NoMethodError: undefined method `upcase’ for nil:NilClass
from (irb):1

I think, in this case, you will have to use a block. For example:

"camel_case".gsub(/_(.)/) { $1.upcase } # => "camelCase"

or

"camel_case".gsub(/_./) { |m| m[1, 1].upcase } # => "camelCase"

Regards, Morton

Instead of reinventing the wheel, you could always use the camelcase
converter that Rails has and pull out what you need:

http://api.rubyonrails.com/classes/ActiveSupport/CoreExtensions/String/Inflections.html

Regards

Mikel

2007/7/20, Kyle S. [email protected]:

It all started with trying to convert some strings with underscores in
them, to camel case…and me thinking of regexes as sed regexes

It looks like gsub saves back references, but only after the whole
method exits, so it can’t use what it found. Is this right?

Below is what I tried, which leads me up to the question of… what
would the right way be of doing this?

irb>“camel_case”.gsub(/_(.)/,$1.upcase)

You need to be aware that $1.upcase is evaluated before the method
call. So it can never be able to do calculations based on match
state. You rather want to use the block for, where the block is
invoked once per match. For example, you can do

irb(main):005:0> “camel_case”.gsub(/(?:\A|_)(.)/) {|m| $1.capitalize }
=> “CamelCase”

NoMethodError: undefined method `upcase’ for nil:NilClass
from (irb):1
#Which made me say Hu? it looks like a valid back reference to me…

No, with the non block form you need to use \1, \2 etc. as has been
mentioned already.

irb>“camel_face”.gsub(/_(.)/,$1.upcase)
=>“camelFase”
#Ha ha! gsub DOES save a backreference… so why isn’t this working?! :frowning:

You’re still working on the value of $1 from the last invocation.
Proper backreferencing in the non block form looks like this:

irb(main):010:0> “camel_case”.gsub /[cde]/, ‘<\&>’
=> “aml_as”
irb(main):011:0> “camel_case”.gsub /c(.)/, ‘<\1>’
=> “mel_se”

Regards

robert

I completely, and utterly forgot about the block form of gsub.
Perfect. Thanks everyone!

But it does make me wonder, for the non block form, when you use the
\1 variable, I can see how to use it inside of other strings, but how
would you go about running other methods on it? In this case upcase.
Or is there no way?

As far as re-inventing the wheel, it’s important to know the hows and
whys, even if you don’t end up implementing it yourself :slight_smile:

–Kyle

From: Kyle S. [mailto:[email protected]]

#Ha ha! gsub DOES save a backreference… so why isn’t this

working?! :frowning:

i think the behaviour is documented.

root@pc4all:~# qri string#gsub
------------------------------------------------------------ String#gsub
str.gsub(pattern, replacement) => new_str
str.gsub(pattern) {|match| block } => new_str

 Returns a copy of str with all occurrences of pattern replaced
 with either replacement or the value of the block. The pattern
 will typically be a Regexp; if it is a String then no regular
 expression metacharacters will be interpreted (that is /\d/ will
 match a digit, but '\d' will match a backslash followed by a 'd').

 If a string is used as the replacement, special variables from the
 match (such as $& and $1) cannot be substituted into it, as
 substitution into the string occurs before the pattern match
 starts. However, the sequences \1, \2, and so on may be used to
 interpolate successive groups in the match.

 In the block form, the current match string is passed in as a
 parameter, and variables such as $1, $2, $`, $&, and $' will be
 set appropriately. The value returned by the block will be
 substituted for the match on each call.

 The result inherits any tainting in the original string or any
 supplied replacement string.

    "hello".gsub(/[aeiou]/, '*')              #=> "h*ll*"
    "hello".gsub(/([aeiou])/, '<\1>')         #=> "h<e>ll<o>"
    "hello".gsub(/./) {|s| s[0].to_s + ' '}   #=> "104 101 108 108 

111 "

root@pc4all:~#

kind regards -botp

On 20.07.2007 15:45, Kyle S. wrote:

But it does make me wonder, for the non block form, when you use the
\1 variable, I can see how to use it inside of other strings, but how
would you go about running other methods on it? In this case upcase.
Or is there no way?

Precisely.

robert