String.gsub with regex and block

luislavena · April 7, 2011, 5:25pm

Probably a stupid question, but is there a way to use :gsub replacement
without $0 $1 $2 $3 (and without “\0\1\2\3”)?

I would prefer something like:

"John S.".gsub /(.+)\s(.+)/ do |name, family|
  p [name, family]

  # instead of this
  p [$1, $2]
end

axyd80 · April 7, 2011, 5:43pm

On Fri, 2011-04-08 at 00:25 +0900, Alexey P. wrote:

“John S.”.gsub /(.+)\s(.+)/ do |name, family|
p [name, family]
  # instead of this
  p [$1, $2]
end

is it a requirement that you use gsub?

irb(main):008:0> name, family = “John S.”.split
=> [“John”, “Smith”]
irb(main):009:0> p [name, family]
[“John”, “Smith”]
=> nil

axyd80 · April 7, 2011, 5:48pm

Alexey P. wrote in post #991484:

Probably a stupid question, but is there a way to use :gsub replacement
without $0 $1 $2 $3 (and without “\0\1\2\3”)?

There is also $~ (Regexp.last_match); $1/$2/etc are just a facade.

I would prefer something like:

"John S.".gsub /(.+)\s(.+)/ do |name, family|
  p [name, family]

  # instead of this
  p [$1, $2]
end

“John S.”.gsub /(.+)\s(.+)/ do
name, family = $~.captures
p [name, family]
end

Not pretty, but you can wrap it up in your own method:

class String
def gsubcap(*arg)
gsub(*arg) { yield $~.captures }
end
end

“John S.”.gsubcap /(.+)\s(.+)/ do |name, family|
p [name, family]
end

axyd80 · April 7, 2011, 5:57pm

Or if you are a ruby 1.9 user, you could use named capture groups
instead. I’m not sure they make the regexp itself any clearer in this
case though:

“John S.”.gsub /(?.+)\s(?.+)/ do
p [$~[:name],$~[:family]]
end

axyd80 · April 7, 2011, 8:10pm

Alexey P. wrote in post #991484:

Probably a stupid question, but is there a way to use :gsub replacement
without $0 $1 $2 $3 (and without “\0\1\2\3”)?

Where are you replacing anything?

I would prefer something like:

"John S.".gsub /(.+)\s(.+)/ do |name, family|
  p [name, family]

  # instead of this
  p [$1, $2]
end

“John S.”.scan(/\S+/) do |match|
puts match
end

–output:–
John
Smith

axyd80 · April 7, 2011, 8:22pm

Brian C. wrote in post #991490:

Alexey P. wrote in post #991484:

Probably a stupid question, but is there a way to use :gsub replacement
without $0 $1 $2 $3 (and without “\0\1\2\3”)?

There is also $~ (Regexp.last_match); $1/$2/etc are just a facade.
I would prefer something like:
"John S.".gsub /(.+)\s(.+)/ do |name, family|
  p [name, family]

  # instead of this
  p [$1, $2]
end
“John S.”.gsub /(.+)\s(.+)/ do
name, family = $~.captures
p [name, family]
end

And if you want to avoid writing code in perl:

str = “John S.”
pattern = /(.+)\s(.+)/

result = str.gsub(pattern) do
md_obj = Regexp.last_match
first_name, last_name = md_obj[1], md_obj[2]

p first_name, last_name
end

Or to avoid any indexing at all, you could do this:

str = “John S.”
pattern = /(.+)\s(.+)/

result = str.gsub(pattern) do |match|
first_name, last_name = match.split
p first_name, last_name

“some replacement”
end

puts result

–output:–
“John”
“Smith”
some replacement

axyd80 · April 7, 2011, 8:41pm

On Thu, Apr 7, 2011 at 8:10 PM, 7stud – [email protected] wrote:

“John S.”.scan(/\S+/) do |match|
puts match
end

irb(main):001:0> “John;Smith”.scan /\S+/ do |match|
irb(main):002:1* puts match
irb(main):003:1> end
John;Smith
=> “John;Smith”

Ups.

Better:

irb(main):04:0> “John;Smith”.scan /\w+/ do |match|
irb(main):05:1* puts match
irb(main):06:1> end
John
Smith

The code still makes assumptions about the data, though: it is uniform
in that only the first n parts are the name, and not n[+|-]1.

–
Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

axyd80 · April 7, 2011, 7:13pm

On Fri, Apr 08, 2011 at 12:36:23AM +0900, Reid T. wrote:

=> [“John”, “Smith”]
irb(main):009:0> p [name, family]
[“John”, “Smith”]
=> nil

This solution is not very generalizable. It only works as presented for
cases where all the stuff you want to discard looks the same. I
wouldn’t
want to have to deal with this kind of thing for a particularly complex
case:

str = 'John S.: A Good Man -- A Good Husband.  RIP (1976)'.split

first, *remainder = str.split

last, *non_name = remainder.split(': ')

desc1, *the_rest = non_name.split(' -- ')

desc2, *deceased = the_rest.split('.  ')

. . . so, is there some way to use more descriptive variable names than
the default $1, $2, et cetera, for captures from within a regex? I’m
not
aware of any, but I too would find that agreeable.

axyd80 · April 7, 2011, 10:57pm

Alexey P. wrote in post #991484:

I would prefer something like:

"John S.".gsub /(.+)\s(.+)/ do |name, family|
  p [name, family]

  # instead of this
  p [$1, $2]
end

i also wonder if gsub is necessary… there’s no replacement here as
far as i can tell. my oversimplified monkey brain comes up with this:

irb(main):001:0> str = “John S. - Minister of Funny Walks.”
=> “John S. - Minister of Funny Walks.”
irb(main):002:0> arr = str.split(" ")
=> [“John”, “Smith”, “-”, “Minister”, “of”, “Funny”, “Walks.”]
irb(main):003:0> name, family = arr[0], arr[1]
=> [“John”, “Smith”]
irb(main):004:0> puts name
John
=> nil
irb(main):005:0> puts family
Smith
=> nil

this also makes some assumptions about the data, of course…

-j

axyd80 · April 8, 2011, 9:19am

Chad P. wrote in post #991517:

. . . so, is there some way to use more descriptive variable names than
the default $1, $2, et cetera, for captures from within a regex? I’m
not
aware of any, but I too would find that agreeable.

Yes, ruby 1.9 has named capture groups. I posted an example earlier in
this thread.

axyd80 · April 8, 2011, 7:11am

On Apr 7, 11:57pm, jake kaiden [email protected] wrote:

irb(main):001:0> str = “John S. - Minister of Funny Walks.”
=> “John S. - Minister of Funny Walks.”
irb(main):002:0> arr = str.split(" ")
=> [“John”, “Smith”, “-”, “Minister”, “of”, “Funny”, “Walks.”]
irb(main):003:0> name, family = arr[0], arr[1]

name, family, = arr

axyd80 · April 8, 2011, 2:31pm

Thanks for advices, I didn’t know about $~ containting arrays of
results, add new ‘substitute’ method is probably the best solution.

This solution is not very generalizable. It only works as presented for
cases where all the stuff you want to discard looks the same.
No, it’s no less generalizable than $X stuff, use splats if You have
different matches.
“John S.”.substitute{|*tokens| …}

And yes the provided sample is unclear, there where actually no
replacement, maybe it should be something like that:

"John S.".gsub /(.+)\s(.+)/ do |name, family|
  "#{name[0..0]}. #{family}"
end

Here’s the complete solution:

class String
  def substitute(*args)
    gsub(*args){yield Regexp.last_match.captures}
  end

  def substitute!(*args)
    gsub!(*args){yield Regexp.last_match.captures}
  end
end

Thanks for help!