irb(main):001:0> “John;Smith”.scan /\S+/ do |match|
irb(main):002:1* puts match
irb(main):003:1> end
John;Smith
=> “John;Smith”
Ups.
Better:
irb(main):04:0> “John;Smith”.scan /\w+/ do |match|
irb(main):05:1* puts match
irb(main):06:1> end
John
Smith
The code still makes assumptions about the data, though: it is uniform
in that only the first n parts are the name, and not n[+|-]1.
–
Phillip G.
Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.
This solution is not very generalizable. It only works as presented for
cases where all the stuff you want to discard looks the same. I
wouldn’t
want to have to deal with this kind of thing for a particularly complex
case:
str = 'John S.: A Good Man -- A Good Husband. RIP (1976)'.split
first, *remainder = str.split
last, *non_name = remainder.split(': ')
desc1, *the_rest = non_name.split(' -- ')
desc2, *deceased = the_rest.split('. ')
. . . so, is there some way to use more descriptive variable names than
the default $1, $2, et cetera, for captures from within a regex? I’m
not
aware of any, but I too would find that agreeable.
"John S.".gsub /(.+)\s(.+)/ do |name, family|
p [name, family]
# instead of this
p [$1, $2]
end
i also wonder if gsub is necessary… there’s no replacement here as
far as i can tell. my oversimplified monkey brain comes up with this:
irb(main):001:0> str = “John S. - Minister of Funny Walks.”
=> “John S. - Minister of Funny Walks.”
irb(main):002:0> arr = str.split(" ")
=> [“John”, “Smith”, “-”, “Minister”, “of”, “Funny”, “Walks.”]
irb(main):003:0> name, family = arr[0], arr[1]
=> [“John”, “Smith”]
irb(main):004:0> puts name
John
=> nil
irb(main):005:0> puts family
Smith
=> nil
this also makes some assumptions about the data, of course…
. . . so, is there some way to use more descriptive variable names than
the default $1, $2, et cetera, for captures from within a regex? I’m
not
aware of any, but I too would find that agreeable.
Yes, ruby 1.9 has named capture groups. I posted an example earlier in
this thread.
Thanks for advices, I didn’t know about $~ containting arrays of
results, add new ‘substitute’ method is probably the best solution.
This solution is not very generalizable. It only works as presented for
cases where all the stuff you want to discard looks the same.
No, it’s no less generalizable than $X stuff, use splats if You have
different matches.
“John S.”.substitute{|*tokens| …}
And yes the provided sample is unclear, there where actually no
replacement, maybe it should be something like that:
"John S.".gsub /(.+)\s(.+)/ do |name, family|
"#{name[0..0]}. #{family}"
end
Here’s the complete solution:
class String
def substitute(*args)
gsub(*args){yield Regexp.last_match.captures}
end
def substitute!(*args)
gsub!(*args){yield Regexp.last_match.captures}
end
end
Thanks for help!
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.