Someone can correct me if I’m wrong about this, but since Regex match
from
left to right, your expression is complete at the end of the first
match. I
don’t think it parses the whole string into subsequently matching
groupings.
message[/(#\w+).*(#\w+)/, 2] would give you “#Ram” since you’d be
telling
it to expect the second identifier ("#text"), but that may not be the
functionality you’re looking for.
I’m guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:
2.0.0p195 :001 > message = ‘#bat with some #Ram’
=> “#bat with some #Ram”
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [["#bat"], ["#Ram"]]
In your example, Rubular reports 2 matches, and within each match a
single group. If you check in Rubular what Andrew and Joel proposed,
you will see just one match with 2 captured groups:
The regex /aeiou\1/ matches the substring “ell”. Specifically
[aeiou]
matches the “e”, the dot matches the “l” and \1 matches the second “l”.
Using 0 as the second argument to String#[] selects the whole match -
just like when you don’t supply a second argument at all. Using 1
selects
the contents matched by the first capturing group, which is “(.)”. Since
that
matched “l”, that’s what you get. Using 2 selects the second capturing
group,
but the regex /aeiou\1/ only contains one capturing group, so you
get nil.
Note that the concept of a capturing group has nothing to do with how
often
the regex can be matched in a given string. It’s solely a property of
the regex.
Specifically a capturing group is a part of the regex that’s enclosed in
parentheses
and does not start with “?:”, “?=” or similar modifiers that make a
group non-
capturing.
If I do, message[/(#\w+)/], I had perception that, all the match groups
created, then using 1,2,3, as the second argument, I can access the
respective matched group’s content. But which is not the case,
understood from - https://www.ruby-forum.com/topic/4422155?reply_to=1134864#1134556.
But yes, String#scan is enough for this purpose, as each match will be
a separate entry inside the array. So If I want first match I can call,
say ar[0], for second ar[1], so on…
can the below reular expression can be written in another way, to get
the same output ?
You should starting by matching only once to avoid unnecessary work.
(arup~>~)$ pry --simple-prompt
s = “315 Kw (422 Engine power (HP))”
=> “315 Kw (422 Engine power (HP))”
s[/(\d+)[^0-9]*(\d+)/,2]
=> “422”
s[/(\d+)[^0-9]*(\d+)/,1]
=> “315”
irb(main):001:0> s = “315 Kw (422 Engine power (HP))”
=> “315 Kw (422 Engine power (HP))”
irb(main):002:0> /(\d+)\D+(\d+)/ =~ s
=> 0
irb(main):003:0> kw = Integer($1)
=> 315
irb(main):004:0> hp = Integer($2)
=> 422
In this case you can also use String#scan
irb(main):005:0> kw, hp = s.scan(/\d+/).map {|m| Integer(m)}
=> [315, 422]
irb(main):006:0> kw
=> 315
irb(main):007:0> hp
=> 422
Downside is that you do not have good control over the match. I’ts
probably better to do something like
irb(main):008:0> /(\d+)\skw\s(\s*(\d+)/i =~ s
=> 0
irb(main):009:0> kw = Integer($1)
=> 315
irb(main):010:0> hp = Integer($2)
=> 422
That gives you a bit more confidence that the string looks the way you
expect. Of course you can extend that even more by adding anchors and
pattern for the trailing portion.
You can also use named captures:
irb(main):011:0> kw = hp = nil
=> nil
irb(main):012:0> /(?\d+)\skw\s(\s*(?\d+)/i =~ s
=> 0
irb(main):013:0> kw
=> “315”
irb(main):014:0> hp
=> “422”