String#[] confusions

Why I am not getting second capture from the string

irb(main):001:0> message = ‘#bat with some #Ram
=> “#bat with some #Ram
irb(main):004:0> message[/(#\w+)/,2]
=> nil
irb(main):005:0> message[/(#\w+)/,1]
=> “#bat
irb(main):006:0>

Why does message[/(#\w+)/,2] return nil ?]

Rubular - Rubular: (#\w+)

You’re only looking for 1 match group. If you use 2 match groups, you
can look for the second one:

message[/(#\w+).*(#\w+)/,2]
=> “#Ram

Someone can correct me if I’m wrong about this, but since Regex match
from
left to right, your expression is complete at the end of the first
match. I
don’t think it parses the whole string into subsequently matching
groupings.

message[/(#\w+).*(#\w+)/, 2] would give you “#Ram” since you’d be
telling
it to expect the second identifier ("#text"), but that may not be the
functionality you’re looking for.

Andrew

Joel P. wrote in post #1134542:

You’re only looking for 1 match group. If you use 2 match groups, you
can look for the second one:

message[/(#\w+).*(#\w+)/,2]
=> “#Ram

But here Rubular: (#\w+) I can see the matches as 1,2.
Why not then String#[] doesn’t work that way. Still I am in a confusion.

Jesús Gabriel y Galán wrote in post #1134549:

On Mon, Jan 27, 2014 at 3:47 PM, Arup R. [email protected]
wrote:


Posted via http://www.ruby-forum.com/.

I’m guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:

2.0.0p195 :001 > message = ‘#bat with some #Ram
=> “#bat with some #Ram
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [[“#bat”], [“#Ram”]]

Thanks you…

Can you explain the below 3 outputs below :

a = “hello there”

a[/aeiou\1/, 0] #=> “ell”
a[/aeiou\1/, 1] #=> “l”
a[/aeiou\1/, 2] #=> nil

Hope that will help me what’s the actual use case of talking out
captures using numbers…

On Mon, Jan 27, 2014 at 3:47 PM, Arup R. [email protected]
wrote:


Posted via http://www.ruby-forum.com/.

I’m guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:

2.0.0p195 :001 > message = ‘#bat with some #Ram
=> “#bat with some #Ram
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [[“#bat”], [“#Ram”]]

In your example, Rubular reports 2 matches, and within each match a
single group. If you check in Rubular what Andrew and Joel proposed,
you will see just one match with 2 captured groups:

Jesus.

On 27.01.2014 16:38, Arup R. wrote:

Can you explain the below 3 outputs below :

a = “hello there”

a[/aeiou\1/, 0] #=> “ell”
a[/aeiou\1/, 1] #=> “l”
a[/aeiou\1/, 2] #=> nil

The regex /aeiou\1/ matches the substring “ell”. Specifically
[aeiou]
matches the “e”, the dot matches the “l” and \1 matches the second “l”.

Using 0 as the second argument to String#[] selects the whole match -
just like when you don’t supply a second argument at all. Using 1
selects
the contents matched by the first capturing group, which is “(.)”. Since
that
matched “l”, that’s what you get. Using 2 selects the second capturing
group,
but the regex /aeiou\1/ only contains one capturing group, so you
get nil.

Note that the concept of a capturing group has nothing to do with how
often
the regex can be matched in a given string. It’s solely a property of
the regex.
Specifically a capturing group is a part of the regex that’s enclosed in
parentheses
and does not start with “?:”, “?=” or similar modifiers that make a
group non-
capturing.

Sebastian H. wrote in post #1134556:

On 27.01.2014 16:38, Arup R. wrote:

Can you explain the below 3 outputs below :

a = “hello there”

a[/aeiou\1/, 0] #=> “ell”
a[/aeiou\1/, 1] #=> “l”
a[/aeiou\1/, 2] #=> nil

The regex /aeiou\1/ matches the substring “ell”. Specifically
[aeiou]
matches the “e”, the dot matches the “l” and \1 matches the second “l”.

Thank you very much! I got it now fully…

On Mon, Jan 27, 2014 at 9:06 AM, Jesús Gabriel y Galán <
[email protected]> wrote:

Why not then String#[] doesn’t work that way. Still I am in a confusion.
=> [["#bat"], ["#Ram"]]

Not exactly sure why you’d want the subgrouping with scan as it’s
creating
a nested array here.

[6] pry(main)> message.scan(/#\w+/) # => ["#bat*", "#Ram"*]

On Thu, Jan 30, 2014 at 12:24 AM, tamouse pontiki
[email protected] wrote:

can look for the second one:
I’m guessing that Rubular is checking for all the matches across the

[6] pry(main)> message.scan(/#\w+/) # => [“#bat”, “#Ram”]

You are right, I just copy pasted the original Regexp.

Jesus.

tamouse m. wrote in post #1134864:

On Mon, Jan 27, 2014 at 9:06 AM, Jesús Gabriel y Galán <
[email protected]> wrote:

Why not then String#[] doesn’t work that way. Still I am in a confusion.
=> [[“#bat”], [“#Ram”]]

Not exactly sure why you’d want the subgrouping with scan as it’s
creating
a nested array here.

[6] pry(main)> message.scan(/#\w+/) # => ["#bat*“, "#Ram”*]

I had some wrong perception about String#[].

If I do, message[/(#\w+)/], I had perception that, all the match groups
created, then using 1,2,3, as the second argument, I can access the
respective matched group’s content. But which is not the case,
understood from -
https://www.ruby-forum.com/topic/4422155?reply_to=1134864#1134556.

But yes, String#scan is enough for this purpose, as each match will be
a separate entry inside the array. So If I want first match I can call,
say ar[0], for second ar[1], so on…

On Wed, Feb 5, 2014 at 9:59 PM, Arup R. [email protected]
wrote:

can the below reular expression can be written in another way, to get
the same output ?

You should starting by matching only once to avoid unnecessary work.

(arup~>~)$ pry --simple-prompt

s = “315 Kw (422 Engine power (HP))”
=> “315 Kw (422 Engine power (HP))”
s[/(\d+)[^0-9](\d+)/,2]
=> “422”
s[/(\d+)[^0-9]
(\d+)/,1]
=> “315”

irb(main):001:0> s = “315 Kw (422 Engine power (HP))”
=> “315 Kw (422 Engine power (HP))”
irb(main):002:0> /(\d+)\D+(\d+)/ =~ s
=> 0
irb(main):003:0> kw = Integer($1)
=> 315
irb(main):004:0> hp = Integer($2)
=> 422

In this case you can also use String#scan

irb(main):005:0> kw, hp = s.scan(/\d+/).map {|m| Integer(m)}
=> [315, 422]
irb(main):006:0> kw
=> 315
irb(main):007:0> hp
=> 422

Downside is that you do not have good control over the match. I’ts
probably better to do something like

irb(main):008:0> /(\d+)\skw\s(\s*(\d+)/i =~ s
=> 0
irb(main):009:0> kw = Integer($1)
=> 315
irb(main):010:0> hp = Integer($2)
=> 422

That gives you a bit more confidence that the string looks the way you
expect. Of course you can extend that even more by adding anchors and
pattern for the trailing portion.

You can also use named captures:

irb(main):011:0> kw = hp = nil
=> nil
irb(main):012:0> /(?\d+)\skw\s(\s*(?\d+)/i =~ s
=> 0
irb(main):013:0> kw
=> “315”
irb(main):014:0> hp
=> “422”

Kind regards

robert

can the below reular expression can be written in another way, to get
the same output ?

(arup~>~)$ pry --simple-prompt

s = “315 Kw (422 Engine power (HP))”
=> “315 Kw (422 Engine power (HP))”

s[/(\d+)[^0-9]*(\d+)/,2]
=> “422”

s[/(\d+)[^0-9]*(\d+)/,1]
=> “315”

@Robert - Thanks for mentioning all these possibilities. A good learning
for me.