Forum: Ruby Multiple matching with ()*

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Alessandro Re (Guest)
on 2007-07-31 17:35
(Received via mailing list)
Hi there!
I'm Alessandro from Italy and I started using ruby some days ago,
so... Hello, Community! :)

Well, I was trying to match a pattern multiple times. I tried both
with normal match() and scan(), but i can't get the desired result.

The subject string is something like:
"1a2bend" or "beg1a2b3c4dend"
more generally, it should match /^beg(\d\w)*end$/ : always a begin and
ending pattern, and a unspecified number of central pattern.
The problem is that the central pattern must be extracted for every
time it's encountered.
For example, trying with
"x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
returns
[["x", "4D", "z"]]
while i need something like
[["x", "1A", "2B", "3C", "4D", "z"]]

Why does ()* match just the last one? How can i get all the ()* that it
matches?

Probabily i'm doing something wrong, but can't understand where :\

Thanks!
Jano S. (Guest)
on 2007-07-31 17:50
(Received via mailing list)
On 7/31/07, Alessandro Re <removed_email_address@domain.invalid> wrote:
> ending pattern, and a unspecified number of central pattern.
>
> Probabily i'm doing something wrong, but can't understand where :\

Try:

 if "x1A2B3C4Dz" =~ /^(x)((?:\d\w)*)(z)$/

     return [
Jano S. (Guest)
on 2007-07-31 17:58
(Received via mailing list)
On 7/31/07, Alessandro Re <removed_email_address@domain.invalid> wrote:
> ending pattern, and a unspecified number of central pattern.
>
> Probabily i'm doing something wrong, but can't understand where :\

Try:

if "x1A2B3C4Dz" =~ /^(x)((?:\d\w)*)(z)$/
    a, b = $1, $3 #
    return [a] + $2.scan(/\d\w/).flatten + [b]
end

I don't know if it's possible to do it in one run though, maybe you
could use split as well...
Take care when doing nested searches as they will overwrite $1..9
(that's why I used a and b)

J.
Harry K. (Guest)
on 2007-07-31 18:02
(Received via mailing list)
On 7/31/07, Alessandro Re <removed_email_address@domain.invalid> wrote:
> For example, trying with
> "x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
> returns
> [["x", "4D", "z"]]
> while i need something like
> [["x", "1A", "2B", "3C", "4D", "z"]]
>
Hi,

Try this.

str = "x1A2B3C4Dz"
p str.scan(/\d?\w/)    #>["x", "1A", "2B", "3C", "4D", "z"]

Harry
Alessandro Re (Guest)
on 2007-07-31 18:13
(Received via mailing list)
Mh well, to me it seems a normal regex processing (i mean, it *should*
require only one instruction, since this pattern can be read with just
one regex, even if ruby doesn't allow it... but it would be really
bad).
Anyway well, splitting it there are different ways to do it - thanks
for your sudjestion.
But if ruby make it possible with one call, i'd prefer to use it.

Thanks!
Robert K. (Guest)
on 2007-07-31 18:57
(Received via mailing list)
2007/7/31, Alessandro Re <removed_email_address@domain.invalid>:
> Mh well, to me it seems a normal regex processing (i mean, it *should*
> require only one instruction, since this pattern can be read with just
> one regex, even if ruby doesn't allow it... but it would be really
> bad).
> Anyway well, splitting it there are different ways to do it - thanks
> for your sudjestion.
> But if ruby make it possible with one call, i'd prefer to use it.

irb(main):006:0> s="x1A2B3C4Dz"
=> "x1A2B3C4Dz"
irb(main):007:0> s.scan /x(\d\w)*z/
=> [["4D"]]
irb(main):008:0> s.scan /x((?:\d\w)*?)z/
=> [["1A2B3C4D"]]
irb(main):009:0> s.scan(/x((?:\d\w)*?)z/).map {|a| a[0].scan(/\d\w/)}
=> [["1A", "2B", "3C", "4D"]]

Kind regards

robert
Alessandro Re (Guest)
on 2007-07-31 19:22
(Received via mailing list)
Thanks, this is an interesting solution!
botp (Guest)
on 2007-07-31 19:24
(Received via mailing list)
On 7/31/07, Alessandro Re <removed_email_address@domain.invalid> wrote:
> Mh well, to me it seems a normal regex processing (i mean, it *should*
> require only one instruction, since this pattern can be read with just
> one regex, even if ruby doesn't allow it... but it would be really bad).

seems like you have a pattern within a pattern.
it may be easy to unwrap outer pattern first, then work on the inner
pattern. something like,

irb(main):096:0> "lol1a2vasd".scan(/lol(.+)asd/).to_s.scan(/\d\w/)
=> ["1a", "2v"]
irb(main):097:0> "beg1a2vend".scan(/beg(.+)end/).to_s.scan(/\d\w/)
=> ["1a", "2v"]
irb(main):098:0>
"beg1a2vendxbeg3c4dend".scan(/beg(.+)end/).to_s.scan(/\d\w/)
=> ["1a", "2v", "3c", "4d"]

is that ok?
kind regards -botp
Alessandro Re (Guest)
on 2007-07-31 19:30
(Received via mailing list)
Thanks, but i need to match the pattern OR don't match anything.
"lol1a2vasd".scan(/\d?\w/) => ["l", "o", "l", "1a", "2v", "a", "s", "d"]
while i need to be sure that the pattern begins with a regex "x" and
ends with "z"

(of course, x 1 a 2 b 3 c should be regexes not just chars)

thanks, you help is apreciated :)
Wolfgang N. (Guest)
on 2007-08-01 01:41
Alessandro Re wrote:
> For example, trying with
> "x1A2B3C4Dz".scan /^(x)(\d\w)*(z)$/
> returns
> [["x", "4D", "z"]]
> while i need something like
> [["x", "1A", "2B", "3C", "4D", "z"]]

Does this goes more into the direction you wanted:

irb(main):001:0> "x1A2B3C4Dz".scan
/(?:^(?:x)|\G)(\d\w)(?=(?:\d\w)*(?:z)$)/
=> [["1A"], ["2B"], ["3C"], ["4D"]]

???

Wolfgang Nádasi-Donner
Harry K. (Guest)
on 2007-08-01 03:50
(Received via mailing list)
On 7/31/07, Alessandro Re <removed_email_address@domain.invalid> wrote:
> while i need to be sure that the pattern begins with a regex "x" and
> ends with "z"
>
> (of course, x 1 a 2 b 3 c should be regexes not just chars)
>
Sorry, I misunderstood what you wanted.
Is this more like it?

str = "lol1a2vasd"
m = /^(\w{3})(.*)(\w{3})$/.match(str).captures
m[1] = m[1].scan(/\d\w/)
p m.flatten #>  ["lol","1a","2v","asd"]

Harry
Robert K. (Guest)
on 2007-08-02 02:08
(Received via mailing list)
On 31.07.2007 17:18, Alessandro Re wrote:
>>> But if ruby make it possible with one call, i'd prefer to use it.
>> irb(main):006:0> s="x1A2B3C4Dz"
>> => "x1A2B3C4Dz"
>> irb(main):007:0> s.scan /x(\d\w)*z/
>> => [["4D"]]
>> irb(main):008:0> s.scan /x((?:\d\w)*?)z/
>> => [["1A2B3C4D"]]
>> irb(main):009:0> s.scan(/x((?:\d\w)*?)z/).map {|a| a[0].scan(/\d\w/)}
>> => [["1A", "2B", "3C", "4D"]]

Give special attention to my usage of the reluctant qualifier which is
mandatory if your input contains multiple begin end pairs.

Kind regards

  robert


PS: please do not top post.
Alessandro Re (Guest)
on 2007-08-02 13:25
(Received via mailing list)
On 8/1/07, Harry K. <removed_email_address@domain.invalid> wrote:
> m = /^(\w{3})(.*)(\w{3})$/.match(str).captures
> m[1] = m[1].scan(/\d\w/)
> p m.flatten #>  ["lol","1a","2v","asd"]
>
> Harry
>
> --
> A Look into Japanese Ruby List in English
> http://www.kakueki.com/
>
>

Yep, it's like this.
I solved using 2 instructions as you did: first matching extern words,
then the middle ones, but i still think that one regex would have been
nicer :)

Thanks guys
Wolfgang N. (Guest)
on 2007-08-02 14:19
Alessandro Re wrote:
> ...but i still think that one regex would have been nicer :)

I don't think, that this will be "nice"...

irb(main):001:0>
"x1A2B3C4Dz".scan(/(?:\G|^(?:x))(x|\d\w|z)(?=(?:\d\w)*(?:z|)$)/)
=> [["x"], ["1A"], ["2B"], ["3C"], ["4D"], ["z"]]

..., and I didn't test it aganst wrong lines, but after a "flatten" it
ends up with the required result.

Wolfgang Nádasi-Donner
Alessandro Re (Guest)
on 2007-08-04 14:13
(Received via mailing list)
On 8/2/07, Wolfgang Nádasi-donner <removed_email_address@domain.invalid> wrote:
> irb(main):001:0>
> "x1A2B3C4Dz".scan(/(?:\G|^(?:x))(x|\d\w|z)(?=(?:\d\w)*(?:z|)$)/)
> => [["x"], ["1A"], ["2B"], ["3C"], ["4D"], ["z"]]

Wonderful :)
Thanks!
Robert K. (Guest)
on 2007-08-06 10:33
(Received via mailing list)
2007/8/4, Alessandro Re <removed_email_address@domain.invalid>:
> On 8/2/07, Wolfgang Nádasi-donner <removed_email_address@domain.invalid> wrote:
> > irb(main):001:0>
> > "x1A2B3C4Dz".scan(/(?:\G|^(?:x))(x|\d\w|z)(?=(?:\d\w)*(?:z|)$)/)
> > => [["x"], ["1A"], ["2B"], ["3C"], ["4D"], ["z"]]
>
> Wonderful :)
> Thanks!

But this does not seem to work with strings that contain multiple
sections:

irb(main):002:0>
"x1A2B3C4Dz1a".scan(/(?:\G|^(?:x))(x|\d\w|z)(?=(?:\d\w)*(?:z|)$)/)
=> []

So it's not suited for a one RX approach and still need two levels of
RX. If that's the case then we have seen simpler solutions for that.
(Btw, one reason why it's so awkward is that there is no lookbehind in
Ruby 1.8 - but this will change.)

Kind regards

robert
Wolfgang N. (Guest)
on 2007-08-06 10:51
Robert K. wrote:
> (Btw, one reason why it's so awkward is that there is no lookbehind in
> Ruby 1.8 - but this will change.)

I am waiting for this Christmas gift too...

Wolfgang Nádasi-Donner
Harry K. (Guest)
on 2007-09-26 01:09
(Received via mailing list)
On 7/31/07, Alessandro Re <removed_email_address@domain.invalid> wrote:
> Thanks, but i need to match the pattern OR don't match anything.
> "lol1a2vasd".scan(/\d?\w/) => ["l", "o", "l", "1a", "2v", "a", "s", "d"]
> while i need to be sure that the pattern begins with a regex "x" and
> ends with "z"

str = "lol1a2vasd"
p str.scan(/\d\w|\w{3}/)

Harry
This topic is locked and can not be replied to.