Forum: Ruby Re: repeated regular expressions -- Need addition to Matchda

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
C5be24289f1471f3da84864a6677af12?d=identicon&s=25 Garance A Drosehn (Guest)
on 2005-12-16 03:20
(Received via mailing list)
On 12/15/05, William James <w_a_x_man@yahoo.com> wrote:
> >
> >     destword = ["bill ", "bob"]
> >    ["copy", "apple pear plum peach ", "peach ", "after", "bill bob", "bob"]
>   dest_word = md.captures.last.split
>   p src_food, dest_word
> }

This does happen to solve my specific example, but...

...that split only works because "you know" what the repeating
pattern is.  While it does not explicitly repeat the original regex,
that split does the job only because the repeating pattern does
not include blanks.  Don't look at the *specific* pattern that I
am repeating, but try to imagine *any* repeating pattern at
that point in the example.  Right now, can we come up with a
solution where I can replace that pattern with *anything* I want
to repeat, and the solution still works?

Right now what ruby does is it only saves the *last* copy of
however many things it matched.  It does seem to me that it
should save *all* copies of what it matched -- somewhere.

For instance, let's say that Matchdata included another
method called "repeated", and that method returns an array.
This array has the same number of elements as captures
does.  If the pattern-segment for captures[0] is NOT a repeating
pattern, then repeated[0] returns nil.  If captures[1] is tied to a
pattern-segment that does (possibly) repeat, then repeated[1]
returns an array of strings, one element for each time that
pattern-segment was found.

Eg:

 /^(copy|duplicate) \s+ (\w+\s+)+ (before|after) \s+ (\w+\s*)+ $/x

used to match against the string:
    "copy apple pear plum peach after bill bob'

$~.captures[0] == "copy"
$~.repeated[0] == nil
$~.captures[1] == "peach "
$~.repeated[1] == {"apple ", "pear ", "plum ", "peach "}
$~.captures[2] == "after"
$~.repeated[2] == nil
$~.captures[3] == "bob"
$~.repeated[3] == {"bill ", "bob"}

Note that I wouldn't even need to add the extra '()' around
'(\w+\s+)+' if ruby provided something like this.

Of course, the next question is why not just make captures[1]
be the array of "things" which were repeatedly matched, instead
of only holding the last-instance of that repeated pattern.  That
would work fine, IMO, although I guess it might break the scripts
of some people.
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2005-12-16 07:54
(Received via mailing list)
Garance A Drosehn wrote:

> returns an array of strings, one element for each time that
> $~.repeated[0] == nil
> Of course, the next question is why not just make captures[1]
> be the array of "things" which were repeatedly matched, instead
> of only holding the last-instance of that repeated pattern.  That
> would work fine, IMO, although I guess it might break the scripts
> of some people.

Looks like a logical and natural extension.  Until it's added, perhaps
something like this would suffice:

pat1 = /\w+\s+/
pat2 = /\w+\s*/
DATA.each {|line|   line.chomp!
  md =
    /^(?:copy|duplicate) \s+
      ((?: #{ pat1 } )+)
      (?:after|before) \s+
      ((?: #{ pat2 } )+) $
    /x.match( line )
  p md.captures
  src_food = md.captures.first.scan( pat1 )
  dest_word = md.captures.last.scan( pat2 )
  p src_food, dest_word
}

__END__
copy apple pear plum peach after bill bob
duplicate tomato before joe alice alfred tommy jane
This topic is locked and can not be replied to.