Forum: Ruby Can't get subgroup of regex to repeat with +... what the ?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
561bf330b4683812331de818132c8c93?d=identicon&s=25 Jon (Guest)
on 2007-05-16 22:26
I'm trying to match these kinds of malformatted xml tags. I'm beginning
to question my sanity, so i'm posting here.

Example Strings:
=====
A=" <orderMsg biz=0>"
B=" <orderMsg type=7 size=0>"
C=" <orderMsg type=7 size=0 biz=1>"
=====


I've come up with this regex:
=====
/<(\w+?)(?:\s(\w+)=(\w+))+>/
=====


But when matching string B from above:
=====
md=/<(\w+?)(?:\s(\w+)=(\w+))+>/.match(B)
=====


It will do this:
======
md[0]=<orderMsg type=7 size=0>
md[1]=orderMsg
md[2]=size
md[3]=0
nil
nil
nil
nil
=======


Why isn't the final + sign making the pattern "(?:\s(\w+)=(\w+))"
repeat?

As an exercise... /<(\w+?)(?:\s(\w+)=(\w+))(?:\s(\w+)=(\w+))>/ DOES
match String B from above. What the heck???
A131b672fdbd2a58dce12031ad78b121?d=identicon&s=25 Wolfgang Nádasi-Donner (wonado)
on 2007-05-16 22:57
Jon wrote:
> B=" <orderMsg type=7 size=0>"
> ...
> /<(\w+?)(?:\s(\w+)=(\w+))+>/
> ...
> md[0]=<orderMsg type=7 size=0>
> md[1]=orderMsg
> md[2]=size
> md[3]=0

It is correct. "(?:\s(\w+)=(\w+))+" matches two times, the last match is
with "size" and "0". The groups will be overwritten each time the "+"
will repeat the group.

Wolfgang Nádasi-Donner
561bf330b4683812331de818132c8c93?d=identicon&s=25 Jon Fi (exabrial)
on 2007-05-16 22:59
Wolfgang Nádasi-donner wrote:
> Jon wrote:
>> B=" <orderMsg type=7 size=0>"
>> ...
>> /<(\w+?)(?:\s(\w+)=(\w+))+>/
>> ...
>> md[0]=<orderMsg type=7 size=0>
>> md[1]=orderMsg
>> md[2]=size
>> md[3]=0
>
> It is correct. "(?:\s(\w+)=(\w+))+" matches two times, the last match is
> with "size" and "0". The groups will be overwritten each time the "+"
> will repeat the group.
>
> Wolfgang Nádasi-Donner

Ah ok. So how can I get it to repeat without overwriting the existing
values for the group? Or is there a better way to do this?
A131b672fdbd2a58dce12031ad78b121?d=identicon&s=25 Wolfgang Nádasi-Donner (wonado)
on 2007-05-16 23:34
Jon Fi wrote:
> Ah ok. So how can I get it to repeat without overwriting the existing
> values for the group? Or is there a better way to do this?

I would do it somehow like:

========== code ==========
texts = [ "<orderMsg biz=0>",
          "<orderMsg type=7 size=0>",
          "<orderMsg type=7 size=0 biz=1>"]

texts.each do |txt|
  if (md=txt.match(/<(\w+?)((?:\s\w+=\w+)+)>/))
    puts "\nkey '#{md[1]}' found"
    md[2].scan(/\s(\w+)=(\w+)/) do |k, v|
      puts "  parameter '#{k}' has value '#{v}'"
    end
  else
    puts "+++ no match for '#{txt}'"
  end
end
========= result =========
key 'orderMsg' found
  parameter 'biz' has value '0'

key 'orderMsg' found
  parameter 'type' has value '7'
  parameter 'size' has value '0'

key 'orderMsg' found
  parameter 'type' has value '7'
  parameter 'size' has value '0'
  parameter 'biz' has value '1'
========== end ===========

Wolfgang Nádasi-Donner
2f4d4f9c35ea851bffb9a9cc2e086365?d=identicon&s=25 Harry Kakueki (Guest)
on 2007-05-17 03:38
(Received via mailing list)
On 5/17/07, Jon <exabrial@gmail.com> wrote:
> /<(\w+?)(?:\s(\w+)=(\w+))+>/
> repeat?
>
> As an exercise... /<(\w+?)(?:\s(\w+)=(\w+))(?:\s(\w+)=(\w+))>/ DOES
> match String B from above. What the heck???
>
> --
> Posted via http://www.ruby-forum.com/.
>
>
Hi,

Unless you really want to write one regular expression for it all, you
could do something like this.

Split on spaces, then on '=' . Then process however you want.

r = B.strip.split(/\s/)
p r
r[1..-1].each {|f| p f.split("=")}

Harry


--

A Look into Japanese Ruby List in English
http://www.kakueki.com/
2f4d4f9c35ea851bffb9a9cc2e086365?d=identicon&s=25 Harry Kakueki (Guest)
on 2007-05-17 06:22
(Received via mailing list)
On 5/17/07, Harry Kakueki <list.push@gmail.com> wrote:
> could do something like this.
>
> Split on spaces, then on '=' . Then process however you want.
>
> r = B.strip.split(/\s/)
> p r
> r[1..-1].each {|f| p f.split("=")}
>
> Harry
>

Sorry for the double post.
This is a little cleaner and easier, I think.

C.strip.delete("<>").split(/\s/).each {|f| p f.split("=")}

Harry


--

A Look into Japanese Ruby List in English
http://www.kakueki.com/
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-05-18 09:55
(Received via mailing list)
On 16.05.2007 22:59, Jon Fi wrote:
>> It is correct. "(?:\s(\w+)=(\w+))+" matches two times, the last match is
>> with "size" and "0". The groups will be overwritten each time the "+"
>> will repeat the group.
>>
>> Wolfgang Nádasi-Donner
>
> Ah ok. So how can I get it to repeat without overwriting the existing
> values for the group?

You can't.

> Or is there a better way to do this?

Probably.  I am not sure what you are up to but you can use a two stage
approach like this:

texts = [
   " <orderMsg biz=0>",
   " <orderMsg type=7 size=0>",
   " <orderMsg type=7 size=0 biz=1>",
]

texts.each do |t|
   p t
   md = /<([^\s>]+)((?:\s+\w+=\d+)*)/.match t

   if md
     tag = md[1]
     attrs = md[2]

     puts tag

     attrs.scan(/(\w+)=(\d+)/) do |m|
       print m[0], "=>", m[1], "\n"
     end
   end
end

Kind regards

  robert
2f4d4f9c35ea851bffb9a9cc2e086365?d=identicon&s=25 Harry Kakueki (Guest)
on 2007-05-18 13:22
(Received via mailing list)
On 5/17/07, Jon Fi <exabrial@gmail.com> wrote:
>
> Ah ok. So how can I get it to repeat without overwriting the existing
> values for the group? Or is there a better way to do this?
>

If you want to use regular expressions, try 'scan'.

c=" <orderMsg type=7 size=0 biz=1>"
c.scan(/\w+=?\w+/).each {|f| p f.split("=")}

Modify the regular expression as necessary.

Harry



--

A Look into Japanese Ruby List in English
http://www.kakueki.com/
561bf330b4683812331de818132c8c93?d=identicon&s=25 Jon Fi (exabrial)
on 2007-05-18 18:10
Harry Kakueki wrote:
> On 5/17/07, Jon Fi <exabrial@gmail.com> wrote:

> If you want to use regular expressions, try 'scan'.
>
> c=" <orderMsg type=7 size=0 biz=1>"
> c.scan(/\w+=?\w+/).each {|f| p f.split("=")}
>
> Modify the regular expression as necessary.
>
> Harry

Brilliant. Exactly what i was looking for.
This topic is locked and can not be replied to.