Can't get subgroup of regex to repeat with +... what the?


#1

I’m trying to match these kinds of malformatted xml tags. I’m beginning
to question my sanity, so i’m posting here.

Example Strings:

A=" "
B=" "
C=" "

I’ve come up with this regex:

/<(\w+?)(?:\s(\w+)=(\w+))+>/

But when matching string B from above:

md=/<(\w+?)(?:\s(\w+)=(\w+))+>/.match(B)

It will do this:

md[0]=
md[1]=orderMsg
md[2]=size
md[3]=0
nil
nil
nil
nil

Why isn’t the final + sign making the pattern “(?:\s(\w+)=(\w+))”
repeat?

As an exercise… /<(\w+?)(?:\s(\w+)=(\w+))(?:\s(\w+)=(\w+))>/ DOES
match String B from above. What the heck???


#2

Jon wrote:

B=" "

/<(\w+?)(?:\s(\w+)=(\w+))+>/

md[0]=
md[1]=orderMsg
md[2]=size
md[3]=0

It is correct. “(?:\s(\w+)=(\w+))+” matches two times, the last match is
with “size” and “0”. The groups will be overwritten each time the “+”
will repeat the group.

Wolfgang Nádasi-Donner


#3

Wolfgang Nádasi-donner wrote:

Jon wrote:

B=" "

/<(\w+?)(?:\s(\w+)=(\w+))+>/

md[0]=
md[1]=orderMsg
md[2]=size
md[3]=0

It is correct. “(?:\s(\w+)=(\w+))+” matches two times, the last match is
with “size” and “0”. The groups will be overwritten each time the “+”
will repeat the group.

Wolfgang Nádasi-Donner

Ah ok. So how can I get it to repeat without overwriting the existing
values for the group? Or is there a better way to do this?


#4

Jon Fi wrote:

Ah ok. So how can I get it to repeat without overwriting the existing
values for the group? Or is there a better way to do this?

I would do it somehow like:

========== code ==========
texts = [ “”,
“”,
“”]

texts.each do |txt|
if (md=txt.match(/<(\w+?)((?:\s\w+=\w+)+)>/))
puts “\nkey ‘#{md[1]}’ found”
md[2].scan(/\s(\w+)=(\w+)/) do |k, v|
puts " parameter ‘#{k}’ has value ‘#{v}’"
end
else
puts “+++ no match for ‘#{txt}’”
end
end
========= result =========
key ‘orderMsg’ found
parameter ‘biz’ has value ‘0’

key ‘orderMsg’ found
parameter ‘type’ has value ‘7’
parameter ‘size’ has value ‘0’

key ‘orderMsg’ found
parameter ‘type’ has value ‘7’
parameter ‘size’ has value ‘0’
parameter ‘biz’ has value ‘1’
========== end ===========

Wolfgang Nádasi-Donner


#5

On 5/17/07, Jon removed_email_address@domain.invalid wrote:

/<(\w+?)(?:\s(\w+)=(\w+))+>/
repeat?

As an exercise… /<(\w+?)(?:\s(\w+)=(\w+))(?:\s(\w+)=(\w+))>/ DOES
match String B from above. What the heck???


Posted via http://www.ruby-forum.com/.

Hi,

Unless you really want to write one regular expression for it all, you
could do something like this.

Split on spaces, then on ‘=’ . Then process however you want.

r = B.strip.split(/\s/)
p r
r[1…-1].each {|f| p f.split("=")}

Harry

A Look into Japanese Ruby List in English
http://www.kakueki.com/


#6

On 16.05.2007 22:59, Jon Fi wrote:

It is correct. “(?:\s(\w+)=(\w+))+” matches two times, the last match is
with “size” and “0”. The groups will be overwritten each time the “+”
will repeat the group.

Wolfgang Nádasi-Donner

Ah ok. So how can I get it to repeat without overwriting the existing
values for the group?

You can’t.

Or is there a better way to do this?

Probably. I am not sure what you are up to but you can use a two stage
approach like this:

texts = [
" ",
" ",
" ",
]

texts.each do |t|
p t
md = /<([^\s>]+)((?:\s+\w+=\d+)*)/.match t

if md
tag = md[1]
attrs = md[2]

 puts tag

 attrs.scan(/(\w+)=(\d+)/) do |m|
   print m[0], "=>", m[1], "\n"
 end

end
end

Kind regards

robert


#7

On 5/17/07, Jon Fi removed_email_address@domain.invalid wrote:

Ah ok. So how can I get it to repeat without overwriting the existing
values for the group? Or is there a better way to do this?

If you want to use regular expressions, try ‘scan’.

c=" "
c.scan(/\w+=?\w+/).each {|f| p f.split("=")}

Modify the regular expression as necessary.

Harry

A Look into Japanese Ruby List in English
http://www.kakueki.com/


#8

Harry K. wrote:

On 5/17/07, Jon Fi removed_email_address@domain.invalid wrote:

If you want to use regular expressions, try ‘scan’.

c=" "
c.scan(/\w+=?\w+/).each {|f| p f.split("=")}

Modify the regular expression as necessary.

Harry

Brilliant. Exactly what i was looking for.


#9

On 5/17/07, Harry K. removed_email_address@domain.invalid wrote:

could do something like this.

Split on spaces, then on ‘=’ . Then process however you want.

r = B.strip.split(/\s/)
p r
r[1…-1].each {|f| p f.split("=")}

Harry

Sorry for the double post.
This is a little cleaner and easier, I think.

C.strip.delete("<>").split(/\s/).each {|f| p f.split("=")}

Harry

A Look into Japanese Ruby List in English
http://www.kakueki.com/