Forum: Ruby Regex Black Magic... how to stop matching if char?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Jon (Guest)
on 2007-03-30 19:34
I'm trying to translate a strange derivative of xml into valid xml. Here
is an example line:

<SUBEVENTSTATUS
1:2><OPERATIONNAME></OPERATIONNAME>gofast<OPERATIONSTATUS>stopped</OPERATIONSTATUS><TARGETOBJECTNAME>name</TARGETOBJECTNAME><TARGETOBJECTVALUE>val</TARGETOBJECTVALUE></SUBEVENTSTATUS
1:1><SUBEVENTSTATUS 2:2><......and on

REXML pukes on the <SUBEVENTSTATUS 1:2> tag... which it should. There
should be some kind of attribute declaration instead. I want to
translate it to something like this: <SUBEVENTSTATUS no="1" of="2">

I'm trying to make a regex to detect the funny tags. Here is what I have
so far:

xml_fix=/<(\S+)\s+(\d+):(\d+)>/

This is great, but it will match this:

<Request><code_set_list 1:2>

instead of just this:

<code_set_list 1:2>

...because there is no gauranteed whitespace between tags. Basically, I
need to stop matching if a ">" is found. I've never had to deal with
anything quite like this in my regex experience. Any help or thoughts of
a better way to do things is much appreciated!
Robert K. (Guest)
on 2007-03-30 19:40
(Received via mailing list)
On 30.03.2007 17:34, Jon wrote:
>
>
> <code_set_list 1:2>
>
> ..because there is no gauranteed whitespace between tags. Basically, I
> need to stop matching if a ">" is found. I've never had to deal with
> anything quite like this in my regex experience. Any help or thoughts of
> a better way to do things is much appreciated!

I can think of several solutions:

/<([^>\s]+)\s+(\d+):(\d+)>/

Or even a two phased approach

/<[^>]+>/

and then with the match
/(\d+):(\d+)>\z/

HTH

  robert
Jon F. (Guest)
on 2007-03-30 19:43
Robert K. wrote:
> On 30.03.2007 17:34, Jon wrote:
>>
>>
>> <code_set_list 1:2>
>>
>> ..because there is no gauranteed whitespace between tags. Basically, I
>> need to stop matching if a ">" is found. I've never had to deal with
>> anything quite like this in my regex experience. Any help or thoughts of
>> a better way to do things is much appreciated!
>
> I can think of several solutions:
>
> /<([^>\s]+)\s+(\d+):(\d+)>/
>
> Or even a two phased approach
>
> /<[^>]+>/
>
> and then with the match
> /(\d+):(\d+)>\z/
>
> HTH
>
>   robert


awesome, and thank you! but for my benefit, could you explain why that
works? I thought ^ was line start?
F. Senault (Guest)
on 2007-03-30 19:46
(Received via mailing list)
Le 30 mars à 17:34, Jon a écrit :

> ..because there is no gauranteed whitespace between tags. Basically, I
> need to stop matching if a ">" is found. I've never had to deal with
> anything quite like this in my regex experience. Any help or thoughts of
> a better way to do things is much appreciated!

I'd simply use /<[^>]+\s+(\d+):(\d+)>/ (untested, but you get my
drift)...

Fred
Rob B. (Guest)
on 2007-03-30 20:18
(Received via mailing list)
On Mar 30, 2007, at 11:43 AM, Jon Fi wrote:

>>> thoughts of
>> and then with the match
>> /(\d+):(\d+)>\z/
>>
>> HTH
>>
>>   robert
>
>
> awesome, and thank you! but for my benefit, could you explain why that
> works? I thought ^ was line start?

Within a character set it inverts the selection so [^>] matches any
character that's NOT a '>'

My solution is:   .gsub(/<([^>]*?\b\s+)(\d+):(\d+)>/, '<\1no="\2"
of="\3">')

-Rob

Rob B.    http://agileconsultingllc.com
removed_email_address@domain.invalid
Brian C. (Guest)
on 2007-03-31 12:33
(Received via mailing list)
On Sat, Mar 31, 2007 at 12:34:25AM +0900, Jon wrote:
>
> xml_fix=/<(\S+)\s+(\d+):(\d+)>/
>
> This is great, but it will match this:
>
> <Request><code_set_list 1:2>
>
> instead of just this:
>
> <code_set_list 1:2>

Try (\w+) instead of (\S+)
This topic is locked and can not be replied to.