Regex Black Magic... how to stop matching if char?

jon · March 30, 2007, 5:34pm

I’m trying to translate a strange derivative of xml into valid xml. Here
is an example line:

<SUBEVENTSTATUS
1:2>gofaststoppednameval</SUBEVENTSTATUS
1:1><SUBEVENTSTATUS 2:2><…and on

REXML pukes on the <SUBEVENTSTATUS 1:2> tag… which it should. There
should be some kind of attribute declaration instead. I want to
translate it to something like this:

I’m trying to make a regex to detect the funny tags. Here is what I have
so far:

xml_fix=/<(\S+)\s+(\d+):(\d+)>/

This is great, but it will match this:

<code_set_list 1:2>

instead of just this:

<code_set_list 1:2>

…because there is no gauranteed whitespace between tags. Basically, I
need to stop matching if a “>” is found. I’ve never had to deal with
anything quite like this in my regex experience. Any help or thoughts of
a better way to do things is much appreciated!

jon · March 30, 2007, 5:40pm

On 30.03.2007 17:34, Jon wrote:

<code_set_list 1:2>

…because there is no gauranteed whitespace between tags. Basically, I
need to stop matching if a “>” is found. I’ve never had to deal with
anything quite like this in my regex experience. Any help or thoughts of
a better way to do things is much appreciated!

I can think of several solutions:

/<([^>\s]+)\s+(\d+):(\d+)>/

Or even a two phased approach

/<[^>]+>/

and then with the match
/(\d+):(\d+)>\z/

HTH

robert

jon · March 30, 2007, 5:43pm

Robert K. wrote:

On 30.03.2007 17:34, Jon wrote:

<code_set_list 1:2>

…because there is no gauranteed whitespace between tags. Basically, I
need to stop matching if a “>” is found. I’ve never had to deal with
anything quite like this in my regex experience. Any help or thoughts of
a better way to do things is much appreciated!

I can think of several solutions:

/<([^>\s]+)\s+(\d+):(\d+)>/

Or even a two phased approach

/<[^>]+>/

and then with the match
/(\d+):(\d+)>\z/

HTH

robert

awesome, and thank you! but for my benefit, could you explain why that
works? I thought ^ was line start?

jon · March 30, 2007, 5:46pm

Le 30 mars à 17:34, Jon a écrit :

…because there is no gauranteed whitespace between tags. Basically, I
need to stop matching if a “>” is found. I’ve never had to deal with
anything quite like this in my regex experience. Any help or thoughts of
a better way to do things is much appreciated!

I’d simply use /<[^>]+\s+(\d+):(\d+)>/ (untested, but you get my
drift)…

Fred

jon · March 30, 2007, 6:18pm

On Mar 30, 2007, at 11:43 AM, Jon Fi wrote:

thoughts of
and then with the match
/(\d+):(\d+)>\z/

HTH

robert

awesome, and thank you! but for my benefit, could you explain why that
works? I thought ^ was line start?

Within a character set it inverts the selection so [^>] matches any
character that’s NOT a ‘>’

My solution is: .gsub(/<([^>]*?\b\s+)(\d+):(\d+)>/, ‘<\1no=“\2”
of=“\3”>’)

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

jon · March 31, 2007, 10:33am

On Sat, Mar 31, 2007 at 12:34:25AM +0900, Jon wrote:

xml_fix=/<(\S+)\s+(\d+):(\d+)>/

This is great, but it will match this:

<code_set_list 1:2>

instead of just this:

<code_set_list 1:2>

Try (\w+) instead of (\S+)