Negative lookahead in Regexp question

Axel_E · June 16, 2007, 5:23pm

Dear all,

I have strings like this one:

" In North Carolina "

and I’d like to match the part in between the brackets with
a regexp with negative lookahead excluding a substring,
( in this case, rather than just single characters),
but I can’t get it right…

Thanks for your help,

Axel

Axel_E · June 16, 2007, 6:17pm

On Jun 16, 11:22 am, “Axel E.” [email protected] wrote:

Thanks for your help,

Axel

–
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen:http://www.gmx.net/de/go/multimessenger

Can you provide a few examples of what you’re looking for? I’m not
sure what you’re asking but it doesn’t sound too bad.

Byron

Axel_E · June 16, 2007, 8:41pm

Dear Byron,

thank you for responding.
I am working on an analysis of so-called chunked text,
i.e., an analysis of words in a sentence, that
classifies words as nouns / verbs / adjectives etc.

A typical sentence with chunking tags thus looks like this:

" The physical descriptions of places in
North Carolina , in so far as
they are specific at all
, owe a little to memories
of my childhood , although I
've also borrowed indiscriminately from
other people 's childhood memories as well
."

Originally, I wanted to use Regexps to split the original sentence
into groups using negative lookahead, which I’ve now skipped in favor
of repeated Array.splits, but I think I could you use knowing how to
search for a substring using negative lookahead, i.e., as in my example:

regexp=/…/ <= searched for, such that:
string=" In North Carolina "
ref=regexp.match(string)
p ref[1] => “In North Carolina”

Thank you for any help!

Best regards,

Axel

Axel_E · June 16, 2007, 9:18pm

Wouldn’t it be better to just use a xml parser?

On 6/16/07, Axel E. [email protected] wrote:

Thank you for any help!

Best regards,

Axel

Psssst! Schon vom neuen GMX MultiMessenger gehÃ¶rt?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

–
“Es tambiÃ©n nuestra intenciÃ³n erradicar la corrupciÃ³n, ofreciendo como
norma la honestidad, la idoneidad y la eficiencia. Con madurez y
sentido de unidad es fÃ¡cil pensar en la recomposiciÃ³n del ser
argentino. Ese ser argentino, basado en madurez y en sentido de
unidad, permitirÃ¡ inspirar para elevarnos por encima de la miseria que
la antinomia nos ha planteado, para dejar, de una vez por todas, ese
ser “anti” y ser, de una vez por todas, “pro”: “Pro argentinos””

Jorge Rafael Videla para el 25 de mayo de 1976

Axel_E · June 16, 2007, 9:21pm

Aureliano,

no - since the tags are not XML tags, and since
I wanted to know about negative lookahead
for regexps …

Best regards,

Axel

Axel_E · June 16, 2007, 10:50pm

Dear Byron,

Does that help?

Yes, very much ! Thank you for your time!

Best regards,

Axel

Axel_E · June 16, 2007, 9:55pm

On Jun 16, 2:40 pm, “Axel E.” [email protected] wrote:

regexp=/…/ <= searched for, such that:
string=“ In North Carolina ”
ref=regexp.match(string)
p ref[1] => “In North Carolina”

This will work pretty well (works for the above):
/<\w+>(.*?)</\w+>/

The only thing fancy there is making the .* non-greedy by adding .*?.
This means it will take the shortest possible match instead of the
longest.

But it will not work as I think you would want with a string of nested
clauses. If you want to include internal clauses then you would need
to make sure that the close tag matches the open tag. The side effect
is that you’ll need to have another sub match within the regex.

So consider:
/<(\w+)>(.*?)</\1>/

Example:
irb(main):033:0> str = “In North Carolina adsf ”
=> “In North Carolina adsf ”
irb(main):034:0> re = /<(\w+)>(.?)</\1>/
=> /<(\w+)>(.?)</\1>/
irb(main):035:0> re.match(str)[1]
=> “NC”
irb(main):036:0> re.match(str)[2]
=> "In North Carolina adsf "

Does that help?