Dear all, I have strings like this one: "<NC> In North Carolina </NC>" and I'd like to match the part in between the brackets with a regexp with negative lookahead excluding a substring, (</NC> in this case, rather than just single characters), but I can't get it right... Thanks for your help, Axel
on 2007-06-16 17:23
on 2007-06-16 18:17
On Jun 16, 11:22 am, "Axel Etzold" <AEtz...@gmx.de> wrote: > > Thanks for your help, > > Axel > > -- > Psssst! Schon vom neuen GMX MultiMessenger gehört? > Der kanns mit allen:http://www.gmx.net/de/go/multimessenger Can you provide a few examples of what you're looking for? I'm not sure what you're asking but it doesn't sound too bad. - Byron
on 2007-06-16 20:41
Dear Byron, thank you for responding. I am working on an analysis of so-called chunked text, i.e., an analysis of words in a sentence, that classifies words as nouns / verbs / adjectives etc. A typical sentence with chunking tags thus looks like this: "<NC> The physical descriptions </NC> <PC> of <NC> places </NC> <PC> in <NC> North Carolina </NC> </PC> , <PC> in </PC> <ADV> so far </ADV> as <NC> they </NC> <VC> are </VC> <NC> specific </NC> <PC> at <NC> all </NC> </PC> , <VC> owe </VC> <NC> a little </NC> <PC> to <NC> memories </NC> </PC> <PC> of <NC> my childhood </NC>, although <NC> I </NC> <VC> 've also borrowed </VC> <ADV> indiscriminately </ADV> <PC> from <NC> other people 's childhood memories </NC> </PC> <PC> as </PC> <ADV> well </ADV> ." Originally, I wanted to use Regexps to split the original sentence into groups using negative lookahead, which I've now skipped in favor of repeated Array.splits, but I think I could you use knowing how to search for a substring using negative lookahead, i.e., as in my example: regexp=/.../ <= searched for, such that: string="<NC> In North Carolina </NC>" ref=regexp.match(string) p ref[1] => "In North Carolina" Thank you for any help! Best regards, Axel
on 2007-06-16 21:18
Wouldn't it be better to just use a xml parser? On 6/16/07, Axel Etzold <AEtzold@gmx.de> wrote: > > Thank you for any help! > > Best regards, > > Axel > -- > Psssst! Schon vom neuen GMX MultiMessenger gehört? > Der kanns mit allen: http://www.gmx.net/de/go/multimessenger > > -- "Es también nuestra intención erradicar la corrupción, ofreciendo como norma la honestidad, la idoneidad y la eficiencia. Con madurez y sentido de unidad es fácil pensar en la recomposición del ser argentino. Ese ser argentino, basado en madurez y en sentido de unidad, permitirá inspirar para elevarnos por encima de la miseria que la antinomia nos ha planteado, para dejar, de una vez por todas, ese ser "anti" y ser, de una vez por todas, "pro": "Pro argentinos"" Jorge Rafael Videla para el 25 de mayo de 1976
on 2007-06-16 21:21
Aureliano, no - since the tags are not XML tags, and since I wanted to know about negative lookahead for regexps ... Best regards, Axel
on 2007-06-16 21:55
On Jun 16, 2:40 pm, "Axel Etzold" <AEtz...@gmx.de> wrote: > regexp=/.../ <= searched for, such that: > string="<NC> In North Carolina </NC>" > ref=regexp.match(string) > p ref[1] => "In North Carolina" This will work pretty well (works for the above): /<\w+>(.*?)<\/\w+>/ The only thing fancy there is making the .* non-greedy by adding .*?. This means it will take the shortest possible match instead of the longest. But it will not work as I think you would want with a string of nested clauses. If you want to include internal clauses then you would need to make sure that the close tag matches the open tag. The side effect is that you'll need to have another sub match within the regex. So consider: /<(\w+)>(.*?)<\/\1>/ Example: irb(main):033:0> str = "<NC>In North Carolina <FOO>adsf</FOO> </NC>" => "<NC>In North Carolina <FOO>adsf</FOO> </NC>" irb(main):034:0> re = /<(\w+)>(.*?)<\/\1>/ => /<(\w+)>(.*?)<\/\1>/ irb(main):035:0> re.match(str)[1] => "NC" irb(main):036:0> re.match(str)[2] => "In North Carolina <FOO>adsf</FOO> " Does that help?
on 2007-06-16 22:50
Dear Byron,
> Does that help?
Yes, very much ! Thank you for your time!
Best regards,
Axel
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.