Forum: Ruby regex: get the first match

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Trochalakis C. (Guest)
on 2007-06-10 14:52
(Received via mailing list)
Hello!

I want to parse a tagged string like this: "<i>this is</i><i>my
string</i>"

i am doing:

>> "<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)
=> [["this is</i><i>my string"]]

What i want is a regex that will return the *first* segment that
matches.
in the above case -> [["this is", "my string"]]

Is there any way to do this?

Thanks!
Robert D. (Guest)
on 2007-06-10 16:09
(Received via mailing list)
On 6/10/07, Trochalakis C. <removed_email_address@domain.invalid> wrote:
> What i want is a regex that will return the *first* segment that
> matches.
> in the above case -> [["this is", "my string"]]
>
> Is there any way to do this?
>
> Thanks!
>
>
>
This is a FAQ, and yes I will give the solution ;)
Regexps are gready par default, they consume as many chars as
possible, there are some possibilities - not tested:

(1) use non gready matches
"<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
(2) use less general expressions
"<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*)<\/i>/)
(3) Combine both ;)
"<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*?)<\/i>/)

HTH
Robert

P.S.
This *really* is a FAQ though
Logan C. (Guest)
on 2007-06-10 16:22
(Received via mailing list)
On 6/10/07, Robert D. <removed_email_address@domain.invalid> wrote:
> > => [["this is</i><i>my string"]]
> >
> This is a FAQ, and yes I will give the solution ;)
> Regexps are gready par default, they consume as many chars as
> possible, there are some possibilities - not tested:
>
> (1) use non gready matches
> "<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
> (2) use less general expressions
> "<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*)<\/i>/)
> (3) Combine both ;)
> "<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*?)<\/i>/)


.Unless you want to match strings like <i><foo</i>, it would be simple
to
just use [^<]*, and not .[^<]*. .[^<]* will also not match <i></i>. If
the
intent was to make the regexp not match that, a better regexp would be
[^<]+

HTH
GrzechG (Guest)
on 2007-06-10 16:26
(Received via mailing list)
> in the above case -> [["this is", "my string"]]
The solution is :

"<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
=> [["this is"], ["my string"]]

The regexp scope is default maximum as is possible to find.
If you use '?' character you minimze the scope.
(.*?) instead of (.*) and the </i><i> part of string don't be include
into one result.

Regards,
Grzegorz Golebiowski
Robert D. (Guest)
on 2007-06-10 16:46
(Received via mailing list)
On 6/10/07, Logan C. <removed_email_address@domain.invalid> wrote:
> > > >> "<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)
> > >
> > "<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*?)<\/i>/)
>
>
> .Unless you want to match strings like <i><foo</i>, it would be simple to
> just use [^<]*, and not .[^<]*. .[^<]* will also not match <i></i>. If the
> intent was to make the regexp not match that, a better regexp would be [^<]+
Thanks for correcting my typos.
Robert
Trochalakis C. (Guest)
on 2007-06-10 21:20
(Received via mailing list)
On Jun 10, 3:22 pm, GrzechG <removed_email_address@domain.invalid> wrote:
> > in the above case -> [["this is", "my string"]]
>
> Regards,
> Grzegorz Golebiowski

Thanks Grzegorz, nice trick!
Robert D. (Guest)
on 2007-06-10 21:31
(Received via mailing list)
On 6/10/07, Trochalakis C. <removed_email_address@domain.invalid> wrote:
> > > matches.
> > into one result.
> >
> > Regards,
> > Grzegorz Golebiowski
>
> Thanks Grzegorz, nice trick!
>
You are welcome ;)
Robert
This topic is locked and can not be replied to.