Forum: Ruby regexp -how to match this?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Nanyang Z. (Guest)
on 2007-04-09 17:28
what kind of pattern will match the part of sentence before a <span>
tag?

for instance:
for this sentence:
This forum is connected to a mailing list that is read by <span
class="wow">thousands</span> of people.

it'll match:
This forum is connected to a mailing list that is read by
Rick D. (Guest)
on 2007-04-09 18:13
(Received via mailing list)
On 4/9/07, Nanyang Z. <removed_email_address@domain.invalid> wrote:
> what kind of pattern will match the part of sentence before a <span>
> tag?
>
> for instance:
> for this sentence:
> This forum is connected to a mailing list that is read by <span
> class="wow">thousands</span> of people.
>
> it'll match:
> This forum is connected to a mailing list that is read by

/^.*?(?=<span)/

This is a little loose since it treats anything starting with "<span"
as a span tag.

Breaking it down:

^ - start of string

.*? - 0 or more characters, non-greedy, otherwise this would match
everything up to the LAST "<span" in the string, in stead of the first
which is what I suspect you really want.

(?=<span) - This is a zero-length lookahead, this means that "<span"
must occur just after what has been matched, but it will not be part
of the match itself.

HTH

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
Robert K. (Guest)
on 2007-04-09 18:26
(Received via mailing list)
On 09.04.2007 15:28, Nanyang Z. wrote:
> what kind of pattern will match the part of sentence before a <span>
> tag?
>
> for instance:
> for this sentence:
> This forum is connected to a mailing list that is read by <span
> class="wow">thousands</span> of people.
>
> it'll match:
> This forum is connected to a mailing list that is read by

One way to do it:

irb(main):022:0* s='This forum is connected to a mailing list that is
read by <span
irb(main):023:0' class="wow">thousands</span> of people.'
=> "This forum is connected to a mailing list that is read by
<span\nclass=\"wow\">thousands</span> of people."
irb(main):024:0> s[/\A(.*?)<span/, 1]
=> "This forum is connected to a mailing list that is read by "

  robert
Nanyang Z. (Guest)
on 2007-04-09 18:37
Rick Denatale wrote:

> /^.*?(?=<span)/

thanks.

BTW, what is “Duck Typing”?
Rick D. (Guest)
on 2007-04-09 19:02
(Received via mailing list)
On 4/9/07, Nanyang Z. <removed_email_address@domain.invalid> wrote:
> Rick Denatale wrote:
>
> > /^.*?(?=<span)/
>
> thanks.
>
> BTW, what is "Duck Typing"?

Well, here's some of what *I've* written on the subject:
http://talklikeaduck.denhaven2.com/articles/tag/ducks

I'd suggest looking at them starting with the oldest one (they are in
reverse chronological order).

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
John J. (Guest)
on 2007-04-09 19:09
(Received via mailing list)
Using the old saying,
"If it walks like a duck and talks like a duck, then it is a duck."
It means deciding something is a duck if it seems to be a duck.
Part of the principle of least surprise [to Matz]
Phillip G. (Guest)
on 2007-04-09 20:04
(Received via mailing list)
Nanyang Z. wrote:

> BTW, what is “Duck Typing”?

PickAxe 2nd Edition (and probably the freely available 1st Edition) have
a nice, interesting and very readable chapter covering that.

In a nutshell: What the other's have already said.

--
Phillip "CynicalRyan" Gawlowski
http://cynicalryan.110mb.com/

Rule of Open-Source Programming #6:

The user is always right unless proven otherwise by the developer.
Nanyang Z. (Guest)
on 2007-04-10 17:00
Rick Denatale wrote:

> (?=<span) - This is a zero-length lookahead, this means that "<span"
> must occur just after what has been matched, but it will not be part
> of the match itself.

so ?= makes pattern lookAHEAD. How to make pattern lookBEHIND?

for instance:

example sentence:
This forum is connected to a mailing list that is read by <span
class="wow">thousands</span> of people.

question:
how to make a Regexp to match the words followed by the </span> tag?

a /<\/span>.*/ will include the tag, which isn't what I want.
Gavin K. (Guest)
on 2007-04-10 17:51
(Received via mailing list)
On Apr 10, 7:00 am, Nanyang Z. <removed_email_address@domain.invalid> wrote:
> so ?= makes pattern lookAHEAD. How to make pattern lookBEHIND?

http://phrogz.net/ProgrammingRuby/language.html#extensions

Zero-width positive and negative lookaheads are supported in Ruby's
regexp engine in 1.8. Zero-width lookbehind assertions are not
supported by the current regexp engine. (However, they are supported
by Oniguruma, the regexp engine used in 1.9 and future builds of
Ruby.)

> example sentence:
> This forum is connected to a mailing list that is read by <span
> class="wow">thousands</span> of people.
>
> question:
> how to make a Regexp to match the words followed by the </span> tag?

Just because you consume them doesn't mean you have to use them. Use
parentheses to saved parts of text extracted by your regular
expression.

irb(main):001:0> str = 'is read by <span class="wow">thousands</span>
of people.'
=> "is read by <span class=\"wow\">thousands</span> of people."

irb(main):002:0> str[ /<\/span>(.+)/, 1 ]
=> " of people."

irb(main):003:0> %r{</span>(.+)}.match( str ).to_a
=> ["</span> of people.", " of people."]
Nanyang Z. (Guest)
on 2007-04-10 18:39
Gavin K. wrote:
> Just because you consume them doesn't mean you have to use them. Use
> parentheses to saved parts of text extracted by your regular
> expression.

I'm trying to code one method(with one regexp input) to extract any part
of a given string.

but now it seems a fix method is very hard to accomplish this job.
This topic is locked and can not be replied to.