Regexp -how to match this?


#1

what kind of pattern will match the part of sentence before a
tag?

for instance:
for this sentence:
This forum is connected to a mailing list that is read by thousands of people.

it’ll match:
This forum is connected to a mailing list that is read by


#2

On 4/9/07, Nanyang Z. removed_email_address@domain.invalid wrote:

what kind of pattern will match the part of sentence before a
tag?

for instance:
for this sentence:
This forum is connected to a mailing list that is read by thousands of people.

it’ll match:
This forum is connected to a mailing list that is read by

/^.*?(?=<span)/

This is a little loose since it treats anything starting with “<span”
as a span tag.

Breaking it down:

^ - start of string

.*? - 0 or more characters, non-greedy, otherwise this would match
everything up to the LAST “<span” in the string, in stead of the first
which is what I suspect you really want.

(?=<span) - This is a zero-length lookahead, this means that “<span”
must occur just after what has been matched, but it will not be part
of the match itself.

HTH


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/


#3

On 09.04.2007 15:28, Nanyang Z. wrote:

what kind of pattern will match the part of sentence before a
tag?

for instance:
for this sentence:
This forum is connected to a mailing list that is read by thousands of people.

it’ll match:
This forum is connected to a mailing list that is read by

One way to do it:

irb(main):022:0* s=‘This forum is connected to a mailing list that is
read by <span
irb(main):023:0’ class=“wow”>thousands of people.’
=> “This forum is connected to a mailing list that is read by
<span\nclass=“wow”>thousands of people.”
irb(main):024:0> s[/\A(.*?)<span/, 1]
=> "This forum is connected to a mailing list that is read by "

robert


#4

Rick Denatale wrote:

/^.*?(?=<span)/

thanks.

BTW, what is “Duck Typing”?


#5

Using the old saying,
“If it walks like a duck and talks like a duck, then it is a duck.”
It means deciding something is a duck if it seems to be a duck.
Part of the principle of least surprise [to Matz]


#6

On 4/9/07, Nanyang Z. removed_email_address@domain.invalid wrote:

Rick Denatale wrote:

/^.*?(?=<span)/

thanks.

BTW, what is “Duck Typing”?

Well, here’s some of what I’ve written on the subject:
http://talklikeaduck.denhaven2.com/articles/tag/ducks

I’d suggest looking at them starting with the oldest one (they are in
reverse chronological order).


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/


#7

Nanyang Z. wrote:

BTW, what is “Duck Typing”?

PickAxe 2nd Edition (and probably the freely available 1st Edition) have
a nice, interesting and very readable chapter covering that.

In a nutshell: What the other’s have already said.


Phillip “CynicalRyan” Gawlowski
http://cynicalryan.110mb.com/

Rule of Open-Source Programming #6:

The user is always right unless proven otherwise by the developer.


#8

Rick Denatale wrote:

(?=<span) - This is a zero-length lookahead, this means that “<span”
must occur just after what has been matched, but it will not be part
of the match itself.

so ?= makes pattern lookAHEAD. How to make pattern lookBEHIND?

for instance:

example sentence:
This forum is connected to a mailing list that is read by thousands of people.

question:
how to make a Regexp to match the words followed by the tag?

a /</span>.*/ will include the tag, which isn’t what I want.


#9

Gavin K. wrote:

Just because you consume them doesn’t mean you have to use them. Use
parentheses to saved parts of text extracted by your regular
expression.

I’m trying to code one method(with one regexp input) to extract any part
of a given string.

but now it seems a fix method is very hard to accomplish this job.


#10

On Apr 10, 7:00 am, Nanyang Z. removed_email_address@domain.invalid wrote:

so ?= makes pattern lookAHEAD. How to make pattern lookBEHIND?

http://phrogz.net/ProgrammingRuby/language.html#extensions

Zero-width positive and negative lookaheads are supported in Ruby’s
regexp engine in 1.8. Zero-width lookbehind assertions are not
supported by the current regexp engine. (However, they are supported
by Oniguruma, the regexp engine used in 1.9 and future builds of
Ruby.)

example sentence:
This forum is connected to a mailing list that is read by thousands of people.

question:
how to make a Regexp to match the words followed by the tag?

Just because you consume them doesn’t mean you have to use them. Use
parentheses to saved parts of text extracted by your regular
expression.

irb(main):001:0> str = ‘is read by thousands
of people.’
=> “is read by <span class=“wow”>thousands of people.”

irb(main):002:0> str[ /</span>(.+)/, 1 ]
=> " of people."

irb(main):003:0> %r{(.+)}.match( str ).to_a
=> [" of people.", " of people."]