Ruby regex lookahead/behind

Who of you here frequently use lookahead/lookbehind
in Ruby regular expressions?

I’m thinking hard about purpose/usage of these, and would
like some more examples…

If you consider it too far off-topic, you may email me.

Cheers,
Hal

My habits are better now, but if I’m lazy and know I only need to remove
small amounts of HTML/XML tags in parsing docs, I will use them to grab
what’s inside the tags. However, that’s really the only time I use them
and really I shouldn’t be quite so lazy and should parse them with
nokogiri and xpath instead.

-Wayne

Who of you here frequently use lookahead/lookbehind
in Ruby regular expressions?

I’m thinking hard about purpose/usage of these, and would
like some more examples…

IMHO very strange question… Purpose of /look(ahead|behind)/ is to
match regexp only in case some pattern comes before or after pattern
without “including” that subpattern into match. It is used when you
have to manipulate strings.

For example, consider you have a ruby code as text:

txt = <<-RUBY
class Foo
  def bar
    encode_json("...")
  end

  # ...

  def encode_json obj
    JSON.generate obj, :quirks_mode => true
  end
RUBY

Now you want replace encode_json with JSON.dump, if you’ll gsub
with /encode_json/ you’ll get def JSON.dump obj which is smoething
that you don’t wanted, so you gsub it like this instead:

txt.gsub /(<?!def )encode_json/, "JSON.dump"


Sincerely yours,
Aleksey V. Zapparov A.K.A. ixti
FSF Member #7118
Mobile Phone: +34 677 990 688
Homepage: http://ixti.net/
JID: [email protected]

*Origin: Happy Hacking!

On Sep 27, 2013, at 3:02 PM, Hal F. [email protected] wrote:

I have to admit, I’ve only found the need for them exactly twice in the
past couple decades, and I can’t even remember where, exactly (but they
were the perfect answer). I always forget about them and find something
else to solve my immediate problem. Maybe I should use them more

I’ve used them for things like string matching:

/"(.*?)(?<!\)"/

Note: that pattern doesn’t allow you to escape backslashes, but it’s a
quick example, and the sort of thing I’ve used in the past.

I guess one thing I’m wondering is:

The lookarounds are basically “don’t-consume” matches
as I see it… Is there always a “do consume” match
associated with it?

To put it more clearly: Is it ever valid to use a lookaround
“by itself”?

I freely confess I am far from expert with regular expressions…
in the more complex cases, I usually write code rather than
use a regex. (“More complex” being a relative term, of course.)

Of course, there are times/places where a regex is not the
right tool. But it’s my desire that, when they are the right tool,
I will use them more often.

Hal

On Sat, Sep 28, 2013 at 11:01 AM, Robert K.

+1 on robertk’s “user-defined” anchors. use case: very handy for
validating password or emails eg

On Fri, Sep 27, 2013 at 10:02 PM, Hal F. [email protected]
wrote:

Who of you here frequently use lookahead/lookbehind
in Ruby regular expressions?

I’m thinking hard about purpose/usage of these, and would
like some more examples…

I use them on a regular (sic!) basis. You can think of them as user
defined anchors, i.e. beyond ^, $, \A, \b and the like.

Kind regards

robert

On Sun, Sep 29, 2013 at 10:59 AM, Robert K.
[email protected]wrote:

:slight_smile:
Ahh, I had been trying to think of a way to do that. :slight_smile: Thank you for
that
trick.

I usually recommend “Mastering Regular Expressions” - even though it’s
not an introductory book.

Yes, I also recommend that book. :wink: I have had it for years, but much
of
it
does not “stick to my brain.”

If you want to see the matching process at
work you can use http://weitz.de/regex-coach/ (Windows program but
works with WINE on Linux). That helps understanding the matching
process - you can even single step.

That sounds cool. I’ve sometimes wished there was an Onigmo patch
to allow that sort of thing.

Imagine regular expressions debuggable at runtime… or maybe the real
experts would cringe at that thought. :slight_smile:

Anyway: About lookarounds. As I see it, there are four basic cases,
arising from two basic questions: 1) Is the nonconsuming match before or
after the consuming one? and 2) is the nonconsuming match positive or
negative?

I read an article that (sort of) implied that there might be eight cases

but
I think I have convinced myself this was a notational issue.

As this relates to Regexador – I am thinking of introducing three new
keywords (find, with, without) so that lookarounds would work this way:

find X with Y       # /(?=XY)X/     - pos lookahead
find X without Y    # /(?!=XY)X/    - neg lookahead
with X find Y       # /(?<=X)Y/      - pos lookbehind
without X find Y    # /(?<!X)Y/     - neg lookbehind

But there are some slight subtleties I am working through here.

For example, I have read that most engines require a lookbehind to be
a fixed-length expression (with .NET and ABA being exceptions – and I
don’t even know what ABA is).

I’ve confirmed that Ruby 2.0 doesn’t allow variable-length lookbehinds.

Hal

On Sat, Sep 28, 2013 at 8:38 PM, Hal F. [email protected]
wrote:

I guess one thing I’m wondering is:

The lookarounds are basically “don’t-consume” matches
as I see it… Is there always a “do consume” match
associated with it?

To put it more clearly: Is it ever valid to use a lookaround
“by itself”?

It never occurred to me to try that. There is really no point in
doing it because if there is no consuming match you do not match
anything. Still Onigurum allows you to do it:

irb(main):001:0> “foo”.scan /(?=\w)/
=> [“”, “”, “”]
irb(main):002:0> “foo”.scan /(?=\w)/ do puts $` end

f
fo
=> “foo”
irb(main):003:0> “foo”.scan /(?=\w)/ do puts $`.length end
0
1
2
=> “foo”

As you can see, with a trick you even get to know the matching
positions. :slight_smile:

I freely confess I am far from expert with regular expressions…
in the more complex cases, I usually write code rather than
use a regex. (“More complex” being a relative term, of course.)

Of course, there are times/places where a regex is not the
right tool. But it’s my desire that, when they are the right tool,
I will use them more often.

I think you are on the right track. :slight_smile: Not using regular expressions
when they are useful can make your code more complicated and even
slower. I personally like the power of regular expressions. But I do
admit that it took me a while to get there. :slight_smile:

I usually recommend “Mastering Regular Expressions” - even though it’s
not an introductory book. If you want to see the matching process at
work you can use http://weitz.de/regex-coach/ (Windows program but
works with WINE on Linux). That helps understanding the matching
process - you can even single step.

Kind regards

robert