# Regular expression question

Why does:
puts /\w+? \w+?/.match “Ein kleiner Satz”
return “Ein k” and not “n k”
My understanding was that \w represents any alphanumerical character, +
means that it occurs at least once and the ? following the plus makes it
non-greedy.
I would have thought this returned “n k” although that is obviously not
correct.

Hi –

On Mon, 28 Sep 2009, Jim B. wrote:

Why does:
puts /\w+? \w+?/.match “Ein kleiner Satz”
return “Ein k” and not “n k”
My understanding was that \w represents any alphanumerical character, +
means that it occurs at least once and the ? following the plus makes it
non-greedy.
I would have thought this returned “n k” although that is obviously not
correct.

Eine kleine Misverstaendnis

\w+ means “one or more from the set (alphanumeric + underscore)”.
The pattern will start trying to match at the beginning of the string,
where it finds \w+, then the space, then more \w+. It grabs all of Ein
because it hasn’t found a space yet (but it also hasn’t found any
reason to stop proceeding along the string, since “E”, “Ei”, and “Ein”
are all \w+). It only grabs the k in kleiner because that satisfies
the pattern in non-greedy fashion.

In this particular case, the first + doesn’t matter, because the
pattern will keep going until it finds a space, so normal rules of
(non-)greediness don’t apply.

David

Vielen Dank fÃ¼r deine Antwort
I was unaware that the reg ex would start from the point where it
encounters the first character satisfying \w.
Regular expressions - sometimes I hate them more than I love them.
Thanks very much for your help.

Eine kleine Misverstaendnis

\w+ means “one or more from the set (alphanumeric + underscore)”.
The pattern will start trying to match at the beginning of the string,
where it finds \w+, then the space, then more \w+. It grabs all of Ein
because it hasn’t found a space yet (but it also hasn’t found any
reason to stop proceeding along the string, since “E”, “Ei”, and “Ein”
are all \w+). It only grabs the k in kleiner because that satisfies
the pattern in non-greedy fashion.

In this particular case, the first + doesn’t matter, because the
pattern will keep going until it finds a space, so normal rules of
(non-)greediness don’t apply.

David

Hi –

On Mon, 28 Sep 2009, Jim B. wrote:

Vielen Dank fÃ¼r deine Antwort
I was unaware that the reg ex would start from the point where it
encounters the first character satisfying \w.

The pattern will always try to match as far to the left in the string
as possible. Greediness or lack thereof pertains only to how far right
it goes.

Here are some further examples that illustrate some of this:

Note the difference between these two:

/x?!/.match(“xxxx!”)[0]
=> “x!”

/!/.match(“xxxx!”)[0]
=> “!”

You might think that in the first one, since x? means zero-or-one of
‘x’, it would settle for zero and just match “!”. But it never gets
that far; it progresses strictly from left to right, reaches the
imaginary position between the third and fourth x’s, and asks itself:
am I now in a position to match zero-or-one of ‘x’ followed by exactly
one ‘!’? The answer is yes, and the match consumes two characters.
Since the match has succeeded, the process is over; the engine never
has to know that if it had advanced one more character, it would also
have found a match for the same pattern.

Then there’s this:

/x!?/.match(“xxxx!”)[0]
=> “x”

In this case, the regex engine gets to the same point and asks: can I
match exactly one ‘x’, followed by zero-or-one of ‘!’? The answer is
yes, and the match only requires one character. Since that one
character fits the pattern, the matching process stops immediately.
You might say that /!?/ is the non-greedy version of /!/ (and I
imagine people have, though I don’t know where and greediness is
always about left-to-right progress through the string.

(I am paraphrasing the brain of the regex engine; some of this may or
may not be optimized away in any given implementation.)

David

Thanks for the comprehensive explanation. That helps me greatly.
I am currently reading a ruby book (in German, just to make things a bit
simpler) and am on the reg ex chapter. It’s hardcore!
Thanks again.

On 09/27/2009 08:45 PM, Jim B. wrote:

Thanks for the comprehensive explanation. That helps me greatly.
I am currently reading a ruby book (in German, just to make things a bit
simpler) and am on the reg ex chapter. It’s hardcore!
Thanks again.

Es gibt ja auch noch de.comp.lang.ruby.

Btw, David you wrote

In this particular case, the first + doesn’t matter, because the
pattern will keep going until it finds a space, so normal rules of
(non-)greediness don’t apply.

I believe you meant “the first ? doesn’t matter” - as leaving out the
“+” would indeed make a difference but it’s the non greediness addition
which does not make a difference to matching.

Kind regards

robert

HI –

On Mon, 28 Sep 2009, Robert K. wrote:

In this particular case, the first + doesn’t matter, because the
pattern will keep going until it finds a space, so normal rules of
(non-)greediness don’t apply.

I believe you meant “the first ? doesn’t matter” - as leaving out the “+”
would indeed make a difference but it’s the non greediness addition which
does not make a difference to matching.

Yes – I did indeed mean ?, not +. Thanks.

David

Hi –

On Mon, 28 Sep 2009, Jim B. wrote:

Thanks for the comprehensive explanation. That helps me greatly.
I am currently reading a ruby book (in German, just to make things a bit
simpler) and am on the reg ex chapter. It’s hardcore!

Oh, your English is plenty good to read Ruby books – like
The Well-Grounded Rubyist, just to choose a random example

David

David A. Black wrote:

Hi –

On Mon, 28 Sep 2009, Jim B. wrote:

Thanks for the comprehensive explanation. That helps me greatly.
I am currently reading a ruby book (in German, just to make things a bit
simpler) and am on the reg ex chapter. It’s hardcore!

Oh, your English is plenty good to read Ruby books – like
The Well-Grounded Rubyist, just to choose a random example
David, your random generator has picked a good book :-p

Cheers,
Mohit.
9/29/2009 | 1:35 AM.

Oh, your English is plenty good to read Ruby books – like
The Well-Grounded Rubyist, just to choose a random example

I must admit, I’m English, just happen to be living in Germany - hence
the good standard of English
But seriously, I have bookmarked the page you sent me and I will check
it out as I am always on the lookout for good books about Ruby.