Questions about * + and ? in Regex

Hi,

I have some questions related the correct meaning of * + and ? in Regex
that I would appreciate some clarification:
I have an example (derived from the Programming Ruby 2nd Edition), that
I don’t understand why gives these results, here is the code:

def show_regexp(a, re)
if a =~ re
puts “#{$`}<<#{$&}>>#{$’}”
else
puts “no match”
end
end

show_regexp(‘Example1’, /\s*/)
show_regexp(‘Example2’, /\s./)
show_regexp('Example3 ', /\s.?/) # Space at the end of string
show_regexp('Example4 ', /\s.+/) # Space at the end of string
show_regexp('Example5 ', /\s.
/) # Space at the end of string

output gives:

<<>>Example1
no match
Example3<< >>
no match
Example5<< >>

If I understand well:
* means - match zero or more occurrences of preceding expression.
+ means - match 1 or more occurrences of preceding expression.
? means - match 0 or 1 occurrence of preceding expression.

Why Example2 gives “no match”? I understand this as find “0 or more
occurrences” of (a space followed by any character)
Why Example4 gives “no match”? I understand this as find “1 or more
occurrence” of (a space followed by any character)
I am assuming that the null character can be matched by a .
Am I correct?

Best Regards

On Dec 30, 2007 11:25 PM, Carlos O. [email protected] wrote:

else
output gives:
? means - match 0 or 1 occurrence of preceding expression.
Posted via http://www.ruby-forum.com/.

A dot (.) can only match an actual character. Example 2 fails because
it’s looking not for “0 or more occurrences of (a space followed by
any character)”, but “a space followed by 0 or more characters”. The *
only applies to whatever immediately precedes it, not the whole
expression… unless the expression’s enclosed in parentheses. A regex
for “0 or more occurrences of (a space followed by any character)”
would be /(\s.)*/. In that case, the * applies to the parenthesized
group of whitespace and dot.

Example 4 fails because the only space isn’t followed by anything at
all.

HTH,
Chris

P.S. I strongly recommend Jeffrey Friedl’s Mastering Regular
Expressions.

Thanks a lot Chris now I think I got it, however I still have the
doubt interpreting this:

show_regexp(‘hi hi hihihi hi hi’, /\s.*?\s/)

Overall my confusion arrives when 2 special characters are together…

Cause this last would be:
-Match a space
-Followed by 0 or More characters
-Followed by … <= Here is my doubt
-Ending with a space.

Again I would appreciate you help on this.

Regards
Carlos

Chris S. wrote:

On Dec 30, 2007 11:25 PM, Carlos O. [email protected] wrote:

else
output gives:
? means - match 0 or 1 occurrence of preceding expression.
Posted via http://www.ruby-forum.com/.

A dot (.) can only match an actual character. Example 2 fails because
it’s looking not for “0 or more occurrences of (a space followed by
any character)”, but “a space followed by 0 or more characters”. The *
only applies to whatever immediately precedes it, not the whole
expression… unless the expression’s enclosed in parentheses. A regex
for “0 or more occurrences of (a space followed by any character)”
would be /(\s.)*/. In that case, the * applies to the parenthesized
group of whitespace and dot.

Example 4 fails because the only space isn’t followed by anything at
all.

HTH,
Chris

P.S. I strongly recommend Jeffrey Friedl’s Mastering Regular
Expressions.

On Dec 31, 1:24 am, Carlos O. [email protected] wrote:

-Followed by … <= Here is my doubt

On Dec 30, 2007 11:25 PM, Carlos O. [email protected] wrote:
for “0 or more occurrences of (a space followed by any character)”
Expressions.


Posted viahttp://www.ruby-forum.com/.

Normally “*” is “greedy” – i.e., it matches the right-most matching
substring – but when it’s bounded by “?” it matches the left-most
(first) instance.

“Hello world, from ruby”.match(/.*?\s+/)[0]

=> "Hello "

“Hello world, from ruby”.match(/.*\s+/)[0]
=> "Hello world, from "

Regards,
Jordan