Forum: Ruby Explain this ruby regex

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
67f3c76b3b1754bdc66ac90151cb1f9f?d=identicon&s=25 unknown (Guest)
on 2008-10-03 17:40
(Received via mailing list)
Can someone explain this regex ...

"one two".scan(/\w*/).length

returns 4. I can see it matching the 2 words and the space, what else
is it matching on? Is there a null terminator, I thought Ruby strings
were not null termed.
90a73d9875462aaa9fab2feffafbffe7?d=identicon&s=25 Ben Bleything (Guest)
on 2008-10-03 17:46
(Received via mailing list)
On Sat, Oct 04, 2008, renton.dan@gmail.com wrote:
> "one two".scan(/\w*/).length
>
> returns 4. I can see it matching the 2 words and the space, what else
> is it matching on? Is there a null terminator, I thought Ruby strings
> were not null termed.

Try replacing #length with #inspect and seeing what the output of scan
is.  You'll find that it's returning two empty strings as well.  I
suspect what you really want is \w+...

Ben
67f3c76b3b1754bdc66ac90151cb1f9f?d=identicon&s=25 unknown (Guest)
on 2008-10-03 19:56
(Received via mailing list)
On Oct 3, 11:44 am, Ben Bleything <b...@bleything.net> wrote:
>
> Ben

Yeah, you're right \w+ will pull out the words, which is what I want
anyway. Though I'm trying to understand what \w* is doing.
irb(main):015:0> "one two".scan(/\w*/).inspect
=> "[\"one\", \"\", \"two\", \"\"]"

My question is, what is the last "\", where does it come from.
1af98819593195b1ad3884bdeddbd15e?d=identicon&s=25 Patrick He (Guest)
on 2008-10-03 20:38
(Received via mailing list)
\w* does not match the space between string "one" and "two". it matches
"one", <empty string after "one">, "two", <empty string after "two">.

There are some other examples:

    irb(main):004:0> "one".scan(/^\w*/)
    => ["one"]
    irb(main):005:0> "one".scan(/\w*$/)
    => ["one", ""]


--
Patrick
Ed437e52d8d6720308720e7e678f3e6d?d=identicon&s=25 Patrick Doyle (Guest)
on 2008-10-03 21:29
(Received via mailing list)
The key idea here is that "*" means "match zero or more of" whereas "+"
means "match one or more of".  So, when you match \w* against "one two",
there are zero or more instances of a word character (3, in fact, 'o',
'n',
and 'e'), so that produces one result.  Following that result, there are
zero matches of a word character, but since you asked for "zero or more
of",
you get that empty string result.  Later, rinse, repeat for the "two"
part.

FWIW, instead of looking at the result with #inspect, I found it more
informative to look at the result returned from #scan by itself, e.g.

irb> "one two".scan(/\w*/)
=> ["one", "", "two", ""]

--wpd
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-10-03 21:47
> FWIW, instead of looking at the result with #inspect, I found it more
> informative to look at the result returned from #scan by itself, e.g.
>
> irb> "one two".scan(/\w*/)
> => ["one", "", "two", ""]

irb displays the expression value using "inspect", so you are using
inspect even though you didn't ask for it :-)
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2008-10-05 18:21
(Received via mailing list)
On 03.10.2008 18:44, Patrick Doyle wrote:
> The key idea here is that "*" means "match zero or more of" whereas "+"
> means "match one or more of".  So, when you match \w* against "one two",
> there are zero or more instances of a word character (3, in fact, 'o', 'n',
> and 'e'), so that produces one result.  Following that result, there are
> zero matches of a word character, but since you asked for "zero or more of",
> you get that empty string result.  Later, rinse, repeat for the "two" part.

It boils down to this statement: a subexpression with "*" potentially
matches an _empty string anywhere_ in a string.

Kind regards

  robert
This topic is locked and can not be replied to.