Problem in 2 code

amiref · July 31, 2010, 11:47pm

Hi
I dont understand results of these 2 code ?
please explain me what do every code do ?

1 -
x = “This is a test”.match( /(\w+)(\w+)/ )
puts x[0]
puts x[1]
puts x[2]

2 -
x = “This is a test”.match( /(\w+) (\w+)/ )
puts x[0]
puts x[1]
puts x[2]

amiref · August 1, 2010, 6:14am

Amir E. wrote:

Hi
I dont understand results of these 2 code ?
please explain me what do every code do ?

1 -
x = “This is a test”.match( /(\w+)(\w+)/ )
puts x[0]
puts x[1]
puts x[2]

2 -
x = “This is a test”.match( /(\w+) (\w+)/ )
puts x[0]
puts x[1]
puts x[2]

/(\w+)(\w+)/ is a Regexp. If you want to know about it, you should
probably look it up to help remove the confusion. \w+ means one or more
word characters, so /(\w+)(\w+)/ means 2 word characters or more. Based
on the string, the match method returns the first instance that matches
it, which is “This”. As to what the x[1] and x[2] put to the screen, I
could be totally wrong about it but I’m guessing they might be the parts
that match the parts of the Regexp given. However, it’d probably be best
to ask someone else about that.

In the second code, it’s the same thing except there’s a space between
the two \w+. This means at least one word character followed by a space
followed by at least one word character. That is why it returns the
match “This is” instead of just “This”.

I hope this was helpful.

amiref · August 1, 2010, 9:19am

Thanks for answer , but yet I have a problem :
what does first code do?
why when I write “puts x[0]” ruby returns “This” and for “puts x[1]”
returns “Thi” and for “puts x[2]” returns “s” ?

amiref · August 1, 2010, 10:14am

Hello Amir,

On 01.08.2010 09:19, Amir E. wrote:

why when I write “puts x[0]” ruby returns “This” and for “puts x[1]”
returns “Thi” and for “puts x[2]” returns “s” ?

The first (x[0]) is always the complete match the whole regular
expression did match. The rest are the individual sub matches, if there
are any.

One also has to know that, by default, in most implementation any
regular expression is “greedy”, which means it tries to match as much
characters as possible.

So, given your first example:

“This is a test”.match( /(\w+)(\w+)/ )

\w - match a a single “word” character

\w+ - match at least one or more “word” characters

Now since by default everything is greedy, the first \w+ tries to match
as much as possible. Since the second \w+ wants to fulfill it task too,
the first \w+ eats up already everything until the last character and
leaves that for the second \w+ .

There’s a special character ? which can be used to tell a regex to be
non-greedy, try this example:

“This is a test”.match( /(\w+?)(\w+)/ )

irb(main):006:0> “1234”.match(/(\d+?)(\d+)/)
=> #<MatchData “1234” 1:“1” 2:“234”>

The \w+? means “match as few as possible” and thus it only matches the
first “1” and leaves all the rest to the second \w+ .

In your case it’s debatable whether this regex really makes sense
though; at a first glance it doesn’t look like a generally useful case
and really looks very specific.

HTH