Regular expression match

luislavena · November 21, 2011, 11:02pm

Hi,

Can anyone explain to me this regular expression, please:
str[/\A([a-z]\w*)/, 1]

what’s the use of 1 in this expression
regards,

rubix01 · November 21, 2011, 11:28pm

Hello,

This syntax is used to return the matching portion of supplied regxp.
For
example:

“hello world”[/(\w+)(\s)(\w+)/, 0] #=> “hello world”
“hello world”[/(\w+)(\s)(\w+)/, 1] #=> “hello”
“hello world”[/(\w+)(\s)(\w+)/, 2] #=> " "
“hello world”[/(\w+)(\s)(\w+)/, 3] #=> “world”

rubix01 · November 21, 2011, 11:31pm

On Nov 21, 2011, at 11:02 PM, rubix Rubix wrote:

Hi,

Can anyone explain to me this regular expression, please:
str[/\A([a-z]\w*)/, 1]

Check out the documentation:

ri String#[]

When given a regular expression and a fixnum, you’re matching the string
against the regular expression and returning the capturing group
indicated by the fixnum. That is to say, your example will match a word
beginning with a lower case letter at the beginning of the string. Since
\A is an anchor and therefore not part of the matched string and if I am
not mistaken, your expression is actually equivalent to just

str[/\A[a-z]\w*/]

If there is a particular reason for using the capturing group over the
entire expression there you should probably add a comment to document
it!

Sylvester

rubix01 · November 21, 2011, 11:54pm

Thank you for your answer
I am using this code to parse a text to search the first word and see if
the word is one of the words of an array of strings and then increment
an index with the size of the word to parse the rest of the text
It is working good but when I try it with unicode characters, I obtain
the letters of the first word and not the hole first word:
str[/[\u0627-\u064A]+/,1]

best regards,

rubix01 · November 22, 2011, 9:14am

On Mon, Nov 21, 2011 at 11:55 PM, rubix Rubix [email protected]
wrote:

I am using this code to parse a text to search the first word and see if
the word is one of the words of an array of strings and then increment
an index with the size of the word to parse the rest of the text

Sounds like you rather want String#scan.

It is working good but when I try it with unicode characters, I obtain
the letters of the first word and not the hole first word:
str[/[\u0627-\u064A]+/,1]

Maybe your range is missing something. I’d rather use specific
character classes for this anyway:

str.scan /\p{Word}+/ do |word|
printf “Found word: %p\n”, word
end

http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt

Kind regards

robert