Ruby Regular Expression
1.How to extract the words which are exist with symbols like
punctuation.
For eg:
str=‘Now, I am Ruby programmer. here, how to list the word which is
exist with symbol like punctuation, comma, or
exclamation.’
that Output may be as:
=>[‘Now,’,‘programmer.’,‘here,’,‘punctuation,’,‘comma,’,‘exclamation.’]
2.from that result, remove the symbol from word and in array. this
result may be as:
=>[‘Now’,‘programmer’,‘here’,‘punctuation’,‘comma’,‘exclamation’]
1.How to extract the words which are exist with symbols like
2.from that result, remove the symbol from word and in array. this
result may be as:
=>[‘Now’,‘programmer’,‘here’,‘punctuation’,‘comma’,‘exclamation’]
–
Posted via http://www.ruby-forum.com/.
This will work. There may be a cleaner way than that match in the
collect.
Just change the punctuation list in the scan to match what you need.
arr = str.scan(/\w+[,.!]/)
arr.collect! { |element|
/\w+/.match(element).to_s
}
p arr
Michael
You can avoid the collect by using a match group with scan:
str.scan(/(\w+)[[:punct:]]/).flatten
Dear Selvag R.,
Growing on the Joel P. idea…
str.scan(/(\w+)[[:punct:]]/).flatten
It can be written like …
str.scan(/\w+(?=[[:punct:]])/)
Abinoam Jr.
thanks to all replies. your all codes working well. some codes
impressing me to know more of Ruby. Can you explain this code in order
to understand?
I understood that words.
but specifically I expected the explanation of [[:punct:]]. In
‘:punct:’,
- it’s closed by double square brackets. why this?
- there are two colons exists. Is there any reason to?
Look up “POSIX bracket expressions”. I’m not 100% sure, but I think the
reason for the second outer brackets is that something like [:punct:] is
a group, and therefore belongs inside a group, just like like [aeiuo]
I’ll have a go at explaining, although I’m no expert on Regexp.
str.scan(/(\w+)[[:punct:]]/).flatten
“str” - Take the String object
“.scan(” - Execute the method “scan” which is available to String.
“/” - Regexp shorthand. A Regular Expression matches a pattern using
special characters.
“(\w+)” - The parentheses indicate a “capture” group. This means that
you can match a specific subsection of the pattern and extract it. The
“\w” means any character in a word, like a letter, number, or
underscore. The “+” means 1 or more.
“[[:punct:]]” - means match punctuation. According to a quick Googling
this is “-[];’,./[email protected]#%&*()_{}::”?" Because this is outside the capture
group it is excluded.
“/” - Closes the Regexp.
“)” - Closes the “scan” argument.
“.flatten” - Turns the nested Array returned by using scan with captures
into a simple Array.
str.scan(/\w+(?=[[:punct:]])/)
The difference with this one is that “(?=” excludes the punctuation from
the output.
According to this:
http://www.tutorialspoint.com/ruby/ruby_regular_expressions.htm
(?= re) Specifies position using a pattern. Doesn’t have a range.
One place to start is
http://www.ruby-doc.org/core-2.1.0/Regexp.html#class-Regexp-label-Character+Classes
Hope this helps,
Mike
On Jan 28, 2014, at 7:38 AM, Selvag R. [email protected] wrote:
I understood that words.
but specifically I expected the explanation of [[:punct:]]. In
‘:punct:’,
- it’s closed by double square brackets. why this?
- there are two colons exists. Is there any reason to?
–
Posted via http://www.ruby-forum.com/.
–
Mike S. [email protected]
http://www.stok.ca/~mike/
The “`Stok’ disclaimers” apply.