Regexp in Ruby

Ruby Regular Expression

1.How to extract the words which are exist with symbols like
punctuation.
For eg:

str=‘Now, I am Ruby programmer. here, how to list the word which is
exist with symbol like punctuation, comma, or
exclamation.’

that Output may be as:
=>[‘Now,’,‘programmer.’,‘here,’,‘punctuation,’,‘comma,’,‘exclamation.’]

2.from that result, remove the symbol from word and in array. this
result may be as:
=>[‘Now’,‘programmer’,‘here’,‘punctuation’,‘comma’,‘exclamation’]

1.How to extract the words which are exist with symbols like
2.from that result, remove the symbol from word and in array. this
result may be as:
=>[‘Now’,‘programmer’,‘here’,‘punctuation’,‘comma’,‘exclamation’]


Posted via http://www.ruby-forum.com/.

This will work. There may be a cleaner way than that match in the
collect.
Just change the punctuation list in the scan to match what you need.

arr = str.scan(/\w+[,.!]/)
arr.collect! { |element|
/\w+/.match(element).to_s
}
p arr

Michael

You can avoid the collect by using a match group with scan:

str.scan(/(\w+)[[:punct:]]/).flatten

Dear Selvag R.,

Growing on the Joel P. idea…

str.scan(/(\w+)[[:punct:]]/).flatten

It can be written like …

str.scan(/\w+(?=[[:punct:]])/)

Abinoam Jr.

thanks to all replies. your all codes working well. some codes
impressing me to know more of Ruby. Can you explain this code in order
to understand?

Hi,

  1. /\w+[^\w\s]/
  2. /\w+[\p{P}]/
    Try rubular.com for any regex problem

I understood that words.
but specifically I expected the explanation of [[:punct:]]. In
‘:punct:’,

  1. it’s closed by double square brackets. why this?
  2. there are two colons exists. Is there any reason to?

Look up “POSIX bracket expressions”. I’m not 100% sure, but I think the
reason for the second outer brackets is that something like [:punct:] is
a group, and therefore belongs inside a group, just like like [aeiuo]

I’ll have a go at explaining, although I’m no expert on Regexp.

str.scan(/(\w+)[[:punct:]]/).flatten

“str” - Take the String object

“.scan(” - Execute the method “scan” which is available to String.

“/” - Regexp shorthand. A Regular Expression matches a pattern using
special characters.

“(\w+)” - The parentheses indicate a “capture” group. This means that
you can match a specific subsection of the pattern and extract it. The
“\w” means any character in a word, like a letter, number, or
underscore. The “+” means 1 or more.

“[[:punct:]]” - means match punctuation. According to a quick Googling
this is “-[];',./!@#%&*()_{}::”?" Because this is outside the capture
group it is excluded.

“/” - Closes the Regexp.

“)” - Closes the “scan” argument.

“.flatten” - Turns the nested Array returned by using scan with captures
into a simple Array.


str.scan(/\w+(?=[[:punct:]])/)

The difference with this one is that “(?=” excludes the punctuation from
the output.

According to this:

(?= re) Specifies position using a pattern. Doesn’t have a range.

One place to start is

Hope this helps,

Mike

On Jan 28, 2014, at 7:38 AM, Selvag R. [email protected] wrote:

I understood that words.
but specifically I expected the explanation of [[:punct:]]. In
‘:punct:’,

  1. it’s closed by double square brackets. why this?
  2. there are two colons exists. Is there any reason to?


Posted via http://www.ruby-forum.com/.

Mike S. [email protected]
http://www.stok.ca/~mike/

The “`Stok’ disclaimers” apply.