Re: Extracting vowels and consonants using regular expression

Thanks Robert, Corey, Philip, Stephano, et. al, for all of the great
suggestions. However, they all seem to ignore the conditional nature of
‘y’
as a vowel. I would like the regex to treat ‘y’ as a vowel when there is
no
other vowel either before or after it. The string I used initially, was
drawn from the following perl code that accomplishes this (found at
http://www.perlmonks.org/?node_id=592867):

my @vowels = ( /[aeiou]|(?<![aeiou])y(?![aeiou])/gi );

The “(?<!..)” is a “zero-width negative-look-behind assertion”.

The “(?!..)” is a “zero-width negative-look-ahead assertion”.

Together, they match the condition of treating “y as a vowel only if
there
is no other vowel before or after it.”

This was my attempt at converting the Perl fragment to Ruby syntax:

scan(/[aeiou]|(?![aeiou])y(?![aeiou])/i)

I have since discovered that Ruby 1.8 lacks regex look-behind assertion

which explains why the code was failing. As a result, I have fallen back
to
the following which currently ignores the ‘y’:

class String
def vowels
scan(/[aeiou]/i)
end
def consonants
scan(/[^aeiou]/i)
end
end

Any ideas how to modify this to include the conditional treatment of “y
as a
vowel only if there is no other vowel before or after it?”

(i.e., is there a way to simulate the perl “zero-width
negative-look-behind”
and “zero-width negative-look-ahead” assertions for ‘y’ in Ruby 1.8?)

On 2/2/08 4:17 PM, in article
[email protected], “Robert
Dober”

Interesting. I guess you could post-process a bit on the two sets that
you
get back. A regular expression that can handle the y would be good, I
guess.

Donovan D. wrote:

they all seem to ignore the conditional nature of ‘y’
as a vowel.

What about diphthongs? Technically a diphthong is one vowel
made up of two letters. The rules vary by language; Spanish
even has triphthongs, e.g. in Raoul. A diphthong/triphthong
occurs wherever a succession of vowel symbols doesn’t contain
a syllable break.

Perhaps this would be a good Ruby Q.?

Clifford H…

On Feb 2, 5:41 pm, Clifford H. [email protected] wrote:

Perhaps this would be a good Ruby Q.?

Clifford H…

Good idea --this is made for Ruby Q… How does one submit a problem?

On Feb 2, 4:36 pm, Donovan D. [email protected] wrote:

The “(?!..)” is a “zero-width negative-look-ahead assertion”.
the following which currently ignores the ‘y’:
Any ideas how to modify this to include the conditional treatment of "y as a

Robert
Origumura is the default regex library for Ruby 1.9. It includes look-
behind assertions (wohoo!) … and … It turns out that there is a
gem available so you don’t have to upgrade to Ruby 1.9 or monkey
around with creating a custom Ruby 1.8.x build.

The gem relies upon a c library that can be downloaded from here:
http://www.geocities.jp/kosako3/oniguruma/

After installing the library, origuruma installation is a breeze
using:

sudo gem install -r origuruma

However, my progress has come to a screeching halt as I am now
receiving the following error:

** Starting Mongrel listening at 0.0.0.0:3000
** Starting Rails with development environment…
Exiting
/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
gem_original_require': ./lib/string_extensions.rb:4: undefined (?...) sequence: /[aeiou]|(?<![aeiou])y(?![aeiou])/ (SyntaxError) ./lib/string_extensions.rb:8: undefined (?...) sequence: /![aeiou]|(? <=[aeiou])y(?=[aeiou])/ from /usr/local/lib/ruby/site_ruby/1.8/ rubygems/custom_require.rb:27:in require’

It seems to be complaining about the look-behind and look-ahead
assertions in the following code fragment (which origuruma is supposed
to support):

class String
def vowels
scan(/[aeiou]|(?<![aeiou])y(?![aeiou])/i)
end
def consonants
scan(/![aeiou]|(?<=[aeiou])y(?=[aeiou])/i)
end
end

According to this reference (サービス終了のお知らせ
oniguruma/
doc/RE.txt), the look behind and look ahead syntax that I am using
appears to be correct
(ref section 7. Extended groups).

This suggests either:

A. Ruby may be using the default regexp library instead of the
oniguruma regexp library,

B. The oniguruma regexp library is not accessible via the ‘scan’
method, or

C. Something else entirely

… hmmm …

Any suggestions?

On Feb 3, 11:42 pm, Dondi [email protected] wrote:

my @vowels = ( /[aeiou]|(?<![aeiou])y(?![aeiou])/gi );
scan(/[aeiou]|(?![aeiou])y(?![aeiou])/i)
scan(/[^aeiou]/i)
[email protected], “Robert D.”
The gem relies upon a c library that can be downloaded from here:サービス終了のお知らせ
** Starting Rails with development environment…
assertions in the following code fragment (which origuruma is supposed

B. The oniguruma regexp library is not accessible via the ‘scan’
method, or

C. Something else entirely

… hmmm …

Any suggestions?

Thanks for all the help everyone. The problem was solved with the help
from pullmonkey on Rails Forum! Here is the solution:

Objective:

  1. Extract vowels and consonants from a string
  2. Handle the conditional treatment of ‘y’ as a vowel under the
    following circumstances:
    • y is a vowel if it is surrounded by consonants
    • y is a consonant if it is adjacent to a vowel

Here is the code that works:

def vowels(name_str)
reg = Oniguruma::ORegexp.new(‘[aeiou]|(?<![aeiou])y(?![aeiou])’)
reg.match_all(name_str).to_s.scan(/./)
end

def consonants(name_str)
reg = Oniguruma::ORegexp.new(‘[bcdfghjklmnpqrstvwx]|(?<=[aeiou])y|
y(?=[aeiou])’)
reg.match_all(name_str).to_s.scan(/./)
end

(Note, the .scan(/./) can be eliminated to return an array)

The major problem was getting the code to accurately treat “y” as a
consonant. The key to solving this problem was to:

  1. define unconditional consonants explicitly (i.e.,
    [bcdfghjklmnpqrstvwx]) – not as [^aeiou] which automatically includes
    “y” thus OVER-RIDING any conditional reatment of “y” that follows

  2. define conditional “y” regexp assertions independently, i.e., “| (?
    <=[aeiou]) y | y (?=[aeiou])” – not “|(?<=[aeiou]) y (?=[aeiou])”
    which only matches “y” preceded AND followed by a vowel, not preceded
    OR followed by a vowel

HTH.