I am trying to build a regex to extract vowels and consonants from a
string. So far, I am able to extract the basic a-e-i-o-u sequence
using the following extension to the String class:
class String
def vowels
scan(/[aeiou]/i)
end
def consonants
scan(/[^aeiou]/i)
end
end
examples:
“Mary had a little lamb”.vowels
=> aaaiea
“Mary had a little lamb”.consonants
=> mryhdlttllmb
However, the regex does not accommodate the conditional treatment of
‘y’ as a vowel if there is no other vowel before or after it. If
properly implemented, the previous examples would return: ayaaiea
(vowels) and mrhdlttllmb (consonants).
According to this post (http://www.perlmonks.org/?node_id=592867),
this could be accommodated in Perl using “zero-width negative-look-
behind” and “zero-width negative-look-ahead” assertions as follows:
my @vowels = ( /[aeiou]|(?<![aeiou])y(?![aeiou])/gi );
Where, the “(?<!..)” is a “zero-width negative-look-behind assertion”
and the “(?!..)” is a “zero-width negative-look-ahead assertion”.
I have since discovered that Ruby 1.8 lacks regex look-behind
assertion so one can’t simply translate this code fragment to Ruby
regex syntax.
So, the question is: how can I accomplish the end result in Ruby (a-e-
i-o-u + the conditional treatment of ‘y’ as a vowel if there is no
other vowel before or after it.) ? Any thoughts are appreciated.
Thanks for the heads-up on the gem Jeremy. I was neither bold enough
to attempt a custom Ruby build nor to jump to Ruby 1.9 just yet, so
this is a perfect alternative.
I installed the gem and the C lib, and placed ‘require oniguruma’ in
application.rb, but I am receiving a ‘MissingSourceFile (no such file
to load – oniguruma)’ error during application load. Here’s an
excerpt from the Rails console:
Machine:appdir User$ script/server
=> Booting Mongrel (use ‘script/server webrick’ to force WEBrick)
=> Rails application starting on http://0.0.0.0:3000
=> Call with -d to detach
=> Ctrl-C to shutdown server
** Starting Mongrel listening at 0.0.0.0:3000
** Starting Rails with development environment…
** Rails loaded.
** Loading any Rails specific GemPlugins
** Signals ready. TERM => stop. USR2 => restart. INT => stop (no
restart).
** Rails signals registered. HUP => reload (without restart). It
might not work well.
** Mongrel available at 0.0.0.0:3000
** Use CTRL-C to stop.
MissingSourceFile (no such file to load – oniguruma):
/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in gem_original_require' /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in require’
/usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/
active_support/dependencies.rb:496:in require' /usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/ active_support/dependencies.rb:342:in new_constants_in’
/usr/local/lib/ruby/gems/1.8/gems/activesupport-2.0.2/lib/
active_support/dependencies.rb:496:in `require’
/app/controllers/application.rb:9
Line 9 of application.rb contains “require ‘oniguruma’.”
I followed the standard install process for installing the onig-5.9.1
package:
`cd’ to the directory containing the package’s source code, type
sudo ./configure
Type `make’ to compile the package.
Type `make install’ to install the programs, data files and
documentation.
I am running Rails 2.0.2 and Ruby is installed at usr/local on my
system (which is consistent with the default oniguruma install
location of usr/local/bin and /usr/local/man) so I am at a loss for an
explanation of the error. Any thoughts?
Thanks for all the help everyone. The problem was solved with the help
from pullmonkey on Rails Forum! Here is the solution:
Objective:
Extract vowels and consonants from a string
Handle the conditional treatment of ‘y’ as a vowel under the
following circumstances:
y is a vowel if it is surrounded by consonants
y is a consonant if it is adjacent to a vowel
Here is the code that works:
def vowels(name_str)
reg = Oniguruma::ORegexp.new(’[aeiou]|(?<![aeiou])y(?![aeiou])’)
reg.match_all(name_str).to_s.scan(/./)
end
def consonants(name_str)
reg = Oniguruma::ORegexp.new(’[bcdfghjklmnpqrstvwx]|(?<=[aeiou])y|
y(?=[aeiou])’)
reg.match_all(name_str).to_s.scan(/./)
end
(Note, the .scan(/./) can be eliminated to return an array)
The major problem was getting the code to accurately treat “y” as a
consonant. The key to solving this problem was to:
define unconditional consonants explicitly (i.e.,
[bcdfghjklmnpqrstvwx]) – not as [^aeiou] which automatically includes
“y” thus OVER-RIDING any conditional reatment of “y” that follows
define conditional “y” regexp assertions independently, i.e., “| (?
<=[aeiou]) y | y (?=[aeiou])” – not “|(?<=[aeiou]) y (?=[aeiou])”
which only matches “y” preceded AND followed by a vowel, not preceded
OR followed by a vowel
OK, solved the “MissingSourceFile” error by re-installing the gem. Now
I am receiving the following error:
** Starting Mongrel listening at 0.0.0.0:3000
** Starting Rails with development environment…
Exiting
/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in gem_original_require': ./lib/string_extensions.rb:4: undefined (?...) sequence: /[aeiou]|(?<![aeiou])y(?![aeiou])/ (SyntaxError) ./lib/string_extensions.rb:8: undefined (?...) sequence: /![aeiou]|(? <=[aeiou])y(?=[aeiou])/ from /usr/local/lib/ruby/site_ruby/1.8/ rubygems/custom_require.rb:27:in require’
It seems to be complaining about the look-behind and look-ahead
assertions in the following code fragment:
class String
def vowels
scan(/[aeiou]|(?<![aeiou])y(?![aeiou])/i)
end
def consonants
scan(/![aeiou]|(?<=[aeiou])y(?=[aeiou])/i)
end
end
According to this reference (サービス終了のお知らせ
doc/RE.txt) the look behind and look ahead syntax appear to be correct
(ref section 7. Extended groups). This suggests either:
A. Ruby may be using the default regexp library instead of the
oniguruma regexp library,
B. The oniguruma regexp library is not accessible via the ‘scan’
method, or
C. Something else entirely
… hmmm …
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.