Regexp to esclude substrings

I’m trying to resolve a very simple problem, but I’m encountering some
issues.

Input: “this is a string where you can find some words”
Output: [this, string, where, can, find, some, words]

So I’m trying to eliminate ’ is ', ’ a ’ and ’ you ’ from the string
(note the spaces, ‘this’ shouldn’t be eliminated just because it
includes ‘is’).

I tried the following (and many others), but it doens’t work:
my_string = input_string.gsub!(/\s+(is|a|you)\s+/i,’ ').split

Any hints?

On 25-May-06, at 1:24 AM, vincent wrote:

I tried the following (and many others), but it doens’t work:
my_string = input_string.gsub!(/\s+(is|a|you)\s+/i,

my_string = input_string.gsub!(/\s+\b(is|a|you)\b\s+/i, ’ ').split


Jeremy T.
[email protected]

“The proof is the proof that the proof has been proven and that’s the
proof!” - Jean Chrétien

vincent [email protected] writes:

I tried the following (and many others), but it doens’t work:
my_string = input_string.gsub!(/\s+(is|a|you)\s+/i,’ ').split

Any hints?

Well, since you want an array of the words back, why don’t you do
something like

my_string.split.reject { |word| word =~/^(is|a|you)$/ }
or
my_string.split.reject { |word| %w(is a you).include? word }

Seems simpler to me than to replace the words in the string and then
split it into an array.

Tom

2006/5/25, vincent [email protected]:

I’m trying to resolve a very simple problem, but I’m encountering some
issues.

Input: “this is a string where you can find some words”
Output: [this, string, where, can, find, some, words]

irb(main):001:0> “this is a string where you can find some
words”.scan(/\w+/).reject {|w| case w; when “is”, “a”, “you”
then true else false end }
=> [“this”, “string”, “where”, “can”, “find”, “some”, “words”]

So I’m trying to eliminate ’ is ', ’ a ’ and ’ you ’ from the string
(note the spaces, ‘this’ shouldn’t be eliminated just because it
includes ‘is’).

I tried the following (and many others), but it doens’t work:
my_string = input_string.gsub!(/\s+(is|a|you)\s+/i,’ ').split

Anchoring at word boundaries helps:

irb(main):006:0> “this is a string where you can find some
words”.gsub(/\b(is|a|you)\s+/, ’ ').split /\s+/
=> [“this”, “string”, “where”, “can”, “find”, “some”, “words”]

This one is probably a bit more efficient

irb(main):011:0> require ‘enumerator’
=> true
irb(main):012:0> “this is a string where you can find some
words”.to_enum(:scan, /\w+/).select {|w| /^(is|a|you)$/ !~ w
}
=> [“this”, “string”, “where”, “can”, “find”, “some”, “words”]

There’s a multitude of other solutions as the number of answers
indicates. :slight_smile:

Kind regards

robert