Forum: Ruby Regexp to esclude substrings

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
C6501639eba1da02ad1ab10f6b8653df?d=identicon&s=25 vincent (Guest)
on 2006-05-25 07:24
I'm trying to resolve a very simple problem, but I'm encountering some
issues.

Input: "this is a string where you can find some words"
Output: [this, string, where, can, find, some, words]

So I'm trying to eliminate ' is ', ' a ' and ' you ' from the string
(note the spaces, 'this' shouldn't be eliminated just because it
includes 'is').

I tried the following (and many others), but it doens't work:
my_string = input_string.gsub!(/\s+(is|a|you)\s+/i,' ').split

Any hints?
96950c5dec0af04a5c12dae973e96cc7?d=identicon&s=25 Jeremy Tregunna (Guest)
on 2006-05-25 07:42
(Received via mailing list)
On 25-May-06, at 1:24 AM, vincent wrote:

> I tried the following (and many others), but it doens't work:
> my_string = input_string.gsub!(/\s+(is|a|you)\s+/i,

my_string = input_string.gsub!(/\s+\b(is|a|you)\b\s+/i, ' ').split

--
Jeremy Tregunna
jtregunna@blurgle.ca


"The proof is the proof that the proof has been proven and that's the
proof!" - Jean Chrétien
188ff29b8682ec3a04e88d85a427300d?d=identicon&s=25 Tom Rauchenwald (Guest)
on 2006-05-25 07:57
(Received via mailing list)
vincent <dontspam@dontspam.com> writes:

> I tried the following (and many others), but it doens't work:
> my_string = input_string.gsub!(/\s+(is|a|you)\s+/i,' ').split
>
> Any hints?

Well, since you want an array of the words back, why don't you do
something like

my_string.split.reject { |word| word =~/^(is|a|you)$/ }
or
my_string.split.reject { |word| %w(is a you).include? word }

Seems simpler to me than to replace the words in the string and then
split it into an array.

Tom
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-05-26 12:36
(Received via mailing list)
2006/5/25, vincent <dontspam@dontspam.com>:
> I'm trying to resolve a very simple problem, but I'm encountering some
> issues.
>
> Input: "this is a string where you can find some words"
> Output: [this, string, where, can, find, some, words]

irb(main):001:0> "this is a string where you can find some
words".scan(/\w+/).reject {|w| case w; when "is", "a", "you"
 then true else false end }
=> ["this", "string", "where", "can", "find", "some", "words"]

> So I'm trying to eliminate ' is ', ' a ' and ' you ' from the string
> (note the spaces, 'this' shouldn't be eliminated just because it
> includes 'is').
>
> I tried the following (and many others), but it doens't work:
> my_string = input_string.gsub!(/\s+(is|a|you)\s+/i,' ').split

Anchoring at word boundaries helps:

irb(main):006:0> "this is a string where you can find some
words".gsub(/\b(is|a|you)\s+/, ' ').split /\s+/
=> ["this", "string", "where", "can", "find", "some", "words"]

This one is probably a bit more efficient

irb(main):011:0> require 'enumerator'
=> true
irb(main):012:0> "this is a string where you can find some
words".to_enum(:scan, /\w+/).select {|w| /^(is|a|you)$/ !~ w
}
=> ["this", "string", "where", "can", "find", "some", "words"]

There's a multitude of other solutions as the number of answers
indicates. :-)

Kind regards

robert
This topic is locked and can not be replied to.