Forum: Ruby string contains one of these???

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
C7669e8b5676f61fdf202230cbcf72d8?d=identicon&s=25 mikkel (Guest)
on 2006-02-27 14:35
Imagine,

I have

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

for a given string, say "some stuff NL is chunky" i want determine which
of the matches it contains...

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it...but is there a simpler way ???

thanks in advance
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-02-27 14:44
(Received via mailing list)
On Feb 27, 2006, at 7:35 AM, mikkel wrote:

> now, the hard way (more code, less thought) would be to iterate the
> array and do a ~= on it...but is there a simpler way ???

The hard way isn't too hard and doesn't require but a line of code:

 >> leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}
=> ["1D", "2D", "U16", "U19", "LR", "RR", "JNL", "NL"]
 >> str="some stuff NL is chunky"
=> "some stuff NL is chunky"
 >> leagues.find_all { |league| str.include? league }
=> ["NL"]

Hope that helps.

James Edward Gray II
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-02-27 14:44
(Received via mailing list)
2006/2/27, mikkel <mikkel@helenius.dk>:
> array and do a ~= on it...but is there a simpler way ???
>> leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}
=> ["1D", "2D", "U16", "U19", "LR", "RR", "JNL", "NL"]
>> "some stuff NL is chunky".scan( Regexp.new( leagues.join('|') ) )
=> ["NL"]

Kind regards

robert
9358cc96c46055cd68d4a76a9aefe026?d=identicon&s=25 Daniel Harple (Guest)
on 2006-02-27 14:44
(Received via mailing list)
On Feb 27, 2006, at 2:35 PM, mikkel wrote:

> now, the hard way (more code, less thought) would be to iterate the
> array and do a ~= on it...but is there a simpler way ???

How about:

leagues = %w{1D 2D U16 U19 LR RR JNL NL}
words     = "some stuff NL is chunky".split
leagues.select { |m| words.include?(m) } # => ["NL"]

-- Daniel
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 Christian Neukirchen (Guest)
on 2006-02-27 17:05
(Received via mailing list)
"Robert Klemme" <shortcutter@googlemail.com> writes:

>> now, the hard way (more code, less thought) would be to iterate the
>> array and do a ~= on it...but is there a simpler way ???
>
>>> leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}
> => ["1D", "2D", "U16", "U19", "LR", "RR", "JNL", "NL"]
>>> "some stuff NL is chunky".scan( Regexp.new( leagues.join('|') ) )
> => ["NL"]

irb(main):002:0>  "some stuff NL is chunky".scan Regexp.union(*leagues)
=> ["NL"]
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 Robert Klemme (Guest)
on 2006-02-27 18:56
(Received via mailing list)
Christian Neukirchen wrote:
>>> which of the matches it contains...
> Regexp.union(*leagues) => ["NL"]
Even better!  Didn't know about that method.  Learn something new every
day.  Thanks!

    robert
9539774fe19b5268e39ee6cf3ca19b71?d=identicon&s=25 Mikkel Bruun (Guest)
on 2006-02-27 22:08
(Received via mailing list)
amazing...

thanks a bunch everybody...


On Tuesday, February 28, 2006, at 2:53 AM, Robert Klemme wrote:
>>>> for a given string, say "some stuff NL is chunky" i want determine
>> irb(main):002:0>  "some stuff NL is chunky".scan
>> Regexp.union(*leagues) => ["NL"]
>
>Even better!  Didn't know about that method.  Learn something new every
>day.  Thanks!
>
>    robert
>
>


Mikkel Bruun

www.strongside.dk    - Football Portal(DK)
nflfeed.helenius.org - Football News(DK)
ting.minline.dk      - Buy Old Stuff!(DK)
8b8e05733c1cd279f9e48e22405f4803?d=identicon&s=25 hitesh.jasani@gmail.com (Guest)
on 2006-02-28 06:15
(Received via mailing list)
You've got a bunch of great answers already, but here's another option.

leagues = %w(1D 2D U16 U19 LR RR JNL NL)
words = "some stuff NL is chunky"

irb(main):008:0> words.split & leagues
=> ["NL"]


- Hitesh
http://www.jasani.org/
Ff7c333574fd4cd19f5925f6c71abda5?d=identicon&s=25 Johan Veenstra (Guest)
on 2006-02-28 10:43
(Received via mailing list)
And the winner is ...
8b8e05733c1cd279f9e48e22405f4803?d=identicon&s=25 hitesh.jasani@gmail.com (Guest)
on 2006-02-28 14:36
(Received via mailing list)
Actually if you flip it around as 'leagues & words.split' it turns out
to have some significant performance advantages in many cases.  See
http://www.jasani.org/articles/2006/02/28/adding-t...
for more details.

- Hitesh
http://www.jasani.org/
149379873fe2cb70e550c6bff8fedd0c?d=identicon&s=25 Jeffrey Schwab (Guest)
on 2006-02-28 17:28
(Received via mailing list)
hitesh.jasani@gmail.com wrote:
> Actually if you flip it around as 'leagues & words.split' it turns out
> to have some significant performance advantages in many cases.  See
> http://www.jasani.org/articles/2006/02/28/adding-t...
> for more details.
>
> - Hitesh
> http://www.jasani.org/

Is it possible that link is incorrect?

"Firefox can't establish a connection to the server at www.jasani.org."

I'm dying to learn about this now. :)
8b8e05733c1cd279f9e48e22405f4803?d=identicon&s=25 hitesh.jasani@gmail.com (Guest)
on 2006-03-01 02:59
(Received via mailing list)
Jeffrey, the link should be working for you now.  My hosting provider
had a number of servers go belly up earlier in the day.
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 Christian Neukirchen (Guest)
on 2006-03-01 03:26
(Received via mailing list)
"hitesh.jasani@gmail.com" <hitesh.jasani@gmail.com> writes:

> Actually if you flip it around as 'leagues & words.split' it turns out
> to have some significant performance advantages in many cases.  See
> http://www.jasani.org/articles/2006/02/28/adding-t...
> for more details.

You'd better cache those Regexps.
Also, test with longer "words"---they are more likely to grow than the
number of leagues.
8b8e05733c1cd279f9e48e22405f4803?d=identicon&s=25 hitesh.jasani@gmail.com (Guest)
on 2006-03-01 06:03
(Received via mailing list)
Good comments Christian.  I thought I'd just hack a set of tests and
post some quick results, but all I think I did was prove that I'm not
supposed to be coding first thing in the morning.

Yes, once you fix that bug in the code by caching the Regexes, they
perform very impressively.  In fact, for the modified tests I just ran,
they beat out every other solution in every case except for extremely
large league sizes (> 18,000 elements) where they wouldn't run at all.
But I'm loathe to draw any conclusions from the data just yet.  (Burned
once, twice shy?)

There appears to be at least one other bug in the code.  One astute,
anonymous person pointed out that the six solutions will not return the
same results for the generated datasets and that preprocessing of input
data could help improve performance even more.  I think it all depends
on how one defines the problem as to whether the generated data is
valid input or undefined requirements now being levied on the code.

Mmmm.... I hope JEG II is watching this thread as I'm wondering if
there isn't a rubyquiz in here somewhere.

- Hitesh
http://www.jasani.org/
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-03-01 15:21
(Received via mailing list)
On Feb 28, 2006, at 11:03 PM, hitesh.jasani@gmail.com wrote:

> Mmmm.... I hope JEG II is watching this thread as I'm wondering if
> there isn't a rubyquiz in here somewhere.

If you can think of a good way to spin it:

suggestion@rubyquiz.com

James Edward Gray II
149379873fe2cb70e550c6bff8fedd0c?d=identicon&s=25 Jeffrey Schwab (Guest)
on 2006-03-01 17:08
(Received via mailing list)
hitesh.jasani@gmail.com wrote:
> Actually if you flip it around as 'leagues & words.split' it turns out
> to have some significant performance advantages in many cases.  See
> http://www.jasani.org/articles/2006/02/28/adding-t...
> for more details.

I am surprised by the scan failures ("could not continue test").  Do you
know what causes the error?
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 Christian Neukirchen (Guest)
on 2006-03-01 17:53
(Received via mailing list)
Jeffrey Schwab <jeff@schwabcenter.com> writes:

> hitesh.jasani@gmail.com wrote:
>> Actually if you flip it around as 'leagues & words.split' it turns out
>> to have some significant performance advantages in many cases.  See
>> http://www.jasani.org/articles/2006/02/28/adding-t...
>> for more details.
>
> I am surprised by the scan failures ("could not continue test").  Do
> you know what causes the error?

irb(main):002:0> Regexp.new "x"*600_000
RegexpError: regular expression too big: /xxxxx...
This topic is locked and can not be replied to.