String contains one of these?

mikkel · February 27, 2006, 2:35pm

Imagine,

I have

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

for a given string, say “some stuff NL is chunky” i want determine which
of the matches it contains…

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it…but is there a simpler way ???

thanks in advance

mikkel · February 27, 2006, 2:44pm

On Feb 27, 2006, at 7:35 AM, mikkel wrote:

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it…but is there a simpler way ???

The hard way isn’t too hard and doesn’t require but a line of code:

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}
=> [“1D”, “2D”, “U16”, “U19”, “LR”, “RR”, “JNL”, “NL”]

str=“some stuff NL is chunky”
=> “some stuff NL is chunky”

leagues.find_all { |league| str.include? league }
=> [“NL”]

Hope that helps.

James Edward G. II

mikkel · February 27, 2006, 2:44pm

2006/2/27, mikkel [email protected]:

array and do a ~= on it…but is there a simpler way ???

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}
=> [“1D”, “2D”, “U16”, “U19”, “LR”, “RR”, “JNL”, “NL”]
“some stuff NL is chunky”.scan( Regexp.new( leagues.join(‘|’) ) )
=> [“NL”]

Kind regards

robert

mikkel · February 27, 2006, 2:44pm

On Feb 27, 2006, at 2:35 PM, mikkel wrote:

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it…but is there a simpler way ???

How about:

leagues = %w{1D 2D U16 U19 LR RR JNL NL}
words = “some stuff NL is chunky”.split
leagues.select { |m| words.include?(m) } # => [“NL”]

– Daniel

mikkel · February 27, 2006, 6:56pm

Christian N. wrote:

which of the matches it contains…
Regexp.union(*leagues) => [“NL”]
Even better! Didn’t know about that method. Learn something new every
day. Thanks!

robert

mikkel · February 27, 2006, 10:08pm

amazing…

thanks a bunch everybody…

On Tuesday, February 28, 2006, at 2:53 AM, Robert K. wrote:

for a given string, say “some stuff NL is chunky” i want determine
irb(main):002:0> “some stuff NL is chunky”.scan
Regexp.union(*leagues) => [“NL”]

Even better! Didn’t know about that method. Learn something new every
day. Thanks!

robert

Mikkel B.

www.strongside.dk - Football Portal(DK)
nflfeed.helenius.org - Football News(DK)
ting.minline.dk - Buy Old Stuff!(DK)

mikkel · February 27, 2006, 5:05pm

“Robert K.” [email protected] writes:

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it…but is there a simpler way ???

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}
=> [“1D”, “2D”, “U16”, “U19”, “LR”, “RR”, “JNL”, “NL”]
“some stuff NL is chunky”.scan( Regexp.new( leagues.join(‘|’) ) )
=> [“NL”]

irb(main):002:0> “some stuff NL is chunky”.scan Regexp.union(*leagues)
=> [“NL”]

mikkel · February 28, 2006, 10:43am

And the winner is …

mikkel · February 28, 2006, 6:15am

You’ve got a bunch of great answers already, but here’s another option.

leagues = %w(1D 2D U16 U19 LR RR JNL NL)
words = “some stuff NL is chunky”

irb(main):008:0> words.split & leagues
=> [“NL”]

Hitesh
http://www.jasani.org/

mikkel · February 28, 2006, 5:28pm

[email protected] wrote:

Actually if you flip it around as ‘leagues & words.split’ it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

Hitesh
http://www.jasani.org/

Is it possible that link is incorrect?

“Firefox can’t establish a connection to the server at www.jasani.org.”

I’m dying to learn about this now.

mikkel · February 28, 2006, 2:36pm

Actually if you flip it around as ‘leagues & words.split’ it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

Hitesh
http://www.jasani.org/

mikkel · March 1, 2006, 2:59am

Jeffrey, the link should be working for you now. My hosting provider
had a number of servers go belly up earlier in the day.

mikkel · March 1, 2006, 3:26am

“[email protected]” [email protected] writes:

Actually if you flip it around as ‘leagues & words.split’ it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

You’d better cache those Regexps.
Also, test with longer “words”—they are more likely to grow than the
number of leagues.

mikkel · March 1, 2006, 3:21pm

On Feb 28, 2006, at 11:03 PM, [email protected] wrote:

Mmmm… I hope JEG II is watching this thread as I’m wondering if
there isn’t a rubyquiz in here somewhere.

If you can think of a good way to spin it:

[email protected]

James Edward G. II

mikkel · March 1, 2006, 5:08pm

[email protected] wrote:

Actually if you flip it around as ‘leagues & words.split’ it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

I am surprised by the scan failures (“could not continue test”). Do you
know what causes the error?

mikkel · March 1, 2006, 6:03am

Good comments Christian. I thought I’d just hack a set of tests and
post some quick results, but all I think I did was prove that I’m not
supposed to be coding first thing in the morning.

Yes, once you fix that bug in the code by caching the Regexes, they
perform very impressively. In fact, for the modified tests I just ran,
they beat out every other solution in every case except for extremely
large league sizes (> 18,000 elements) where they wouldn’t run at all.
But I’m loathe to draw any conclusions from the data just yet. (Burned
once, twice shy?)

There appears to be at least one other bug in the code. One astute,
anonymous person pointed out that the six solutions will not return the
same results for the generated datasets and that preprocessing of input
data could help improve performance even more. I think it all depends
on how one defines the problem as to whether the generated data is
valid input or undefined requirements now being levied on the code.

Mmmm… I hope JEG II is watching this thread as I’m wondering if
there isn’t a rubyquiz in here somewhere.

Hitesh
http://www.jasani.org/

mikkel · March 1, 2006, 5:53pm

Jeffrey S. [email protected] writes:

[email protected] wrote:

Actually if you flip it around as ‘leagues & words.split’ it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

I am surprised by the scan failures (“could not continue test”). Do
you know what causes the error?

irb(main):002:0> Regexp.new “x”*600_000
RegexpError: regular expression too big: /xxxxx…