String contains one of these?


#1

Imagine,

I have

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}

for a given string, say “some stuff NL is chunky” i want determine which
of the matches it contains…

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it…but is there a simpler way ???

thanks in advance


#2

On Feb 27, 2006, at 7:35 AM, mikkel wrote:

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it…but is there a simpler way ???

The hard way isn’t too hard and doesn’t require but a line of code:

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}
=> [“1D”, “2D”, “U16”, “U19”, “LR”, “RR”, “JNL”, “NL”]

str=“some stuff NL is chunky”
=> “some stuff NL is chunky”

leagues.find_all { |league| str.include? league }
=> [“NL”]

Hope that helps.

James Edward G. II


#3

2006/2/27, mikkel removed_email_address@domain.invalid:

array and do a ~= on it…but is there a simpler way ???

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}
=> [“1D”, “2D”, “U16”, “U19”, “LR”, “RR”, “JNL”, “NL”]

“some stuff NL is chunky”.scan( Regexp.new( leagues.join(’|’) ) )
=> [“NL”]

Kind regards

robert


#4

On Feb 27, 2006, at 2:35 PM, mikkel wrote:

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it…but is there a simpler way ???

How about:

leagues = %w{1D 2D U16 U19 LR RR JNL NL}
words = “some stuff NL is chunky”.split
leagues.select { |m| words.include?(m) } # => [“NL”]

– Daniel


#5

Christian N. wrote:

which of the matches it contains…
Regexp.union(*leagues) => [“NL”]
Even better! Didn’t know about that method. Learn something new every
day. Thanks!

robert

#6

amazing…

thanks a bunch everybody…

On Tuesday, February 28, 2006, at 2:53 AM, Robert K. wrote:

for a given string, say “some stuff NL is chunky” i want determine
irb(main):002:0> “some stuff NL is chunky”.scan
Regexp.union(*leagues) => [“NL”]

Even better! Didn’t know about that method. Learn something new every
day. Thanks!

robert

Mikkel B.

www.strongside.dk - Football Portal(DK)
nflfeed.helenius.org - Football News(DK)
ting.minline.dk - Buy Old Stuff!(DK)


#7

“Robert K.” removed_email_address@domain.invalid writes:

now, the hard way (more code, less thought) would be to iterate the
array and do a ~= on it…but is there a simpler way ???

leagues=%w{ 1D 2D U16 U19 LR RR JNL NL}
=> [“1D”, “2D”, “U16”, “U19”, “LR”, “RR”, “JNL”, “NL”]

“some stuff NL is chunky”.scan( Regexp.new( leagues.join(’|’) ) )
=> [“NL”]

irb(main):002:0> “some stuff NL is chunky”.scan Regexp.union(*leagues)
=> [“NL”]


#8

And the winner is …


#9

You’ve got a bunch of great answers already, but here’s another option.

leagues = %w(1D 2D U16 U19 LR RR JNL NL)
words = “some stuff NL is chunky”

irb(main):008:0> words.split & leagues
=> [“NL”]


#10

removed_email_address@domain.invalid wrote:

Actually if you flip it around as ‘leagues & words.split’ it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

Is it possible that link is incorrect?

“Firefox can’t establish a connection to the server at www.jasani.org.”

I’m dying to learn about this now. :slight_smile:


#11

Actually if you flip it around as ‘leagues & words.split’ it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.


#12

Jeffrey, the link should be working for you now. My hosting provider
had a number of servers go belly up earlier in the day.


#13

“removed_email_address@domain.invalid” removed_email_address@domain.invalid writes:

Actually if you flip it around as ‘leagues & words.split’ it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

You’d better cache those Regexps.
Also, test with longer “words”—they are more likely to grow than the
number of leagues.


#14

On Feb 28, 2006, at 11:03 PM, removed_email_address@domain.invalid wrote:

Mmmm… I hope JEG II is watching this thread as I’m wondering if
there isn’t a rubyquiz in here somewhere.

If you can think of a good way to spin it:

removed_email_address@domain.invalid

James Edward G. II


#15

removed_email_address@domain.invalid wrote:

Actually if you flip it around as ‘leagues & words.split’ it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

I am surprised by the scan failures (“could not continue test”). Do you
know what causes the error?


#16

Good comments Christian. I thought I’d just hack a set of tests and
post some quick results, but all I think I did was prove that I’m not
supposed to be coding first thing in the morning.

Yes, once you fix that bug in the code by caching the Regexes, they
perform very impressively. In fact, for the modified tests I just ran,
they beat out every other solution in every case except for extremely
large league sizes (> 18,000 elements) where they wouldn’t run at all.
But I’m loathe to draw any conclusions from the data just yet. (Burned
once, twice shy?)

There appears to be at least one other bug in the code. One astute,
anonymous person pointed out that the six solutions will not return the
same results for the generated datasets and that preprocessing of input
data could help improve performance even more. I think it all depends
on how one defines the problem as to whether the generated data is
valid input or undefined requirements now being levied on the code.

Mmmm… I hope JEG II is watching this thread as I’m wondering if
there isn’t a rubyquiz in here somewhere.


#17

Jeffrey S. removed_email_address@domain.invalid writes:

removed_email_address@domain.invalid wrote:

Actually if you flip it around as ‘leagues & words.split’ it turns out
to have some significant performance advantages in many cases. See
http://www.jasani.org/articles/2006/02/28/adding-the-science-back-to-computer-science
for more details.

I am surprised by the scan failures (“could not continue test”). Do
you know what causes the error?

irb(main):002:0> Regexp.new “x”*600_000
RegexpError: regular expression too big: /xxxxx…