Forum: Ruby Boyer-Moore string search algorithm in ruby

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
(Guest)
on 2007-07-22 07:25
(Received via mailing list)
I came across this post:
http://www.arnebrasseur.net/2007/02/26/boyer-moore...

Does anyone know if this is already part of Ruby open source library?
I did not find anything on Google. TIA.
Nobuyoshi N. (Guest)
on 2007-07-22 13:22
(Received via mailing list)
Hi,

At Sun, 22 Jul 2007 12:25:01 +0900,
removed_email_address@domain.invalid wrote in [ruby-talk:261195]:
> Does anyone know if this is already part of Ruby open source library?
> I did not find anything on Google. TIA.

String#index uses Karp-Rabin algorithm, not Boyer-Moore.
Yukihiro M. (Guest)
on 2007-07-22 16:30
(Received via mailing list)
Hi,

In message "Re: Boyer-Moore string search algorithm in ruby"
    on Sun, 22 Jul 2007 18:21:18 +0900, "Nobuyoshi N."
<removed_email_address@domain.invalid> writes:

|removed_email_address@domain.invalid wrote in [ruby-talk:261195]:
|> Does anyone know if this is already part of Ruby open source library?
|> I did not find anything on Google. TIA.
|
|String#index uses Karp-Rabin algorithm, not Boyer-Moore.

1.8 regex engine uses modified Boyer-Moore search algorithm.

              matz.
Peter S. (Guest)
on 2007-07-22 22:06
(Received via mailing list)
In message
<removed_email_address@domain.invalid>,
"Nobuyoshi N." wr
ites:
>At Sun, 22 Jul 2007 12:25:01 +0900,
>removed_email_address@domain.invalid wrote in [ruby-talk:261195]:
>> Does anyone know if this is already part of Ruby open source library?
>> I did not find anything on Google. TIA.

>String#index uses Karp-Rabin algorithm, not Boyer-Moore.

That strikes me as secondary; presumably, the goal is to implement it in
Ruby
code, not to determine what algorithm the language internals use, no?

-s
John J. (Guest)
on 2007-07-23 07:04
(Received via mailing list)
On Jul 22, 2007, at 1:04 PM, Peter S. wrote:

>> String#index uses Karp-Rabin algorithm, not Boyer-Moore.
>
> That strikes me as secondary; presumably, the goal is to implement
> it in Ruby
> code, not to determine what algorithm the language internals use, no?
>
> -s
>
Perhaps if you look into the code for one of those functions, you
might find that it is done in Ruby, you might not.
Peter S. (Guest)
on 2007-07-23 07:08
(Received via mailing list)
In message <removed_email_address@domain.invalid>, John J.
writes:
>Perhaps if you look into the code for one of those functions, you
>might find that it is done in Ruby, you might not.

True.

Normally, when someone asks for an algorithm implemented in a language,
a library-style implementation will be a poor fit, as most library
implementations are tuned for speed rather than readability -- perhaps
less true in Ruby, I admit.

-s
Nate M. (Guest)
on 2010-10-27 18:16
Hey all, I realize this thread is a few years old, but I just wrote a
Ruby library for doing Boyer-Moore string searching which you can find
here:
http://www.xcombinator.com/2010/10/27/boyer-moore-...
.

It's basically a port of the current wikipedia c-code, but it also
supports searching token arrays and the tokens can be regular
expressions. For instance:

Usage: BoyerMoore.search(haystack, needle)   # returns index of needle
or nil

Examples:

    BoyerMoore.search("ANPANMAN", "ANP")   # => 0
    BoyerMoore.search("ANPANMAN", "ANPXX") # => nil
    BoyerMoore.search(["<b>", "hi", "</b>"], ["hi"])         # => 1
    BoyerMoore.search(["bam", "foo", "bar"], ["foo", "bar"]) # => 1
    BoyerMoore.search(["bam", "bar", "baz"], ["foo"])        # => nil
    BoyerMoore.search(["Sing", "99", "Luftballon"], [/\d+/]) == 1
    BoyerMoore.search(["Nate M.", "5 Pine Street", "Los A.",
"CA", "90210"], [/^\w{2}$/, /^\d{5}$/]) == 3

The regexp matching is fairly slow, so be wary of using it.

Hope this helps someone!

Nate
This topic is locked and can not be replied to.