Forum: Ruby compare to strings

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Aeb5810616aafc577c999d769d3f1bd2?d=identicon&s=25 Clint Pidlubny (Guest)
on 2006-04-05 00:28
(Received via mailing list)
Hello,

What is the best approach to searching a string for another string?

For instance, I have:

url1 = 'http://www.url.com'
url2 = 'http://www.url.com/page'

If part of url1 is in url2, like above, I'd like to declare it a
match. I'm sure this happens using a regular expression, but my
experience is limited with them.

The other problem is that I'm not going to be looking for just one
url1, but I have an entire database table full of those to compare to
an entire database table of url2.

Any thoughts on approaching this problem are appreciated.

Thanks
Clint
1fba4539b6cafe2e60a2916fa184fc2f?d=identicon&s=25 unknown (Guest)
on 2006-04-05 02:39
(Received via mailing list)
Hi --

On Wed, 5 Apr 2006, Clint Pidlubny wrote:

> match. I'm sure this happens using a regular expression, but my
> experience is limited with them.
>
> The other problem is that I'm not going to be looking for just one
> url1, but I have an entire database table full of those to compare to
> an entire database table of url2.
>
> Any thoughts on approaching this problem are appreciated.

It's not a complete answer, but in case it helps: String has an
include? method:

   url2.include?(url1) => true


David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" chapters now available
from Manning Early Access Program! http://www.manning.com/books/black
Aeb5810616aafc577c999d769d3f1bd2?d=identicon&s=25 Clint Pidlubny (Guest)
on 2006-04-05 04:12
(Received via mailing list)
On 4/4/06, dblack@wobblini.net <dblack@wobblini.net> wrote:
> > url1 = 'http://www.url.com'
> > Any thoughts on approaching this problem are appreciated.
> David A. Black (dblack@wobblini.net)
> Ruby Power and Light, LLC (http://www.rubypowerandlight.com)
>
> "Ruby for Rails" chapters now available
> from Manning Early Access Program! http://www.manning.com/books/black
>
>

I can't think of why that wouldn't work. Thank you.

Clint
F0223b1193ecc3a935ce41a1edd72e42?d=identicon&s=25 zdennis (Guest)
on 2006-04-05 08:49
(Received via mailing list)
dblack@wobblini.net wrote:
>> url1 = 'http://www.url.com'
>> Any thoughts on approaching this problem are appreciated.
>
> It's not a complete answer, but in case it helps: String has an
> include? method:
>
>   url2.include?(url1) => true

Using String#include? is much faster then regexp matching. Here are some
benchmarks. I didn't test this with Oniguruma though, but I su

-- START CODE --
require 'benchmark'

url = "http://www.url.com/"
url2 = "http://www.url.com/page"

Benchmark.bm{ |x|
	x.report{ 100000.times { url2.include?( url ) } }
	x.report{ 100000.times { url2 =~ /#{url}/ } }
}
-- END CODE ---


Benchmark Windows ruby 1.8.4 (2005-12-24) [i386-mswin32]
C:\source\projects\ruby\strings>ruby temp.rb
      user     system      total        real
  0.080000   0.000000   0.080000 (  0.080000)
  1.722000   0.130000   1.852000 (  1.873000)


Benchmark Linux ruby 1.8.4 (2005-12-24) [i686-linux]
zdennis@lima:~$ ruby-1.8.4 temp.rb
      user     system      total        real
  0.100000   0.000000   0.100000 (  0.119403)
  1.570000   0.040000   1.610000 (  1.760446)


Benchmark Linux ruby 1.8.3 (2005-06-23) [i486-linux]
zdennis@lima:~$ ruby temp.rb
      user     system      total        real
  0.160000   0.030000   0.190000 (  0.209436)
  1.720000   0.080000   1.800000 (  2.021754)


Benchmark Linux ruby 1.8.2 (2005-04-11) [i386-linux]
zdennis@jboss:~$ ruby temp.rb
      user     system      total        real
  0.000000   0.000000   0.000000 (  0.246239)
  0.000000   0.000000   0.000000 (  1.401049)


Zach
Aeb5810616aafc577c999d769d3f1bd2?d=identicon&s=25 Clint Pidlubny (Guest)
on 2006-04-06 01:24
(Received via mailing list)
>         x.report{ 100000.times { url2.include?( url ) } }
>
>       user     system      total        real
>
> Zach

Excellent info Zach. Very relevant for me. I'll have thousands of
links to do this with.

Thanks again,
Clint
18ca239ffade6df0b839d26062f173fb?d=identicon&s=25 Dominik Bathon (Guest)
on 2006-04-06 02:45
(Received via mailing list)
Hi,

On Thu, 06 Apr 2006 01:23:31 +0200, Clint Pidlubny
<clint.pidlubny@gmail.com> wrote:

> Excellent info Zach. Very relevant for me. I'll have thousands of
> links to do this with.

$ cat str_inc_bench.rb
require 'benchmark'

url = "http://www.url.com/"
url2 = "http://www.url.com/page"
urlrx = /#{url}/

Benchmark.bm{ |x|
x.report{ 100000.times { url2.include?( url ) } }
x.report{ 100000.times { url2 =~ urlrx } }
x.report{ 100000.times { url2 =~ /#{url}/ } }
}
$ ruby -v str_inc_bench.rb
ruby 1.8.4 (2005-12-24) [i686-linux]
       user     system      total        real
   0.070000   0.000000   0.070000 (  0.071435)
   0.130000   0.000000   0.130000 (  0.130016)
   1.130000   0.020000   1.150000 (  1.182629)

So, regular expression matching itself is not that much slower than
String#include?.
What makes "url2 =~ /#{url}/" slow is the creation of so many Regexp
objects.

I just wanted to point that out.

Dominik
B44ab09b79ee4a0cc4b4ca69e10eeb3a?d=identicon&s=25 Brian Mitchell (Guest)
on 2006-04-06 03:15
(Received via mailing list)
On 4/5/06, Dominik Bathon <dbatml@gmx.de> wrote:
> So, regular expression matching itself is not that much slower than
> String#include?.
> What makes "url2 =~ /#{url}/" slow is the creation of so many Regexp
> objects.

Ruby has some very subtle optimizations for Regexps too:

# ruby 1.8.4 (2006-03-20) [powerpc-darwin8.5.0]
GC.disable
n = ObjectSpace.each_object(Regexp){}
def foo; /abc/ end
# Note I didn't call foo.
ObjectSpace.each_object(Regexp){} - n #=> 1
1000.times {foo}
ObjectSpace.each_object(Regexp){} - n #=> 1

It is always nice to see simple optimizations like this.

Brian.
This topic is locked and can not be replied to.