Compare to strings

pachl · April 5, 2006, 12:28am

Hello,

What is the best approach to searching a string for another string?

For instance, I have:

url1 = ‘http://www.url.com’
url2 = ‘http://www.url.com/page’

If part of url1 is in url2, like above, I’d like to declare it a
match. I’m sure this happens using a regular expression, but my
experience is limited with them.

The other problem is that I’m not going to be looking for just one
url1, but I have an entire database table full of those to compare to
an entire database table of url2.

Any thoughts on approaching this problem are appreciated.

Thanks
Clint

pachl · April 5, 2006, 4:12am

On 4/4/06, [email protected] [email protected] wrote:

url1 = ‘http://www.url.com’
Any thoughts on approaching this problem are appreciated.
David A. Black ([email protected])
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

“Ruby for Rails” chapters now available
from Manning Early Access Program! Ruby for Rails

I can’t think of why that wouldn’t work. Thank you.

Clint

pachl · April 5, 2006, 2:39am

Hi –

On Wed, 5 Apr 2006, Clint P. wrote:

match. I’m sure this happens using a regular expression, but my
experience is limited with them.

The other problem is that I’m not going to be looking for just one
url1, but I have an entire database table full of those to compare to
an entire database table of url2.

Any thoughts on approaching this problem are appreciated.

It’s not a complete answer, but in case it helps: String has an
include? method:

url2.include?(url1) => true

David

–
David A. Black ([email protected])
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

“Ruby for Rails” chapters now available
from Manning Early Access Program! Ruby for Rails

pachl · April 6, 2006, 1:24am

    x.report{ 100000.times { url2.include?( url ) } }

  user     system      total        real

Zach

Excellent info Zach. Very relevant for me. I’ll have thousands of
links to do this with.

Thanks again,
Clint

pachl · April 5, 2006, 8:49am

[email protected] wrote:

url1 = ‘http://www.url.com’
Any thoughts on approaching this problem are appreciated.

It’s not a complete answer, but in case it helps: String has an
include? method:

url2.include?(url1) => true

Using String#include? is much faster then regexp matching. Here are some
benchmarks. I didn’t test this with Oniguruma though, but I su

– START CODE –
require ‘benchmark’

url = “http://www.url.com/”
url2 = “http://www.url.com/page”

Benchmark.bm{ |x|
x.report{ 100000.times { url2.include?( url ) } }
x.report{ 100000.times { url2 =~ /#{url}/ } }
}
– END CODE —

Benchmark Windows ruby 1.8.4 (2005-12-24) [i386-mswin32]
C:\source\projects\ruby\strings>ruby temp.rb
user system total real
0.080000 0.000000 0.080000 ( 0.080000)
1.722000 0.130000 1.852000 ( 1.873000)

Benchmark Linux ruby 1.8.4 (2005-12-24) [i686-linux]
zdennis@lima:~$ ruby-1.8.4 temp.rb
user system total real
0.100000 0.000000 0.100000 ( 0.119403)
1.570000 0.040000 1.610000 ( 1.760446)

Benchmark Linux ruby 1.8.3 (2005-06-23) [i486-linux]
zdennis@lima:~$ ruby temp.rb
user system total real
0.160000 0.030000 0.190000 ( 0.209436)
1.720000 0.080000 1.800000 ( 2.021754)

Benchmark Linux ruby 1.8.2 (2005-04-11) [i386-linux]
zdennis@jboss:~$ ruby temp.rb
user system total real
0.000000 0.000000 0.000000 ( 0.246239)
0.000000 0.000000 0.000000 ( 1.401049)

Zach

pachl · April 6, 2006, 2:45am

Hi,

On Thu, 06 Apr 2006 01:23:31 +0200, Clint P.
[email protected] wrote:

Excellent info Zach. Very relevant for me. I’ll have thousands of
links to do this with.

$ cat str_inc_bench.rb
require ‘benchmark’

url = “http://www.url.com/”
url2 = “http://www.url.com/page”
urlrx = /#{url}/

Benchmark.bm{ |x|
x.report{ 100000.times { url2.include?( url ) } }
x.report{ 100000.times { url2 =~ urlrx } }
x.report{ 100000.times { url2 =~ /#{url}/ } }
}
$ ruby -v str_inc_bench.rb
ruby 1.8.4 (2005-12-24) [i686-linux]
user system total real
0.070000 0.000000 0.070000 ( 0.071435)
0.130000 0.000000 0.130000 ( 0.130016)
1.130000 0.020000 1.150000 ( 1.182629)

So, regular expression matching itself is not that much slower than
String#include?.
What makes “url2 =~ /#{url}/” slow is the creation of so many Regexp
objects.

I just wanted to point that out.

Dominik

pachl · April 6, 2006, 3:15am

On 4/5/06, Dominik B. [email protected] wrote:

So, regular expression matching itself is not that much slower than
String#include?.
What makes “url2 =~ /#{url}/” slow is the creation of so many Regexp
objects.

Ruby has some very subtle optimizations for Regexps too:

ruby 1.8.4 (2006-03-20) [powerpc-darwin8.5.0]

GC.disable
n = ObjectSpace.each_object(Regexp){}
def foo; /abc/ end

Note I didn’t call foo.

ObjectSpace.each_object(Regexp){} - n #=> 1
1000.times {foo}
ObjectSpace.each_object(Regexp){} - n #=> 1

It is always nice to see simple optimizations like this.

Brian.