Forum: Ruby compare to strings

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Clint P. (Guest)
on 2006-04-05 02:28
(Received via mailing list)
Hello,

What is the best approach to searching a string for another string?

For instance, I have:

url1 = 'http://www.url.com'
url2 = 'http://www.url.com/page'

If part of url1 is in url2, like above, I'd like to declare it a
match. I'm sure this happens using a regular expression, but my
experience is limited with them.

The other problem is that I'm not going to be looking for just one
url1, but I have an entire database table full of those to compare to
an entire database table of url2.

Any thoughts on approaching this problem are appreciated.

Thanks
Clint
unknown (Guest)
on 2006-04-05 04:39
(Received via mailing list)
Hi --

On Wed, 5 Apr 2006, Clint P. wrote:

> match. I'm sure this happens using a regular expression, but my
> experience is limited with them.
>
> The other problem is that I'm not going to be looking for just one
> url1, but I have an entire database table full of those to compare to
> an entire database table of url2.
>
> Any thoughts on approaching this problem are appreciated.

It's not a complete answer, but in case it helps: String has an
include? method:

   url2.include?(url1) => true


David

--
David A. Black (removed_email_address@domain.invalid)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" chapters now available
from Manning Early Access Program! http://www.manning.com/books/black
Clint P. (Guest)
on 2006-04-05 06:12
(Received via mailing list)
On 4/4/06, removed_email_address@domain.invalid 
<removed_email_address@domain.invalid> wrote:
> > url1 = 'http://www.url.com'
> > Any thoughts on approaching this problem are appreciated.
> David A. Black (removed_email_address@domain.invalid)
> Ruby Power and Light, LLC (http://www.rubypowerandlight.com)
>
> "Ruby for Rails" chapters now available
> from Manning Early Access Program! http://www.manning.com/books/black
>
>

I can't think of why that wouldn't work. Thank you.

Clint
zdennis (Guest)
on 2006-04-05 10:49
(Received via mailing list)
removed_email_address@domain.invalid wrote:
>> url1 = 'http://www.url.com'
>> Any thoughts on approaching this problem are appreciated.
>
> It's not a complete answer, but in case it helps: String has an
> include? method:
>
>   url2.include?(url1) => true

Using String#include? is much faster then regexp matching. Here are some
benchmarks. I didn't test this with Oniguruma though, but I su

-- START CODE --
require 'benchmark'

url = "http://www.url.com/"
url2 = "http://www.url.com/page"

Benchmark.bm{ |x|
	x.report{ 100000.times { url2.include?( url ) } }
	x.report{ 100000.times { url2 =~ /#{url}/ } }
}
-- END CODE ---


Benchmark Windows ruby 1.8.4 (2005-12-24) [i386-mswin32]
C:\source\projects\ruby\strings>ruby temp.rb
      user     system      total        real
  0.080000   0.000000   0.080000 (  0.080000)
  1.722000   0.130000   1.852000 (  1.873000)


Benchmark Linux ruby 1.8.4 (2005-12-24) [i686-linux]
zdennis@lima:~$ ruby-1.8.4 temp.rb
      user     system      total        real
  0.100000   0.000000   0.100000 (  0.119403)
  1.570000   0.040000   1.610000 (  1.760446)


Benchmark Linux ruby 1.8.3 (2005-06-23) [i486-linux]
zdennis@lima:~$ ruby temp.rb
      user     system      total        real
  0.160000   0.030000   0.190000 (  0.209436)
  1.720000   0.080000   1.800000 (  2.021754)


Benchmark Linux ruby 1.8.2 (2005-04-11) [i386-linux]
zdennis@jboss:~$ ruby temp.rb
      user     system      total        real
  0.000000   0.000000   0.000000 (  0.246239)
  0.000000   0.000000   0.000000 (  1.401049)


Zach
Clint P. (Guest)
on 2006-04-06 03:24
(Received via mailing list)
>         x.report{ 100000.times { url2.include?( url ) } }
>
>       user     system      total        real
>
> Zach

Excellent info Zach. Very relevant for me. I'll have thousands of
links to do this with.

Thanks again,
Clint
Dominik B. (Guest)
on 2006-04-06 04:45
(Received via mailing list)
Hi,

On Thu, 06 Apr 2006 01:23:31 +0200, Clint P.
<removed_email_address@domain.invalid> wrote:

> Excellent info Zach. Very relevant for me. I'll have thousands of
> links to do this with.

$ cat str_inc_bench.rb
require 'benchmark'

url = "http://www.url.com/"
url2 = "http://www.url.com/page"
urlrx = /#{url}/

Benchmark.bm{ |x|
x.report{ 100000.times { url2.include?( url ) } }
x.report{ 100000.times { url2 =~ urlrx } }
x.report{ 100000.times { url2 =~ /#{url}/ } }
}
$ ruby -v str_inc_bench.rb
ruby 1.8.4 (2005-12-24) [i686-linux]
       user     system      total        real
   0.070000   0.000000   0.070000 (  0.071435)
   0.130000   0.000000   0.130000 (  0.130016)
   1.130000   0.020000   1.150000 (  1.182629)

So, regular expression matching itself is not that much slower than
String#include?.
What makes "url2 =~ /#{url}/" slow is the creation of so many Regexp
objects.

I just wanted to point that out.

Dominik
Brian M. (Guest)
on 2006-04-06 05:15
(Received via mailing list)
On 4/5/06, Dominik B. <removed_email_address@domain.invalid> wrote:
> So, regular expression matching itself is not that much slower than
> String#include?.
> What makes "url2 =~ /#{url}/" slow is the creation of so many Regexp
> objects.

Ruby has some very subtle optimizations for Regexps too:

# ruby 1.8.4 (2006-03-20) [powerpc-darwin8.5.0]
GC.disable
n = ObjectSpace.each_object(Regexp){}
def foo; /abc/ end
# Note I didn't call foo.
ObjectSpace.each_object(Regexp){} - n #=> 1
1000.times {foo}
ObjectSpace.each_object(Regexp){} - n #=> 1

It is always nice to see simple optimizations like this.

Brian.
This topic is locked and can not be replied to.