Forum: Ruby search reg-exp for exact match

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
0de83fd56f1af530034ba8efa5490b1b?d=identicon&s=25 John Butler (johnnybutler7)
on 2008-11-20 13:53
Hi,

I have a regular expression
/\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
and i want to check if various years are present.

"2003" =~  /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns 0 as expected

"2010" =~  /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns nil as expected

But i want only exact matches so when i search for "2003 - 2008" i want
nil returned as there is no exact match for that particular string.  I
thought the \b would give me this but it doesnt.

"2003 - 2008" =~  /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns 0 i want nil returned.

Anyone help?

Jb
E088bb5c80fd3c4fd02c2020cdacbaf0?d=identicon&s=25 Jesús Gabriel y Galán (Guest)
on 2008-11-20 14:21
(Received via mailing list)
On Thu, Nov 20, 2008 at 1:49 PM, John Butler <johnnybutler7@gmail.com>
wrote:
> returns nil as expected
>
> But i want only exact matches so when i search for "2003 - 2008" i want
> nil returned as there is no exact match for that particular string.  I
> thought the \b would give me this but it doesnt.
>
> "2003 - 2008" =~  /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
> returns 0 i want nil returned.
>
> Anyone help?

To do exactly what you are asking for: you can anchor the regexp
to the beggining or end of the string:

irb(main):013:0> re = /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
=> /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
irb(main):014:0> "2003" =~ re
=> 0
irb(main):015:0> "2003 - 2008" =~ re
=> nil

In this case you don't need the \b anymore. BTW, you had typos there
because you had \2 instead of \b.
Anyway, if you want exact matches of strings you don't need regexps:

irb(main):018:0> years = (2003..2009).map {|x| x.to_s}
=> ["2003", "2004", "2005", "2006", "2007", "2008", "2009"]
irb(main):020:0> years.include? "2003"
=> true
irb(main):021:0> years.include? "2003 - 2008"
=> false

If you have many numbers and many lookups, a Set should be better,
performance-wise.
Now, if we are talking about ranges of years we can do even better:

irb(main):022:0> min_year = 2003
=> 2003
irb(main):023:0> max_year = 2009
=> 2009
irb(main):024:0> year_to_test = "2003".to_i
=> 2003
irb(main):025:0> min_year <= year_to_test and year_to_test <= max_year
=> true
irb(main):026:0> year_to_test = "2008".to_i
=> 2008
irb(main):027:0> min_year <= year_to_test and year_to_test <= max_year
=> true
irb(main):028:0> year_to_test = "2010".to_i
=> 2010
irb(main):029:0> min_year <= year_to_test and year_to_test <= max_year
=> false


Hope this helps,

Jesus.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2008-11-22 11:55
(Received via mailing list)
On 20.11.2008 14:17, Jesús Gabriel y Galán wrote:
>> "2010" =~  /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
>
> To do exactly what you are asking for: you can anchor the regexp
> to the beggining or end of the string:
>
> irb(main):013:0> re = /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
> => /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
> irb(main):014:0> "2003" =~ re
> => 0
> irb(main):015:0> "2003 - 2008" =~ re
> => nil

I'd rather use /\A200[3-9]\z/.

> In this case you don't need the \b anymore. BTW, you had typos there
> because you had \2 instead of \b.
> Anyway, if you want exact matches of strings you don't need regexps:
>
> irb(main):018:0> years = (2003..2009).map {|x| x.to_s}
> => ["2003", "2004", "2005", "2006", "2007", "2008", "2009"]
> irb(main):020:0> years.include? "2003"
> => true
> irb(main):021:0> years.include? "2003 - 2008"
> => false

Or

irb(main):001:0> s="2005"
=> "2005"
irb(main):002:0> (2003..2009) === s[/\A\d{4}\z/].to_i
=> true
irb(main):003:0> s="2010"
=> "2010"
irb(main):004:0> (2003..2009) === s[/\A\d{4}\z/].to_i
=> false
irb(main):006:0> (2003..2009).include? s[/\A\d{4}\z/].to_i
=> false

Note, this works because 0 (= nil.to_i) is not part of the range!

> If you have many numbers and many lookups, a Set should be better,
> performance-wise.

You can even use a bit set:

irb(main):007:0> t = (2003..2009).inject(0) {|mask,y| mask | 1 << y}
=>
116650078639864259662055853239652489576667478532211432368528502061497852157464823887836603809757037023714110007321126217782227286423686421672874625786531963635756068971637276480699799614611885589371789821904502024698121311064730577770474098457113815634439476503092997189887743679313284635928742849521858004245675611528209841692017556564840683843349732924435866760173931843810360262352061792429448169450281904579322760817054128336138506834410834183565543664844525391283837108127106791786643268532096672079466512393065631776802367002142967381057920196424747178242497261636008255151052901022379808767413846016
irb(main):008:0> t[s.to_i]
=> 0
irb(main):009:0> s="2005"
=> "2005"
irb(main):010:0> t[s.to_i]
=> 1
irb(main):011:0>

There are many ways... :-)

> Now, if we are talking about ranges of years we can do even better:

... or use the range test (as above) directly.

Kind regards

  robert
This topic is locked and can not be replied to.