Search reg-exp for exact match


#1

Hi,

I have a regular expression
/\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
and i want to check if various years are present.

“2003” =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns 0 as expected

“2010” =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns nil as expected

But i want only exact matches so when i search for “2003 - 2008” i want
nil returned as there is no exact match for that particular string. I
thought the \b would give me this but it doesnt.

“2003 - 2008” =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns 0 i want nil returned.

Anyone help?

Jb


#2

On Thu, Nov 20, 2008 at 1:49 PM, John B. removed_email_address@domain.invalid
wrote:

returns nil as expected

But i want only exact matches so when i search for “2003 - 2008” i want
nil returned as there is no exact match for that particular string. I
thought the \b would give me this but it doesnt.

“2003 - 2008” =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns 0 i want nil returned.

Anyone help?

To do exactly what you are asking for: you can anchor the regexp
to the beggining or end of the string:

irb(main):013:0> re = /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
=> /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
irb(main):014:0> “2003” =~ re
=> 0
irb(main):015:0> “2003 - 2008” =~ re
=> nil

In this case you don’t need the \b anymore. BTW, you had typos there
because you had \2 instead of \b.
Anyway, if you want exact matches of strings you don’t need regexps:

irb(main):018:0> years = (2003…2009).map {|x| x.to_s}
=> [“2003”, “2004”, “2005”, “2006”, “2007”, “2008”, “2009”]
irb(main):020:0> years.include? “2003”
=> true
irb(main):021:0> years.include? “2003 - 2008”
=> false

If you have many numbers and many lookups, a Set should be better,
performance-wise.
Now, if we are talking about ranges of years we can do even better:

irb(main):022:0> min_year = 2003
=> 2003
irb(main):023:0> max_year = 2009
=> 2009
irb(main):024:0> year_to_test = “2003”.to_i
=> 2003
irb(main):025:0> min_year <= year_to_test and year_to_test <= max_year
=> true
irb(main):026:0> year_to_test = “2008”.to_i
=> 2008
irb(main):027:0> min_year <= year_to_test and year_to_test <= max_year
=> true
irb(main):028:0> year_to_test = “2010”.to_i
=> 2010
irb(main):029:0> min_year <= year_to_test and year_to_test <= max_year
=> false

Hope this helps,

Jesus.


#3

On 20.11.2008 14:17, Jesús Gabriel y Galán wrote:

“2010” =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/

To do exactly what you are asking for: you can anchor the regexp
to the beggining or end of the string:

irb(main):013:0> re = /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
=> /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
irb(main):014:0> “2003” =~ re
=> 0
irb(main):015:0> “2003 - 2008” =~ re
=> nil

I’d rather use /\A200[3-9]\z/.

In this case you don’t need the \b anymore. BTW, you had typos there
because you had \2 instead of \b.
Anyway, if you want exact matches of strings you don’t need regexps:

irb(main):018:0> years = (2003…2009).map {|x| x.to_s}
=> [“2003”, “2004”, “2005”, “2006”, “2007”, “2008”, “2009”]
irb(main):020:0> years.include? “2003”
=> true
irb(main):021:0> years.include? “2003 - 2008”
=> false

Or

irb(main):001:0> s=“2005”
=> “2005”
irb(main):002:0> (2003…2009) === s[/\A\d{4}\z/].to_i
=> true
irb(main):003:0> s=“2010”
=> “2010”
irb(main):004:0> (2003…2009) === s[/\A\d{4}\z/].to_i
=> false
irb(main):006:0> (2003…2009).include? s[/\A\d{4}\z/].to_i
=> false

Note, this works because 0 (= nil.to_i) is not part of the range!

If you have many numbers and many lookups, a Set should be better,
performance-wise.

You can even use a bit set:

irb(main):007:0> t = (2003…2009).inject(0) {|mask,y| mask | 1 << y}
=>
116650078639864259662055853239652489576667478532211432368528502061497852157464823887836603809757037023714110007321126217782227286423686421672874625786531963635756068971637276480699799614611885589371789821904502024698121311064730577770474098457113815634439476503092997189887743679313284635928742849521858004245675611528209841692017556564840683843349732924435866760173931843810360262352061792429448169450281904579322760817054128336138506834410834183565543664844525391283837108127106791786643268532096672079466512393065631776802367002142967381057920196424747178242497261636008255151052901022379808767413846016
irb(main):008:0> t[s.to_i]
=> 0
irb(main):009:0> s=“2005”
=> “2005”
irb(main):010:0> t[s.to_i]
=> 1
irb(main):011:0>

There are many ways… :slight_smile:

Now, if we are talking about ranges of years we can do even better:

… or use the range test (as above) directly.

Kind regards

robert