Regexp question for doing "and" searches


#1

Just can’t quite figure this one out.

Given a search phrase like “one two three”, I want to search a list of
text strings for onex that contain ALL of those words, but not
necessarily in that order.

The hard part for me is the “not necessarily in that order”.

Using ‘\b(one|two|three)\b’ will match if at least one of them occurs in
any order, but I need all of them to match and be in any order.

two one three => match
one three => does not match

Any ideas? Is this possible with regular expressions?

Thanks!
Jeff


#2

On Sat, 4 Feb 2006 06:38:12 +0900, Jeff C. removed_email_address@domain.invalid
wrote:

Just can’t quite figure this one out.

Given a search phrase like “one two three”, I want to search a list of
text strings for onex that contain ALL of those words, but not
necessarily in that order.

You can use look-aheads to build a re that looks like:

/^(?=.\bone\b)(?=.\btwo\b)(?=.*\bthree\b)/

ensuring a match only if all look-ahead assertions pass.

phrase = “one two three”
re = %r/^#{phrase.split.map{|s|"(?=.*\b#{s}\b)"}}/
while DATA.gets
print if ~re
end
END
one two three YUP
one three two YUP
zone two three NOPE
three and two and one YUP

regards,
andrew


#3

Jeff C. wrote:

two one three => match
one three => does not match

Seems to work:

s = “foo two bar three zap one”
rx = /^(?=.*one)(?=.*two).*three/m

p (rx =~ s)

s = “one three”
p (rx =~ s)

Works like this too:
rx = /^(?=.*one)(?=.*two)(?=.*three)/m


#4

Andrew J. wrote:

You can use look-aheads to build a re that looks like:

/^(?=.\bone\b)(?=.\btwo\b)(?=.*\bthree\b)/

ensuring a match only if all look-ahead assertions pass.

Awesome! I’ll give it a try.

Thanks a lot.
Jeff


#5

On Sat, 4 Feb 2006, Jeff C. wrote:

two one three => match
one three => does not match

Any ideas? Is this possible with regular expressions?

Thanks!
Jeff

stupid - but works

 harp:~ > cat a.rb
 strings = <<-txt
   two one three
   one three
   one foo two bar  three
   foo two bar one three
   two
 txt

 atoms = '\b(?:one|two|three)\b'
 re = %r/(?:#{ atoms }).*(?:#{ atoms }).*(?:#{ atoms })/

 require "yaml" and strings.each{|string| y string => (string =~ re 

? true : false)}

 harp:~ > ruby a.rb
 ---
 "  two one three\n": true
 ---
 "  one three\n": false
 ---
 "  one foo two bar  three\n": true
 ---
 "  foo two bar one three\n": true
 ---
 "  two\n": false

hth.

-a