Reg Ex

animate123 · September 7, 2007, 12:58am

I’m querying through the .com’s looking for any www name with Cisco in
it.
I’m using a reg exp that is reading a file line by line obtained by
Verisign that has all the domain names with a .com extension.
Here is my reg exp:

file.each { |line| print line if line =~ /(C|c)isco|CISCO/ }

but I’m getting results like SanFrancisco and Francisco.

Does anyone know how I can modify my reg exp to not include certain
keywords like SanFrancisco and Francisco?

-Chuck

animate123 · September 7, 2007, 1:14am

On Sep 7, 2007, at 12:58 AM, Charles P. wrote:

Does anyone know how I can modify my reg exp to not include certain
keywords like SanFrancisco and Francisco?

You need to assert word boundaries:

/\b((C|c)isco|CISCO)\b/

– fxn

animate123 · September 7, 2007, 1:11am

Hi

I don’t know if you need to check the caps, but your regex could be
/cisco/i
(the i is for case insensitive)
To “ban” certain words, you’ could use lookaround expression -and I
don’t
know if Ruby supports it-, but I personally find cleaner check
programatically against a list, and add the “exceptions” to that list.

Hope it helps

Diego.

animate123 · September 7, 2007, 10:42am

The boundaries could be a good idea, but if he needs urls like
ciscosystems.com, with the final \b won’t match. Keep that in sight

animate123 · September 7, 2007, 11:54am

2007/9/7, Diego S. [email protected]:

How about:
/\b([Cc]isco|CISCO)\b/

or even

/\bcisco\b/i

I’d probably use a second rx like

file.each { |line| print line if line =~ /\bcisco/i && line !~
/sanfrancisco/i}

Kind regards

robert

animate123 · September 7, 2007, 2:06am

On Fri, 7 Sep 2007, Charles P. wrote:

file.each { |line| print line if line =~ /(C|c)isco|CISCO/ }
…
Does anyone know how I can modify my reg exp to not include
certain keywords like SanFrancisco and Francisco?

How about:
/\b([Cc]isco|CISCO)\b/

or even

/\bcisco\b/i

?

animate123 · September 7, 2007, 6:26pm

Posted by Charles P. (chuckdawit) on 07.09.2007 00:58
I’m querying through the .com’s looking for any www name with Cisco in it.
I’m using a reg exp that is reading a file line by line obtained by
Verisign that has all the domain names with a .com extension.
Here is my reg exp:

file.each { |line| print line if line =~ /(C|c)isco|CISCO/ }

but I’m getting results like SanFrancisco and Francisco.

Does anyone know how I can modify my reg exp to not include certain
keywords like SanFrancisco and Francisco?

-Chuck
Reply with quote

Maybe also try something like: print line if line =~
/^(?!.(?:SanFrancisco|Francisco)).\Bcisco/i

Cheers

j.k.

animate123 · September 7, 2007, 8:27pm

On Sep 6, 4:58 pm, Charles P. [email protected] wrote:

Does anyone know how I can modify my reg exp to not include certain
keywords like SanFrancisco and Francisco?

Instead of modifying the regex, how about simply working on the data
until it’s right?

ACCEPTABLE = [ /francisco/i, /scisco/i ]
matches = file.readlines.select{ |line| line =~ /[Cc]isco|CISCO/ }
ACCEPTABLE.each{ |re| matches.delete_if{ |line| line =~ re } }
puts matches

animate123 · September 11, 2007, 11:42pm

Gavin K. wrote:

On Sep 6, 4:58 pm, Charles P. [email protected] wrote:

Does anyone know how I can modify my reg exp to not include certain
keywords like SanFrancisco and Francisco?

Instead of modifying the regex, how about simply working on the data
until it’s right?

ACCEPTABLE = [ /francisco/i, /scisco/i ]
matches = file.readlines.select{ |line| line =~ /[Cc]isco|CISCO/ }
ACCEPTABLE.each{ |re| matches.delete_if{ |line| line =~ re } }
puts matches

If I want to print the line to a new file I use this:
dnsfile.each { |line| newfile.puts line if line =~
/(\b((C|c)isco|CISCO)\b) /

what can I do if I want to print it to a newfile and print it to the
screen at the same time?