Regular expression NOT operator

Regex is something I’ve managed to ignore for some time, only learning
bits when needed.

I’m trying to do code highlighting, so I’ve been using Reg to find parts
of the code
I’ve run into an issue with comments

A comment can contain control statements like if and Constants

things like if and a Constant get picked up by my reg, I’m trying to put
a not operator in my regex but cant seem to get it to work.

so for constants I search /([A-Z].*?\b)/
how can I add “when theres no # on the left”?

I’ve been trying to use the ?! as the not operator. (is that even a not
operator)
this is what i’ve been trying /([A-Z].*?\b)(?!)/

anyhelp would be greatly appreated.

thanks
Phil.

Hi –

On Mon, 7 Sep 2009, Phil Cooper-king wrote:

a not operator in my regex but cant seem to get it to work.

so for constants I search /([A-Z].*?\b)/
how can I add “when theres no # on the left”?

I’ve been trying to use the ?! as the not operator. (is that even a not
operator)
this is what i’ve been trying /([A-Z].*?\b)(?!)/

There’s negative look-behind in Oniguruma, but you’re going to run
into some difficulties anyway, I suspect. For example:

x = 3
puts “This has no comments, and x is #{x}” if x < 5

My advice would be to keep it (relatively) straightforward by doing
something like this as you scan the lines of text:

comment_re = /^\s*#/

if comment_re.match(line)
# treat line as a comment
else
# line is not a comment
end

David


David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Ruby/Rails training, mentoring, consulting, code-review
Latest book: The Well-Grounded Rubyist (The Well-Grounded Rubyist)

September Ruby training in NJ has been POSTPONED. Details to follow.

My advice would be to keep it (relatively) straightforward by doing
something like this as you scan the lines of text:

comment_re = /^\s*#/

if comment_re.match(line)
# treat line as a comment
else
# line is not a comment
end

Thanks David.A, I’ll give Oniguruma a look but I think I’ll go with your
suggestion

Thanks G_F_ for pointing out my mistake.

Phil Cooper-king wrote:
[…]

so for constants I search /([A-Z].*?\b)/

This regex is not going to match a constant. It matches any upper-case
letter followed by a non-greedy wildcard followed by a word boundary.

A constant has to begin with an upper-case letter, possibly followed by
mixed-case letters, numbers and underscores (“_”).

The following show the problem: The first two are your regex, and the
second two show a fix.

/([A-Z].*?\b)/ =~ 'noT a constant'  # => 2
/([A-Z].*?\b)/ =~ 'a Constant'  # => 2

/\b([A-Z]\w*\b)/ =~ 'noT a constant.'    # => nil
/\b([A-Z]\w*\b)/ =~ 'a Constant.'    # => 2

The # => at the end of the line show where the match occurred. The first
set shows a non-constant having a false-positive.

Regex are extremely powerful, but you have to think out what can go
wrong with them. When you are searching you can get false-positives
easily. If you are searching and replacing, you can get destroyed
content.

Also, “?!” is not a NOT operator, it’s a negative look-ahead. A match
succeeds if the initial condition matches followed by no match.

http://www.ruby-doc.org/docs/ProgrammingRuby/html/language.html#UN