Small regexp question


#1

Hi all,

I am writing some refactoring code for a C++ project.

I need to change:

 class MyClass
 {
  ...
 }

to:

 class IMP_EXP MyClass
{
 ...
}

The pattern I used to find a class definition line is:

line =~ /^\s*class\s+(\w+)/

But I want to exclude forward class declarations ( class MyClass; )

So I changed my pattern to:

line =~ /^\s*class\s+(\w+)\s*[^;]/    --> don't match if line ends

with “;”

But it doesn’t work… Why?

I ended up using: if line !~ /;/ and line =~ /^\s*class\s+(\w+)/

Hints?

Any help will be appreciated,
Best regards,

Francis


#2

removed_email_address@domain.invalid wrote:

But I want to exclude forward class declarations ( class MyClass; )

So I changed my pattern to:

line =~ /^\s*class\s+(\w+)\s*[^;]/    --> don't match if line ends

with “;”

But it doesn’t work… Why?

I ended up using: if line !~ /;/ and line =~ /^\s*class\s+(\w+)/

How does it not match? Can you show several lines of an irb session
demonstrating matches and non-matches?

Pistos


#3

irb(main):001:0> line = “class MyClass;”
=> “class MyClass;”
irb(main):002:0> line =~ /^\sclass\s+(\w+)\s[^;]/
=> 0
irb(main):003:0> puts $1
MyClas
=> nil


#4

removed_email_address@domain.invalid wrote:

irb(main):001:0> line = “class MyClass;”
=> “class MyClass;”
irb(main):002:0> line =~ /^\sclass\s+(\w+)\s[^;]/
=> 0
irb(main):003:0> puts $1
MyClas
=> nil

Try putting a $ on the end of your regexp. What’s happening is that
your \w+ is matching up to “MyClas”, and the final “s” is matching
/[^;]/.

irb(main):029:0> line = “class MyClass;”
=> “class MyClass;”
irb(main):030:0> r = /^\sclass\s+(\w+)\s[^;]/
=> /^\sclass\s+(\w+)\s[^;]/
irb(main):031:0> line =~ r
=> 0
irb(main):037:0> l = Regexp.last_match
=> #MatchData:0x40685ddc
irb(main):038:0> puts l
class MyClass
=> nil
irb(main):033:0> r = /^\sclass\s+(\w+)\s[^;]$/
=> /^\sclass\s+(\w+)\s[^;]$/
irb(main):034:0> line =~ r
=> nil


#5

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

removed_email_address@domain.invalid a écrit :

Hints?

Any help will be appreciated,
Best regards,

Francis

puts “ok” if “class MyClass;”.match(/^\sclass\s+(\w+)\s[^;]/)

=> “ok”

it works for me.

The problem may be somewhere else.

Antonin.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEEV+IrvKyD2MLOwsRAoY0AJ4nPyzvD8ZrfGviBWBmWewOu6GuQgCfcFSG
LwkwT0uAahhQbOg+7/eKwwI=
=OzYv
-----END PGP SIGNATURE-----


#6

On Mar 10, 2006, at 10:08 AM, Pistos C. wrote:

/^\sclass\s+(\w+)\s[^;]/

This would match:
“class A ;” for instance

\s* can match the empty string which is followed by a space which is
not a semi-colon, so hey it matches!


#7

removed_email_address@domain.invalid wrote:

Hints?

Any help will be appreciated,
Best regards,

Francis

/^\sclass\s+(\w+)\s$/


#8

On Mar 10, 2006, at 11:58, removed_email_address@domain.invalid wrote:

But it doesn’t work… Why?

I don’t know exactly in what sense it does not work, but negations in
regexps are tricky.

A regexp engine always tries to match. If in a first attempt \w+
matches the whole class name and then the rest does not match, then
the regexp engine backtracks and happens to find a “shorter class
name” whose remaining characters are not semicolons, so it still
matches.

class Foo; (\w+ -> "Foo", fails, backtrack)
         ^
class Foo; (\w+ -> "Fo", no whitespace, "o" is not a semicolon,

matched)
^

A solution is to add an anchor for end of string. Another one is to
prevent \w+ from backtracking, that is known as “atomic grouping”:

(?>\w+) # grab word characters and do not backtrack

In addition, the idiomatic way to say “and at this point I don’t what
this to happen” is to use a negative look-ahead assertion. All in all
we get this:

/^\s*class\s+(?>\w+)(?!\s*;)/

– fxn


#9

removed_email_address@domain.invalid wrote:

But I want to exclude forward class declarations ( class MyClass; )

So I changed my pattern to:

line =~ /^\s*class\s+(\w+)\s*[^;]/    --> don't match if line ends

with “;”

But it doesn’t work… Why?

Because the match simply stops before the “;”.

line = ‘class Foo;’
=> “class Foo;”

line[/^\sclass\s+(\w+)\s[^;]/]
=> “class Foo”

If you want to make sure there is no “;” between the class name and the
end of the line you need to anchor the RX at the end:

line = ‘class Foo;’
=> “class Foo;”

line[/^\sclass\s+(\w+)[^;]$/]
=> nil

line = ‘class Foo’
=> “class Foo”

line[/^\sclass\s+(\w+)[^;]$/]
=> “class Foo”

Kind regards

robert

#10
{
 ...
}

Don’t you want to be looking for

class xxxx {

Where you can have at least one white space between class and xxxx, and
any ammount of white space between xxxx and { (I think none is allowable
too)? Any white space includes new lines too, as the following are all
valid class declarations:

class
AClass
{

class AClass {

class
AClass {

and I think even…
class
AClass{

? :slight_smile: Or do you want to get the job done, rather than getting a perfect
solution? :slight_smile:

I was playing with regexp yesterday, and wanted to have a pattern match
over multiple lines, but couldn’t see how that is done (A friend wanted
a simple way of stripping out c comments, and they can over multiple
lines, of course). Could someone give me a hint on that?

Cheers,
Benjohn


#11

On Fri, 2006-03-10 at 19:58 +0900, removed_email_address@domain.invalid wrote:

But I want to exclude forward class declarations ( class MyClass; )

So I changed my pattern to:

line =~ /^\s*class\s+(\w+)\s*[^;]/    --> don't match if line ends

with “;”

But it doesn’t work… Why?

Your regexp is trying to match:

+ zero or more spaces
+ the word 'class'
+ one or more spaces
+ one or more word characters (captured)
+ zero or more spaces
+ any single character except ';'

By the time you get to that ‘;’ there likely won’t be any input left, so
no character to be something except ‘;’. You could do it with lookahead,
but it’s probably easier to do:

"class MyClass;" =~ /class\s+(\w+)[^;]*$/
# => nil

"class MyClass" =~ /class\s+(\w+)[^;]*$/
# => 0

"class MyClass {" =~ /class\s+(\w+)[^;]*$/
# => 0

"class MyClass { /* etc */ }" =~ /class\s+(\w+)[^;]*$/
# => 0

There are probably still things this will miss though. For example,
strange class names could well result in a failure to match…