Newbie regex question about \w


#1

I’m trying to parse ruby files to find all the class definitions in the
file. For each line in the file, I thought I could use the following to
pull out the class name:

\bclass\b(\w+)\b

so then $1 would give me the class name.

But it doesn’t work:

irb(main):001:0> line = “class Article < MyBaseClass”
=> “class Article < ActiveRecord::Base”
irb(main):002:0> line =~ /\bclass\b(\w+)\b/
=> nil

I think I narrowed down the problem to my use of \w, but I can’t
understand why.

For extra credit, anybody know how I can make sure I can ignore comments
and quoted strings? I want to make sure I ignore these things:

if option_exists # handle class options

as well as

puts “Your are in a class by yourself”

But those are advanced… if I can just get the first one working I’ll
be grateful!

Thanks,
Jeff


#2

On Jan 14, 2006, at 4:14 PM, Jeff C. wrote:

I’m trying to parse ruby files to find all the class definitions in
the
file. For each line in the file, I thought I could use the
following to
pull out the class name:

\bclass\b(\w+)\b

so then $1 would give me the class name.

You’re close, you just forgot to allow for some space between class
and the name. A boundary is a zero-width assertion, so it’s not enough:

\bclass\s+(\w+)\b

Hope that helps.

James Edward G. II


#3

James G. wrote:

You’re close, you just forgot to allow for some space between class
and the name. A boundary is a zero-width assertion, so it’s not enough:

Awesome. Thanks a lot.

Jeff


#4

On Jan 14, 2006, at 5:14 PM, Jeff C. wrote:

But it doesn’t work:
comments

Thanks,
Jeff


Posted via http://www.ruby-forum.com/.

your \w is right. \b doesn’t work the way you think it does though.
It doesn’t consume anything, ie;

"<-- \b is just before the ‘c’
c
l
a
s
s__ \b is in between the ‘s’ and the space
<- space doesn’t match \w
A
r
t
i
c
l
e
.
.
.

So what you really want is
line =~ /\bclass\s+(\w+)/
irb(main):007:0> line =~ /\bclass\s+(\w+)/
=> 0
irb(main):008:0> $1
=> “Article”

As for the other questions, comments aren’t SO hard:

/#.*$/ unless of course you want to handle strings, then you have to
worry about # inside of strings. I’m not even going to begin to try
to create a regex to match quoted strings, thats all sorts of
difficult especially with heredocs and such. I would take a look at
rdoc and see if you can’t manipulate it to get a list of classes for
you.


#5

Hi –

On Sun, 15 Jan 2006, Jeff C. wrote:

be grateful!
You might try this:

/^\s*class\s+(\w+)/

which will only match “class …” at the beginning of a line or with
only spaces to its left. It’s certainly not impossible to get false
positives or negatives this way, but in the normal course of a
normally-written Ruby program file it should be close to 100%.

Don’t forget, though, that you might get “::” in a class name, like
this:

module M
end

class M::C
end

and just going for \w+ will give you the module name, not the class
name.

David


David A. Black
removed_email_address@domain.invalid

“Ruby for Rails”, from Manning Publications, coming April 2006!


#6

[…]
especially with heredocs and such. I would take a look at rdoc and see
if you can’t manipulate it to get a list of classes for you.

Maybe this is about regexps and i’m totaly off, but what about:


before = Object.constants
require ‘sqlite3’ # put your file here
after = Object.constants

p(after - before)

output:

[“NKF”, “Deprecated”, “SQLite3”, “Base64”, “Kconv”, “ParseDate”]

cheers

Simon


#7

On Jan 14, 2006, at 5:49 PM, Simon Kröger wrote:

Simon

Clever!

before = Object.constants
require ‘file’
after = Object.constants
files_classes = (after - before).select { |x| Class ===
Object.const_get(x) }