I can't believe it ate the whole thing!

At work, to dispel some of the tedium of writing yet another layer of C
on top of twenty years of rotting but seemingly irreplaceable legacy C
code…I’ve taken to writing something I call “coding standards lint” in
ruby.

It’s a little script to run on a C file to catch it in the act of
violating some of the wacky coding standards that I’m being payed far
too little to obey.

Some of them are pretty simple even for a ruby nuby like me. Just
single line stuff with Regexp’s a’plenty.

Today, I decided to tackle something that doesn’t live all on one line
like the easier things I’ve tested for in the past. For instance, today
I was checking to see if for,do, and while loops over a certain length
have a closing comment at the end that says something like “// this is
the end of the while(pigsFly) block”.

In order to do this, obviously you have to look at several lines of the
file, not just one at a time. After staring at the pickaxe book for an
hour or so, the best idea I was able to come up with was to read the
whole file into an array using File#readlines. After that, I used
Array#for_each_with_index to zip through the array looking for the end
of the block (being careful to watch out for inner blocks as well).

I got it to work, but I couldn’t help feeling guilty for sucking the
whole bloody file into memory first. Is there something obvious that I
could have done instead that would allow me to look at the file the same
way but leave it on disk instead of in memory?

(BTW, having line numbers to use in error statements is key)

Also, I kind of wonder if my concern is outdated. The typical C file
might be on the order of 100k bytes, and the machine has a Gig of memory
in it. Am I applying a twenty year old concern to a modern programming
problem?

thanks,
jp

On 5/12/06, Jeff P. [email protected] wrote:

single line stuff with Regexp’s a’plenty.
whole file into an array using File#readlines. After that, I used
Also, I kind of wonder if my concern is outdated. The typical C file

I learned to program on a machine with 4k of ram – It took me years
to rid myself of ram guilt :slight_smile: Learn to enjoy pulling whole files into
memory when appropriate (and it is in this case).

pth

p.s. You may also find http://cast.rubyforge.org/ useful as you tackle
more complex rules

Quoting [email protected], on Sat, May 13, 2006 at 11:06:27AM +0900:

the best idea I was able to come up with was to read the
whole file into an array using File#readlines. After that, I used
Array#for_each_with_index to zip through the array looking for the end
of the block (being careful to watch out for inner blocks as well).

whole bloody file into memory first. Is there something obvious that I
could have done instead that would allow me to look at the file the same
way but leave it on disk instead of in memory?

IO mixes in Enumerable, so it has a #each_with_index method that would
use IO buffering:

ensemble:~ % ruby -e ‘$stdin.each_with_index {|i,l| p i; p l; }’ <
/etc/passwd
“nobody::-2:-2:Unprivileged User:/dev/null:/dev/null\n"
0
"root:
:0:0:System A.:/var/root:/bin/tcsh\n”
1
“daemon:*:1:1:System Services:/var/root:/dev/null\n”
2

Cheers,
Sam

It’s not ruby but look at CRM114. It’s like grep on sterroids and
allows you to define begin and end blocks. Maybe you could shell out
to it.

On Sat, May 13, 2006 at 11:06:27AM +0900, Jeff P. wrote:

I got it to work, but I couldn’t help feeling guilty for sucking the
whole bloody file into memory first. Is there something obvious that I
could have done instead that would allow me to look at the file the same
way but leave it on disk instead of in memory?

I’ve done exactly this for a hand-rolled ActionScript parser I wrote.

After having originally gone to painful lengths to make the code
recoginse multi-line tokens while reading the input one line at a time,
I found that converting it to use IO.read made the code both simpler and
faster. It won’t scale well for huge imput files, but who cares?

Maybe StringScanner will work with ruby-mmap, giving the best of both
worlds? (Won’t work under windows though, I expect.)

dave

Also, I kind of wonder if my concern is outdated. The typical C file
might be on the order of 100k bytes, and the machine has a Gig of memory
in it. Am I applying a twenty year old concern to a modern programming
problem?

Interesting application.

When coding, I try to focus more on the design of the program, and less
on
the machine execution. More often than not it keeps my programs tidy and
readable. When performance is an issue, then I profile and optimize.
Not
before.

Premature optimization is expensive. Focus on interface instead.

All the best
Jon Egil S.