First script seems slow - What's a better way to write this?


#1

I’ve inherited a tcl script from previous co-op students, and it’s a
little messy so I wanted to clean it up. I wanted to learn Ruby anyway,
so I made a ruby script to search my .tcl file and output a list of all
the procedures and variables, sorted in order of number of times of use
(I’m mainly interested in the unused ones).

The script seems really slow though (~10 seconds for a 3000 line file)-
is that Ruby, or is it just my implementation? I don’t care that this
script takes 10 seconds, but I’d like to learn how to write better ruby
code. Here’s my script:

def generateTokenList(readFile, token, prefix)
names = Hash.new
str = “”
File.open(readFile, ‘r’).each do |line|
if line[token] and not line[’#’]
name = line.split[1]
names[name] = 0 if not names.key?(name)
end
end

names.each do |key, value|
		i = 0
		i = -1 if token == 'proc '
		File.open(readFile, 'r').each do |line|
			i = i + 1 if line[prefix + key] and not line['#']
		end
	names[key] = i
end

names = names.sort { |a,b| a[1] <=>b [1] }

names.each { |pair| str << pair[0] + "	uses: " + pair[1].to_s + "\n" }

return str

end

if ARGV[0] == nil or ARGV[1] == nil
puts “\nUsage: ruby ProcList.rb inputfilepath outputfilename”
exit(0)
end

writeFile = File.new(ARGV[1], ‘w’)
writeFile << “Procedures:\n”
writeFile << generateTokenList(ARGV[0], 'proc ', ‘’)
writeFile << “\n\nVariables:\n”
writeFile << generateTokenList(ARGV[0], 'set ', 36.chr)
writeFile << "Updated: " + File.mtime(ARGV[0]).to_s


#2

As a side issue there is a tool to generate cross references in tcl
called zdoc (http://www.oklin.com/zdoc/) that might be a better starting
point if all you really want to do is get to grips with the existing
code. However I have never used it not being a particularly good tcl
programmer myself. There is also frink (http://wiki.tcl.tk/2611) to
reformat your source code to make it easier to read, which I have used.

I realise that this does nothing for your Ruby but perhaps it will help
you get onto something more interesting :slight_smile:


#3

Thanks for the link. However, I forgot to mention that the reason I’m
doing this myself is because no currently available tools like that work
with my code, as it contains commands specific to the program it extends
and gives errors telling me that they are invalid command names.

Basically, I’m just trying to figure out if it’s my fault the script is
slow, or if Ruby just isn’t very efficient.

I realise that this does nothing for your Ruby but perhaps it will help
you get onto something more interesting :slight_smile:

This is more interesting :wink:

On a side note, so far I like Ruby better than tcl.


#4

Charlotte wrote:

you get onto something more interesting :slight_smile:

This is more interesting :wink:

On a side note, so far I like Ruby better than tcl.

Hi there,

I’m afraid that I can’t take the time right now to really pore over the
script you’ve posted, but it doesn’t look unreasonable to me. Of
course, I’m not terribly clever so take that with a grain of salt. :wink:

It’s true that blistering speed is not listed as one of the current Ruby
interpreter’s features and you may be seeing an example of that. You
can probably get a better view of the situation by running your script
with the profiling library enabled.

Try running it like this:

ruby -rprofile ProcList.rb inputfilepath outputfilename

It will take even longer, but you’ll end up with a report showing you
where in your script you are spending the most time. Maybe it will
reveal a few hot spots that you can speed up a bit.

Good luck, and don’t hesitate to continue posting problems here. There
are a lot of awfully smart people on this list that are often willing to
help out in situations like yours.

Regards,
Matthew D.


#5

On Thu, 13 Apr 2006, Charlotte wrote:

This is more interesting :wink:

On a side note, so far I like Ruby better than tcl.

how does this do (untested) :

harp:~ > cat a.rb

require ‘yaml’
class TclIndex
def initialize arg
@procs, @vars = {}, {}
parse arg
end
def parse a
read = lambda{|io| io.readlines.map!{|l| l.gsub %r/#.$/, ‘’}}
lines = a.respond_to?(‘readlines’) ? read[a] :
open(a){|f|read[f]}
lines.each do |line|
case line
when %r/^ \s
proc \s+ (\w+)/iox
@procs[$1] = -1
when %r/^ \s* set \s+ (\w+)/iox
@vars[$1] = 0
end
@procs.keys.each{|k| @procs[k] += 1 if line[%r/\b#{ k }\b/]}
@vars.keys.each{|k| @vars[k] += 1 if line[%r/\b#{ k }\b|$#{ k
}\b/]}
end
end
def report o
o << {
‘procs’ => @procs.to_a.sort_by{|ab| ab.last}.map{|ab|
Hash[*ab]},
‘vars’ => @vars.to_a.sort_by{|ab| ab.last}.map{|ab| Hash[*ab]},
}.to_yaml
end
end

abort “Usage: [inputfilepath = stdin] [outputfilename = stdout]” if
ARGV.delete(‘help’) or ARGV.delete(’–help’)

i = ARGV.shift || STDIN
o = ARGV.shift || STDOUT

idx = TclIndex.new i
idx.report o

regards.

-a


#6

removed_email_address@domain.invalid wrote:

I realise that this does nothing for your Ruby but perhaps it will help
require ‘yaml’
when %r/^ \s* proc \s+ (\w+)/iox
o << {
i = ARGV.shift || STDIN
o = ARGV.shift || STDOUT

idx = TclIndex.new i
idx.report o

regards.

-a

Just a quick note - precompiling the regular expressions might help
here, too.

Regards,

Dan


#7

DÅ?a Streda 12. Apríl 2006 18:29 Charlotte napísal:

Basically, I’m just trying to figure out if it’s my fault the script is
slow, or if Ruby just isn’t very efficient.

Well, I hate to mention the giant purple squid in the middle of the
kitchen,
but Ruby is… shall we say… not really speedy. I should still
outperform
TCL, but not much else, if the programming language benchmarks are to be
trusted. (Which they aren’t, but hey.)

Then again, YARV looks surprisingly vital for what was mere vaporware
only a
few years ago, so there’s a chance of a Blazing Fast (well, not really)
Ruby
yet.

David V.


#8

On 4/12/06, Daniel B. removed_email_address@domain.invalid wrote:

Just a quick note - precompiling the regular expressions might help here, too.

How do you do that in Ruby?


#9

On Thu, 13 Apr 2006, Daniel B. wrote:

Just a quick note - precompiling the regular expressions might help here,
too.

very good point! :

 harp:~ > cat /usr/share/tcl8.3/*tcl |wc -l
    3533

 harp:~ > time cat /usr/share/tcl8.3/*tcl |ruby a.rb >/dev/null

 real    0m0.848s
 user    0m0.810s
 sys     0m0.020s

this is down from 3 sec!

harp:~ > cat a.rb

require ‘yaml’
class TclIndex
def initialize arg
@procs, @vars = {}, {}
parse arg
end
def parse a
read = lambda{|io| io.readlines.map!{|l| l.gsub %r/#.$/, ‘’}}
lines = a.respond_to?(‘readlines’) ? read[a] :
open(a){|f|read[f]}
proc_re = Hash.new{|h,k| h[k] = %r/\b#{ k }\b/}
var_re = Hash.new{|h,k| h[k] = %r/\b#{ k }\b|$#{ k }\b/}
lines.each do |line|
case line
when %r/^ \s
proc \s+ (\w+)/iox
@procs[$1] = -1
when %r/^ \s* set \s+ (\w+)/iox
@vars[$1] = 0
end
@procs.keys.each{|k| @procs[k] += 1 if line[proc_re[k]]}
@vars.keys.each{|k| @vars[k] += 1 if line[var_re[k]]}
end
end
def report o
o << {
‘procs’ => @procs.to_a.sort_by{|ab| ab.last}.map{|ab|
Hash[*ab]},
‘vars’ => @vars.to_a.sort_by{|ab| ab.last}.map{|ab| Hash[*ab]},
}.to_yaml
end
end

abort “Usage: [inputfilepath = stdin] [outputfilename = stdout]” if
ARGV.delete(‘help’) or ARGV.delete(’–help’)

i = ARGV.shift || STDIN
o = ARGV.shift || STDOUT

idx = TclIndex.new i
idx.report o

regards.

-a


#10

On Thu, 13 Apr 2006, Mark V. wrote:

On 4/12/06, Daniel B. removed_email_address@domain.invalid wrote:

Just a quick note - precompiling the regular expressions might help here, too.

How do you do that in Ruby?

/re/o
^
^

-a


#11

On Apr 12, 2006, at 2:19 PM, Mark V. wrote:

On 4/12/06, Daniel B. removed_email_address@domain.invalid wrote:

Just a quick note - precompiling the regular expressions might
help here, too.

How do you do that in Ruby?

You use this idiom:

class SomeClass
def initialize
@some_re = /some_re/
end
def some_method
# do stuff with @some_re
end
end


#12

2006/4/12, Logan C. removed_email_address@domain.invalid:

You use this idiom:

class SomeClass
def initialize
@some_re = /some_re/
end
def some_method
# do stuff with @some_re
end
end

Actually, Ruby is quite smart in some cases… Try the following:

@re = /^\w±\w+$/ # Some random expression
def foo(str)
str =~ @re
end

def bar(str)
str =~ /^\w±\w+$/
end

def qux(str)
str =~ Regexp.new("/^\w±\w+$/")
end

require ‘benchmark’
include Benchmark

bm(16) do |test|
test.report(“foo”) do
1_000_000.times {foo(“abc-xyz”)}
end
test.report(“bar”) do
1_000_000.times {bar(“abc-xyz”)}
end
test.report(“qux”) do
1_000_000.times {qux(“abc-xyz”)}
end
end

I get something like this on 1.8 cvs:

                  user     system      total        real

foo 4.920000 0.080000 5.000000 ( 5.581873)
bar 4.610000 0.060000 4.670000 ( 5.457461)
qux 15.280000 0.280000 15.560000 ( 17.514639)

So ruby actually shares a single compiled Regexp object in bar’s case
(as can also be proven by counting Regexp’s in ObjectSpace with the GC
disabled).

Brian.


#13

Brian M. wrote:

I get something like this on 1.8 cvs:

                  user     system      total        real

foo 4.920000 0.080000 5.000000 ( 5.581873)
bar 4.610000 0.060000 4.670000 ( 5.457461)
qux 15.280000 0.280000 15.560000 ( 17.514639)

Just for fun, I tried it too… I get

                 user     system      total        real

foo 5.141000 0.000000 5.141000 ( 5.157000)
bar 4.765000 0.032000 4.797000 ( 4.812000)
qux 22.219000 1.593000 23.812000 ( 23.906000)

Hmm… I know for a fact that my (work) computer is messed up, but still

  • it’s 3.0Ghz HT P4 with 1Gb RAM. Running Windows XP, as much disabled
    as I can to try and convince the thing to run quickly.

Also, I tried the YAML version - at first it told me that I couldn’t
modify a frozen string. I read somewhere about this happening if you
try to modify an ARGV value, so I changed

i = ARGV.shift || STDIN
o = ARGV.shift || STDOUT
to
i = File.open(ARGV[0], ‘r’)
o = File.new(ARGV[1], ‘w’)

I don’t know if that has an effect on the speed or not, but it worked.

Speed results:
My script: 450 seconds
YAML script: 156 seconds

My bottleneck definitely seems to be iterating through each line in the
input file:

% cumulative self self total
time seconds seconds calls ms/call ms/call name
70.25 316.08 316.08 327 966.61 1369.08 IO#each
15.07 383.88 67.79 991826 0.07 0.07 String#[]
14.15 447.56 63.69 984426 0.06 0.06 String#+
0.11 448.05 0.49 2 243.00 406.50 Hash#sort
… etc.

Hmm… seems like it would be worthwhile to learn all that
lambda/map/#&@^#* gibberish. Thanks!


#14

On Thu, 2006-04-13 at 04:24 +0900, Charlotte wrote:

                 user     system      total        real

foo 5.141000 0.000000 5.141000 ( 5.157000)
bar 4.765000 0.032000 4.797000 ( 4.812000)
qux 22.219000 1.593000 23.812000 ( 23.906000)

Hmm… I know for a fact that my (work) computer is messed up, but still

  • it’s 3.0Ghz HT P4 with 1Gb RAM. Running Windows XP, as much disabled
    as I can to try and convince the thing to run quickly.

Wow, on my paltry 1.7Ghz P4 I get:

                  user     system      total        real

foo 3.990000 0.040000 4.030000 ( 4.097883)
bar 3.700000 0.020000 3.720000 ( 3.778370)
qux 13.640000 0.130000 13.770000 ( 13.914830)

from ruby 1.8.4 (2005-12-24) [i686-linux]. Interestingly (though
probably of no concern given developmental status), performance was
significantly worse with 1.9 (especially for qux - around 23 seconds)
and Oniguruma.


#15

Hi,

In message “Re: First script seems slow - What’s a better way to write
t”
on Thu, 13 Apr 2006 04:48:12 +0900, Ross B.
removed_email_address@domain.invalid writes:

|from ruby 1.8.4 (2005-12-24) [i686-linux]. Interestingly (though
|probably of no concern given developmental status), performance was
|significantly worse with 1.9 (especially for qux - around 23 seconds)
|and Oniguruma.

Because qux creates a lot of Regexp objects. Since Oniguruma (1.9
regex engine) takes little bit longer time for pattern compilation and
optimization than old 1.8 regex engine.

						matz.

#16

Ross B. wrote:

Wow, on my paltry 1.7Ghz P4 I get:

                  user     system      total        real

foo 3.990000 0.040000 4.030000 ( 4.097883)
bar 3.700000 0.020000 3.720000 ( 3.778370)
qux 13.640000 0.130000 13.770000 ( 13.914830)

from ruby 1.8.4 (2005-12-24) [i686-linux]. Interestingly (though
probably of no concern given developmental status), performance was
significantly worse with 1.9 (especially for qux - around 23 seconds)
and Oniguruma.

Could the operating system have an effect on the speed?


#17

On Thu, 13 Apr 2006, Charlotte wrote:

Hmm… seems like it would be worthwhile to learn all that
lambda/map/#&@^#* gibberish. Thanks!

in fact my approach is slowed by those features. what makes it a bit
faster
is that it makes one pass through the file, does io in bulk, and
pre-compiles
all regexes. those are the keys.

regards.

-a


#18

Charlotte wrote:

probably of no concern given developmental status), performance was
significantly worse with 1.9 (especially for qux - around 23 seconds)
and Oniguruma.

Could the operating system have an effect on the speed?

Without opening or closing a single program/window i benchmarkt
this on native windows, colinux-woody and cygwin.


C:\temp>ruby -v foobarqux.rb
ruby 1.8.4 (2005-12-24) [i386-mswin32]
user system total real
foo 5.359000 0.062000 5.421000 ( 5.875000)
bar 4.969000 0.063000 5.032000 ( 5.391000)
qux 18.750000 1.484000 20.234000 ( 22.578000)

colinux:~# ruby -v foobarqux.rb
ruby 1.8.4 (2005-12-24) [i486-linux]
user system total real
foo 1.360000 2.510000 3.870000 ( 3.868129)
bar 1.160000 2.300000 3.460000 ( 3.460166)
qux 2.020000 13.190000 15.210000 ( 15.210472)

Simon@XPS /cygdrive/c/temp
$ ruby -v foobarqux.rb
ruby 1.8.4 (2005-12-24) [i386-cygwin]
user system total real
foo 3.656000 0.000000 3.656000 ( 3.973000)
bar 3.422000 0.000000 3.422000 ( 3.709000)
qux 12.813000 0.000000 12.813000 ( 13.991000)

Well, I’m puzzled. I thought native windows should be the
fastest one on a windows machine.

cheers

Simon


#19

On Fri, 14 Apr 2006, [UTF-8] Simon Kröger wrote:

Well, I’m puzzled. I thought native windows should be the fastest one on a
windows machine.

but why? it’s windows!?

:wink:

-a