Forum: Ruby First script seems slow - What's a better way to write this?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
1eb5ba5aed47e7f3e2634310db3e2143?d=identicon&s=25 Charlotte (Guest)
on 2006-04-12 16:52
I've inherited a tcl script from previous co-op students, and it's a
little messy so I wanted to clean it up.  I wanted to learn Ruby anyway,
so I made a ruby script to search my .tcl file and output a list of all
the procedures and variables, sorted in order of number of times of use
(I'm mainly interested in the unused ones).

The script seems really slow though (~10 seconds for a 3000 line file)-
is that Ruby, or is it just my implementation?  I don't care that this
script takes 10 seconds, but I'd like to learn how to write better ruby
code.  Here's my script:

def generateTokenList(readFile, token, prefix)
	names = Hash.new
	str = ""
	File.open(readFile, 'r').each do |line|
			if line[token] and not line['#']
				name = line.split[1]
				names[name] = 0 if not names.key?(name)
			end
	end

	names.each do |key, value|
			i = 0
			i = -1 if token == 'proc '
			File.open(readFile, 'r').each do |line|
				i = i + 1 if line[prefix + key] and not line['#']
			end
		names[key] = i
	end

	names = names.sort { |a,b| a[1] <=>b [1] }

	names.each { |pair| str << pair[0] + "	uses: " + pair[1].to_s + "\n" }

	return str
end

if ARGV[0] == nil or ARGV[1] == nil
	puts "\nUsage: ruby ProcList.rb inputfilepath outputfilename"
	exit(0)
end

writeFile = File.new(ARGV[1], 'w')
writeFile << "Procedures:\n"
writeFile << generateTokenList(ARGV[0], 'proc ', '')
writeFile << "\n\nVariables:\n"
writeFile << generateTokenList(ARGV[0], 'set ', 36.chr)
writeFile << "Updated: " + File.mtime(ARGV[0]).to_s
6d9bf78ca49a017e9e3e6b0357b6c59e?d=identicon&s=25 Peter Hickman (Guest)
on 2006-04-12 17:39
(Received via mailing list)
As a side issue there is a tool to generate cross references in tcl
called zdoc (http://www.oklin.com/zdoc/) that might be a better starting
point if all you really want to do is get to grips with the existing
code. However I have never used it not being a particularly good tcl
programmer myself. There is also frink (http://wiki.tcl.tk/2611) to
reformat your source code to make it easier to read, which I have used.

I realise that this does nothing for your Ruby but perhaps it will help
you get onto something more interesting :)
1eb5ba5aed47e7f3e2634310db3e2143?d=identicon&s=25 Charlotte (Guest)
on 2006-04-12 18:29
Thanks for the link.  However, I forgot to mention that the reason I'm
doing this myself is because no currently available tools like that work
with my code, as it contains commands specific to the program it extends
and gives errors telling me that they are invalid command names.

Basically, I'm just trying to figure out if it's my fault the script is
slow, or if Ruby just isn't very efficient.

>I realise that this does nothing for your Ruby but perhaps it will help
>you get onto something more interesting :)

This is more interesting ;)

On a side note, so far I like Ruby better than tcl.
0817571d150afead454f4220007042fe?d=identicon&s=25 Matthew Desmarais (Guest)
on 2006-04-12 18:53
(Received via mailing list)
Charlotte wrote:
>> you get onto something more interesting :)
>>
>
> This is more interesting ;)
>
> On a side note, so far I like Ruby better than tcl.
>
Hi there,

I'm afraid that I can't take the time right now to really pore over the
script you've posted, but it doesn't look unreasonable to me.  Of
course, I'm not terribly clever so take that with a grain of salt. ;-)

It's true that blistering speed is not listed as one of the current Ruby
interpreter's features and you may be seeing an example of that.  You
can probably get a better view of the situation by running your script
with the profiling library enabled.

Try running it like this:

ruby -rprofile ProcList.rb inputfilepath outputfilename

It will take even longer, but you'll end up with a report showing you
where in your script you are spending the most time.  Maybe it will
reveal a few hot spots that you can speed up a bit.

Good luck, and don't hesitate to continue posting problems here.  There
are a lot of awfully smart people on this list that are often willing to
help out in situations like yours.

Regards,
Matthew Desmarais
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-04-12 19:03
(Received via mailing list)
On Thu, 13 Apr 2006, Charlotte wrote:

>
> This is more interesting ;)
>
> On a side note, so far I like Ruby better than tcl.

how does this do (untested) :

   harp:~ > cat a.rb

   require 'yaml'
   class TclIndex
     def initialize arg
       @procs, @vars = {}, {}
       parse arg
     end
     def parse a
       read = lambda{|io| io.readlines.map!{|l| l.gsub %r/#.*$/, ''}}
       lines = a.respond_to?('readlines') ? read[a] :
open(a){|f|read[f]}
       lines.each do |line|
         case line
           when %r/^ \s* proc \s+ (\w+)/iox
             @procs[$1] = -1
           when %r/^ \s* set \s+ (\w+)/iox
             @vars[$1] = 0
         end
         @procs.keys.each{|k| @procs[k] += 1 if line[%r/\b#{ k }\b/]}
         @vars.keys.each{|k| @vars[k] += 1 if line[%r/\b#{ k }\b|\$#{ k
}\b/]}
       end
     end
     def report o
       o << {
         'procs' => @procs.to_a.sort_by{|ab| ab.last}.map{|ab|
Hash[*ab]},
         'vars' => @vars.to_a.sort_by{|ab| ab.last}.map{|ab| Hash[*ab]},
       }.to_yaml
     end
   end


   abort "Usage: [inputfilepath = stdin] [outputfilename = stdout]" if
     ARGV.delete('help') or ARGV.delete('--help')

   i = ARGV.shift || STDIN
   o = ARGV.shift || STDOUT

   idx = TclIndex.new i
   idx.report o

regards.

-a
430ea1cba106cc65b7687d66e9df4f06?d=identicon&s=25 David Vallner (Guest)
on 2006-04-12 19:03
(Received via mailing list)
DÅ?a Streda 12. Apríl 2006 18:29 Charlotte napísal:
> Basically, I'm just trying to figure out if it's my fault the script is
> slow, or if Ruby just isn't very efficient.
>

Well, I hate to mention the giant purple squid in the middle of the
kitchen,
but Ruby is... shall we say... not really speedy. I should still
outperform
TCL, but not much else, if the programming language benchmarks are to be
trusted. (Which they aren't, but hey.)

Then again, YARV looks surprisingly vital for what was mere vaporware
only a
few years ago, so there's a chance of a Blazing Fast (well, not really)
Ruby
yet.

David Vallner
F3b7b8756d0c7f71cc7460cc33aefaee?d=identicon&s=25 Daniel Berger (Guest)
on 2006-04-12 19:09
(Received via mailing list)
ara.t.howard@noaa.gov wrote:
>>> I realise that this does nothing for your Ruby but perhaps it will help
>   require 'yaml'
>           when %r/^ \s* proc \s+ (\w+)/iox
>       o << {
>   i = ARGV.shift || STDIN
>   o = ARGV.shift || STDOUT
>
>   idx = TclIndex.new i
>   idx.report o
>
> regards.
>
> -a

Just a quick note - precompiling the regular expressions might help
here, too.

Regards,

Dan
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-04-12 19:30
(Received via mailing list)
On Thu, 13 Apr 2006, Daniel Berger wrote:

> Just a quick note - precompiling the regular expressions might help here,
> too.

very good point! :


     harp:~ > cat /usr/share/tcl8.3/*tcl |wc -l
        3533

     harp:~ > time cat /usr/share/tcl8.3/*tcl |ruby a.rb >/dev/null

     real    0m0.848s
     user    0m0.810s
     sys     0m0.020s


this is down from 3 sec!


   harp:~ > cat a.rb

   require 'yaml'
   class TclIndex
     def initialize arg
       @procs, @vars = {}, {}
       parse arg
     end
     def parse a
       read = lambda{|io| io.readlines.map!{|l| l.gsub %r/#.*$/, ''}}
       lines = a.respond_to?('readlines') ? read[a] :
open(a){|f|read[f]}
       proc_re = Hash.new{|h,k| h[k] = %r/\b#{ k }\b/}
       var_re = Hash.new{|h,k| h[k] = %r/\b#{ k }\b|\$#{ k }\b/}
       lines.each do |line|
         case line
           when %r/^ \s* proc \s+ (\w+)/iox
             @procs[$1] = -1
           when %r/^ \s* set \s+ (\w+)/iox
             @vars[$1] = 0
         end
         @procs.keys.each{|k| @procs[k] += 1 if line[proc_re[k]]}
         @vars.keys.each{|k| @vars[k] += 1 if line[var_re[k]]}
       end
     end
     def report o
       o << {
         'procs' => @procs.to_a.sort_by{|ab| ab.last}.map{|ab|
Hash[*ab]},
         'vars' => @vars.to_a.sort_by{|ab| ab.last}.map{|ab| Hash[*ab]},
       }.to_yaml
     end
   end


   abort "Usage: [inputfilepath = stdin] [outputfilename = stdout]" if
     ARGV.delete('help') or ARGV.delete('--help')

   i = ARGV.shift || STDIN
   o = ARGV.shift || STDOUT

   idx = TclIndex.new i
   idx.report o



regards.

-a
52a177e9dbd3e614825aabc4e45f8cd6?d=identicon&s=25 Mark Volkmann (Guest)
on 2006-04-12 20:20
(Received via mailing list)
On 4/12/06, Daniel Berger <Daniel.Berger@qwest.com> wrote:
>
> Just a quick note - precompiling the regular expressions might help here, too.

How do you do that in Ruby?
E34b5cae57e0dd170114dba444e37852?d=identicon&s=25 Logan Capaldo (Guest)
on 2006-04-12 20:26
(Received via mailing list)
On Apr 12, 2006, at 2:19 PM, Mark Volkmann wrote:

> On 4/12/06, Daniel Berger <Daniel.Berger@qwest.com> wrote:
>>
>> Just a quick note - precompiling the regular expressions might
>> help here, too.
>
> How do you do that in Ruby?
>

You use this idiom:

class SomeClass
   def initialize
     @some_re = /some_re/
   end
   def some_method
      # do stuff with @some_re
   end
end
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-04-12 20:50
(Received via mailing list)
On Thu, 13 Apr 2006, Mark Volkmann wrote:

> On 4/12/06, Daniel Berger <Daniel.Berger@qwest.com> wrote:
>>
>> Just a quick note - precompiling the regular expressions might help here, too.
>
> How do you do that in Ruby?

   /re/o
       ^
       ^

-a
B44ab09b79ee4a0cc4b4ca69e10eeb3a?d=identicon&s=25 Brian Mitchell (Guest)
on 2006-04-12 20:57
(Received via mailing list)
2006/4/12, Logan Capaldo <logancapaldo@gmail.com>:
> You use this idiom:
>
> class SomeClass
>    def initialize
>      @some_re = /some_re/
>    end
>    def some_method
>       # do stuff with @some_re
>    end
> end

Actually, Ruby is quite smart in some cases.. Try the following:

@re = /^\w+-\w+$/ # Some random expression
def foo(str)
  str =~ @re
end

def bar(str)
  str =~ /^\w+-\w+$/
end

def qux(str)
 str =~ Regexp.new("/^\w+-\w+$/")
end

require 'benchmark'
include Benchmark

bm(16) do |test|
  test.report("foo") do
    1_000_000.times {foo("abc-xyz")}
  end
  test.report("bar") do
    1_000_000.times {bar("abc-xyz")}
  end
  test.report("qux") do
    1_000_000.times {qux("abc-xyz")}
  end
end

I get something like this on 1.8 cvs:

                      user     system      total        real
foo               4.920000   0.080000   5.000000 (  5.581873)
bar               4.610000   0.060000   4.670000 (  5.457461)
qux              15.280000   0.280000  15.560000 ( 17.514639)

So ruby actually shares a single compiled Regexp object in bar's case
(as can also be proven by counting Regexp's in ObjectSpace with the GC
disabled).

Brian.
1eb5ba5aed47e7f3e2634310db3e2143?d=identicon&s=25 Charlotte (Guest)
on 2006-04-12 21:23
Brian Mitchell wrote:

> I get something like this on 1.8 cvs:
>
>                       user     system      total        real
> foo               4.920000   0.080000   5.000000 (  5.581873)
> bar               4.610000   0.060000   4.670000 (  5.457461)
> qux              15.280000   0.280000  15.560000 ( 17.514639)

Just for fun, I tried it too... I get

>                      user     system      total        real
>foo               5.141000   0.000000   5.141000 (  5.157000)
>bar               4.765000   0.032000   4.797000 (  4.812000)
>qux              22.219000   1.593000  23.812000 ( 23.906000)

Hmm... I know for a fact that my (work) computer is messed up, but still
- it's 3.0Ghz HT P4 with 1Gb RAM.  Running Windows XP, as much disabled
as I can to try and convince the thing to run quickly.

Also, I tried the YAML version - at first it told me that I couldn't
modify a frozen string.  I read somewhere about this happening if you
try to modify an ARGV value, so I changed

>   i = ARGV.shift || STDIN
>   o = ARGV.shift || STDOUT
to
>   i = File.open(ARGV[0], 'r')
>   o = File.new(ARGV[1], 'w')

I don't know if that has an effect on the speed or not, but it worked.

Speed results:
    My script: 450 seconds
    YAML script: 156 seconds

My bottleneck definitely seems to be iterating through each line in the
input file:

 %   cumulative   self              self     total
time   seconds   seconds    calls  ms/call  ms/call  name
70.25   316.08    316.08      327   966.61  1369.08  IO#each
15.07   383.88     67.79   991826     0.07     0.07  String#[]
14.15   447.56     63.69   984426     0.06     0.06  String#+
0.11    448.05      0.49        2   243.00   406.50  Hash#sort
... etc.

Hmm... seems like it would be worthwhile to learn all that
lambda/map/#&@^#* gibberish.  Thanks!
A9b6a93b860020caf9d2d1d58c32478f?d=identicon&s=25 Ross Bamford (Guest)
on 2006-04-12 21:49
(Received via mailing list)
On Thu, 2006-04-13 at 04:24 +0900, Charlotte wrote:
>
> >                      user     system      total        real
> >foo               5.141000   0.000000   5.141000 (  5.157000)
> >bar               4.765000   0.032000   4.797000 (  4.812000)
> >qux              22.219000   1.593000  23.812000 ( 23.906000)
>
> Hmm... I know for a fact that my (work) computer is messed up, but still
> - it's 3.0Ghz HT P4 with 1Gb RAM.  Running Windows XP, as much disabled
> as I can to try and convince the thing to run quickly.

Wow, on my paltry 1.7Ghz P4 I get:

                      user     system      total        real
foo               3.990000   0.040000   4.030000 (  4.097883)
bar               3.700000   0.020000   3.720000 (  3.778370)
qux              13.640000   0.130000  13.770000 ( 13.914830)

from ruby 1.8.4 (2005-12-24) [i686-linux]. Interestingly (though
probably of no concern given developmental status), performance was
significantly worse with 1.9 (especially for qux - around 23 seconds)
and Oniguruma.
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-04-12 22:31
(Received via mailing list)
On Thu, 13 Apr 2006, Charlotte wrote:

> Hmm... seems like it would be worthwhile to learn all that
> lambda/map/#&@^#* gibberish.  Thanks!

in fact my approach is slowed by those features.  what makes it a bit
faster
is that it makes one pass through the file, does io in bulk, and
pre-compiles
all regexes.  those are the keys.

regards.

-a
0ec4920185b657a03edf01fff96b4e9b?d=identicon&s=25 Yukihiro Matsumoto (Guest)
on 2006-04-13 17:28
(Received via mailing list)
Hi,

In message "Re: First script seems slow - What's a better way to write
t"
    on Thu, 13 Apr 2006 04:48:12 +0900, Ross Bamford
<rossrt@roscopeco.co.uk> writes:

|from ruby 1.8.4 (2005-12-24) [i686-linux]. Interestingly (though
|probably of no concern given developmental status), performance was
|significantly worse with 1.9 (especially for qux - around 23 seconds)
|and Oniguruma.

Because qux creates a lot of Regexp objects.  Since Oniguruma (1.9
regex engine) takes little bit longer time for pattern compilation and
optimization than old 1.8 regex engine.

							matz.
1eb5ba5aed47e7f3e2634310db3e2143?d=identicon&s=25 Charlotte (Guest)
on 2006-04-13 20:01
Ross Bamford wrote:

> Wow, on my paltry 1.7Ghz P4 I get:
>
>                       user     system      total        real
> foo               3.990000   0.040000   4.030000 (  4.097883)
> bar               3.700000   0.020000   3.720000 (  3.778370)
> qux              13.640000   0.130000  13.770000 ( 13.914830)
>
> from ruby 1.8.4 (2005-12-24) [i686-linux]. Interestingly (though
> probably of no concern given developmental status), performance was
> significantly worse with 1.9 (especially for qux - around 23 seconds)
> and Oniguruma.

Could the operating system have an effect on the speed?
A9c4658e9e475e13d790ae419acf01b6?d=identicon&s=25 Simon Kröger (Guest)
on 2006-04-13 23:08
(Received via mailing list)
Charlotte wrote:
>> probably of no concern given developmental status), performance was
>> significantly worse with 1.9 (especially for qux - around 23 seconds)
>> and Oniguruma.
>
> Could the operating system have an effect on the speed?

Without opening or closing a single program/window i benchmarkt
this on native windows, colinux-woody and cygwin.

-------------------------------------------------------------
C:\temp>ruby -v foobarqux.rb
ruby 1.8.4 (2005-12-24) [i386-mswin32]
                      user     system      total        real
foo               5.359000   0.062000   5.421000 (  5.875000)
bar               4.969000   0.063000   5.032000 (  5.391000)
qux              18.750000   1.484000  20.234000 ( 22.578000)
-------------------------------------------------------------
colinux:~# ruby -v foobarqux.rb
ruby 1.8.4 (2005-12-24) [i486-linux]
                      user     system      total        real
foo               1.360000   2.510000   3.870000 (  3.868129)
bar               1.160000   2.300000   3.460000 (  3.460166)
qux               2.020000  13.190000  15.210000 ( 15.210472)
-------------------------------------------------------------
Simon@XPS /cygdrive/c/temp
$ ruby -v foobarqux.rb
ruby 1.8.4 (2005-12-24) [i386-cygwin]
                      user     system      total        real
foo               3.656000   0.000000   3.656000 (  3.973000)
bar               3.422000   0.000000   3.422000 (  3.709000)
qux              12.813000   0.000000  12.813000 ( 13.991000)
-------------------------------------------------------------

Well, I'm puzzled. I thought native windows should be the
fastest one on a windows machine.

cheers

Simon
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-04-13 23:23
(Received via mailing list)
On Fri, 14 Apr 2006, [UTF-8] Simon Kröger wrote:

> Well, I'm puzzled. I thought native windows should be the fastest one on a
> windows machine.

but why?  it's windows!?

;-)

-a
This topic is locked and can not be replied to.