Runtime disparity - Same program in Perl and Ruby

Hi all (this is going to comp.lang.ruby and comp.lang.perl.misc),

The other day I wrote a basic program in Perl, and the following day I
rewrote it in Ruby. I’m curious about the differences in runtime of
the two versions, though.

Let me start by describing the program (I’ll append full code for both
to the end): it reads in a list of alphanumeric codes from file
(format is [\w\d\S]+_\d{3}, but they’re separated by a comma in the
file), then creates a hash with those codes as keys and empty arrays
as values. After the hash is built, the program traverses through a
given directory and its subdirectories (using File::Find in Perl and
Find.find in Ruby) and checks each file against the hash of codes
(with a few regexps and conditions to prevent lots of unnecessary
looping), adding it to the array for a code if the code is found in
the filename. Finally, it writes the contents of the hash to a .csv
file in the format CODE,PATH for each match.

Now, if it were the case that Ruby or Perl were simply -slower- than
the other, I wouldn’t be bothering you folks. But here’s where it gets
a little unusual: the number of elements in the code list has a
noticeable impact on the run time of the Ruby version, but far less on
the Perl version. I ran each one a few times with code lists of
various sizes, and they both print start/stop timestamps at the end,
so I collected the data:

Entries | Seconds
Ruby
4 | 153
64 | 133
256 | 222
512 | 327
1024 | 562
1500 | 683
Perl
4 | 291
64 | 258
256 | 253
512 | 248
1024 | 353
1500 | 363

Ruby runs faster for low numbers of entries, as you can see, but once
you get up to 1500, Ruby’s time has more than tripled while Perl’s
time has gone up about a fifth.

I’ve looked over the code for both versions several times, and I don’t
see any significant differences. The only important feature the Ruby
version lacks is the sort() before writing the file.

I’d really appreciate any insight into why Ruby’s runtime grows so
readily and Perl’s does not.

Code of both versions follows.

Thanks,
Andrew Fallows

use File::Find;
use strict;
use warnings;
my $code;
my $type;
my %filecodes = ();
my $start_time = "Started: " . localtime();
$| = 1; #Enables flush on print.
$\ = “\n”; #Automatic newlines on print
open(ITEM_LIST, “(path)”) or die “Error”;

This loop builds a hash whose keys are the codes/types from file

and whose values are references to empty arrays

while(my $item = <ITEM_LIST>)
{
$item =~ s/,/_/;
$item =~ s/\n//g;
print $item;
my @files = ();
$filecodes{$item} = @files;
}
print “Hash built”;

Uses File::Find to iterate over the entire subdirectory

find(&file_seek, “(path)”);

The searching portion: gets each location from File::Find, then

compares it

to all the targets. If there is a match, prints a message and adds

that file

to the related array.

sub file_seek
{
my $file = $_;

Kicks out if the file in question is not of the necessary format

if(!(-f $file) || !($file =~ /^[\d\w\S]+_\d{3}/)){ return; }

foreach my $target (keys(%filecodes))
{
# If the file name contains the code sought
if($file =~ /$target/)
{
print “found $file in $File::Find::dir”;

  # Jumps out if the list for this code already contains this file.
  for (0..@{$filecodes{$target}})
  {
    if(defined(${$filecodes{$target}}[$_])
    && $File::Find::name eq ${$filecodes{$target}}[$_]) {return; }
  }
  push(@{$filecodes{$target}}, $File::Find::name);
}

}
}

After the whole directory has been searched, prints each key and all

values found for it.

open(RESULTS, “> (path)”) or die “Error 2”;
foreach my$target ( sort(keys( %filecodes )))
{
my @results = @{$filecodes{$target}};
if(@results == 0) { push(@results, “NO FILES FOUND”) }
print $target;
foreach (@results)
{
print RESULTS “$target,$";
print "\t$
”;
}
}
close RESULTS;
print $start_time;
print "Ended: " . localtime();

Ruby:

class FileSearcher
$\ = “\n”
in_file = File.open( “(path)”,“r”)
start_time = Time.now
filecodes = Hash.new

This loop reads all the item codes in from file and then

adds them to a hash, each linked to its own empty array

while item = in_file.gets
item = item.gsub(’,’,’_’)
item = item.gsub("\n","")
files = Array.new
files.push(“empty”);
filecodes[item]= files
end
in_file.close

The searching portion: looks at each file/location, then compares

it

to all the targets. If there is a match, prints a message and

adds

that file to the related array.

require “Find”
require ‘ftools’
Find.find("(path)") do |file|
if !(FileTest.file?(file)) || !(File.basename(file) =~ /^[\d\w\S]+_
\d{3}/)
next
else
filecodes.each_key do |target|
if(file =~ /#{target}/)
puts "found " + target + " at " + file
$stdout.flush
fail = 0
for i in 0…filecodes[target].size-1 do
if(filecodes[target][i] != “empty” &&
File.basename(file) == File.basename(filecodes[target]
[i]))
fail = 1
break
end
end
if fail == 0
if filecodes[target][0] == “empty”
filecodes[target][0] = file
else
filecodes[target].push(file)
end
end
end
end
end
end

After the whole directory has been searched, prints each key and

all

values found for it to a file called Ruby_results.csv.

target_file = File.open("(path)",“w”)
filecodes.each_key do |target|
results = filecodes[target]
if results[0] == “empty”
results[0] = “NO FILES FOUND”
end
puts target
for i in 0…(results.size-1)
target_file.puts target + “,” + results[i]
end
end
target_file.close
end_time = Time.now
puts "Started: " + start_time.to_s
puts "Ended: " + end_time.to_s
end

Thanks for the reply, John. There are a number of good tips in your
reply for making my code more “Perl”-y. I don’t think many (if any)
will actually change the way the program runs, though, will they? A
lot of the things I did work, but are styled more like Java, the
language I use most. For example, I know I can just use $_ in sub
file_seek, but I prefer to give my vars names that make more sense at
a glance. But I’ll keep all of your advice in mind.

Thanks again,
Andrew

Kaldrenon wrote:

Hi all (this is going to comp.lang.ruby and comp.lang.perl.misc),

The other day I wrote a basic program in Perl,

Did you write it in basic or in Perl? :slight_smile:

and the following day I
rewrote it in Ruby. I’m curious about the differences in runtime of
the two versions, though.

Let me start by describing the program (I’ll append full code for both
to the end): it reads in a list of alphanumeric codes from file
(format is [\w\d\S]+_\d{3},

The character class \d is a subset of \w and they are both a subset of
\S so
your expression could be simplified to:

\S+_\d{3}

Now, if it were the case that Ruby or Perl were simply -slower- than
64 | 133
1500 | 363
readily and Perl’s does not.
Did you compare the output of the Perl and Ruby versions to see if there
were
any differences?

my %filecodes = ();
my $start_time = "Started: " . localtime();
$| = 1; #Enables flush on print.
$\ = “\n”; #Automatic newlines on print
open(ITEM_LIST, “(path)”) or die “Error”;

You should include the $! (or $^E) variable in the error message so you
know
why it failed.

This loop builds a hash whose keys are the codes/types from file

and whose values are references to empty arrays

while(my $item = <ITEM_LIST>)
{
$item =~ s/,/_/;
$item =~ s/\n//g;

That is usually done with chomp:

     chomp $item;

print $item;
my @files = ();
$filecodes{$item} = @files;

You don’t need to create an array, just assign an anonymous array:

$filecodes{$item} = [];

to the related array.

sub file_seek
{
my $file = $_;

Kicks out if the file in question is not of the necessary format

if(!(-f $file) || !($file =~ /^[\d\w\S]+_\d{3}/)){ return; }

Using $_ instead of the copy in $file:

     return if !-f || !/^\S+_\d{3}/;

foreach my $target (keys(%filecodes))
{
# If the file name contains the code sought
if($file =~ /$target/)

Because $target may contain some regular expression meta-characters you
should
quotemeta it:

  if ( $file =~ /\Q$target/ )

Or use the index function:

  if ( 0 <= index $file, $target )
{
  print "found $file in $File::Find::dir";

  # Jumps out if the list for this code already contains this file.
  for (0..@{$filecodes{$target}})

You have an off-by-one error:

    for (0..$#{$filecodes{$target}})
  {
    if(defined(${$filecodes{$target}}[$_])
    && $File::Find::name eq ${$filecodes{$target}}[$_]) {return; }

${$filecodes{$target}}[$] can be written more simply as
$filecodes{$target}[$
].

But you don’t really need to use an array index:

    for ( @{$filecodes{$target}} )
                     {
                         return if defined() && $File::Find::name eq 

$_;

(Or you could use a Hash of Hashes.)

  }
  push(@{$filecodes{$target}}, $File::Find::name);
}

}
}

After the whole directory has been searched, prints each key and all

values found for it.

open(RESULTS, “> (path)”) or die “Error 2”;

You should include the $! (or $^E) variable in the error message so you
know
why it failed.

foreach my$target ( sort(keys( %filecodes )))
{
my @results = @{$filecodes{$target}};

Do you really need to make a copy of the array?

if(@results == 0) { push(@results, “NO FILES FOUND”) }

If the array is empty you can just assign to it:

     @results = 'NO FILES FOUND' unless @results;

print $target;
foreach (@results)
{
print RESULTS “$target,$";
print "\t$
”;
}
}
close RESULTS;
print $start_time;
print "Ended: " . localtime();

John

On Thu, 14 Jun 2007 19:11:18 GMT, “John W. Krahn” [email protected]
wrote:

Using $_ instead of the copy in $file:

    return if !-f || !/^\S+_\d{3}/;

Or (IMHO more clearly):

return unless -f and /^\S+_\d{3}/;

Michele

Kaldrenon wrote:

Hi all (this is going to comp.lang.ruby and comp.lang.perl.misc),

The other day I wrote a basic program in Perl, and the following day I
rewrote it in Ruby. I’m curious about the differences in runtime of
the two versions, though.

It would help to have a view into the input file an to know what your
program should do. Im sure there has to be another way to code your
problem, and im pretty sure that constructs as the following can and
should be avoided:

Find.find("(path)") do |file|
if !(FileTest.file?(file)) || !(File.basename(file) =~ /^[\d\w\S]+_
\d{3}/)
next
else
filecodes.each_key do |target|
if(file =~ /#{target}/)
puts "found " + target + " at " + file
$stdout.flush
fail = 0
for i in 0…filecodes[target].size-1 do
if(filecodes[target][i] != “empty” &&
File.basename(file) == File.basename(filecodes[target]
[i]))
fail = 1
break
end
end
if fail == 0
if filecodes[target][0] == “empty”
filecodes[target][0] = file
else
filecodes[target].push(file)
end
end
end
end
end
end

This is both difficult to read and error prone.

On Thu, 14 Jun 2007 20:15:16 -0000, Kaldrenon [email protected]
wrote:

reply for making my code more “Perl”-y. I don’t think many (if any)
will actually change the way the program runs, though, will they? A

Just try.

lot of the things I did work, but are styled more like Java, the
language I use most. For example, I know I can just use $_ in sub
file_seek, but I prefer to give my vars names that make more sense at
a glance. But I’ll keep all of your advice in mind.

$_ is a pronoun and it makes sense in short enough phrases. If you
have a C loop with a two or three lines block (or even a C
modifier) then use it. If it’s 100 lines long (probably not a good
idea in its own) then use an explicit name.

Michele