Forum: Ruby optimizing for speed - Array#each

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Ce5990549a8185ad45643be2cfbff9b5?d=identicon&s=25 Mike Steiner (Guest)
on 2007-05-24 00:01
(Received via mailing list)
I ran the profiler on my program and it said that 51% of the time is
spent
in Array#each. Is there any way to speed this up (like using
Array#each_index perhaps)?

And is there a quicker way to do a .gsub if I'm using just strings and
not a
regexp?

By the way, the program compares 2 databases and determines which
records
have been inserted and deleted, and each database has about 15,000
records
to compare.

Mike Steiner
Cf7cd97cdc8ed7d4ae92965b24f0dfad?d=identicon&s=25 Stefan Rusterholz (apeiros)
on 2007-05-24 00:15
Mike Steiner wrote:
> I ran the profiler on my program and it said that 51% of the time is
> spent
> in Array#each. Is there any way to speed this up (like using
> Array#each_index perhaps)?
>
> And is there a quicker way to do a .gsub if I'm using just strings and
> not a
> regexp?
>
> By the way, the program compares 2 databases and determines which
> records
> have been inserted and deleted, and each database has about 15,000
> records
> to compare.
>
> Mike Steiner

No. I don't think you can speed up Array#each at all. What's far more
promising is improving your algorithm in a way that it has to run less
loops. You most likely have one or even several O(n^x) algorithms with x
>= 2 in your app. Maybe you're not even aware. E.g. if you have an each
with a select in the loop you're already in O(n^2) realm.
The database comparison probably can be handled best with hashtables of
the primary keys.

Regards
Stefan
4d5b5dd4e263d780a5dfe7ac8b8ac98c?d=identicon&s=25 Tim Pease (Guest)
on 2007-05-24 00:32
(Received via mailing list)
On 5/23/07, Stefan Rusterholz <apeiros@gmx.net> wrote:
>
> No. I don't think you can speed up Array#each at all. What's far more
> promising is improving your algorithm in a way that it has to run less
> loops. You most likely have one or even several O(n^x) algorithms with x
> >= 2 in your app. Maybe you're not even aware. E.g. if you have an each
> with a select in the loop you're already in O(n^2) realm.
> The database comparison probably can be handled best with hashtables of
> the primary keys.
>

I was going to suggest something even more primitive -- query each
database table in some sorted fashion and dump it as a CSV file. Use
diff to compare CSV files. Obviously this won't work for tables with
blobs, but it might be a fun experiment.

Failing that, how can the database do the work for you?  Let it find
the records you're interested in (it will do it faster).  Put
modification timestamps in your tables to reduce the number of records
you need to search.  Or use triggers to log all transactions to a
separate table so you can run those same transactions on the other
database.

Just some thoughts.

Blessings,
TwP
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-05-24 10:41
(Received via mailing list)
On 24.05.2007 00:15, Stefan Rusterholz wrote:
> Mike Steiner wrote:
>> I ran the profiler on my program and it said that 51% of the time is
>> spent
>> in Array#each. Is there any way to speed this up (like using
>> Array#each_index perhaps)?

Note that the profiler output is often irritating because time for each
also includes all the time spend in the block - and while #each is
invoked only once the block is invoked n times.

>> And is there a quicker way to do a .gsub if I'm using just strings and
>> not a regexp?

Quicker than what?  Are you using the block form of gsub?  Are you using
gsub! or gsub?  Can you show some code?

>> By the way, the program compares 2 databases and determines which
>> records
>> have been inserted and deleted, and each database has about 15,000
>> records
>> to compare.

15,000 does not sound like much.  What is the runtime of the program
without profile?

And another idea: could be that you are using inefficient methods to
test for existence.  Finding something in an Array is always O(n) unless
you have additional constraints; i.e. array content is sorted in which
case you can use binary search which is O(log(n)).  It seems you want to
be doing set operations - so why don't you try to use Set for which
member tests are far more efficient (O(1) because it's using hashing
internally).

Kind regards

  robert
Ce5990549a8185ad45643be2cfbff9b5?d=identicon&s=25 Mike Steiner (Guest)
on 2007-05-24 17:15
(Received via mailing list)
One of the things I'm doing is a lot of File.join's, and I have to do a
gsub
( /\// , "\\" ) after each one (because I'm running under Windows).

I can't use sets because I have duplicate records. There is no unique
record
key, so for each record (which I use as a key in a hash), I have an
array
listing the files it appears in (which can include the same file more
than
once).

Thanks for the help!

Mike Steiner
5a837592409354297424994e8d62f722?d=identicon&s=25 Ryan Davis (Guest)
on 2007-05-24 20:05
(Received via mailing list)
On May 23, 2007, at 14:59 , Mike Steiner wrote:

> I ran the profiler on my program and it said that 51% of the time
> is spent
> in Array#each. Is there any way to speed this up (like using
> Array#each_index perhaps)?

The time spent on Array#each itself is minimal and can't really be
sped up much. What you're really seeing is the time spent in the
block's code. If you have a bunch of each calls all over, you can
refactor each one to call out to a method:

   ary.each { ... } => def x; ...; end; ary.each { x }

That way you can get better accounting on which block is taking up
the most time.

> And is there a quicker way to do a .gsub if I'm using just strings
> and not a
> regexp?

Not by much. I will say (based on a subsequent email of yours on this
thread) that you shouldn't have to be doing these gsubs just because
you're on windows. IIRC, ruby should open files with either form of
path separator.

> By the way, the program compares 2 databases and determines which
> records
> have been inserted and deleted, and each database has about 15,000
> records
> to compare.

15k records to compare isn't that much, honestly.

One thing I would suggest is for you to use an alternative profiler.
shugo's prof or my zenprofile gives much more accurate profiles where
lots and lots of method calls are involved. The beauty of the default
profiler is how small and easy it is to write. The pain is that it is
biased towards # of calls, not time spent.
428f96cc689eb7419bba3a8bbfcc222a?d=identicon&s=25 Stefan Mahlitz (Guest)
on 2007-05-24 22:25
(Received via mailing list)
Mike Steiner wrote:
> One of the things I'm doing is a lot of File.join's, and I have to do a
> gsub
> ( /\// , "\\" ) after each one (because I'm running under Windows).

Are you using the strings generated by File.join outside of ruby?

If not you can simply use them without gsub.

Stefan
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2007-05-24 23:03
(Received via mailing list)
2007/5/24, Mike Steiner <mikejaysteiner@gmail.com>:
> One of the things I'm doing is a lot of File.join's, and I have to do a gsub
> ( /\// , "\\" ) after each one (because I'm running under Windows).
>
> I can't use sets because I have duplicate records. There is no unique record
> key, so for each record (which I use as a key in a hash), I have an array
> listing the files it appears in (which can include the same file more than
> once).

Can you be a bit more specific on what exactly you are doing?

robert
D812408537ac3a0fa2fec96eb8811559?d=identicon&s=25 John Carter (johncarter)
on 2007-05-25 00:35
(Received via mailing list)
On Thu, 24 May 2007, Mike Steiner wrote:

> I ran the profiler on my program and it said that 51% of the time is spent
> in Array#each. Is there any way to speed this up (like using
> Array#each_index perhaps)?

Mostly what the profiler is telling you is "don't do that".

Which sometimes isn't very helpful.

The thing to do rather than ask "is there something faster than
Array#each?" rather ask is there a way I can rewrite my algorithm so I
don't called Array#each so much?

I tend to not look at those entries in the profiler output and skip
downwards until I find the first method I wrote and optimize that.

> And is there a quicker way to do a .gsub if I'm using just strings and not a
> regexp?

I don't know exactly what you're doing but String []= does a lot more
than I would naively think.

------------------------------------------------------------- String#[]=
      str[fixnum] = fixnum
      str[fixnum] = new_str
      str[fixnum, fixnum] = new_str
      str[range] = aString
      str[regexp] = new_str
      str[regexp, fixnum] = new_str
      str[other_str] = new_str
------------------------------------------------------------------------
      Element Assignment---Replaces some or all of the content of _str_.
      The portion of the string affected is determined using the same
      criteria as +String#[]+. If the replacement string is not the same
      length as the text it is replacing, the string will be adjusted
      accordingly. If the regular expression or string is used as the
      index doesn't match a position in the string, +IndexError+ is
      raised. If the regular expression form is used, the optional
second
      +Fixnum+ allows you to specify which portion of the match to
      replace (effectively using the +MatchData+ indexing rules. The
      forms that take a +Fixnum+ will raise an +IndexError+ if the value
      is out of range; the +Range+ form will raise a +RangeError+, and
      the +Regexp+ and +String+ forms will silently ignore the
      assignment.


> By the way, the program compares 2 databases and determines which records
> have been inserted and deleted, and each database has about 15,000 records
> to compare.

Have a look at
  http://rubygarden.org:3000/Ruby/page/show/RubyOptimization
for more info.

If you are still stuck, and it doesn't reveal any confidential
details, send the top 50 lines profiler output to the group and ask
for suggestions.


John Carter                             Phone : (64)(3) 358 6639
Tait Electronics                        Fax   : (64)(3) 359 4632
PO Box 1645 Christchurch                Email : john.carter@tait.co.nz
New Zealand
This topic is locked and can not be replied to.