Forum: Ruby Way to many calls to Dir#[]

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Tj H. (Guest)
on 2008-12-09 20:58
I cannot seem to debug this issue... my entire library does not call
Dir#[] a single time, however when I used ruby-prof I get:

 17.26      0.14     0.14     0.00     0.00      594  <Class::Dir>#[]

unnecessarily slowing things down. I have profiled all the other
libraries I am requiring, and none of them produce this same number of
calls, so I suppose it can only be something I am calling within, but
this seems very weird.... any ideas?
Brian C. (Guest)
on 2008-12-09 22:21
Tj Holowaychuk wrote:
> I cannot seem to debug this issue... my entire library does not call
> Dir#[] a single time, however when I used ruby-prof I get:
>
>  17.26      0.14     0.14     0.00     0.00      594  <Class::Dir>#[]
>
> unnecessarily slowing things down. I have profiled all the other
> libraries I am requiring, and none of them produce this same number of
> calls, so I suppose it can only be something I am calling within, but
> this seems very weird.... any ideas?

There is probably some elegant debug or tracing way to do this, but
personally I'd just stick this at the top of my program:

class <<Dir
  alias :old_square :[]
  def [](*args)
    STDERR.puts "Calling Dir[#{args.inspect}] from #{caller.inspect}"
    old_square(*args)
  end
end
Tj H. (Guest)
on 2008-12-09 23:07
Thanks! that is helping, its all from rubygems
'rubygems/gem_path_searcher.rb' file, just seems very excessive. My gem
is 'commander' and it searches for mine in a good 25 locations or so,
then moves on to looking for Win32API  and termios in at least 200
directories :S, neither of which I am using, the only library I am using
is highline, which does not require those either... just seems odd
unless im completely retarded and something is brutally wrong with my
requires

$:.unshift File.dirname(__FILE__)

require 'rubygems'
require 'highline/import'
require 'commander/version'
require 'commander/user_interaction'
require 'commander/fileutils'
require 'commander/core_ext'
require 'commander/runner'
require 'commander/command'
require 'commander/help_formatters'
require 'commander/import'
Tj H. (Guest)
on 2008-12-09 23:51
its in highline!! if you put this in any file it calls Dir[] like nuts

require 'rubygems'
require 'highline/import'

BUT for anyone else who might run across this, the following cuts this
down to a little over 100 calls to Dir#[], where as the previous is 600+

require 'rubygems'
require 'highline'
require 'highline/import'
Tj H. (Guest)
on 2008-12-09 23:59
nvm i fucked that up.. it seems like any required gem is doing that :S
James G. (Guest)
on 2008-12-10 00:05
(Received via mailing list)
On Dec 9, 2008, at 3:44 PM, Tj Holowaychuk wrote:

> its in highline!!

I disagree.  :)

$ grep -c 'Dir' `ruby -e 'puts Dir["lib/**/*.rb"]'`
lib/highline/color_scheme.rb:0
lib/highline/import.rb:0
lib/highline/menu.rb:0
lib/highline/question.rb:1
lib/highline/system_extensions.rb:0
lib/highline.rb:0

If you look into that single usage of Dir[], you will see that you
must construct a HighLine::Question object and call selection() on it
to trigger that code.  This does not happen when HighLine loads.

> require 'highline'
> require 'highline/import'

I assume that's a RubyGems issue with how it figures out to load
highline/import.  I recommend asking your question on their mailing
list.

James Edward G. II
Brian C. (Guest)
on 2008-12-10 00:21
Tj Holowaychuk wrote:
> Thanks! that is helping, its all from rubygems
> 'rubygems/gem_path_searcher.rb' file, just seems very excessive.

Startup with rubygems can be pretty slow. Of course, once your app is
running it should be fine; you can turn on profiling later in the code.

> My gem
> is 'commander' and it searches for mine in a good 25 locations or so

Sounds reasonable - I'd expect /usr/lib/ruby/gems/1.8/gems/*/lib

> then moves on to looking for Win32API  and termios in at least 200
> directories :S

Not sure why so many directories there; you'd have to analyse that list
yourself.

> neither of which I am using, the only library I am using
> is highline, which does not require those either...

Clearly, something is requiring them. But I'm not a Windows user, so I
can't help you pin this down further. But since highline is something to
do with terminal interaction, I'd certainly expect it to try to load
termios, and perhaps Win32API on a Windows box.
Tj H. (Guest)
on 2008-12-10 00:35
Im not using windows either haha. just took a look at the highline
source looks like it conditionally supports termios so your right there.
I see your point though, if its a long running process I guess 0.35 of a
second is not much, just seemed really sketchy to me being new to Ruby
Brian C. (Guest)
on 2008-12-10 11:11
Tj Holowaychuk wrote:
> Im not using windows either haha. just took a look at the highline
> source looks like it conditionally supports termios so your right there.

And win32 too:

$ pwd
/usr/lib/ruby/gems/1.8/gems/highline-1.5.0
$ find . -type f | xargs grep -i win32
./lib/highline/system_extensions.rb:      require "Win32API"
# See if we're on Windows.
./lib/highline/system_extensions.rb:      CHARACTER_MODE = "Win32API"
# For Debugging purposes only.
./lib/highline/system_extensions.rb:        Win32API.new("crtdll",
"_getch", [ ], "L").Call
./lib/highline/system_extensions.rb:        m_GetStdHandle
= Win32API.new( 'kernel32',
./lib/highline/system_extensions.rb:        m_GetConsoleScreenBufferInfo
= Win32API.new(
./test/tc_highline.rb:if HighLine::CHARACTER_MODE == "Win32API"
./test/tc_highline.rb:    assert(%w[Win32API termios
stty].include?(HighLine::CHARACTER_MODE))

However there is only a single require of Win32API, and if it fails a
LoadError is rescued which should cause it to install termios methods
instead. It shouldn't attempt to load Win32API again after that.

> I see your point though, if its a long running process I guess 0.35 of a
> second is not much, just seemed really sketchy to me being new to Ruby

Now I know you're not on Windows, let me look a bit more. Here's what I
get (P4 2.8GHz, 1.8.6p111, rubygems 1.2.0), giving the second run times
for each test:

$ time for i in $(seq 1 10); do ruby -rubygems -e0; done

real  0m0.646s
user  0m0.568s
sys  0m0.060s

$ time for i in $(seq 1 10); do ruby -rubygems -e'require "termios"';
done

real  0m0.707s
user  0m0.612s
sys  0m0.072s

$ time for i in $(seq 1 10); do ruby -rubygems -e'require "highline"';
done

real  0m2.680s
user  0m2.380s
sys  0m0.244s

So there is a substantial startup overhead in including highline, which
for me is about 0.2s more than just loading ruby + rubygems + termios.

But when I profile this, I find that Dir[] is a very long way down the
list.

$ ruby -rubygems -rprofile -e'require "highline"'
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 23.23     1.05      1.05      248     4.23    23.79  Array#each
  7.08     1.37      0.32     5522     0.06     0.09  Kernel.===
  6.86     1.68      0.31      502     0.62     0.62  Regexp#===
  4.20     1.87      0.19     6987     0.03     0.03  Module#===
  3.32     2.02      0.15     5501     0.03     0.03  Kernel.==
  2.88     2.15      0.13     1664     0.08     0.10  Kernel.dup
  2.88     2.28      0.13      104     1.25    31.73
Gem::Specification#initialize
  2.43     2.39      0.11      303     0.36     0.83
Gem::Version#normalize
  2.43     2.50      0.11       25     4.40    47.60
Kernel.gem_original_require
  2.21     2.60      0.10        1   100.00   140.00  Array#sort
  1.99     2.69      0.09      104     0.87     0.87  IO#read
  1.99     2.78      0.09      209     0.43     0.53
Gem::Specification#full_name
... skip lots ...
  0.44     4.25      0.02      144     0.14     0.14  Dir#[]
...

(Whilst it's possible that Dir#[] is using a lot more *real* time than
*CPU* time - and there was a post about fixing profile for this a week
or two back - you can see from above that the total real time is very
close to user+sys time in this case. That's what I'd expect when the
directories are in the VFS cache, that is, no actual disk I/O is done
when searching through directories)

Sorting by total ms/call, I get:

$ env LC_ALL=C sort -rn -k6 ert | head -20
  0.00     4.52      0.00        1     0.00  4520.00  #toplevel
  0.00     4.52      0.00        1     0.00  4090.00
Gem::GemPathSearcher#initialize
  0.00     4.52      0.00        1     0.00  3880.00
Gem::GemPathSearcher#init_gemspecs
  0.00     4.52      0.00        1     0.00  3740.00
Gem::SourceIndex#from_installed_gems
  0.00     4.52      0.00        1     0.00  3620.00
Gem::SourceIndex#refresh!
  0.00     4.52      0.00        1     0.00  3620.00
Gem::SourceIndex#load_gems_in
  0.00     4.52      0.00        1     0.00  3620.00
Gem::SourceIndex#from_gems_in
  0.00     4.52      0.00        1     0.00  3620.00  Array#reverse_each
  0.00     4.52      0.00        2     0.00  2045.00  Mutex#synchronize
  0.00     4.52      0.00        2     0.00  2045.00  Gem.searcher
  0.00     4.52      0.00        2     0.00  1870.00  Gem.source_index
  0.22     4.33      0.01       24     0.42   225.42  Kernel.require
  2.21     2.60      0.10        1   100.00   140.00  Array#sort
  0.00     4.52      0.00        1     0.00   120.00
Gem::SourceIndex#installed_spec_directories
  0.00     4.52      0.00        1     0.00   120.00  Gem.set_paths
  0.00     4.52      0.00        1     0.00   120.00  Gem.path
  0.00     4.52      0.00        1     0.00   110.00  Gem.set_home
  0.00     4.52      0.00        1     0.00   110.00  Gem.dir
  0.00     4.52      0.00        2     0.00    60.00
Gem.ensure_gem_subdirectories
  0.00     4.52      0.00        2     0.00    50.00
Gem::GemPathSearcher#find
...

So there's a lot of rubygems chomping going on. Optimising that may be
difficult, I'm afraid. I tried reading the rubygems source once, and
gave up :-)

Notice that whilst there are 24 calls to Kernel.require, there is only 1
call to Gem::GemPathSearcher#initialize accounting for most of the work.
Perhaps highline forces rubygems to do some dependency calculations,
whereas an itsy-bitsy standalone gem like termios doesn't.

Indeed, when loading only termios, this doesn't seem to be done:

$ ruby -rubygems -rprofile -e'require "termios"' 2>&1 | env LC_ALL=C
sort -rn -k6 | head
  0.00     0.09      0.00        1     0.00    90.00  #toplevel
 55.56     0.05      0.05        1    50.00    80.00  Array#each
  0.00     0.09      0.00        1     0.00    80.00
Enumerable.each_with_index
 11.11     0.09      0.01        2     5.00    50.00
Kernel.gem_original_require
  0.00     0.09      0.00        2     0.00    50.00  Kernel.require
 33.33     0.08      0.03      514     0.06     0.06  Hash#[]=
 time   seconds   seconds    calls  ms/call  ms/call  name
  0.00     0.09      0.00      257     0.00     0.00  Integer#to_int
  0.00     0.09      0.00      257     0.00     0.00  Array#pack
  0.00     0.09      0.00       64     0.00     0.00  String#eql?

Regards,

Brian.
Tj H. (Guest)
on 2008-12-10 19:04
Thanks for investigating, I thought it was for sure some strange
dependency loop or something I was creating, but even with my tiny file
with those two requires it is still being odd.. I posted on rubyforge to
see what the gem guys think, might be a rubygems 1.3.x thing
This topic is locked and can not be replied to.