Forum: Ruby ANN: zozo 1.0.0 Released

Posted by Jeremy Evans (jeremyevans)
on 2010-07-01 21:54
= What?

zozo is a tool that makes it easy to reduce the memory footprint of your
applications by having them not load rubygems/bundler at runtime:

  $ unicorn -c unicorn.conf -D
  $ ps ux
  USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME
COMMAND
  jeremy   18226  0.0  0.5 17196  4496 ??  S      1:34PM    0:00.01
ruby: unicorn master -c unicorn.conf -D (ruby)
  jeremy    8473 31.3  3.3 27180 30172 ??  S      1:34PM    0:00.62
ruby: unicorn worker[0] -c unicorn.conf -D (ruby)

  $ zozo -R config.ru unicorn
  $ ruby -I lib bin/unicorn -c unicorn.conf -D
  $ ps ux
  USER       PID %CPU %MEM   VSZ   RSS TT  STAT  STARTED       TIME
COMMAND
  jeremy   17561  0.0  0.4  5548  3908 ??  S      1:35PM    0:00.01
ruby: unicorn master -c unicorn.conf -D (ruby)
  jeremy   22626  4.2  2.0 15016 17904 ??  S      1:35PM    0:00.25
ruby: unicorn worker[0] -c unicorn.conf -D (ruby)

As you can see, the memory footprint is reduced dramatically:

  master process:
    VSZ: 5548/17196  => 68% reduction
    RSS: 3908/4496   => 13% reduction
  worker process:
    VSZ: 15016/27180 => 45% reduction
    RSS: 17904/30172 => 41% reduction

That's a major difference, as a 41% reduction in memory footprint means
you can host 68% more workers in the same amount of memory.  It also
makes your applications faster. They will start faster because zozo
loads all necessary library files into a single directory tree.  They
will run faster as there will be fewer ruby objects to check every time
the garbage collector is run.

= Why?

Rubygems is a fine package distribution system, but it is not very
efficient from a runtime memory standpoint.  If your application uses
rubygems in production, every time it starts, rubygems needs to figure
out which packages to load.  zozo makes it so that this is calculation
is only done once, and the result is cached into a local directory.

= How?

zozo works by starting ruby and checking the current load path.  It then
requires all of the command line arguments given, and checks the load
path again.  Any new entries in the load path are checked and their
contents are loaded into a local directory (lib by default).  By
default, zozo uses symlinks, but it can use hard links (-H) or make
copies (-c) via a command line option.

In addition, new entries in the load path that end in /bin are loaded
into a separate local directory (bin by default).  This allows you to
run them with loading rubygems via:

  ruby -I lib bin/$program

zozo adds replacement rubygems.rb, ubygems.rb, and bundler.rb files to
the lib directory it creates, so it works transparently if your program
requires rubygems and/or bundler.  If you run your program without
adding the lib directory zozo creates to the load path, rubygems/bundler
will be used as it was.  If you run your program with the lib directory
zozo creates in the load path, then rubygems/bundler will not be loaded,
and it won't need to be because all other libraries your program uses
will already be in the load path.

= Where?

http://github.com/jeremyevans/zozo

= Who?

Jeremy Evans / code@jeremyevans.net

= When?

Now:

  sudo gem install zozo

= Does not work with Rails!

The replacement rubygems.rb and bundler.rb files only do the bare
minimum.  The rubygems.rb file adds Kernel#gem, and the bundler.rb file
adds Bundler.setup, both of which are defined to do nothing and return
nil.  No other features are mocked out.  This means that frameworks that
rely on introspecting the running Gem/Bundler configuration (notably
Rails) will not work.

This is probably fixable, and I'll accept patches that allow zozo to
work with Rails, but I don't plan on working on the issue myself.  As
Rails uses a substantial amount of memory by itself, it benefits less
from zozo than more memory friendly frameworks such as Sinatra.
Posted by John Barnette (Guest)
on 2010-07-02 01:46
(Received via mailing list)
On Jul 1, 2010, at 12:54 PM, Jeremy Evans wrote:
> Rubygems is a fine package distribution system, but it is not very
> efficient from a runtime memory standpoint.  If your application uses
> rubygems in production, every time it starts, rubygems needs to figure
> out which packages to load.  zozo makes it so that this is calculation
> is only done once, and the result is cached into a local directory.

I appreciate the work you've done here, but I'd also be delighted to 
hear some comments or patches to help improve RubyGems' memory 
footprint. Did you know we're up on GitHub now? If you notice any 
particularly stupid/wasteful memory stuff in RG I'd love to hear about 
it.

    http://github.com/rubygems


~ j.
Posted by Jeremy Evans (jeremyevans)
on 2010-07-02 06:34
John Barnette wrote:
> On Jul 1, 2010, at 12:54 PM, Jeremy Evans wrote:
>> Rubygems is a fine package distribution system, but it is not very
>> efficient from a runtime memory standpoint.  If your application uses
>> rubygems in production, every time it starts, rubygems needs to figure
>> out which packages to load.  zozo makes it so that this is calculation
>> is only done once, and the result is cached into a local directory.
> 
> I appreciate the work you've done here, but I'd also be delighted to 
> hear some comments or patches to help improve RubyGems' memory 
> footprint. Did you know we're up on GitHub now? If you notice any 
> particularly stupid/wasteful memory stuff in RG I'd love to hear about 
> it.
> 
>     http://github.com/rubygems

I'm sorry if I implied that rubygems is wasteful with memory.  By "not 
very efficient" I meant that it uses a lot of memory compared to other 
lightweight libraries such as sequel, sinatra, and unicorn.  There 
probably is a good reason for rubygems' memory use.

If it's possible to save the ~10MB per process by doing the rubygems' 
calculation once and caching the result, I definitely think it's worth 
it, especially if 10MB is a good portion of the process's memory 
footprint.

In terms of analyzing rubygems' memory use, I'd probably start with tmm1 
and ice799's memprof: http://github.com/ice799/memprof.  I haven't 
actually used it, but I've seen the presentations and I'm pretty sure it 
could tell you where rubygems is using memory.  If I had to guess, it 
has mostly to do with how much code rubygems is loading, even without 
doing anything:

$ ruby -e "system('ps ux | fgrep ruby')"
jeremy    4207  0.0  0.1  1064  2700 p8  S+     9:13PM    0:00.01 ruby 
-e syste
$ ruby -rubygems -e "system('ps ux | fgrep ruby')"
jeremy    5489  0.0  0.3  9200 11300 p8  S+     9:13PM    0:00.15 ruby 
-rubygems -e system('ps ux | fgrep ruby')

Considering how much Sequel adds:

$ ruby -I lib -r sequel -e "system('ps ux | fgrep ruby')"
jeremy    2488  0.0  0.3  6952  8952 pe  S+     9:25PM    0:00.12 ruby 
-I lib -r sequel -e system('ps ux | fgrep ruby')

When you consider that rubygems' codebase is larger than Sequel's (9648 
LOC for rubygems and 5548 for Sequel), it's not surprising that rubygems 
takes more memory.  If code size is truly the reason, the only thing you 
can do is try to reduce the amount of code you load at once, if 
possible.  Sequel does this by not loading adapters, connection pools, 
plugins, and extensions that aren't being used.  Rubygems might be able 
to do something similar, by only loading code necessary for the purpose 
(i.e. only load the code for installing gems when the user uses gem 
install).  That may cause some backwards compatibility issues, though.

Jeremy
Posted by Roger Pack (Guest)
on 2010-07-03 00:21
(Received via mailing list)
> zozo is a tool that makes it easy to reduce the memory footprint of your
> applications by having them not load rubygems/bundler at runtime:

Fascinating.

I've been working on a rubygems replacement as well:

http://github.com/rdp/faster_rubygems

Mine replaces loading of full rubygems (+specs) with loading a cache
file listing known  lib files.  Zozo looks most excellent, and you'd
think you could rip out of the guts of rails' gem loading and it would
work fine with rail, though that might be hard :)

The only drawback I see to zozo is that it doesn't appear to catch gem
updates.  But it would work splendidly for those ok with those
restrictions, like servers :)

-r
Posted by Roger Pack (Guest)
on 2010-07-03 00:38
(Received via mailing list)
> I appreciate the work you've done here, but I'd also be delighted to hear some comments or patches to help improve RubyGems' memory footprint. Did you know we're up on GitHub now? If you notice any particularly stupid/wasteful memory stuff in RG I'd love to hear about it.

I do have some thoughts on that, as I've been experimenting lately
with speeding up rubygems.

The first thing that comes to mind is that currently rubygems always
loads *full* rubygems when all you typically need is its require
capabilities:

ex:

>> $LOADED_FEATURES.grep /rubyg/
=> ["c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/defaults.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/exceptions.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/version.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/requirement.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/dependency.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/gem_path_searcher.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/user_interaction.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/platform.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/specification.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/source_index.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/builder.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/config_file.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/command.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/command_manager.rb",
"c:/Ruby19/lib/ruby/gems/1.9.1/gems/gemcutter-0.5.0/lib/rubygems/commands/migrate.rb",
"c:/Ruby19/lib/ruby/gems/1.9.1/gems/gemcutter-0.5.0/lib/rubygems/commands/tumble.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/local_remote_options.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/remote_fetcher.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/gemcutter_utilities.rb",
"c:/Ruby19/lib/ruby/gems/1.9.1/gems/gemcutter-0.5.0/lib/rubygems/commands/webhook.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems/version_option.rb",
"c:/Ruby19/lib/ruby/gems/1.9.1/gems/gemcutter-0.5.0/lib/rubygems/commands/yank.rb",
"c:/Ruby19/lib/ruby/gems/1.9.1/gems/specific_install-0.2.3/lib/rubygems/commands/specific_install_command.rb",
"c:/Ruby19/lib/ruby/site_ruby/1.9.1/rubygems.rb"]

There's lots of unused stuff in there.

Suggestion: load a skeleton of files by default, until needed.

Beyond that, rubygems currently takes a  relatively long time to load
all the spec files (especially rdoc-data gem), when in reality all it
needs to know is the gem version and which requireable files are in
the require_paths of each gem.  It can lazy load full specs when
necessary.

Suggestion: keep a cache file with all vital information in the root
of each gem path, ex:

.../gems/1.9.1/.rubygems_cache

A further optimization is to use marshal for the cache, to avoid
having to load YAML by default, etc.

I've been experimenting with with with faster_rubygems [1] and it
speeds up startup for jruby from 1.6s to 0.3s.

This avoids mounds of file stats and require's, and actually makes
ruby fast to start on windows.

I'd be happy to integrate something like this into normal rubygems, if
there's any interest...

Thanks!

-roger

[1] http://github.com/rdp/faster_rubygems
Posted by Jeremy Evans (jeremyevans)
on 2010-07-03 01:03
Roger Pack wrote:
>> zozo is a tool that makes it easy to reduce the memory footprint of your
>> applications by having them not load rubygems/bundler at runtime:
> 
> Fascinating.
> 
> I've been working on a rubygems replacement as well:
> 
> http://github.com/rdp/faster_rubygems
> 
> Mine replaces loading of full rubygems (+specs) with loading a cache
> file listing known  lib files.  Zozo looks most excellent, and you'd
> think you could rip out of the guts of rails' gem loading and it would
> work fine with rail, though that might be hard :)

I tried to get Rails to work with zozo for a few hours and gave up. 
Rails is pretty tied to rubygems, at least in 2.3.x.  It may be easier 
on Rails 3, but I haven't tried.

> The only drawback I see to zozo is that it doesn't appear to catch gem
> updates.  But it would work splendidly for those ok with those
> restrictions, like servers :)

That's correct.  My recommendation is that whenever the gems change, "rm 
-r lib bin" (or any custom lib and bin directory names), and then 
regenerate them with zozo.

Jeremy
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.