Posix-spawn 0.3.0 -- first public release (codename, "tigers blood")

https://github.com/rtomayko/posix-spawn
$ gem install posix-spawn

tmm1 and I are pleased to announce the initial release of posix-spawn,
a small extension library that implements a subset of Ruby 1.9’s new
Process::spawn [1] in a way that takes advantage of fast process
spawning (IEEE Std 1003.1 posix_spawn(2) systems interfaces [2]) where
available and runs on all MRI Rubys >= 1.8.7.

  • Fast, constant time process spawning across a variety of platforms
  • A largish compatible subset of Ruby 1.9’s Process::spawn interface
    as well as 1.9 enhancements to Kernel#system, Kernel#`, etc. under
    Ruby >= 1.8.7.
  • High level and hopefully portable POSIX::Spawn::Child class for
    quick and dirty (but correct!) non-streaming IPC scenarios.

See the README for usage and graphs of benchmark results on Linux and
Darwin, or run them yourself:

$ uname -a
Linux aux1 2.6.26-2-xen-amd64 #1 SMP Thu Aug 20 2009 x86_64 

GNU/Linux
$ ruby --version
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
$ gem install posix-spawn
$ posix-spawn-benchmark
benchmarking fork/exec vs. posix_spawn over 1000 runs at 100M res
user system total real
fspawn (fork/exec): 0.080000 14.920000 38.040000 ( 39.029493)
pspawn (posix_spawn): 0.040000 0.010000 0.560000 ( 0.939422)

Work on the library started when tmm1 found, through the use of his
brilliant rbtrace [3] program, a number of slow points in the GitHub
codebase where fork/exec is used heavily to spawn processes. In some
cases, a single fork() system call was using >30ms while in others
using only ~1ms. Our testsuite fork()'d especially slowly. Hmmm.

On Linux, fork(2) slows down as the parent process uses more memory
due to the need to copy page tables for COW. In many common uses of
fork(), where it is followed by one of the exec family of functions to
spawn child processes (Kernel#system, IO::popen, Process::spawn,
etc.), this overhead can be removed by using posix_spawn() or vfork()
instead.

After implementing a simple fast process spawner extension using
posix_spawn() and gaining some familiarity with the posix_spawn family
of C functions, we noticed that it could potentially be used to
implement a large subset of features provided by Ruby 1.9’s
Process::spawn.

We love Process::spawn.

We love Process::spawn so much in fact that over the past few months,
even before surfacing any of the issues with Linux fork() slowness, an
effort had been underway at GitHub to move two key libraries (Grit,
the Ruby interface to Git, and Albino, a Ruby wrapper around the
excellent Pygments syntax highlighter) to use Process::spawn
compatible method invocations (implemented with fork/exec under
Ruby 1.8.7) so that we could take advantage of Process::spawn under
Ruby 1.9.

Once we had a basic Process::spawn interface implemented on top of
posix_spawn(), we were able to take some higher level utility classes
from this work on the Grit and Albino projects and include them in
posix-spawn as a nice POSIX::Spawn::Child class. It is:

  • Simple, requiring little code for simple stream input and capture
  • Internally non-blocking (uses select(2)), so it handles all pipe
    hang cases due to exceeding PIPE_BUF limits on one or more streams
  • Potentially portable, due to the abstraction over lower-level
    process and stream management APIs

We hope to now remove large bodies of Ruby 1.8.7 spawn emulation code
and replace it with posix-spawn.

As the project continued to take shape, we noticed how much more
feature-rich the Kernel#system, IO.popen, etc. methods were in Ruby
1.9. Having been built on the foundation of the new Process::spawn,
they allow for setting up the child’s environment, redirecting
arbitrary fds, and all the other great stuff in Process::spawn. We
were able to write Ruby 1.8.7 compatible subset implementations of
those as well and put them under the POSIX::Spawn module.

Now, about that subset. As of this initial release, we were able to
implement the following arguments and options to spawn:

clearing environment variables:
[string] : redir w/ open(string, File::RDONLY)
[string, open_mode] : redir w/ open(string, open_mode, 0644)
[string, open_mode, perm] : redir w/ open(string, open_mode, perm)
FD is one of follows
:in : the fd 0 which is the standard input
:out : the fd 1 which is the standard output
:err : the fd 2 which is the standard error
integer : the fd of specified the integer
io : the fd specified as io.fileno
current directory:
:chdir => str

We have NOT yet implemented these options:

redirection:
value:
[:child, FD] : redirect to the redirected fd
file descriptor inheritance: close non-redir non-standard fds > 3
:close_others => false : inherit fds (default for system and exec)
:close_others => true : no inherit (default for spawn and popen)

We have ideas for some of these (:pgroup, :umask, [:child, FD]) and
may implement them in future releases; others, like :rlimit, are not
supported by posix_spawn() and have no clear implementations strategy
outside of falling back to fork/exec when detected.

[0] GitHub - rtomayko/posix-spawn: Ruby process spawning library
[1] module Process - RDoc Documentation
[2]
posix_spawn
[3] GitHub - tmm1/rbtrace: like strace, but for ruby code

Ryan T.
Aman G.

Did you know about the “spoon” gem? It’s a very simple binding of
posix_spawn via FFI that works fine in JRuby too. It would sure be
nice if this could be an FFI solution, so it would work without a C
extension.

FWIW, GitHub - headius/spoon: A fork/exec replacement for FFI-capable implementations

On Sat, Mar 5, 2011 at 1:08 AM, Charles Oliver N.

On Fri, Mar 4, 2011 at 11:08 PM, Charles Oliver N.
[email protected] wrote:

Did you know about the “spoon” gem? It’s a very simple binding of
posix_spawn via FFI that works fine in JRuby too. It would sure be
nice if this could be an FFI solution, so it would work without a C
extension.

Yes. I experimented with using spoon for JRuby support in Grit some
time ago. Unfortunately, a large number of posix_spawn() features
require real OS file descriptors and there was no standard/supported
way of retrieving them for most standard Java stream types.

That issue aside, I’m not opposed to using FFI so long as the
performance profile is on par with the C extension. We certainly plan
on supporting JRuby in some way down the road. I doubt the
Process::spawn interface can be fully implemented without real fds,
but we’d like the higher level POSIX::Spawn::Child class to be JRuby
compatible.

Ryan

On Sat, Mar 5, 2011 at 12:07 AM, Ryan T. [email protected]
wrote:

Yes. I experimented with using spoon for JRuby support in Grit some
time ago. Unfortunately, a large number of posix_spawn() features
require real OS file descriptors and there was no standard/supported
way of retrieving them for most standard Java stream types.

Turns out I had some of that work laying around:

https://github.com/rtomayko/spoon/compare/spawn_file_actions

I was able to make posix_spawn_file_actions_adddup2() and
posix_spawn_file_actions_addclose() calls but running the test fails
with:

$ ruby lib/spoon.rb
_posix_spawn_file_actions_init => 0
/bin/echo: write: Bad file descriptor
read:

Under Ruby 1.9.2 with the ffi gem, I get:

ruby lib/spoon.rb
_posix_spawn_file_actions_init => 0
read: hello world

If you can describe a method for retrieving real fds under JRuby
(especially IO objects returned from open() and IO.pipe()), I don’t
see any reason why we couldn’t have an FFI based implementation with
only a little more work.

Ryan

On Sat, Mar 5, 2011 at 4:16 AM, Ryan T. [email protected] wrote:

I was able to make posix_spawn_file_actions_adddup2() and
posix_spawn_file_actions_addclose() calls but running the test fails
with:

$ ruby lib/spoon.rb
_posix_spawn_file_actions_init => 0
/bin/echo: write: Bad file descriptor
read:

If you can describe a method for retrieving real fds under JRuby
(especially IO objects returned from open() and IO.pipe()), I don’t
see any reason why we couldn’t have an FFI based implementation with
only a little more work.

As you discovered, in JRuby IO.fileno is simulated. This is mostly
because the JVM does not provide us a way to work with file
descriptors directly; everything is wrapped in a Channel or an
Input/OutputStream, and determining the real fd (if one exists) is
tricky.

However, this is a feature many people have asked for, since many
native calls need a real file descriptor. I’ll look into it and see if
there’s a way we can provide real file descriptor values somehow.

  • Charlie