Ruby bounties--list of bounties

On Jan 25, 2010, at 1:12 PM, Mike D. wrote:

library is available and you must use that library, FFI is the only
But yes, at the end of the day, I believe writing stuff in a portable
choose one of these options:

  1. Support nearly everyone by maintaining two ports of your code: FFI for
    JRuby; C for MRI, Rubinius and MacRuby. Don’t support GAE.
  2. Support everyone by maintaining two ports of your code: JVM for JRuby and
    GAE; C for MRI, Rubinius and MacRuby.
  3. Maintain only a single port, FFI, and force everyone on MRI to take a
    performance hit of some kind. Oh, and don’t support Rubinius, MacRuby or
    GAE.
  4. Don’t support JRuby or GAE. Just write it in C.
  5. Don’t support MRI, Rubinius, or MacRuby. Just write it for the JVM.

FFI originated with rubinius, so I would wager that it will work once
the FFI APIs get synched up again. Also, MacRuby has FFI support on its
roadmap. That changes your picture a bit.

cr

On Mon, Jan 25, 2010 at 1:12 PM, Charles Oliver N.
[email protected]wrote:

platform. FFI > C bindings, but [platform-independent binary] > FFI.
system. You ought to know that already…would I be working on JRuby
if I believed any differently? :slight_smile:

I agree with everything you’re saying, more or less.

However, none of that relates at all to what I think is the crux of the
issue, which is that everyone writing a non-pure-Ruby gem today is
forced to
choose one of these options:

  1. Support nearly everyone by maintaining two ports of your code: FFI
    for
    JRuby; C for MRI, Rubinius and MacRuby. Don’t support GAE.
  2. Support everyone by maintaining two ports of your code: JVM for JRuby
    and
    GAE; C for MRI, Rubinius and MacRuby.
  3. Maintain only a single port, FFI, and force everyone on MRI to take a
    performance hit of some kind. Oh, and don’t support Rubinius, MacRuby or
    GAE.
  4. Don’t support JRuby or GAE. Just write it in C.
  5. Don’t support MRI, Rubinius, or MacRuby. Just write it for the JVM.

Complicated? Yes. I’ve summed it all up in a nice matrix here:

I personally think these choices all suck, and I refuse to paint a happy
face on any of them.

We chose option 1 for Nokogiri (you’re welcome, intarnets), but everyone
who’s writing a gem today has to make this decision for themselves.

My point is that any of these choices contains a tradeoff, and stating
that
one in particular “hurts” people more than another is just disingenuous.
I’d
rather help people understand the tradeoffs.

On Mon, Jan 25, 2010 at 2:17 PM, Chuck R. [email protected]
wrote:

libraries in Java; you could also use Scala or Fan or similar
I agree with everything you’re saying, more or less.
GAE; C for MRI, Rubinius and MacRuby.

  1. Maintain only a single port, FFI, and force everyone on MRI to take a
    performance hit of some kind. Oh, and don’t support Rubinius, MacRuby or
    GAE.
  2. Don’t support JRuby or GAE. Just write it in C.
  3. Don’t support MRI, Rubinius, or MacRuby. Just write it for the JVM.

FFI originated with rubinius, so I would wager that it will work once the
FFI APIs get synched up again. Also, MacRuby has FFI support on its roadmap.
That changes your picture a bit.

If you’re interested in helping out in standardizing the FFI specs,
please
subscribe to the ruby-ffi list and offer to help out! We’re always
looking
for extra hands, because the specs are not in good shape right now. So
I’m
likely to take your wager. :wink:

I stand by the chart as an accurate reflection of the options that
developers are forced to choose from today and for the likely near
future.

On Tue, Jan 26, 2010 at 04:17:32AM +0900, Chuck R. wrote:

FFI is much better than writing any C code at all, due to the
languages, and it would be just as portable (albeit a bit larger due

GAE.
4) Don’t support JRuby or GAE. Just write it in C.
5) Don’t support MRI, Rubinius, or MacRuby. Just write it for the JVM.

FFI originated with rubinius, so I would wager that it will work once the FFI APIs get synched up again. Also, MacRuby has FFI support on its roadmap. That changes your picture a bit.

Rubinius implements enough of the MRI C api that it will run Nokogiri
today. MacRuby will follow suit, and I expect that to happen sooner
than it supports FFI (though this is conjecture). With minor tweaks to
your C
code, you can have a native extension that runs on all three today.

On Jan 25, 2010, at 1:34 PM, Mike D. wrote:

likely to take your wager. :wink:

I stand by the chart as an accurate reflection of the options that
developers are forced to choose from today and for the likely near future.

While it may be true that some C extensions work with rubinius and
MacRuby today, I’d say it doesn’t matter much in the long term.

For one, Rubinius does not support the entire MRI C API nor will it
ever. Extensions that directly access memory structures are not
supported. FFI is a better long-term choice for Rubinius.

MacRuby is months away from catching up to Rubinius, JRuby or IronRuby
for handling straight ruby code. I don’t mean to disparage MacRuby (it
will likely be my go-to-guy for future Cocoa apps) but it ain’t ready
for prime time for ruby code let alone hooking in C extensions. And
like Rubinius, it won’t support all of the MRI C API.

IronRuby does not support any C extensions though it’s on the roadmap. I
don’t know for certain how extensive their support will be, but I will
wager they’ll avoid supporting the same elements that Rubinius and
MacRuby are avoiding. :slight_smile:

So for the likely near future (next 6 months), Rubinius is the only one
that might be able to run a random C extension (as long as it doesn’t
use unsafe direct access to memory structures).

I understand what you are saying, truly I do. But I disagree that it is
important to continue building extensions using the C API for the long
term. The best way to get FFI firmed up and ready for prime-time is to
port existing extensions to it.

cr

On 25 Jan 2010, at 19:12, Mike D. wrote:

Complicated? Yes. I’ve summed it all up in a nice matrix here:
all_these_choices_suck.txt · GitHub

I personally think these choices all suck, and I refuse to paint a happy
face on any of them.

I have to agree, which is why I mostly seem to end up describing how to
break things via dynamic loading - although I’ll admit it’s also a lot
of fun :slight_smile:

Frankly though there is no general case solution which can satisfy all
of the needs of both the Java/Enterprise world and C hackers. Every time
we make the choice to use a third-party library written in anything
other than Ruby as a core dependency of our projects we tie ourselves to
a specific runtime environment as surely as if we were relying on some
custom assembler code and that’s just something to accept and move on.

It’s maddening, but it’s a fact that programmers the world over already
live with on a daily basis.

Last year I spent a fair chunk of time giving lightweight lectures about
Unix abuse from Ruby for those new to the hobby. Many of the techniques
I was keen to demonstrate either won’t work on other platforms or do so
unstably, but so what? If I’m writing for a Windows box I already know
that and I’ll design things differently.

The same principle applies to JRuby. It can run arbitrary C libraries
via FFI if they’re present on the target platform but if they’re not
it’s exactly as stymied as MRI or Rubinius or MacRuby would be in the
same situation. Runtime environment is more than just processor
architecture or operating system and not to take account of that in
deployed code is the fault of the programmer concerned not the team who
developed the runtime implementation.

Now I’ve often facetiously suggested in this list that all our code
should be developed in Ruby. The main reason I suggest that is that we
often rush to utilise code in other languages without considering its
real as opposed to perceived cost, not only in terms of development
effort and runtime performance but also of longterm maintenance.

Synthetic benchmarks tell us sweet FA about real world performance of
code, architecture being a much more significant consideration than the
proportion of raw MIPS a given language will deliver on a given
platform. The average netbook could happily run all of Teller’s fusion
bomb models along with the full telemetry analysis of all the Apollo
missions in the pauses between loading XKCD comics and binning junk mail
without the user being any the wiser.

But architecture is also the primary determinant of how maintainable a
given application will be and whether it’ll scale to suit future needs.

The main reason we’re not using Ruby for everything is that the
architecture of the reference implementation is a relatively poor match
for the underlying hardware on which our programs run and so a lot of
translation work is being handled automagically (and inefficiently).
Rather than wasting our time arguing over defects we can’t fix (such as
not all platforms having access to a given native library) we should be
fixing that core deficit and developing Ruby runtimes that unlock the
level of performance we want from our language. Then more and more
libraries will deliver high performance in pure Ruby and runtime library
issues should become irrelevant.

So far I see most of the work capable of delivering this (such as a
decent abstract Intermediate Language for peep-hole optimisation) coming
from the JRuby team. If the rest of us poured a fraction of the effort
into similar efforts for MRI and other implementations that’s expended
on making [FFI|DL|C] API wrappers of existing C libraries then Ruby may
stop being the slow relative of Python and start to compete as what it’s
fully capable of being - a systems language.

I have several long rants on this subject that I’ll spare anyone who’s
not stuck in a bar with me (and is willing to keep the beer flowing, you
know who you are lol) but at the very least Ruby needs: a parallelised
library implementation to seamlessly (i.e. without programmer
intervention) exploit multicore hardware and multithreaded operating
systems; ‘unsafe’ access to raw memory and kernel event mechanisms for
higher-performance data structures and IO; and a register-based and
JIT-friendly virtual machine so runtime code can be translated to
efficient machine code.

These are the basic architectural building blocks that would make the
need to rely on libraries in C, Java or any other language much rarer.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason

On Jan 25, 2010, at 2:56 PM, Aaron P. wrote:

People keep saying that FFI is the better way to go, but as someone who
has to support both an FFI version and a C version, I can tell you the
support / development problems with FFI are much more difficult.

  1. I have no direct experience using FFI, so my opinions should carry
    the appropriate weight. I defer to your real-world experience.

  2. I’m not much of a bikeshedder.

I agree with Eleanor. Let’s fix the performance deficiencies in the
runtimes and write more code in ruby.

cr

On Tue, Jan 26, 2010 at 04:52:13AM +0900, Chuck R. wrote:

If you’re interested in helping out in standardizing the FFI specs, please
subscribe to the ruby-ffi list and offer to help out! We’re always looking
for extra hands, because the specs are not in good shape right now. So I’m
likely to take your wager. :wink:

I stand by the chart as an accurate reflection of the options that
developers are forced to choose from today and for the likely near future.

While it may be true that some C extensions work with rubinius and MacRuby today, I’d say it doesn’t matter much in the long term.

For one, Rubinius does not support the entire MRI C API nor will it ever. Extensions that directly access memory structures are not supported. FFI is a better long-term choice for Rubinius.

It doesn’t need to support the entire API. It supports enough of the C
API to get nokogiri running, and believe me, we use a lot of the C
API. Why pay the FFI speed penalty when you can write C code that works
cross implementation?

MacRuby is months away from catching up to Rubinius, JRuby or IronRuby for handling straight ruby code. I don’t mean to disparage MacRuby (it will likely be my go-to-guy for future Cocoa apps) but it ain’t ready for prime time for ruby code let alone hooking in C extensions. And like Rubinius, it won’t support all of the MRI C API.

Again, it doesn’t need to support the entire C api.

IronRuby does not support any C extensions though it’s on the roadmap. I don’t know for certain how extensive their support will be, but I will wager they’ll avoid supporting the same elements that Rubinius and MacRuby are avoiding. :slight_smile:

So for the likely near future (next 6 months), Rubinius is the only one that might be able to run a random C extension (as long as it doesn’t use unsafe direct access to memory structures).

I understand what you are saying, truly I do. But I disagree that it is important to continue building extensions using the C API for the long term. The best way to get FFI firmed up and ready for prime-time is to port existing extensions to it.

As I pointed out in an earlier email, dealing with FFI wrapped libraries
is
error prone, difficult to debug (not just during development, but also
when
helping people get things installed), doesn’t work cross implementation,
requires id2ref (the bane of Charlie’s existence. I’m sorry. :frowning: ),
etc. I even have real world examples of all of the issues I pointed
out.

Even if FFI were the cross implementation messiah it’s supposed to be,
our FFI applications will still not work on GAE or Android. Rubinius
has already proved that you can implement a subset of the C API and
get complex extensions to work. Why can’t we run with that? I think it
would be a better long term solution. We would get the same “cross
implementation” behavior as FFI, but not have to pay FFI’s runtime
conversion penalties. We also get the ability to do compile time checks
of C library functionality (i.e. check for #defines, function existence,
etc).

People keep saying that FFI is the better way to go, but as someone who
has to support both an FFI version and a C version, I can tell you the
support / development problems with FFI are much more difficult.

On Mon, Jan 25, 2010 at 8:12 PM, Mike D. [email protected]
wrote:

  1. Maintain only a single port, FFI, and force everyone on MRI to take a

We chose option 1 for Nokogiri (you’re welcome, intarnets), but everyone
who’s writing a gem today has to make this decision for themselves.

My point is that any of these choices contains a tradeoff, and stating that
one in particular “hurts” people more than another is just disingenuous. I’d
rather help people understand the tradeoffs.

Yeah, I agree all the choices have various levels of suck. Being a JVM
guy I’d love to just tell everyone to “write it in Java”, since
there’s practically no cross-platform challenges in that case (and
don’t anyone start telling me about how bad some Swing app is at
working across platforms; you’re digging in the wrong place and the
JVM has a stellar cross-platform record when it comes to plain old
libraries). But that obviously doesn’t solve the larger problem of
writing extensions or binding libraries in ways that all Ruby
implementations can support.

I’m nothing if I’m not pragmatic. I fully recognize that FFI is a real
pain in the ass to wire up for anything nontrivial, especially if you
have the issues Aaron talked about with struct layout and memory
management, and I sympathize. I’m also extremely grateful to all the
library authors who have swallowed that pill in order to support
JRuby. We’re ready and willing to find ways to support extension
writers better, be it through ffi-inliner, a safe C API subset, or
simply helping to find maintainers for JVM-based (i.e. no native code)
ports of key libraries.

And to Aaron: I do apologize for being so gruff about id2ref. We’d had
it disabled on master for several months without any reports of
trouble; Nokogiri just ended up being the first lucky customer.
Hopefully you’ve been able to find a better way, like maintaining your
own table or using the WeakHash that Evan mocked up. If not, I stand
ready to help find another solution.

  • Charlie

On Mon, Jan 25, 2010 at 7:53 PM, Aaron P.
[email protected] wrote:

References please.

Last I checked, it was just as easy to segv from an FFI library as a C
library. Â Plus with FFI you don’t get any benefits of compile time
checks. Â You can’t, for example, check for #define constants.

Code you don’t write can’t cause a segfault. FFI allows you to write
less C, and from my experience the more C code you write the more
likely you are to blow something up. FFI certainly doesn’t protect you
from other possible segfaults, like calling into libraries incorrectly
or defining bad struct sizes or mismanaging memory, but it is at least
less C code to write and maintain.

I will grant there’s a lot of up-front cost required (currently) that
may make it no easier than maintaining all that C code.

With FFI you must:

  1. Duplicate header files (see below for more problems)
  2. Understand struct layouts and the sizeof() for each member
  3. Do runtime checking of library features
  4. Worry about weak ref maps when using void pointers (see the id2ref
    Â problem in nokogiri)
  5. Pay a runtime conversion price from ruby data types to FFI types
  6. Educate users on LD_LIBRARY_PATH
  7. Worry about 32bit and 64bit issues (like Tony mentioned)

Yeah, I will admit there’s more hassle using FFI than there should be.
I don’t know how to address that, but projects like ffi-inliner seem
to be a step in the right direction. FFI-inliner basically allows you
to have some embedded C code in your FFI-consuming library that it
then compiles and links in via FFI. That allows you to get the
compile-time tooling you want for wrangling nontrivial structs while
still supporting any implementation that supports FFI. You lose the
ability to run on platforms without a compiler available (though it
does some wrangling with tcc, I believe), but it may be a good happy
medium. What do you think?

I don’t want to give the impression that you shouldn’t use C tooling
to call a C library, or even that nobody should ever write C code. I
just believe that everyone writing C code that depends on MRI’s C API
is a dead end.

Unfortunately, none of the problems I’ve just listed off are
theoretical. Â I have personally run in to every one of them and can
provide you with real world examples. Â FFI is awesome for certain,
confined, small, stable use cases. Â I use FFI, and I enjoy it. Â But
saying that it’s “the only logical choice” seems wrong.

I’ll restate it: using mechanisms for binding C libraries that don’t
depend on MRI’s C API is the only logical choice. FFI certainly isn’t
perfect, but it’s the best option for doing that right now.

I am curious what your experience has been, and why you haven’t run in to the
same problems? Â How do other people overcome these issues?

We certainly have run into some of those issues, most notably when
trying to support “stat” calls from JRuby across all platforms. Our
only option has been to rewire the struct and call for each platform
we intend to run on. It sucks, I agree. But we support stat on all
those platforms out of a single JRuby distribution without a recompile
being necessary. That’s pretty cool.

  • Charlie

On Mon, Jan 25, 2010 at 5:36 AM, Eleanor McHugh <
[email protected]> wrote:

I implore Ruby developers to write in Pure Ruby and demand all these Ruby
Implementors solve their “performance” problems ;p

The problem with this attitude is that you eschew some great, robust
libraries that are already out there that solve complex problems.
Parsing
XML is a bitch. Fortunately, there are already some great libraries to
do
this. There’s the libxml2 library, which Nokogiri uses, and Java ships
with
some great XML libraries to.

Will we ever see a pure Ruby library as robust and powerful as these
(all
performance considerations aside)? REXML certainly isn’t there yet. Is
it
really worth writing a library in pure Ruby when robust libraries
already
exist that Ruby can tap into?

On Mon, Jan 25, 2010 at 9:56 PM, Aaron P.
[email protected] wrote:

On Tue, Jan 26, 2010 at 04:52:13AM +0900, Chuck R. wrote:

For one, Rubinius does not support the entire MRI C API nor will it ever. Extensions that directly access memory structures are not supported. FFI is a better long-term choice for Rubinius.

It doesn’t need to support the entire API. Â It supports enough of the C
API to get nokogiri running, and believe me, we use a lot of the C
API. Â Why pay the FFI speed penalty when you can write C code that works
cross implementation?

I’d like to understand how much of a speed penalty we actually pay
using FFI. It’s worth pointing out that Rubinius has had to implement
some pretty nasty (as in tricky, difficult, and potentially a lot
slower than MRI’s “raw” memory access) logic in order to support their
current subset of the MRI C API. They’ve chosen to try to support APIs
I would never dream of like RARRAY and other direct pointer access,
and in many cases they have to do it by copying around a lot more data
than MRI does. And that’s life, sucky though it is, if you want to
support enough of the C API to run real-world extensions right now.
I’m sure Evan can describe how they handle those APIs better than I
can.

I do believe there’s a subset of APIs that could be supported across
implementations without a major perf penalty if these points (and
probably others) were addressed:

  • No direct access to object internals without explicitly copying in
    and out yourself (i.e. you have to opt-in to the copying penalty)
  • Additional APIs to make object access and manipulation easier (like
    APIs for copying or doing bulk writes into array contents)
  • Additional APIs for lifecycle management (hard and weak references
    and functions for acquiring and releasing such references)

I’d love to hear from the other implementers about what they think
they’d be able to support of the C API.

The example set by JNI might help us figure out the safe subset and
enhancements needed. JNI, for all its warts, does a very good job of
isolating native code from JVM internals. You can’t get direct
pointers to anything, you need to manage reference lifecycles
appropriately, you need to copy data in and out yourself if the object
accessor functions don’t do what you need. It’s not a pretty API,
granted, but in the 15 years the JVM has been mainstream that API has
changed very little.

Even if FFI were the cross implementation messiah it’s supposed to be,
our FFI applications will still not work on GAE or Android. Â Rubinius
has already proved that you can implement a subset of the C API and
get complex extensions to work. Â Why can’t we run with that? Â I think it
would be a better long term solution. Â We would get the same “cross
implementation” behavior as FFI, but not have to pay FFI’s runtime
conversion penalties. Â We also get the ability to do compile time checks
of C library functionality (i.e. check for #defines, function existence, etc).

I’ll say it again: The Rubinius folks have done an admirable job of
implementing the large subset that they do. And given the target
audience for Rubinius, they may not have any other choice. But there’s
some pretty large tradeoffs required to get that subset
working…tradeoffs that in some cases might make binding to the C API
a lot slower than using something like FFI. It has also required a
herculean effort to support that subset given the (good) design
choices Evan made (like having accurate GC that moves objects around
in memory). Expecting all implementations to put in that effort is
pretty close to absurdity; consider that JRuby only recently really
started to feel “compatible” enough that we don’t spend every day, all
day fixing Ruby core class bugs.

JRuby has had a continuous stream of about 3.5 bug reports per day,
every day, for over three years…and out of the 4500-some filed bugs,
we manage to keep our unresolved count around 500. That has required
fulltime effort from at least two of us (Tom Enebo and I) and
part-time help from dozens of contributors. The benefits of supporting
a C API subset just don’t warrant the effort we would personally have
to put in and the sacrifices that would result. We need help. :frowning:

  • Charlie

On 26 Jan 2010, at 08:17, Tony A. wrote:

this. There’s the libxml2 library, which Nokogiri uses, and Java ships with
some great XML libraries to.

Don’t get me wrong, I enjoy low-level munging as much as the next
hacker. But given the choice between scripting libraries written in C
and having Ruby performance comparable to C I’d take the latter every
time.

Will we ever see a pure Ruby library as robust and powerful as these (all
performance considerations aside)? REXML certainly isn’t there yet. Is it
really worth writing a library in pure Ruby when robust libraries already
exist that Ruby can tap into?

The same argument applies to anything new. Why replace something which
appears perfectly suited to a given task with a new, shiny, probably
flawed and ill-conceived alternative? Because that’s how we get better
tools than the ones we currently have and are able to tackle new tasks
that our existing understanding fails to even identify.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason

On Tue, Jan 26, 2010 at 9:17 AM, Tony A. [email protected] wrote:

The problem with this attitude is that you eschew some great, robust
libraries that are already out there that solve complex problems. Â Parsing
XML is a bitch. Â Fortunately, there are already some great libraries to do
this. Â There’s the libxml2 library, which Nokogiri uses, and Java ships with
some great XML libraries to.

Will we ever see a pure Ruby library as robust and powerful as these (all
performance considerations aside)? Â REXML certainly isn’t there yet. Â Is it
really worth writing a library in pure Ruby when robust libraries already
exist that Ruby can tap into?

It's probably also worth pointing out that various folks in the Ruby community have continually panned anyone having any association with Java. Hell, at my first ever JRuby talk in San Diego, I was openly mocked by other presenters. And I still see other Java/JVM users get the same treatment, both on these lists, at conference talks, and in the hallway track. Apparently MINASWAN doesn't apply to folks using Java or the JVM. :(

Unfortunately, it’s exactly those Java folks that could help
accelerate Ruby adoption and help maintain Java/JVM versions of key
native libraries like Nokogiri or RMagick. If we did more to embrace
JVM users, rather than insulting them for using a different tool,
maybe extension writers would have more help supporting JRuby (and the
same goes for other managed runtimes like .NET/CLR).

The Ruby world shouldn’t be a “C hackers only” club. Native extensions
tend to make it so.

Rant aside…I really do want to make it easier to support JRuby,
regardless of whether folks need C or Java. Tell me what needs to be
done and help me find resources to do it :slight_smile:

  • Charlie