Ruby and science?

On Mon, Dec 20, 2010 at 3:34 PM, Ryan D. [email protected]
wrote:

On Dec 20, 2010, at 11:51 , Charles Oliver N. wrote:

The JVM and the JDK APIs suck at process manipulation…not JRuby.

Oh come now. If the JVM sucks at something, JRuby sucks at it too. Don’t pass
the buck.

You couldn’t be more wrong. It’s understandable since you don’t
actually know anything about JRuby’s implementation.

Notice I specifically called out the JDK APIs. The JDK provides very
primitive APIs for dealing with processes, providing no way to share
stdio streams with child processes, no way to let child processes run
without pumping their IO streams, no way to get actual PIDs and send
signals to them, and so on. These are APIs that haven’t changed in
over a decade, designed to provide the lowest common denominator of
Process management features across many platforms. They pretty much
suck.

JRuby, in its default mode, does all its process management using
these APIs. We use a few mostly-portable tricks to get real PIDs and
to make processes appear more detached than they are, but we don’t do
much more than that. However, using FFI, it’s trivial to route around
those cumbersome built-in APIs and get much more modern behavior. A
perfect example is my “spoon” gem, which uses FFI to bind the
posix_spawn syscall, which allows something no other standard JDK API
can do: sharing stdio with child processes. We also ship with a set of
native bindings (across almost a dozen platforms) to POSIX functions
that have no equivalent in the JDK, ranging from process management
(waitpid, kill), to signals, to filesystem ‘stat’, and more.

JRuby goes above and beyond the typical JVM-based language in
supporting the POSIX features in question, and the limitations of the
JVM and JDK are often not applicable to JRuby.

JRuby does the best job it can do cross-platform with the JDK APIs
provided for it. If you need to go outside those APIs, or if we “suck”
in how we utilize them, it’s a trivial matter to bind native C
process-management logic via FFI and use that. It won’t be as portable
as what we provide, but it will work.

If it were trivial, why aren’t you shipping it (or at least pointing to a jruby
supported gem that does)? You’ve espoused FFI as the C-API silver bullet time and
again. I have doubts that it is that trivial as FFI itself seems non-portable.

We don’t ship anything yet because exactly one person has reported
these issues, and we provided workarounds for almost every case with
just a few lines of FFI code. I’d love to work out a complete set of
native-behaving process management APIs (for users to opt into), but
there’s only so much we can do in a given cycle. Given limited
resources, we cater to the majority first. The majority of JRuby users
do not have these issues, and would prefer we work on Ruby 1.9
compatibility, user-reported bugs, and Java integration features.

Perhaps you’d like to help? I’d happily support you.

  • Charlie

Hi Michel,
I think you are completely correct in your criticism of ruby’s lack of
support for scientific computing, and complex graphing. I think the
problem is that python had a bit of a lead time (~5 yrs.) before ruby
became popular. During that period, many of the scientists switched
from perl to python and began developing the tools they needed. When
Ruby was popularized with rails, python was already established. So
most of the talent that could solve this problem for the ruby
community is already happily doing science with python and has no
reason to switch to a slightly more elegantly designed language (as
some would argue). I learned some R and use ruby to crunch the numbers
and R to plot them. I think it is a reasonable solution. If I did a
lot of data crunching/graphing, I might write a DSL that allowed R
plotting code to be made via a nicer ruby-like syntax, but so far I
haven’t been motivated enough. Alternatively, I would switch to
python, I think it would take a couple of weeks to get up to speed as
the two are quite similar conceptually. I think it would be great if
we had a dedicated scientific community to build the tools in ruby,
but I don’t see it happening, because python has already filled the
niche.
Tim

On Dec 20, 2010, at 18:40 , Charles Oliver N. wrote:

On Mon, Dec 20, 2010 at 3:34 PM, Ryan D. [email protected] wrote:

On Dec 20, 2010, at 11:51 , Charles Oliver N. wrote:

The JVM and the JDK APIs suck at process manipulation…not JRuby.

Oh come now. If the JVM sucks at something, JRuby sucks at it too. Don’t pass
the buck.

You couldn’t be more wrong. It’s understandable since you don’t
actually know anything about JRuby’s implementation.

Nice Ad Hominem. It’s just that I don’t need to know anything about
jruby’s implementation to identify someone passing the buck when they do
it.

Or… are you claiming that JRuby doesn’t suck at process manipulation
and that JEG is wrong, or worse, a liar?

But JEG is right, it does suck. That’s not a terrible thing. Sometimes
your dogmatic “rah rah java/jruby” thing blinds you to simple truths:
jruby being built on the jvm gives it a lot of strengths, but as JEG
said (so succinctly), “like anything, there are tradeoffs and JRuby
sucks at other things”. That’s not the end of the world, but just
because you claim it isn’t so, doesn’t make it true.

On Tue, Dec 21, 2010 at 1:23 AM, Charles Oliver N. >

What I’d really like to see are FFI-based wrappers around key science
and math libraries, rather than more blasted C extensions that can’t
be run concurrently and aren’t easily portable across impls. FFI works
incredibly well for these isolated libraries (as opposed to FFI for
kernel-level features, which can have many platofrm-specific
differences).

How do FFI wrappers handle concurrent running? (As in, how do they
differ from C extensions in that respect?)

martin

On Tue, Dec 21, 2010 at 3:58 AM, Martin DeMello
[email protected] wrote:

How do FFI wrappers handle concurrent running? (As in, how do they
differ from C extensions in that respect?)

At least in JRuby, FFI calls are not prevented from firing in
parallel. That requires you to ensure the library you’re using is
concurrency-safe (most mature C libraries are, usually by passing
state around), but it’s a far better situation than you have with C
extensions where concurrent mutation is often the norm.

It might be possible in the future to have a API or contract by which
a C extension can be “promoted” to “threadsafe mode”, but at the
moment, all implementations of the C extension API prevent extension
code from running concurrently (and therefore prevent concurrent use
of the libraries they bind). This is a very large limitation for any
Ruby implementations that support thread-level concurrency, like
JRuby, and so we generally discourage C extension use if an
alternative is available.

It’s also worth noting that C extensions usually maintain global state
that prevents them from being used with multiple Ruby instances in a
single process. JRuby has always had the ability to spin up extra
JRuby instances inside a single JVM, but we can’t do this when loading
C extensions because they usually hold hard global references to
symbols, classes, etc that are instance-specific. This also afflicts
MRI, and the “multi-VM” work required adding a special new extension
entry point whereby extensions “promise” they’re MVM-friendly.

So C extensions prevent not just concurrency but also MVM. FFI on the
other hand only requires that it can load the library. The bindings
happen in user code, so the same library can be loaded and called from
multiple JRuby instances.

  • Charlie

On Tue, Dec 21, 2010 at 3:35 AM, Ryan D. [email protected]
wrote:

Nice Ad Hominem. It’s just that I don’t need to know anything about jruby’s
implementation to identify someone passing the buck when they do it.

No, you actually were wrong. We route around the JVM/JDK limitations
in dozens of ways, so when I say something’s a JVM/JDK problem but
JRuby does it better, I mean it.

Or… are you claiming that JRuby doesn’t suck at process manipulation and that
JEG is wrong, or worse, a liar?

By default, JRuby doesn’t provide the best process manipulation
support. But it’s not inherently sucky at it, and with a few tweaks it
can work just fine. JEG is claimed we “suck at process management”
because of two reported bugs we didn’t immediately fix, since fixing
them would have required presenting platform-specific behavior and
using native calls rather than JVM/JDK APIs. He disagrees with that
decision. Others don’t. I was clarifying for folks who might be
confused by his statements.

But JEG is right, it does suck. That’s not a terrible thing. Sometimes your
dogmatic “rah rah java/jruby” thing blinds you to simple truths: jruby being built
on the jvm gives it a lot of strengths, but as JEG said (so succinctly), “like
anything, there are tradeoffs and JRuby sucks at other things”. That’s not the end
of the world, but just because you claim it isn’t so, doesn’t make it true.

JRuby does suck at some things, like startup time…which can make
some aspects of typical UNIX parallelization (like spawning a new
instance of yourself) suck pretty bad. But the process management APIs
are easy to improve, so it’s unfair to outright claim that “JRuby
sucks” at them, when we’ve simply taken the path that serves the most
users as well as possible. They’re just not easy to improve while
maintaining other non-sucky aspects of JRuby, like consistent
cross-platform behavior. If we did things the way JEG wanted, we’d
have other folks complaining “JRuby sucks” at cross-platform. So these
blanket statements are at best misleading.

And I’m not just claiming it’s true. I’ve put in 4 years of work (with
many others helping) making it true.

  • Charlie

On Dec 21, 2010, at 12:15 AM, timr [email protected] wrote:

If I did a
lot of data crunching/graphing, I might write a DSL that allowed R
plotting code to be made via a nicer ruby-like syntax, but so far I
haven’t been motivated enough.

Curious if you’ve used RinRuby and what you think of it?

Jose

Jose Hales-Garcia
UCLA Statistics

On Dec 21, 2010, at 7:12 AM, Charles Oliver N. wrote:

JEG is claimed we “suck at process management” because of two reported bugs we
didn’t immediately fix

That’s not really why I said that.

I was once trying to use JRuby in a POSIX environment where I needed
Unixy process management. It had to interoperate with other processes
that expected these behaviors. I ran into at least two challenges and
filed the bugs I had seen. In your replies, you gave me some
workarounds and explained why things were purposely working this way.
At this point I realized JRuby was the wrong tool for the job and
switched.

My opinion is that Ruby grew up in Unix and it purposely exposes a lot
of Unixisms. I think that means exit!() should kill my process now and
exec() should replace my process. That’s what those methods mean to me.
When you give me reasons why that isn’t a good idea, it makes me feel
like JRuby has stopped trusting me to make the best decisions. That
feels like the Java way, but not the Ruby way, in my opinion.

However, these feelings of mine are not why I said JRuby sucks at
process management. I said it because your exec() doesn’t really exec()
on purpose. Your exec() isn’t a POSIX exec(). I don’t feel that’s
debatable. It’s just a fact.

So I’ll refine my statement a little: JRuby sucks at POSIX style
process management.

James Edward G. II

On Tue, Dec 21, 2010 at 9:13 AM, James Edward G. II
[email protected] wrote:

I was once trying to use JRuby in a POSIX environment where I needed Unixy
process management. It had to interoperate with other processes that expected
these behaviors. I ran into at least two challenges and filed the bugs I had
seen. In your replies, you gave me some workarounds and explained why things were
purposely working this way. At this point I realized JRuby was the wrong tool for
the job and switched.

I think the workarounds were quite acceptable. You did not. That’s
your prerogative.

My opinion is that Ruby grew up in Unix and it purposely exposes a lot of
Unixisms. I think that means exit!() should kill my process now and exec() should
replace my process. That’s what those methods mean to me. When you give me
reasons why that isn’t a good idea, it makes me feel like JRuby has stopped
trusting me to make the best decisions. That feels like the Java way, but not the
Ruby way, in my opinion.

Ruby needs to grow beyond being a Unix-only language.

However, these feelings of mine are not why I said JRuby sucks at process
management. I said it because your exec() doesn’t really exec() on purpose. Your
exec() isn’t a POSIX exec(). I don’t feel that’s debatable. It’s just a fact.

Ruby’s exec isn’t a POSIX exec when running on Windows. So it
obviously is debatable.

So I’ll refine my statement a little: JRuby sucks at POSIX style process
management.

How about “JRuby by default does not assume the user wants POSIX-style
process management, but does not prevent you from using POSIX-style
process management with a bit of extra work.”

I think it’s the “sucks” that I don’t like. Just saying something
“sucks” because you don’t agree with how it does things isn’t
informative or helpful.

In an effort to be more constructive, I present the two bugs in
question:

Kernel#select() prevents exit during a signal handler
http://jira.codehaus.org/browse/JRUBY-5079

JRuby does not do a hard process exit on calls to Kernel#exit because
usually the intent is just to stop that Ruby instance. Because JRuby
is often running in a shared JVM, doing a hard system exit would
frequently kill unrelated services, threads, and other JRuby
instances. We opted not to expose a hard system exit specifically for
this reason. The bug is that exit can’t cause other JVM threads
blocking on IO or Java calls to terminate.

exec() would be more useful if it really exec()ed
http://jira.codehaus.org/browse/JRUBY-5082

Because Kernel#exec also nukes the calling process, we opted (for the
same reasons) to have exec not do a “real” system level exec call.
Instead, exec launches the process, waits for it to complete, and then
raises an internal exception to unroll the execution stack and
terminate the JVM. The bug is that the exec’ed process does not have
the same pid as the parent, since we launch rather than replace.

The workaround for both bugs is fairly simple using FFI: bind the real
“exec” and “exit” system calls:

We do not do this by default because of the potential for taking down
an entire server process (and because at least “exec” does not exist
on Windows). If you need this behavior, it’s easy to get it.

I would also love (with help) to roll this into a gem (or built-in
library) that overwrites exec, exit, and similar methods for the
purpose of improving POSIX behavior under JRuby. Almost everything
“POSIXy” that JRuby “sucks” at can be patched in this way (with the
most notable exception being “fork”).

You can’t please everyone all the time. It’s unfortunate that some
people think this means you “suck”.

  • Charlie

On Dec 21, 2010, at 9:45 AM, Charles Oliver N. wrote:

On Tue, Dec 21, 2010 at 9:13 AM, James Edward G. II
[email protected] wrote:

I was once trying to use JRuby in a POSIX environment where I needed Unixy
process management. It had to interoperate with other processes that expected
these behaviors. I ran into at least two challenges and filed the bugs I had
seen. In your replies, you gave me some workarounds and explained why things were
purposely working this way. At this point I realized JRuby was the wrong tool for
the job and switched.

I think the workarounds were quite acceptable.

They are. They do work. I didn’t mean to imply that they don’t. As
you noted, it’s not a cure all (we can’t get fork()), but you can get
most functions.

My opinion is that Ruby grew up in Unix and it purposely exposes a lot of
Unixisms. I think that means exit!() should kill my process now and exec() should
replace my process. That’s what those methods mean to me. When you give me
reasons why that isn’t a good idea, it makes me feel like JRuby has stopped
trusting me to make the best decisions. That feels like the Java way, but not the
Ruby way, in my opinion.

Ruby needs to grow beyond being a Unix-only language.

When Ruby is running on Unix and doesn’t have those abilities, we’ve
lost something in the translation.

However, these feelings of mine are not why I said JRuby sucks at process
management. I said it because your exec() doesn’t really exec() on purpose. Your
exec() isn’t a POSIX exec(). I don’t feel that’s debatable. It’s just a fact.

Ruby’s exec isn’t a POSIX exec when running on Windows. So it
obviously is debatable.

I wasn’t running on Windows when I found the bugs.

So I’ll refine my statement a little: JRuby sucks at POSIX style process
management.

How about “JRuby by default does not assume the user wants POSIX-style
process management, but does not prevent you from using POSIX-style
process management with a bit of extra work.”

Yeah, kind of shocking I didn’t just say that mouthful. :slight_smile: OK, we’ll
go with your statement in the future. I’m fine with it.

I think it’s the “sucks” that I don’t like. Just saying something
“sucks” because you don’t agree with how it does things isn’t
informative or helpful.

I think it’s sad you have trouble with that word.

I suck at a LOT of things. I’ll make you a list sometime.

I’ve heard Matz explain things MRI sucks at before, though he probably
didn’t use that exact word.

If JRuby was perfect, we would all use JRuby all the time and all other
Ruby interpreters would die out. That hasn’t happened yet. :slight_smile:

You can’t please everyone all the time. It’s unfortunate that some
people think this means you “suck”.

I never said you suck Charlie. I don’t think anything like that.
You’re a hero of our community.

I also never said JRuby sucks. I said it sucks at some things.

I’ll try to remember to drop the word in the future though, knowing that
it bothers you.

We can still be friends as far as I’m concerned! :slight_smile:

James Edward G. II

On Tue, Dec 21, 2010 at 10:29 AM, James Edward G. II
[email protected] wrote:

On Dec 21, 2010, at 9:45 AM, Charles Oliver N. wrote:

Ruby needs to grow beyond being a Unix-only language.

When Ruby is running on Unix and doesn’t have those abilities, we’ve lost
something in the translation.

This I agree with. I’d like to provide the opt-in POSIX versions as a
standard library, so folks on Unix running JRuby can just require
‘jruby/posix’ and everything flips.

I wasn’t running on Windows when I found the bugs.

JRuby distributes a single binary for all platforms; that’s why we
usually try to avoid making platform-specific logic when possible. We
certainly do it in areas, but generally we try to keep things feeling
the same across all platforms we support in a single distributable.
That’s a large reason why JRuby’s often the easiest way to get Ruby
running on a given platform (and the JVM authors have done most of the
cross-platform binaries for us at that level, of course).

MRI, on the other hand, can’t distribute a single binary, and so
there’s scads of platform-specific logic applied at build time. That
makes it possible to be more platform specific out of the box, but
also means things often vary across platforms in unexpected ways.
Tradeoffs.

If JRuby was perfect, we would all use JRuby all the time and all other Ruby
interpreters would die out. That hasn’t happened yet. :slight_smile:
JRuby’s certainly not perfect, and I’ll be the first to admit when
it’s an inherent problem we can’t solve. But for cases like this,
where we actively made a decision to deviate, I feel it necessary to
point out why we made that decision so people don’t think it’s just
because “JRuby sucks” and walk away.

You can’t please everyone all the time. It’s unfortunate that some
people think this means you “suck”.

I never said you suck Charlie. I don’t think anything like that. You’re a hero
of our community.

I also never said JRuby sucks. I said it sucks at some things.

I will also admit I tend to conflate “JRuby” and “me”. After working
on it for so many years, attacks on the project often feel like
attacks on me. I’ll try harder to keep the two separate in the future
:slight_smile:

We can still be friends as far as I’m concerned! :slight_smile:

Of course :slight_smile:

  • Charlie

On Dec 21, 2010, at 06:09 , Jose Hales-Garcia wrote:

On Dec 21, 2010, at 12:15 AM, timr [email protected] wrote:

If I did a
lot of data crunching/graphing, I might write a DSL that allowed R
plotting code to be made via a nicer ruby-like syntax, but so far I
haven’t been motivated enough.

Curious if you’ve used RinRuby and what you think of it?

I’ve had good success with rsruby. It isn’t the nicest or terribly
complete, but by embedding R it is a LOT faster (10x by my measurements)
than calling out to R and running R scripts. Usually I’m just doing:

RSRuby.instance.eval_R r_src

But there is API to do data conversion back/forth. I found it to be a
bit clunky and in some cases buggy, but for the most part I’m usually
just using R for visualization, not lots of data transfer.

On Sat, Dec 18, 2010 at 8:55 PM, James Edward G. II
<[email protected]

wrote:

Of course, like anything, there are tradeoffs and JRuby sucks at other
things, like manipulating processes in a POSIX environment. I don’t use it
in these scenarios and you know that I’ve filed bugs for the specific
problems I’ve run into (some of those have been partially addressed).

For what it’s worth, the APIs that Ruby provides for process management
and
IPC suck and are very low level and hard to work with. It’s very easy to
run
into deadlocks or other issues when you try to do anything but the
simplest
tasks of process creation and IPC.

As an example, take what should be a relatively easy task: spawn a
process,
let it execute, and return what the process wrote to STDOUT and STDERR
as
two separate strings. Simple enough, right?

Wrong. The pitfalls of attempting to do this are documented in Ara
Howard’s
systemu gem:

There’s a few options to consider to prevent deadlocks: spawning
separate
threads for STDOUT and STDERR which consume data from the pipes as
written,
or using some form of I/O multiplexing to test the readiness of the
STDOUT
and STDERR descriptors, then consume data from whichever one is ready.

I think the best way to go here, both in general and for supporting
platforms like JRuby, is to write use case-specific libraries to handle
process management and IPC for specific tasks. systemu is one such
library
(and I’ve added JRuby support)

On Tuesday, December 21, 2010 1:15:32 AM UTC-7, Tim Rand wrote:

the two are quite similar conceptually. I think it would be great if
we had a dedicated scientific community to build the tools in ruby,
but I don’t see it happening, because python has already filled the
niche.
Tim

mostly true. worth mentioning, however, is that scientific
visualization is
having a bit of a crisis, largely because of two facts:

  • data is increasingly open and accessed via an api, not stored locally
    (kml, opendap/netcdf, various home spun http protocols, etc)

  • people want to interactive data that can be shared

that’s why there is so much interest in svg/js/canvas libraries: one can
access open data with open protocols and produce interactive charts that
can
be shared online using standards-ish tools (see
http://vis.stanford.edu/protovis/ for and example…)

there is new work to do, however, many toolsets are, unfortunately,
bound to
to dom. great value can be made by programming toolsets which work in
http/svg and which use open data access standards, but which are not
bound
to a browser only runtime environment.

ruby is actually in an excellent position to fill this void. food for
thought.

On Saturday, December 18, 2010 11:27:48 am Charles Oliver N. wrote:

About the only unintuitive thing I ever found was implementing a Java

interface, and while it’s somewhat unintuitive, it’s still trivial:
If you have suggestions for how to improve it, we’d love to hear them :slight_smile:

Not really.

Put another way: Most Java constructs translate intuitively to Ruby, as
soon
as I just conceptually gave up on even trying to use generics, but Java
can
work with raw types, too, so that’s not that big a leap.

Nearly everything I want to do with JRuby and Java, I don’t have to go
to the
docs, it just does what I’d expect:

require ‘java’
import java.util.AbstractCollection
import java.util.ArrayList

class MyCollection < AbstractCollection
def initialize
super
@coll = ArrayList.new
@coll << 1
@coll << 2
@coll << 3
end
def iterator
@coll.iterator
end
def size
@coll.size
end
end

Not only does that give me the stuff I wanted from AbstractCollection,
but
Java “knows” it’s a working collection, so I can do this:

import java.util.Collections
c = Collections.unmodifiableCollection MyCollection.new
i = c.iterator
i.next # returns 1
i.remove # throws java.lang.UnsupportedOperationException

Basically, by knowing Java and Ruby, and looking at the Javadoc for
various
Collections-related stuff, I was able to write the above without knowing
a
single thing about JRuby other than that I have to “require ‘java’”
first.

So maybe it’s just me, but the first time I actually had to Google for
how
JRuby does something was when it wasn’t immediately obvious that I
needed:

include Comparator

That’s about the closest you could map that onto Ruby, but unlike the
previous, it’s not a perfect match, because Ruby doesn’t have
interfaces, or
really anything like them. In Ruby, if I’m going to completely implement
Comparator, there’s no reason I should ever have to include Comparator.

I don’t really see a good way around that, though. Plus, this stuff is
cool:

Or this may work too (I don’t remember PriorityQueue’s API):

pq = PriorityQueue.new(11) do |a, b|
a.to_s <=> b.to_s
end

That does work.

So this isn’t a complaint, really. How could I complain when actually
learning
about JRuby gives me even easier interfaces than I’d expect?

Oracle’s actions relating to Java have all been political. At the same
time people publish that they’re fighting with Apache or Google, they
are also getting IBM (GPL-haters) and Apple (not big OSS contributors)
to collaborate on the GPLed OpenJDK, and making concrete plans for
OpenJDK to continue beyond Java 8.

…at the same time as they’re also making things difficult for anyone
who
wants to start an implementation of Java other than OpenJDK. The fact
that
Ruby has MRI (1.8 and 1.9), JRuby, MacRuby, and others is a strength.
There
really only is the one implementation of Java.

The fact that OpenJDK exists and is GPL’d helps me sleep better at
night, but
the fact that Oracle is suing people over patents makes me wonder how
useful
the GPL actually is in this case. It’s kind of like x264 being GPL’d –
doesn’t make H.264 any more open.

As far as using Java, nothing has changed for the worse in the past year.

Right, and I am using it (and JRuby).

It’s just the long-term stuff that scares me. I suppose it’s not
actually a
change for the worse given Sun always had the ability to do the stuff
Oracle’s
doing now, but every now and then, it looks as though Oracle might
actually be
clueless (or evil?) enough to run Java into the ground for a short-term
profit.

I doubt it’s going to happen, but I do try to keep enough of my code
portable
that a switch back to MRI would be possible, if incredibly painful.

Hi All,

I know this is an old post… but I´m new to the subject. I don´t know
if people will read this or not. Anyhow, I´ve just implemented a Gem
called MDArray which is available at RubyGem and GitHub
(GitHub - rbotafogo/mdarray: Multidimensional array similar to NumPy and NArray). It implements a multi
dimensional array in the spirit of numpy. It is far from numpy
performance and features, but I´m planning to keep adding features. I
think it should be functional for what it does, but still limited. It
is targeted to JRuby as it uses Java-NetCDF library.

Hope this can help someone… cheers,

Rodrigo