Ruby and science?

On Fri, Dec 17, 2010 at 6:39 PM, Kent R. Spillner [email protected]
wrote:

Howdy-

No, it is true (or was ~9 months ago, when I last tried to unsubscribe). The
automated list manager at [email protected] is broken; the web
interface, too.

Did you try to send an email with “help” in the subject to
ruby-talk-ctl@… ? I just did, and the list server was kind enough to
send me the help file.

So, try again, and see if it works, before working with data that’s 9
months old. :wink:


Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

Howdy-

You can’t; the Ruby mailing lists are broken and the mailing list owner is
unresponsive.

[SNIP]

That is not true. You can unsubscribe by sending your unsubscribe
email to [email protected] instead of
[email protected]. You can also unsubscribe from:

Mailing Lists

No, it is true (or was ~9 months ago, when I last tried to unsubscribe).
The automated list manager at [email protected] is broken; the
web interface, too.

When you send an unsubscribe request to [email protected] you
receive a confirmation message, but when you reply to that confirmation
message nothing ever happens. When you try to unsubscribe through the
web interface, you receive a confirmation email saying you’ve been
unsubscribed, but you’re not actually removed from the list.

And when I email ruby-talk-admin@ I never receive a response.

Best,
Kent

This thread reminded me this nice blog post about the same topic -
http://allthingsprogress.com/posts/ruby-is-beautiful-but-im-moving-to-python

Unfortunately the author of that blog post seemed to end up switching
to Python. Maybe the author didn’t/doesn’t know also about the
possible libraries in Ruby.

Jarmo P.

IT does really matter - http://www.itreallymatters.net

Jarmo P. wrote in post #969261:

This thread reminded me this nice blog post about the same topic -
http://allthingsprogress.com/posts/ruby-is-beautiful-but-im-moving-to-python

Unfortunately the author of that blog post seemed to end up switching
to Python. Maybe the author didn’t/doesn’t know also about the
possible libraries in Ruby.

@ryan : thanks for the Fortran tip

@jarno : yes, it was exactly the same discussion ! But ruby/gsl now
seems to solve a part of the problem (at least on Ubuntu, it does not
compile on Windows for the time being).

_md

On Thu, Dec 16, 2010 at 5:15 PM, Colin B.
[email protected] wrote:

  1. Integration with Java seems to be easy: I’m not a Java programmer, but
    I’ve found it easy to write Java code to do the number crunching, and mostly
    easy to integrate the “compiled” Java code with JRuby. (I say mostly because
    at the start I couldn’t find a way to compile the Java code in a way that
    would reliably work with JRuby, but that was essentially me not
    understanding how Java packages really worked. I still don’t understand how
    Java packages really work, but I’ve found a way to compile that reliably
    works for me with JRuby!) That’s a big plus because I definitely don’t
    understand at the moment how to compile C code and integrate that with MRI
    Ruby.

Apart from perhaps MacRuby calling ObjC and IronRuby calling .NET,
JRuby calling Java is by far the easiest way to pull in external
libraries. C extensions and FFI are nowhere near as easy, and never
will be.

I always recommend using JRuby to take advantage of Java/JVM libraries
over C extensions and FFI, but of course I’m a bit biased :slight_smile:

  • Charlie

i worked in science using ruby for many years. it is true that
sometimes
various libs (narray, ruby gsl, netcdf, opendap bindings, etc) require
some
tweaking to compile and get running. my observation, during the decade
or
so doing this work, is that this issue is not exclusive to ruby at all:
if
one is going to be a serious scientific programmer then one needs to
understand all sorts of things ‘normal’ computer scientists don’t
normally
consider any more: compiler flags, endianness, huge word sizes,
specicalized
hardware interfaces, dedicated super computer compiler tool chains, and
generally shoddy source control and distribution mechanisms (here - i
wrote
this for f77 10 years ago…) dominate the scientific programming
landscape.
my experience has been that this is as true for python as it is for
fortran
as it is for ruby. understanding the black art of object files,
linkers,
and compiler implementations is just a fact of life when one is pushing
the
boundaries of what machines do commonly. it is no reason to give up on
ruby, or any other language for that matter. if you choose, say, R,
then
maybe you’ll solve certain problems more quickly, but hit roadblocks in
others, like accessing data via some obscure ftp based protocol…
needing
to understand complier black art to make computers go fast i can live
with,
but hacking makefiles to do common tasks i abhor - pick your poison.

On Thu, Dec 16, 2010 at 9:39 PM, David M. [email protected]
wrote:

And JRuby is getting faster all the time. It’s not clear whether one will
necessarily beat the other.

We’ve always emphasized compatibility and bugfixes over performance,
and so while we handily beat 1.9.1 and earlier versions of Ruby a year
ago, these days it’s a bit of a toss-up with 1.9.2. Even in the JRuby
1.6 cycle, we got in a little perf work…but soon moved priorities
back to implementing remaining 1.9.2 features. One of these days we’ll
have caught up on all features, or I’ll just decide I need to spend my
time entirely on performance :slight_smile:

In general, though, if you find something that’s notably slower than
1.9.2, please file a bug. There are areas where we know we’re a bit
slower, but I’m sure there’s areas we have bugs keeping us slow.

In particular, I remember hearing discussions of a commandline flag in JRuby
which one could use to disallow altering methods on the core numeric types.
This would basically make Ruby math compile down to Java math. I imagine most
scientific applications wouldn’t care about altering the core numeric types,
while most scientific applications would care about fast math.

This would be the --fast flag. It used to help the performance of
small methods and math operations, but the bulk of its benefit is now
in JRuby master (1.6) by default:

~/projects/jruby ➔ …/jruby-1.5.2/bin/jruby bench/bench_tak.rb 4
user system total real
2.601000 0.000000 2.601000 ( 2.537000)
1.805000 0.000000 1.805000 ( 1.805000)
1.790000 0.000000 1.790000 ( 1.791000)
1.807000 0.000000 1.807000 ( 1.807000)

~/projects/jruby ➔ jruby -v bench/bench_tak.rb 4
jruby 1.6.0.dev (ruby 1.8.7 patchlevel 249) (2010-12-17 d2575a7) (Java
HotSpot™ 64-Bit Server VM 1.6.0_22) [darwin-x86_64-java]
user system total real
1.810000 0.000000 1.810000 ( 1.742000)
1.058000 0.000000 1.058000 ( 1.058000)
1.053000 0.000000 1.053000 ( 1.053000)
1.057000 0.000000 1.057000 ( 1.057000)

The next “big thing” that probably won’t land in 1.6 is “dynopt”,
which performs more runtime optimization of code:

~/projects/jruby ➔ jruby -v -Xcompile.dynopt=true bench/bench_tak.rb 4
jruby 1.6.0.dev (ruby 1.8.7 patchlevel 249) (2010-12-17 d2575a7) (Java
HotSpot™ 64-Bit Server VM 1.6.0_22) [darwin-x86_64-java]
user system total real
0.912000 0.000000 0.912000 ( 0.837000)
0.518000 0.000000 0.518000 ( 0.518000)
0.516000 0.000000 0.516000 ( 0.516000)
0.517000 0.000000 0.517000 ( 0.517000)

Both the 1.6 and the 1.6+dynopt results should consistently be faster
than 1.9 for small benchmarks.

For large benchmarks and real applications, performance almost always
comes down to the performance of core classes like String and Array.
At that point, it’s mostly a matter of figuring out where the core
classes don’t perform as well…and fixing them.

About the only unintuitive thing I ever found was implementing a Java
interface, and while it’s somewhat unintuitive, it’s still trivial:

If you have suggestions for how to improve it, we’d love to hear them :slight_smile:

singleton comparator

comp = Class.new {
include Comparator
def compare a,b
a.to_s <=> b.to_s
end
}.new

pq = PriorityQueue.new 11, comp

You can also do:

Comparator.impl do |name, a, b|

name is name of interface method, check it or not

a.to_s <=> b.to_s
end

Or this may work too (I don’t remember PriorityQueue’s API):

pq = PriorityQueue.new(11) do |a, b|
a.to_s <=> b.to_s
end

Oracle’s behavior lately is making me kind of iffy about the future of Java as
a platform, but JRuby is just made of awesome.

Oracle’s actions relating to Java have all been political. At the same
time people publish that they’re fighting with Apache or Google, they
are also getting IBM (GPL-haters) and Apple (not big OSS contributors)
to collaborate on the GPLed OpenJDK, and making concrete plans for
OpenJDK to continue beyond Java 8.

As far as using Java, nothing has changed for the worse in the past
year.

ruby-inline is very cool, but it’s still not quite as easy as being able to
write a Java class, pretend it’s a Ruby class, and have it work.

There’s also java_inline, an extension to ruby_inline I made that
allows you to write Java code inline like C code in ruby_inline:

https://github.com/jruby/java-inline

require ‘java_inline’

class Foo
inline :Java do |builder|
builder.package “org.jruby.test”
builder.java "
public static int fib_java(int n) {
if (n < 2) return n;

    return fib_java(n - 2) + fib_java(n - 1);
  }
  "

end
end

Foo.new.fib_java(45)

Fun stuff.

  • Charlie

Apart from perhaps MacRuby calling ObjC and IronRuby calling .NET,
JRuby calling Java is by far the easiest way to pull in external
libraries. C extensions and FFI are nowhere near as easy, and never

will be.

I always recommend using JRuby to take advantage of Java/JVM libraries
over C extensions and FFI, but of course I’m a bit biased :slight_smile:

of course JRuby is a fantastic tool for many use cases, but i’ve
personally
found science to be perhaps the worst possible application of it. these
reasons are quite simple:

  • speed. when you need something to be big or fast in science,
    generally
    even c won’t cut it. fortran is still used in maybe 80% of big weather
    systems for a reason: the compilers are generally doing faster floating
    point ops than the equiv c compilers. one can bridge fortran -> c ->
    ruby
    quite easily (narray does this, gsl does this, etc) and it’s place
    where
    JRuby actually makes the job much harder. Java, of course, isn’t even
    in
    the ballpark.

  • OS integration: the general approach to making ruby faster is to use
    parallelism. the best way is to run lot’s of processes. JRuby’s
    interface
    to the operating system level primitives for this (fork, et all) make
    this
    really really hard, close to impossible, to deal with simply. Mmap is
    another great example of something you want at your finger tips in
    science… Interfaces to hardware boards connected to a research
    device,
    etc. I think any research based science makes getter close to the metal
    a
    requirement.

  • start up time. related to the above is the fact that science tends to
    lead to many small programs running very often. map reduce jobs, cron
    jobs,
    process pipe lines of related algorithims, toolkits made extensible via
    file
    based processing, tons of processing of stdin/stdout tend to be facts of
    life when algorithm writers produce systems as a side effect. it’s not
    pretty, but it is a fact i’ve seen repeated over and over.

i am definitely aware of some projects which make really heavy use of
java
and there, JRuby sure would be an awesome tool but my personal
experience is
that anything related to the JVM is a total non-starter. YMMV.

Perhaps it’s not as low-level and bare-metal as MRI, but it’s a better
experience for many, many cases. And that’s the Ruby way.

excellent points all charles - as usual!

i think my only warning is: if one is expecting to work with a pile of
legacy fortran and c (the only science environment i’ve ever worked in)
then
just bite the bullet and learn compiler-fu. if you are lucky, maybe you
can
work in a more enlightened environment…

for the record i am aware of a few large projects that are using huge
java
code bases - i have simply never personally never been lucky enough not
to
step into piles of f77 code… ;-(

in the end i think the silos that exist in science, before all else,
should
determine the tool set. just the nature of the beast.

cheers.

On Sat, Dec 18, 2010 at 1:00 PM, ara.t.howard [email protected]
wrote:

the ballpark.
If you need C or Fortran, you need C or Fortran. I won’t argue that.
Most people, however, don’t.

  • OS integration: the general approach to making ruby faster is to use
    parallelism. the best way is to run lot’s of processes. JRuby’s interface
    to the operating system level primitives for this (fork, et all) make this
    really really hard, close to impossible, to deal with simply. Mmap is
    another great example of something you want at your finger tips in
    science… Interfaces to hardware boards connected to a research device,
    etc. I think any research based science makes getter close to the metal a
    requirement.

The general approach to making Ruby faster is to use a faster Ruby or
write better Ruby code. JRuby’s good for the former.

If you need to parallelize, processes are only one tool, and perhaps
the most blunt tool. In-process concurrency opens up many options that
are difficult or impossible with processes. So JRuby enables one set
of methodologies for concurrency while perhaps not supporting others
well. Trade-offs.

JRuby doesn’t support fork, but it supports memory-mapping (via NIO
memory-mapping, and again you don’t have to write or compile a line of
C). As for interfaces to hardware boards…if you need C, you need C.
I won’t argue that. Most people don’t.

  • start up time. related to the above is the fact that science tends to
    lead to many small programs running very often. map reduce jobs, cron jobs,
    process pipe lines of related algorithims, toolkits made extensible via file
    based processing, tons of processing of stdin/stdout tend to be facts of
    life when algorithm writers produce systems as a side effect. it’s not
    pretty, but it is a fact i’ve seen repeated over and over.

This is how you do parallel processing for your work. It’s not the
only way, and being able to pass whole in-memory object graphs over to
another thread is distinctly more elegant than having to marshal it
through a memory-mapped file or IO pipe.

i am definitely aware of some projects which make really heavy use of java
and there, JRuby sure would be an awesome tool but my personal experience is
that anything related to the JVM is a total non-starter. YMMV.

Java is not a requirement for someone to want JRuby. All that’s
required is wanting to avoid monkeying with native code, wanting a
really solid VM, and wanting to run concurrent threads in a robust
environment. You can do all that without ever touching a line of Java
code. Just because you don’t do Java for science doesn’t mean Java and
the JVM are bad options for science.

And in any case…it was based on my recommendations, after dealing
with and hearing from dozens of MRI users who have no end of problems
with native C extensions. With JRuby, you write it once, build it
once, and ship it. Perhaps it’s not quite as fast as C, perhaps it
doesn’t integrate with the OS as well…but it’s a hell of a lot less
painful to use. Perhaps you can’t fork, but you can use real
concurrent threads, which are almost certainly easier (provided you
don’t share mutable data, as with processes). Perhaps it’s not as
low-level and bare-metal as MRI, but it’s a better experience for
many, many cases. And that’s the Ruby way.

  • Charlie

On Dec 18, 2010, at 6:24 PM, Charles Oliver N. wrote:

If you need to parallelize, processes are only one tool, and perhaps
the most blunt tool. In-process concurrency opens up many options that
are difficult or impossible with processes.

This is how you do parallel processing for your work. It’s not the
only way, and being able to pass whole in-memory object graphs over to
another thread is distinctly more elegant than having to marshal it
through a memory-mapped file or IO pipe.

Perhaps you can’t fork, but you can use real
concurrent threads, which are almost certainly easier (provided you
don’t share mutable data, as with processes).

Before I say this, I need to state that I love and use JRuby. The
reasons are that it completely rocks at some things, like Java
integration.

Of course, like anything, there are tradeoffs and JRuby sucks at other
things, like manipulating processes in a POSIX environment. I don’t use
it in these scenarios and you know that I’ve filed bugs for the specific
problems I’ve run into (some of those have been partially addressed).

All that said, I think you were pretty harsh on using processes for
concurrency in general. That “blunt tool” is pretty much the core of
the Unix operating system, which I think a lot of us are found of. I
often find it easier to work with processes that threads myself, though
obviously some programmers think the other way.

On the contrary, threading is so challenging to get right that
“threading is hard” is a popular saying:

"threading is hard" - Google Search

It bugs me that people are so harsh on fork(). I avoided it like the
plague when I was a younger programmer because everyone had me convinced
it was evil. I’m now far more dangerous because I took the time to
learn it and understand it. I strongly recommend all programmers do the
same. (By the way, ara.t.howard taught me most of what I know about
processes, directly and indirectly!)

So JRuby is good at threads and not so good at processes, in my opinion.
Processes are also not at all evil. Judge not lest ye be judged. :wink:

James Edward G. II

@all, esp. @ara, @james, @charles

Thanks for this enlightening discussion, which clarify the issues I was

  • quite clumsily - adressing.

_md

I totally understand the desire to have thse capabilities and the
elegance of ruby, but I think you’d find the science and engineering
community in the python world worth looking at in more detail.

Numpy, scipy, matplotlib, ipython mayavi2 are some buzzwords to look up
and then decide for yourself.

Regards,
Ben R.


From: Michel D. [[email protected]]
Sent: Friday, December 17, 2010 12:56 AM
To: ruby-talk ML
Subject: Re: Ruby and science ?

Phillip G. wrote in post #969006:

Not quite, but have a look at ruby-toolbox.com (IIRC), which gives an
overview of what’s available fir what. And there’s the Ruby
Application Archive, of course.

‘gsl’ was not in the toolbox, and (stupid me) I did not look in the RAA
!
_md

Benjamin J. Racine wrote in post #969482:

I totally understand the desire to have thse capabilities and the
elegance of ruby, but I think you’d find the science and engineering
community in the python world worth looking at in more detail.

Numpy, scipy, matplotlib, ipython mayavi2 are some buzzwords to look up
and then decide for yourself.

Regards,
Ben R.

Benjamin, you are moving the knife in the wound (translated from French,
do you say that in English ?)
_md

Martin DeMello wrote in post #969546:

On Mon, Dec 20, 2010 at 1:44 PM, Michel D. [email protected]
wrote:

Benjamin, you are moving the knife in the wound (translated from French,
do you say that in English ?)

“Twisting the knife” in English

martin

In French it is “remuer le couteau dans la plaie”. Twisting is certainly
meaner than “remuer” :wink:

_md

On Sun, Dec 19, 2010 at 9:43 PM, Benjamin J. Racine
[email protected] wrote:

I totally understand the desire to have thse capabilities and the elegance of
ruby, but I think you’d find the science and engineering community in the python
world worth looking at in more detail.

Numpy, scipy, matplotlib, ipython mayavi2 are some buzzwords to look up and then
decide for yourself.

It depends on what kind of science you are trying to do in ruby.

I would like to point out that there are ruby bindings for Root.
http://root.cern.ch/root/HowtoRuby.html

http://root.cern.ch/ ( loosely think of it as the software behind the
LHC )

Andrew McElroy

On Mon, Dec 20, 2010 at 1:44 PM, Michel D. [email protected]
wrote:

Benjamin, you are moving the knife in the wound (translated from French,
do you say that in English ?)

“Twisting the knife” in English

martin

On Sun, Dec 19, 2010 at 9:43 PM, Benjamin J. Racine
[email protected] wrote:

I totally understand the desire to have thse capabilities and the elegance of
ruby, but I think you’d find the science and engineering community in the python
world worth looking at in more detail.

Numpy, scipy, matplotlib, ipython mayavi2 are some buzzwords to look up and then
decide for yourself.

What I’d really like to see are FFI-based wrappers around key science
and math libraries, rather than more blasted C extensions that can’t
be run concurrently and aren’t easily portable across impls. FFI works
incredibly well for these isolated libraries (as opposed to FFI for
kernel-level features, which can have many platofrm-specific
differences).

C extensions are the devil.

  • Charlie

On Sat, Dec 18, 2010 at 9:55 PM, James Edward G. II
[email protected] wrote:

Of course, like anything, there are tradeoffs and JRuby sucks at other things,
like manipulating processes in a POSIX environment. I don’t use it in these
scenarios and you know that I’ve filed bugs for the specific problems I’ve run
into (some of those have been partially addressed).

The JVM and the JDK APIs suck at process manipulation…not JRuby.
JRuby does the best job it can do cross-platform with the JDK APIs
provided for it. If you need to go outside those APIs, or if we “suck”
in how we utilize them, it’s a trivial matter to bind native C
process-management logic via FFI and use that. It won’t be as portable
as what we provide, but it will work.

Providing the excellent cross-platform experience the JVM provides
(and which JRuby provides by extension) means a lot of
platform-specific things are a bit cumbersome. Our direction has been
to provide the cross-platform experience and allow people to opt out
of portability through FFI if necessary. You may disagree with that
approach.

All that said, I think you were pretty harsh on using processes for concurrency
in general. That “blunt tool” is pretty much the core of the Unix operating
system, which I think a lot of us are found of. I often find it easier to work
with processes that threads myself, though obviously some programmers think the
other way.

Processes for concurrency works great. The blunt tool I meant was how
you get those processes to coordinate. You basically have a handful of
cumbersome options:

  • Signals, which can’t communicate much data
  • Streams, pipes, files, shared memory, which can only carry byte[]
    data, requiring marshaling

With threads, it’s possible to communicate between concurrent
processes using normal OO constructs like queues, actors, and simple
method calls. You can emulate that with processes using one of the
above mechanisms, but it’s a leaky abstraction. On the other hand,
your queues, actors, and method calls across threads need to be
thread-safe. Tradeoffs.

JRuby is perfectly happy to work with a multi-process model, but you
may need to opt out of portability to get the lowest-level behaviors
of a typical UNIX environment. I personally have nothing against
processes. Threads are just easier, if you stay out of the danger
zones.

On the contrary, threading is so challenging to get right that “threading is
hard” is a popular saying:

"threading is hard" - Google Search

Threading is hard if you do it wrong. The problem is that it’s easy to
do it wrong.

Follow these rules and threading is a very nice, very clean, very easy
way to do concurrency:

  1. Don’t share data
  2. If you must share data, don’t share immutable data
  3. If you must share mutable data, guarantee ACID (atomicity,
    consistency, isolation, durability)

Clojure is a perfect example of an environment that uses threads
heavily by defaulting to (2) and providing software transactional
memory for (3). Other than enforcing immutability, nothing Clojure
does for concurrency could not be done in Ruby. Anyone interested in
seeing concurrency done the Clojure way with JRuby can find many
examples online.

Threads “fail” in that none of these rules are enforced at any level.
They’re a very sharp tool with many dangerous paths. But I prefer
sharp tools.

It bugs me that people are so harsh on fork(). I avoided it like the plague
when I was a younger programmer because everyone had me convinced it was evil.
I’m now far more dangerous because I took the time to learn it and understand it.
I strongly recommend all programmers do the same. (By the way, ara.t.howard
taught me most of what I know about processes, directly and indirectly!)

I have no problem with fork. If JRuby could support fork on the JVM,
we would do so. We don’t only because all mainstream JVMs spin up
multiple threads, which are not carried along to forked child
processes (and even if they could be restarted, it’s a very
complicated transition that might defeat much of the benefit of
forking).

So JRuby is good at threads and not so good at processes, in my opinion.
Processes are also not at all evil. Judge not lest ye be judged. :wink:

It might be more correct to say that the JVM is good at threads and
not so good at processes, nothing that JRuby makes it possible via FFI
to be nearly as good at processes as any POSIX application. We have
simply prioritized making JRuby work uniformly across platforms first,
while still providing the tools people need to opt out of portability
for lower-level behaviors and features.

  • Charlie

On Dec 20, 2010, at 11:51 , Charles Oliver N. wrote:

The JVM and the JDK APIs suck at process manipulation…not JRuby.

Oh come now. If the JVM sucks at something, JRuby sucks at it too. Don’t
pass the buck.

JRuby does the best job it can do cross-platform with the JDK APIs
provided for it. If you need to go outside those APIs, or if we “suck”
in how we utilize them, it’s a trivial matter to bind native C
process-management logic via FFI and use that. It won’t be as portable
as what we provide, but it will work.

If it were trivial, why aren’t you shipping it (or at least pointing to
a jruby supported gem that does)? You’ve espoused FFI as the C-API
silver bullet time and again. I have doubts that it is that trivial as
FFI itself seems non-portable.