*Announcement* MDArray version 0.5.3 has Just been released. MDArray is a multi dimensional array implemented for JRuby inspired by NumPy (www.numpy.org) and Masahiro Tanakas Narray (narray.rubyforge.org). MDArray stands on the shoulders of Java-NetCDF and Parallel Colt. At this point MDArray has libraries for mathematical, trigonometric and descriptive statistics methods. NetCDF-Java Library is a Java interface to NetCDF files<http://www.unidata.ucar.edu/software/netcdf/index.html>, as well as to many other types of scientific data formats. It is developed and distributed by Unidata (http://www.unidata.ucar.edu). Parallel Colt ( http://grepcode.com/snapshot/repo1.maven.org/maven...) is a multithreaded<http://en.wikipedia.org/wiki/Thread_%28computer_sc... version of Colt <http://dsd.lbl.gov/~hoschek/colt/> ( http://acs.lbl.gov/software/colt/). Colt provides a set of Open Source Libraries for High Performance Scientific and Technical Computing in Java. Scientific and technical computing is characterized by demanding problem sizes and a need for high performance at reasonably small memory footprint. *Whats new**:* *Performance Improvement* On previous versions, array operations were done by passing a Ruby Proc to a loop for all elements of the given arrays. For instance, adding two MDArrays was done by passing Proc.new { |a, b| a + b } and looping through all elements of the arrays. Procs are very flexible in Ruby; however, from my experience with MDArray, also very slow. On this version, when available, instead of passing a Proc to the loop, we pass a native Java method. Available Java methods are those extracted from Parallel Colt and listed below. Note that Parallel Colt has native methods for the following types only: double, float, long and int. With this change, there was a performance improvement of over 90%, and using MDArray operations is close to native Java operations. We expect (but have not yet benchmarking data) that this brings MDArray performance close to similar solutions such as NArray, NMatrix and NumPy (please try it, and if this assertion is false, Ill be glad to change it in future announcements). Methods not available in Parallel Colt but supported by Ruby, such as sinh, tanh, and add for byte type, etc. are still supported by MDArray. Again, to improve performance, instead of passing a Proc we now create a class as follows class Add * *def self.apply(a, b) a + b end end This change brought performance improvement of over 60% for MDArray operations with Ruby methods. *Experimental Lazy Evaluation* Usual MDArray operations are done eagerly, i.e., if @a, @b, @c are three MDArrays then the following: @d = @a + @b + @c will be evaluated as follows: first @a + @b is performed and stored in a temporary variable, then this temporary variable is added to @c. For large expressions, temporary variables can have significant performance impact. This version of MDArray introduces lazy evaluation of expressions. Thus, when in lazy mode: @lazy_d = @a + @b + @c will not evaluate immediately. Rather, the expression is preprocessed and only executed when required. Since at execution time the whole expression is known, there is no need for temporary variables as the whole expression is executed at once. To put MDArray in lazy mode we only need to set its mode to lazy with the following command MDArray.lazy = true. All expressions after that are by default lazy. In lazy mode, MDArray resembles Numexpr, however, there is no need to write the expression as a string, and there is no compilation involved. MDArray does not implement broadcasting rules as NumPy. As a result, trying to operate on arrays of different shape raises an exception. On lazy mode, this exception is raise only at evaluation time, so it is possible to have an invalid lazy array. To evaluate a lazy array one should use the [] method as follows: @d = lazy_d[] @d is now a normal MDArray. Lazy MDArrays are really lazy, so lets assume that @a = [1, 2, 3, 4] and @b = [5, 6, 7, 8]. Lets also have @l_c = @a + @b. Now doing @c = @l_c[], will evaluate @c to [6, 8, 10, 12]. Now, lets do @a[1] = 20 and then @d = @l_c[]. Now @d evaluates to [25, 8, 10, 12] as the new value of @a is used. Lazy arrays can be evaluated inside expressions: @l_c = (@a + @b)[] + @c In this example, @l_c is a lazy array, but (@a + @b) is evaluated when the [] method is called and then added to @c. If now the value of @a or @b is changed, the evaluation of @l_c will not be changed as in the previous example. Finally, laziness is contagious. So, lets assume that we have @l_c as above, a lazy array and we do MDArray.lazy = false. From this point on in the code, operations will be done eagerly. Now doing: @e = @d + @l_c, @e is a lazy array as its construction involves a lazy array. One should be careful when in eager mode mixing lazy and eager arrays: @c = @l_a + (@b + @c) then, with parenthesis, first (@b + @c) is evaluated eagerly and then added lazily to @l_a, giving a lazy array. In this version, Lazy evaluation is around 40% *less* efficient in one machine I tested up to approximately the same performance in another equipment than eager evaluation when only native Java methods (Parallel Colt methods described below) are used in the expression. If expression involves any Ruby method, evaluation of lazy expressions becomes much slower than eager evaluation. In order to improve performance, I believe that compilation of expression will be necessary. *MDArray and SciRuby**:* MDArray subscribes fully to the SciRuby Manifesto (http://sciruby.com/). ***Ruby* <http://www.ruby-lang.org/>* has for some time had no equivalent to the beautifully constructed **NumPy* <http://numpy.scipy.org/>*, SciPy<http://www.scipy.org/>, and **matplotlib* <http://matplotlib.sourceforge.net/>* libraries for ** Pytho <http://www.python.org/>n.* *We believe that the time for a Ruby science and visualization package has come. Sometimes when a solution of sugar and water becomes super-saturated, from it precipitates a pure, delicious, and diabetes-inducing crystal of sweetness, induced by no more than the tap of a finger. So is occurring now, we believe, with numeric and visualization libraries for Ruby.* *MDArray main properties are**:* Homogeneous multidimensional array, a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers; Easy calculation for large numerical multi dimensional arrays; Basic types are: boolean, byte, short, int, long, float, double, string, structure; Based on JRuby, which allows importing Java libraries; Operator: +,-,*,/,%,**, >, >=, etc.; Functions: abs, ceil, floor, truncate, is_zero, square, cube, fourth; Binary Operators: &, |, ^, ~ (binary_ones_complement), <<, >>; Ruby Math functions: acos, acosh, asin, asinh, atan, atan2, atanh, cbrt, cos, erf, exp, gamma, hypot, ldexp, log, log10, log2, sin, sinh, sqrt, tan, tanh, neg; Boolean operations on boolean arrays: and, or, not; Fast descriptive statistics from Parallel Colt (complete list found bellow); Easy manipulation of arrays: reshape, reduce dimension, permute, section, slice, etc.; Reading of two dimensional arrays from CSV files (mainly for debugging and simple testing purposes); StatList: a list that can grow/shrink and that can compute Parallel Colt descriptive statistics; Experimental lazy evaluation (still slower than eager evaluation). *Descriptive statistics methods imported from Parallel Colt**:* auto_correlation, correlation, covariance, durbin_watson, frequencies, geometric_mean, harmonic_mean, kurtosis, lag1, max, mean, mean_deviation, median, min, moment, moment3, moment4, pooled_mean, pooled_variance, product, quantile, quantile_inverse, rank_interpolated, rms, sample_covariance, sample_kurtosis, sample_kurtosis_standard_error, sample_skew, sample_skew_standard_error, sample_standard_deviation, sample_variance, sample_weighted_variance, skew, split, standard_deviation, standard_error, sum, sum_of_inversions, sum_of_logarithms, sum_of_powers, sum_of_power_deviations, sum_of_squares, sum_of_squared_deviations, trimmed_mean, variance, weighted_mean, weighted_rms, weighted_sums, winsorized_mean. *Double and Float methods from Parallel Colt*: acos, asin, atan, atan2, ceil, cos, exp, floor, greater, IEEEremainder, inv, less, lg, log, log2, rint, sin, sqrt, tan. *Double, Float, Long and Int methods from Parallel Colt*: abs, compare, div, divNeg, equals, isEqual (is_equal), isGreater (is_greater), isles (is_less), max, min, minus, mod, mult, multNeg (mult_neg), multSquare (mult_square), neg, plus (add), plusAbs (plus_abs), pow (power), sign, square. *Long and Int methods from Parallel Colt* and, dec, factorial, inc, not, or, shiftLeft (shift_left), shiftRightSigned (shift_right_signed), shiftRightUnsigned (shift_right_unsigned), xor. *MDArray installation and download**:* Install Jruby jruby S gem install mdarray *MDArray Homepages**:* http://rubygems.org/gems/mdarray https://github.com/rbotafogo/mdarray/wiki *Contributors**:* Contributors are welcome. *MDArray History**:* 24/05/2013: Version 0.5.0 Over 90% Performance improvements for methods imported from Parallel Colt and over 40% performance improvements for all other methods (implemented in Ruby); 16/05/2013: Version 0.5.0 - All loops transferred to Java with over 50% performance improvements. Descriptive statistics from Parallel Colt; 19/04/2013: Version 0.4.3 - Fixes a simple, but fatal bug in 0.4.2. No new features; 17/04/2013: Version 0.4.2 - Adds simple statistics and boolean operators; 05/04/2013: Version 0.4.0 Initial release.

on 2013-06-24 21:42

on 2013-06-24 22:07

Wow, this is really excellent Rodrigo! Thank you for the extensive post...I just tweeted about MDArray to spread the word. The perf improvements sound excellent, and there's even room to grow; if we turned a few of your wrapper classes into JRuby extensions, we could eliminate almost all of the Ruby-to-Java overhead and bump the speed up even more. It's nice to have a path forward to go even faster, but MDArray is also an excellent demonstration of the power of JRuby's Java integration. I am looking forward to seeing what else comes out of ScyRuby this summer. Does MDArray fall under the same family of projects, or is it mostly independent? - Charlie On Mon, Jun 24, 2013 at 12:39 PM, Rodrigo B.

on 2013-06-24 23:41

Charles, Thanks! Im really surprise at how fast MDArray can be and Im thrilled with JRuby. JRuby really makes integration of Ruby-to-Java easy and fast. I look forward on possible extensions to JRuby to eliminate overhead. Moving forward, I think Ill try to compile lazy expressions in order to reap the benefits of temporaries elimination. This is an independent project, mostly done on my free time. Keep the good work with JRuby, its a great project! Cheers, Rodrigo On Mon, Jun 24, 2013 at 4:04 PM, Charles Oliver N.

on 2013-08-04 00:19

Hi, this looks pretty interesting for image/graphical processing and all sorts of other geometry stuff (numpy is quite sophisticated for this). I had a bit of go at using MDArray with ruby-processing here http://learning-ruby-processing.blogspot.co.uk/201..., but I'm sure I can use it for more exciting stuff.

on 2013-08-05 20:10

Hi Martin, Thanks for your post. I think that an interesting feature of MDArray is the ability to get section from it. A section gets a subarray with the same backing store. Since there is no data copying, cost is very low. So, Im thinking that you could do some animation with it. For instance, in your example, you could have another dimension time = 10. Here a small example that fills all frames of the animation. My example data will probably not show anything nice though! require 'mdarray' def animation width = 5 height = 6 time = 10 animation = MDArray.float([time, width, height]) (0...time).each do |t| # use section to get only one time frame. The last argument to section is 'true' # that gets a section with reduction, i.e., eliminates dimensions of size 1. # frame is a 2 dimensional mdarray frame = animation.section([t, 0, 0], [1, width, height], true) # fill frame with values for each time frame animation (0...width).each do |w| (0...height).each do |h| frame[w, h] = t * w * h end end end end More dimensions could be used for instance if you wanted to divide the screen if quarters. You could also have an animation with many characters. Animation = MDArray.float([time, characters, width, height]) Every character would be in its own dimension. You could work on every character independently by getting their section: char1 = animation.section([t, 1, 0, 0], [1, 1, width, height], true) or getting all characters for a given time frame. Frame1 = animation.section([t, 0, 0, 0], [1, characters, width, height]) I would love to see this with processing! Cheers, Rodrigo

on 2013-08-06 23:18

Before I saw your reply I came up with a revised version of my conways game of life using MDArray http://learning-ruby-processing.blogspot.co.uk/201.... I guess in this case instead of having to two MDArray instances I could have used the one. Unfortunately I've come unstuck with my idea of image processing. Since to use gem libraries with ruby-processing I need to use an external jruby. Whereas to get sketches with PImage (loads images as pixel arrays), I seem to need to use the jruby-complete (that is included with ruby-processing). I have had bit of experience using numpy and pyprocessing which I thought could be interesting.

on 2013-08-07 00:00

Martin, Yes, you could use just one MDArray with one extra dimension for the buffer, although the code is quite clear the way it is. In this case I dont see much gain to it. I dont really understant the problem with having to load jruby-complete. If you cannot intall gems, you could get the .rb files and .jar from MDArray and work as if they were your own ruby and jar files by just configuring the path properly. All files are available on the gem directory or can be directly downloaded from gitHub. Rodrigo

on 2013-08-07 09:21

I think I might have solved the jruby-complete.jar vs external jruby issue, which is completely our fault. The jruby-complete.jar is in our classpath, whether using it directly by starting from java, or calling from jruby (not a good idea). If I remove the jruby-complete.jar I can now run sketches that require runtime libraries from that classpath. I will need a workaround for non-jruby installs (which can't access gems anyway) and for exported apps.