Forum: JRuby [ANN] Multidimensional Array - MDArray 0.5.4

B97f192e676f1a67e0ef18f022671406?d=identicon&s=25 Rodrigo Botafogo (Guest)
on 2013-08-07 21:37
(Received via mailing list)
Announcement
============

MDArray version 0.5.4 has Just been released. MDArray is a multi
dimensional array implemented
for JRuby inspired by NumPy (www.numpy.org) and Masahiro Tanakas Narray
(
narray.rubyforge.org).
MDArray stands on the shoulders of Java-NetCDF and Parallel Colt.  At
this
point MDArray has
libraries for mathematical, trigonometric and descriptive statistics
methods.

NetCDF-Java Library is a Java interface to NetCDF files, as well as to
many
other types of
scientific data formats.  It is developed and distributed by Unidata
(http://www.unidata.ucar.edu).

Parallel Colt (
http://grepcode.com/snapshot/repo1.maven.org/maven...
parallelcolt/0.10.0/) is a multithreaded version of Colt (
http://acs.lbl.gov/software/colt/).
Colt provides a set of Open Source Libraries for High Performance
Scientific and Technical
Computing in Java. Scientific and technical computing is characterized
by
demanding problem
sizes and a need for high performance at reasonably small memory
footprint.

For more information and (some) documentation please go to:
https://github.com/rbotafogo/mdarray/wiki

Whats new:
===========

NetCDF-3 File Support
---------------------

From Wikipedia, the free encyclopedia:

"NetCDF (Network Common Data Form) is a set of software libraries and
self-describing,
machine-independent data formats that support the creation, access, and
sharing of array-oriented
scientific data. The project homepage is hosted by the Unidata program
at
the University
Corporation for Atmospheric Research (UCAR). They are also the chief
source
of netCDF software,
standards development, updates, etc. The format is an open standard.
NetCDF
Classic and 64-bit
Offset Format are an international standard of the Open Geospatial
Consortium.

The project is actively supported by UCAR. Version 4.0 (released in
2008)
allows the use of the
HDF5 data file format. Version 4.1 (2010) adds support for C and Fortran
client access to
specified subsets of remote data via OPeNDAP.

The format was originally based on the conceptual model of the Common
Data
Format developed by
NASA, but has since diverged and is not compatible with it."

This version of MDArray implements NetCDF-3 file support only.  NetCDF-4
is
not yet supported.  At
the end of this announcement we show the MDArray implementation of the
NetCDF-3 file writing
from the tutorial at:
http://www.unidata.ucar.edu/software/netcdf-java/t...


MDArray and SciRuby:
====================

MDArray subscribes fully to the SciRuby Manifesto (http://sciruby.com/).

Ruby has for some time had no equivalent to the beautifully constructed
NumPy, SciPy, and
matplotlib libraries for Python.

We believe that the time for a Ruby science and visualization package
has
come. Sometimes
when a solution of sugar and water becomes super-saturated, from it
precipitates a pure,
delicious, and diabetes-inducing crystal of sweetness, induced by no
more
than the tap of a
finger. So is occurring now, we believe, with numeric and visualization
libraries for Ruby.

MDArray main properties are:
============================

 + Homogeneous multidimensional array, a table of elements (usually
numbers), all of the
     same type, indexed by a tuple of positive integers;
 + Easy calculation for large numerical multi dimensional arrays;
 + Basic types are: boolean, byte, short, int, long, float, double,
string,
structure;
 + Based on JRuby, which allows importing Java libraries;
 + Operator: +,-,*,/,%,**, >, >=, etc.;
 + Functions: abs, ceil, floor, truncate, is_zero, square, cube, fourth;
 + Binary Operators: &, |, ^, ~ (binary_ones_complement), <<, >>;
 + Ruby Math functions: acos, acosh, asin, asinh, atan, atan2, atanh,
cbrt,
cos, erf, exp,
     gamma, hypot, ldexp, log, log10, log2, sin, sinh, sqrt, tan, tanh,
neg;
 + Boolean operations on boolean arrays: and, or, not;
 + Fast descriptive statistics from Parallel Colt (complete list found
bellow);
 + Easy manipulation of arrays: reshape, reduce dimension, permute,
section, slice, etc.;
 + Support for reading and writing NetCDF-3 files;
 + Reading of two dimensional arrays from CSV files (mainly for
debugging
and simple testing
     purposes);
 + StatList: a list that can grow/shrink and that can compute Parallel
Colt
descriptive
     statistics;
 + Experimental lazy evaluation (still slower than eager evaluation).

Descriptive statistics methods imported from Parallel Colt:
===========================================================

  + auto_correlation, correlation, covariance, durbin_watson,
frequencies,
geometric_mean,
  + harmonic_mean, kurtosis, lag1, max, mean, mean_deviation, median,
min,
moment, moment3,
  + moment4, pooled_mean, pooled_variance, product, quantile,
quantile_inverse,
  + rank_interpolated, rms, sample_covariance, sample_kurtosis,
sample_kurtosis_standard_error,
  + sample_skew, sample_skew_standard_error, sample_standard_deviation,
sample_variance,
  + sample_weighted_variance, skew, split,  standard_deviation,
standard_error, sum,
  + sum_of_inversions, sum_of_logarithms, sum_of_powers,
sum_of_power_deviations,
  + sum_of_squares, sum_of_squared_deviations, trimmed_mean, variance,
weighted_mean,
  + weighted_rms, weighted_sums, winsorized_mean.

Double and Float methods from Parallel Colt:
============================================

  + acos, asin, atan, atan2, ceil, cos, exp, floor, greater,
IEEEremainder,
inv, less, lg,
  + log, log2, rint, sin, sqrt, tan.

Double, Float, Long and Int methods from Parallel Colt:
=======================================================

  + abs, compare, div, divNeg, equals, isEqual (is_equal), isGreater
(is_greater),
  + isles (is_less), max, min, minus, mod, mult, multNeg (mult_neg),
multSquare (mult_square),
  + neg, plus (add), plusAbs (plus_abs), pow (power), sign, square.

Long and Int methods from Parallel Colt
=======================================

  + and, dec, factorial, inc, not, or, shiftLeft (shift_left),
shiftRightSigned
      (shift_right_signed), shiftRightUnsigned (shift_right_unsigned),
xor.

MDArray installation and download:
==================================

  + Install Jruby
  + jruby S gem install mdarray

MDArray Homepages:
==================

  + http://rubygems.org/gems/mdarray
  + https://github.com/rbotafogo/mdarray/wiki

Contributors:
=============
Contributors are welcome.

MDArray History:
================

  + 07/08/2013: Version 0.5.4 - Support for reading and writing NetCDF-3
files
  + 24/06/2013: Version 0.5.3  Over 90% Performance improvements for
methods imported
      from Parallel Colt and over 40% performance improvements for all
other methods
      (implemented in Ruby);
  + 16/05/2013: Version 0.5.0 - All loops transferred to Java with over
50%
performance
      improvements.  Descriptive statistics from Parallel Colt;
  + 19/04/2013: Version 0.4.3 - Fixes a simple, but fatal bug in 0.4.2.
No
new features;
  + 17/04/2013: Version 0.4.2 - Adds simple statistics and boolean
operators;
  + 05/04/2013: Version 0.4.0  Initial release.

NetCDF-3 Writing with MDArray API
=================================

require 'mdarray'

class NetCDF

  attr_reader :dir, :filename, :max_strlen


#---------------------------------------------------------------------------------------
  #

#---------------------------------------------------------------------------------------

  def initialize
    @dir = "~/tmp"
    @filename1 = "testWriter"
    @filename2 = "testWriteRecord2"
    @max_strlen = 80
  end


#---------------------------------------------------------------------------------------
  # Define the NetCDF-3 file

#---------------------------------------------------------------------------------------

  def define_file

    # We pass the directory, filename, filetype and optionaly the
outside_scope.
    #
    # I'm implementing in cygwin, so the need for method cygpath that
converts the
    # directory name to a Windows name.  In another environment, just
pass
the directory
    # name.
    #
    # Inside a block we have another scope, so the block cannot access
any
variables, etc.
    # from the ouside scope. If we pass the outside scope, in this case
we
are passing self,
    # we can access variables in the outside scope by using
@outside_scope.<variable>.
    NetCDF.define(cygpath(@dir), @filename1, "netcdf3", self) do

      # add dimensions
      dimension "lat", 64
      dimension "lon", 128

      # add variables and attributes
      # add Variable double temperature(lat, lon)
      variable "temperature", "double", [@dim_lat, @dim_lon]
      variable_att @var_temperature, "units", "K"
      variable_att @var_temperature, "scale", [1, 2, 3]

      # add a string-value variable: char svar(80)
      # note that this is created as a scalar variable although in
NetCDF-3
there is no
      # string type and the string has to be represented as a char type.
      variable "svar", "string", [], {:max_strlen =>
@outside_scope.max_strlen}

      # add a 2D string-valued variable: char names(names, 80)
      dimension "names", 3
      variable "names", "string", [@dim_names], {:max_strlen =>
@outside_scope.max_strlen}

      # add a scalar variable
      variable "scalar", "double", []

      # add global attributes
      global_att "yo", "face"
      global_att "versionD", 1.2, "double"
      global_att "versionF", 1.2, "float"
      global_att "versionI", 1, "int"
      global_att "versionS", 2, "short"
      global_att "versionB", 3, "byte"

    end

  end


#---------------------------------------------------------------------------------------
  # write data on the above define file

#---------------------------------------------------------------------------------------

  def write_file

    NetCDF.write(cygpath(@dir), @filename1, self) do

      temperature = find_variable("temperature")
      shape = temperature.shape
      data = MDArray.fromfunction("double", shape) do |i, j|
        i * 1_000_000 + j * 1_000
      end
      write(temperature, data)

      svar = find_variable("svar")
      write_string(svar, "Two pairs of ladies stockings!")

      names = find_variable("names")
      # careful here with the shape of a string variable.  A string
variable has one
      # more dimension than it should as there is no string type in
NetCDF-3.  As such,
      # if we look as names' shape it has 2 dimensions, be we need to
create a one
      # dimension string array.
      data = MDArray.string([3], ["No pairs of ladies stockings!",
                                  "One pair of ladies stockings!",
                                  "Two pairs of ladies stockings!"])
      write_string(names, data)

      # write scalar data
      scalar = find_variable("scalar")
      write(scalar, 222.333 )

    end

  end


#---------------------------------------------------------------------------------------
  # Define a file for writing one record at a time

#---------------------------------------------------------------------------------------

  def define_one_at_time

    NetCDF.define(cygpath(@dir), @filename2, "netcdf3", self) do

      dimension "lat", 3
      dimension "lon", 4
      # zero sized dimension is an unlimited dimension
      dimension "time", 0

      variable "lat", "float", [@dim_lat]
      variable_att @var_lat, "units", "degree_north"

      variable "lon", "float", [@dim_lon]
      variable_att @var_lon, "units", "degree_east"

      variable "rh", "int", [@dim_time, @dim_lat, @dim_lon]
      variable_att @var_rh, "long_name", "relative humidity"
      variable_att @var_rh, "units", "percent"

      variable "T", "double", [@dim_time, @dim_lat, @dim_lon]
      variable_att @var_t, "long_name", "surface temperature"
      variable_att @var_t, "units", "degC"

      variable "time", "int", [@dim_time]
      variable_att @var_time, "units", "hours since 1990-01-01"

    end

  end


#---------------------------------------------------------------------------------------
  # Define a file for writing one record at a time

#---------------------------------------------------------------------------------------

  def write_one_at_time

    NetCDF.write(cygpath(@dir), @filename2, self) do

      lat = find_variable("lat")
      lon = find_variable("lon")

      # write non recored data to the variables
      write(lat, MDArray.float([3], [41, 40, 39]))
      write(lon, MDArray.float([4], [-109, -107, -105, -103]))

      # get record variables from file
      rh = find_variable("rh")
      time = find_variable("time")
      t = find_variable("T")

      # there is no method find_dimension for NetcdfFileWriter, so we
need
to get the
      # dimension from a variable.
      rh_shape = rh.shape
      dim_lat = rh_shape[1]
      dim_lon = rh_shape[2]

      (0...10).each do |time_idx|

        # fill rh_data array
        rh_data = MDArray.fromfunction("int", [dim_lat, dim_lon]) do
|lat,
lon|
          time_idx * lat * lon
        end
        # reshape rh_data so that it has the same shape as rh variable
        # Method reshape! reshapes the array in-place without data
copying.
        rh_data.reshape!([1, dim_lat, dim_lon])

        # fill temp_data array
        temp_data = MDArray.fromfunction("double", [dim_lat, dim_lon])
do
|lat, lon|
          time_idx * lat * lon / 3.14159
        end
        # reshape temp_data array so that it has the same shape as temp
variable.
        temp_data.reshape!([1, dim_lat, dim_lon])

        # write the variables
        write(time, MDArray.int([1], [time_idx * 12]), [time_idx])
        write(rh, rh_data, [time_idx, 0, 0])
        write(t, temp_data, [time_idx, 0, 0])

      end # End time_idx loop

    end

  end

end

netcdf = NetCDF.new
netcdf.define_file
netcdf.write_file
netcdf.define_one_at_time
netcdf.write_one_at_time
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.