SciCom 0.2.3.1

Announcement

SciCom version 0.2.3.1 has been released. SciCom (Scientific Computing)
for Ruby brings the power of R to the Ruby community. SciCom is based
on Renjin, a JVM-based interpreter for the R language for statistical
computing.
Home Pages

Contributors

Contributors are wellcome!
Features R on the JVM

Over the past two decades, the R language for statistical computing
has emerged as the de facto standard for analysts, statisticians, and
scientists. Today, a wide range of enterprises – from pharmaceuticals
to insurance – depend on R for key business uses. Renjin is a new
implementation of the R language and environment for the Java Virtual
Machine (JVM), whose goal is to enable transparent analysis of big
data sets and seamless integration with other enterprise systems such
as databases and application servers.

Renjin is still under development, but it is already being used in
production for a number of client projects, and supports most CRAN
packages, including some with C/Fortran dependencies.
SciCom and Renjin

SciCom integrates with Renjin and allows the use of R inside a Ruby
script. In a sense, SciCom is similar to other solutions such as
RinRuby, Rpy2, PipeR, etc. However, since SciCom and Renjin both
target the JVM there is no need to integrate both solutions and there
is no need to send data between Ruby and R, as it all resides in the
same JVM. Further, installation of SciCom does not require the
installation of GNU R; Renjin is the interpreter and comes with
SciCom. Finally, although SciCom provides a basic interface to Renjin
similar to RinRuby, a much tighter integration is also possible (see
examples below).

SciCom with Standard R Interface

SciCom allows R programmers to use R commands inside a Ruby script in
a way similar to RinRuby by calling method eval and passing to it an R
script:

Basic integration with R can always be done by calling eval and

passing it a valid

R expression.

R.eval(“r.i = 10L”)
R.eval(“print(r.i)”)

[1] 10

R.eval(“vec = c(10, 20, 30, 40, 50)”)
R.eval(“print(vec)”)

[1] 10 20 30 40 50

R.eval(“print(vec[1])”)

[1] 10

Programmers can also use here docs to integrate an R script inside a
Ruby script. The next example show a model for predicting baseball
wins based on runs allowed and runs scored. The data comes from
Baseball-Reference.com.

R.eval <<EOF

# This dataset comes from Baseball-Reference.com.
baseball = read.csv("baseball.csv")
# str has a bug in Renjin
# str(data)

# Lets look at the data available for Momeyball.
moneyball = subset(baseball, Year < 2002)

# Let's see if we can predict the number of wins, by lookin at
# runs allowed (RA) and runs scored (RS).  RD is the runs 

difference.
# We are making a linear model from predicting wins (W) based on RD
moneyball$RD = moneyball$RS - moneyball$RA
WinsReg = lm(W ~ RD, data=moneyball)
print(summary(WinsReg))

EOF

The output of the program above is:

Call:
lm(data = moneyball, formula = W ~ RD)

Residuals:
Min 1Q Median 3Q Max
-14,266 -2,651 0,123 2,936 11,657

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 80,881 0,131 616,675 <0 ***
RD 0,106 0,001 81,554 <0 ***

Signif. codes: 0 ‘’ 0,001 '’ 0,01 '’ 0,05 ‘.’ 0,1 ’ ’ 1

Residual standard error: 3,939 on 900 degrees of freedom
Multiple R-squared: 0,8808, Adjusted R-squared: 0,8807
F-statistic: 6.650,9926 on 1 and 900 DF, p-value: < 0

The SciCom “language”

SciCom also allows for implementing R scripts in a “language” that is
just like Ruby, so that the developer does not need to know that she
is actually writing an R script. All R methods are accessible through
an R namespace.

The next script is the same baseball model done in R above using
SciCom ‘language’:

require ‘scicom’

This dataset comes from Baseball-Reference.com.

baseball = R.read__csv(“baseball.csv”)

Lets look at the data available for Momeyball.

moneyball = baseball.subset(baseball.Year < 2002)

Let’s see if we can predict the number of wins, by looking at

runs allowed (RA) and runs scored (RS). RD is the runs difference.

We are making a linear model for predicting wins (W) based on RD

moneyball.RD = moneyball.RS - moneyball.RA
wins_reg = R.lm(“W ~ RD”, data: moneyball)
wins_reg.summary.pp

We show bellow an example of calculating the correlation matrix
without using the build-in functions. First this is done in an R
script and then using SciCom:

Create a matrix and give it rownames and colnames

set.seed(42)
Xij ← matrix(sample(seq(0, 9), 40, replace = TRUE), ncol = 4)
rownames(Xij) ← paste(“S”, seq(1, dim(Xij)[1]), sep = “”)
colnames(Xij) ← paste(“V”, seq(1, dim(Xij)[2]), sep = “”)

find the means of the columns

n ← dim(Xij)[1]
one ← rep(1, n)
X.means ← t(one) %*% Xij/n

find the covariance of the matrix

X.diff ← Xij - one %% X.means
X.cov ← t(X.diff) %
% X.diff/(n - 1)
round(X.cov, 2)

find the correlation

sdi ← diag(1/sqrt(diag(X.cov)))
rownames(sdi) ← colnames(sdi) ← colnames(X.cov)
round(sdi, 2)
X.cor ← sdi %% X.cov %% sdi
rownames(X.cor) ← colnames(X.cor) ← colnames(X.cov)
round(X.cor, 2)

Now the same code using SciCom

require ‘scicom’

Create a matrix and give it rownames and colnames

R.set__seed(42)
R.seq(0,9).sample(40, replace: TRUE).matrix(ncol: 4)
.fassign(:rownames, R.paste(“S”, R.seq(1, xij.attr.dim[1]), sep: “”))
.fassign(:colnames, R.paste(“V”, R.seq(1, xij.attr.dim[2]), sep: “”))

find the means of the columns

n = xij.dim[1]
one = R.rep(1, n)
x_means = one.t._ :*, xij/n

find the covariance of the matrix

x_diff = xij - (one._ :, x_means)
x_cov = (x_diff.t._ :
, x_diff/(n - 1)).round(2)

find the correlation

sdi = (1 / x_cov.diag.sqrt).diag.round(2)
sdi.fassign(:rownames, x_cov.rownames)
sdi.fassign(:colnames, x_cov.colnames)
x_cor = ((sdi._ :, x_cov)._ :, sdi)
.round(2)
.fassign(:rownames, x_cov.rownames)
.fassign(:colnames, x_cov.colnames)

As another example, here is a SciCom script to print the number of
days for every month is 2005:

require ‘scicom’
everyday = R.seq(from: R.as__Date(‘2005-1-1’), to:
R.as__Date(‘2005-12-31’), by: ‘day’)
cmonth = everyday.format(‘%b’)
cmonth
.factor(levels: cmonth.unique, ordered: TRUE)
.table
.pp

As can be seen from these examples, R methods can be accessed through
the R namespace in SciCom, so, R method ‘seq’ is called in SciCom as
‘R.seq’. R methods that are applied on objects can be called in two
ways, either using the R namespace as in ‘R.factor’ or directly on the
object, as in this case we did ‘cmonth.factor’. This last example
shows how SciCom allows method chaining, which is not possible in an R
script.