Announcement

SciCom version 0.2.3.1 has been released. SciCom (Scientific Computing)

for Ruby brings the power of R to the Ruby community. SciCom is based

on Renjin, a JVM-based interpreter for the R language for statistical

computing.

Home Pages

- SciCom can be downloaded from http://rubygems.org/gems/scicom
- GitHub page: https://github.com/rbotafogo/scicom
- Wiki: https://github.com/rbotafogo/scicom/wiki
- Issues: https://github.com/rbotafogo/scicom/issues

Contributors

Contributors are wellcome!

Features R on the JVM

Over the past two decades, the R language for statistical computing

has emerged as the de facto standard for analysts, statisticians, and

scientists. Today, a wide range of enterprises – from pharmaceuticals

to insurance – depend on R for key business uses. Renjin is a new

implementation of the R language and environment for the Java Virtual

Machine (JVM), whose goal is to enable transparent analysis of big

data sets and seamless integration with other enterprise systems such

as databases and application servers.

Renjin is still under development, but it is already being used in

production for a number of client projects, and supports most CRAN

packages, including some with C/Fortran dependencies.

SciCom and Renjin

## SciCom integrates with Renjin and allows the use of R inside a Ruby

script. In a sense, SciCom is similar to other solutions such as

RinRuby, Rpy2, PipeR, etc. However, since SciCom and Renjin both

target the JVM there is no need to integrate both solutions and there

is no need to send data between Ruby and R, as it all resides in the

same JVM. Further, installation of SciCom does not require the

installation of GNU R; Renjin is the interpreter and comes with

SciCom. Finally, although SciCom provides a basic interface to Renjin

similar to RinRuby, a much tighter integration is also possible (see

examples below).

SciCom with Standard R Interface

SciCom allows R programmers to use R commands inside a Ruby script in

a way similar to RinRuby by calling method eval and passing to it an R

script:

# Basic integration with R can always be done by calling eval and

passing it a valid

# R expression.

R.eval(“r.i = 10L”)

R.eval(“print(r.i)”)

[1] 10

R.eval(“vec = c(10, 20, 30, 40, 50)”)

R.eval(“print(vec)”)

[1] 10 20 30 40 50

R.eval(“print(vec[1])”)

[1] 10

Programmers can also use here docs to integrate an R script inside a

Ruby script. The next example show a model for predicting baseball

wins based on runs allowed and runs scored. The data comes from

Baseball-Reference.com.

R.eval <<EOF

```
# This dataset comes from Baseball-Reference.com.
baseball = read.csv("baseball.csv")
# str has a bug in Renjin
# str(data)
# Lets look at the data available for Momeyball.
moneyball = subset(baseball, Year < 2002)
# Let's see if we can predict the number of wins, by lookin at
# runs allowed (RA) and runs scored (RS). RD is the runs
```

difference.

# We are making a linear model from predicting wins (W) based on RD

moneyball$RD = moneyball$RS - moneyball$RA

WinsReg = lm(W ~ RD, data=moneyball)

print(summary(WinsReg))

EOF

The output of the program above is:

Call:

lm(data = moneyball, formula = W ~ RD)

Residuals:

Min 1Q Median 3Q Max

-14,266 -2,651 0,123 2,936 11,657

## Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 80,881 0,131 616,675 <0 ***

RD 0,106 0,001 81,554 <0 ***

Signif. codes: 0 ‘* ’ 0,001 '’ 0,01 '*’ 0,05 ‘.’ 0,1 ’ ’ 1

Residual standard error: 3,939 on 900 degrees of freedom

Multiple R-squared: 0,8808, Adjusted R-squared: 0,8807

F-statistic: 6.650,9926 on 1 and 900 DF, p-value: < 0

The SciCom “language”

SciCom also allows for implementing R scripts in a “language” that is

just like Ruby, so that the developer does not need to know that she

is actually writing an R script. All R methods are accessible through

an R namespace.

The next script is the same baseball model done in R above using

SciCom ‘language’:

require ‘scicom’

# This dataset comes from Baseball-Reference.com.

baseball = R.read__csv(“baseball.csv”)

# Lets look at the data available for Momeyball.

moneyball = baseball.subset(baseball.Year < 2002)

# Let’s see if we can predict the number of wins, by looking at

# runs allowed (RA) and runs scored (RS). RD is the runs difference.

# We are making a linear model for predicting wins (W) based on RD

moneyball.RD = moneyball.RS - moneyball.RA

wins_reg = R.lm(“W ~ RD”, data: moneyball)

wins_reg.summary.pp

We show bellow an example of calculating the correlation matrix

without using the build-in functions. First this is done in an R

script and then using SciCom:

# Create a matrix and give it rownames and colnames

set.seed(42)

Xij <- matrix(sample(seq(0, 9), 40, replace = TRUE), ncol = 4)

rownames(Xij) <- paste(“S”, seq(1, dim(Xij)[1]), sep = “”)

colnames(Xij) <- paste(“V”, seq(1, dim(Xij)[2]), sep = “”)

# find the means of the columns

n <- dim(Xij)[1]

one <- rep(1, n)

X.means <- t(one) %*% Xij/n

# find the covariance of the matrix

X.diff <- Xij - one %*% X.means
X.cov <- t(X.diff) %*% X.diff/(n - 1)

round(X.cov, 2)

# find the correlation

sdi <- diag(1/sqrt(diag(X.cov)))

rownames(sdi) <- colnames(sdi) <- colnames(X.cov)

round(sdi, 2)

X.cor <- sdi %*% X.cov %*% sdi

rownames(X.cor) <- colnames(X.cor) <- colnames(X.cov)

round(X.cor, 2)

Now the same code using SciCom

require ‘scicom’

# Create a matrix and give it rownames and colnames

R.set__seed(42)

R.seq(0,9).sample(40, replace: TRUE).matrix(ncol: 4)

.fassign(:rownames, R.paste(“S”, R.seq(1, xij.attr.dim[1]), sep: “”))

.fassign(:colnames, R.paste(“V”, R.seq(1, xij.attr.dim[2]), sep: “”))

# find the means of the columns

n = xij.dim[1]

one = R.rep(1, n)

x_means = one.t._ :*, xij/n

# find the covariance of the matrix

x_diff = xij - (one._ :*, x_means)
x_cov = (x_diff.t._ :*, x_diff/(n - 1)).round(2)

# find the correlation

sdi = (1 / x_cov.diag.sqrt).diag.round(2)

sdi.fassign(:rownames, x_cov.rownames)

sdi.fassign(:colnames, x_cov.colnames)

x_cor = ((sdi._ :*, x_cov)._ :*, sdi)

.round(2)

.fassign(:rownames, x_cov.rownames)

.fassign(:colnames, x_cov.colnames)

As another example, here is a SciCom script to print the number of

days for every month is 2005:

require ‘scicom’

everyday = R.seq(from: R.as__Date(‘2005-1-1’), to:

R.as__Date(‘2005-12-31’), by: ‘day’)

cmonth = everyday.format(’%b’)

cmonth

.factor(levels: cmonth.unique, ordered: TRUE)

.table

.pp

As can be seen from these examples, R methods can be accessed through

the R namespace in SciCom, so, R method ‘seq’ is called in SciCom as

‘R.seq’. R methods that are applied on objects can be called in two

ways, either using the R namespace as in ‘R.factor’ or directly on the

object, as in this case we did ‘cmonth.factor’. This last example

shows how SciCom allows method chaining, which is not possible in an R

script.