Hi,
I am doing some scientific work in Ruby. So far the data is stored
into a massive file (just Marshalled or YAMLed) and then once the run
is over it is loaded back into memory for some data analysis.
This (not very elegant method) has worked so far but as I will need to
raise the scale to several GBs the time lost in loading and slicing
the data in memory is getting painful. Hence I thought I could take
advantage of the many DB that exist for Ruby. Alas… they are so
many, and I’ve got no database experience, so I am not sure what would
be best.
The data can be thought of as hierarchical tables: i.e. tables in the
SQL sense with some entries corresponding to entire new tables, and
the others either strings or numbers (integers and float). I have only
access to one computer, though with multiple processors, so I don’t
have any need for distributing the data. Finally I’d like speed, but
given the database size I can’t keep all the info in memory.
The kind of operations I need to do involve getting slices of tables
(all the rows but a subset of columns), often in all the “subtables”
I’ve got. I also need to add a few rows to each table when the
simulation is running (but not for analysis) however the rate is
rather low so I don’t expect this to be much of a constraint.
Could people with more DB experience give a few suggestions? I’ve read
of MongoDB, CouchDB, MySQL and PostgreSQL… and to be honest I am
pretty lost, so I am sending this in the hope some fellow Rubyist can
help a poor DB newbie.
Thank you to all in advance.
Diego V.