Manipulating big hunks of data

Is there any kind of container-like-thing in Ruby for the special case
of a large number of fixed-type-datums? (As opposed to objects.)

Basically, imagine that I have, for instance, a huge matrix of data.
So far as I can tell, that really means a huge selection of individual
objects, all allocated possibly using separate chunks of memory, because
they all have to be, well, objects – they can’t just be raw data.
Obviously,
using any given item in the matrix is convenient if they’re all already
objects, but the storage looks like it’d be ridiculously large.

Is this just not idiomatic in Ruby? Is there some base class or object
type I haven’t spotted yet which handles cases like this?

-s

Seebs wrote:

Is this just not idiomatic in Ruby? Is there some base class or object
type I haven’t spotted yet which handles cases like this?

Not sure how closely these fit your requirements, but a couple
seeming possibilities that come to mind are Guy Decoux’ mmap
module, and Tokyo Cabinet’s array-of-fixed-length-elements database
option:

http://1978th.net/tokyocabinet/
http://1978th.net/tokyocabinet/rubydoc/

Hope this helps,

Bill

On Jan 17, 2010, at 6:44 PM, Bill K. wrote:

…and Tokyo Cabinet’s array-of-fixed-length-elements database
option:

GitHub - knu/ruby-mmap: Ruby bindings for Unix mmap(2) by Guy Decoux

http://1978th.net/tokyocabinet/
http://1978th.net/tokyocabinet/rubydoc/

I wrote a bit about Tokyo Cabinet’s Fixed-length Database recently, in
case it helps:

http://blog.grayproductions.net/articles/tokyo_cabinets_keyvalue_database_types

I kept thinking of the fantastic NArray library while reading the
initial message, but it’s just for in memory work.

James Edward G. II

On 2010-01-18, James Edward G. II [email protected] wrote:

I kept thinking of the fantastic NArray library while reading
the initial message, but it’s just for in memory work.

That’d be fine in my case. I’m sort of messing with thoughts about
doing a roguelike game, and somewhere in there, there’s nearly always
a level grid, which is typically a fairly large array of something…
But it’s lareg enough, and regenerated/reused/etc. enough, that having
thousands upon thousands of objects created and destroyed when messing
with it feels inefficient to me.

Disclaimer: My sense of what kinds of tasks are “too inefficient” was
developed back when a 5MHz system was intended for time sharing among
many users.

But it’s nice to know that a solution for this problem exists – if I
need something like that, NArray would solve the cases I most often have
to deal with.

-s

2010/1/17 Seebs [email protected]:

Is this just not idiomatic in Ruby? Is there some base class or object
type I haven’t spotted yet which handles cases like this?

Since everything is an object in Ruby (well, almost) I am not sure
what to make of your distinction. What I often do in these situations
is this

unifier = Hash.new {|h,k| h[k.freeze] = k}

read or create lots of stuff

item = unifier[item]

That way you keep at least only one version of a set of equivalent
objects. This helps of course only if you have repetitive values.

Depending on your use case another approach could be to store in a
single (or multiple) Strings and use #pack and #unpack. I guess this
works best if your data is uniformly sized.

Btw, what volume of data are we talking about?

Kind regards

robert

On 2010-01-18, Robert K. [email protected] wrote:

Btw, what volume of data are we talking about?

Well, as an example, say I were doing number-crunching, and I wanted to
have a block of, say, twenty million doubles.

It looks like NArray is the right tool for the job – it can give me
array-like semantics on things which have the behavior of doubles, but
without me having to keep 20M objects wrapping my 20M double values.

-s