Forum: Ruby Beyond YAML? (scaling)

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
A87f7a014c624587fab0d3d78c5b9c18?d=identicon&s=25 Bil Kleb (Guest)
on 2007-05-03 15:50
(Received via mailing list)
Hi,

I've been using YAML files to store hashes of numbers, e.g.,

  { ["O_Kc_01"] => [ 0.01232, 0.01212, 0.03222, ... ], ... }

This has worked wonderfully for portability and visibility
into the system as I've been creating it.

Recently, however, I've increased my problem size by orders
of magnitude in both the number of variables and the number
of associated values. The resulting YAML files are prohibitive:
10s of MBs big and requiring 10s of minutes to dump/load.

Where should I go from here?

Thanks,
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2007-05-03 16:04
(Received via mailing list)
On May 3, 2007, at 8:50 AM, Bil Kleb wrote:

> of magnitude in both the number of variables and the number
> of associated values. The resulting YAML files are prohibitive:
> 10s of MBs big and requiring 10s of minutes to dump/load.
>
> Where should I go from here?

Some random thoughts:

* If they are just super straightforward lists of numbers like this a
trivial flat file scheme, say with one number per line, might get the
job done.
* XML can be pretty darn easy to output manually and if you use
REXML's stream parser (not slurping everything into a DOM) you should
be able to read it reasonably quick.
* If you are willing to sacrifice a little visibility, you can always
take the step up to a real database, even if it's just sqlite.  These
have varying degrees of portability as well.
* You might want to look at KirbyBase.  (It has a younger brother
Mongoose, but that uses binary output.)

Hope something in there helps.

James Edward Gray II
5b75d07b5adfc157376fe012845cb08a?d=identicon&s=25 Jamey Cribbs (Guest)
on 2007-05-03 16:05
(Received via mailing list)
Bil Kleb wrote:
> of magnitude in both the number of variables and the number
> of associated values. The resulting YAML files are prohibitive:
> 10s of MBs big and requiring 10s of minutes to dump/load.
>
> Where should I go from here?

Hey, Bil.  If you don't mind a couple of shameless plugs, you might want
to try KirbyBase or Mongoose.

KirbyBase should be faster than YAML and it still stores the data in
plain text files, if that is important to you.

Mongoose is faster than KirbyBase, at the expense of the data not being
stored as plain text.

I don't know if either will be fast enough for you.

HTH,

Jamey Cribbs
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (Guest)
on 2007-05-03 16:06
(Received via mailing list)
On Thu, May 03, 2007 at 10:50:06PM +0900, Bil Kleb wrote:
> 10s of MBs big and requiring 10s of minutes to dump/load.
>
> Where should I go from here?

Use a SQL database?

It all depends what sort of processing you're doing. If you're adding to
a
dataset (rather than starting with an entirely fresh data set each
time),
having a database makes sense. If you're doing searches across the data,
and/or if the data is larger than the available amount of RAM, then a
database makes sense. If you're only touching small subsets of the data
at
any one time, then a database makes sense.

Put it another way, does your processing really require you to read the
entire collection of objects into RAM before you can perform any
processing?

If it does, and your serialisation needs are as simple as it appears
above,
then maybe something like CSV would be better.

O_Kc_01,0.01232,0.01212,0.03222,...

If the source of the data is another Ruby program, then Marshal will be
much
faster than YAML (but unfortunately binary).

You could consider using something like Madeleine:
http://madeleine.rubyforge.org/
This snapshots your object tree to disk (using Marshal by default I
think,
but can also use YAML). You can then make incremental changes and
occasionally rewrite the snapshot.

B.
1b5341b64f7ce0244366eae17f06c801?d=identicon&s=25 unknown (Guest)
on 2007-05-03 16:12
(Received via mailing list)
On Thu, 3 May 2007, Bil Kleb wrote:

> I've been using YAML files to store hashes of numbers, e.g.,
>
> { ["O_Kc_01"] => [ 0.01232, 0.01212, 0.03222, ... ], ... }
>
> of magnitude in both the number of variables and the number
> of associated values. The resulting YAML files are prohibitive:
> 10s of MBs big and requiring 10s of minutes to dump/load.
>
> Where should I go from here?

I guess that depends on whether you need the files to be easily readable
or not.  If you don't, Marshal will be faster than YAML.


Kirk Haines
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2007-09-25 23:00
(Received via mailing list)
Bil Kleb wrote:
> khaines@enigo.com wrote:
>>
>> I guess that depends on whether you need the files to be easily
>> readable or not.  If you don't, Marshal will be faster than YAML.
>
> At this point, I'm looking for an easy out that will
> reduce size and increase speed, and I'm willing to
> go binary if necessary.

What about mmap and pack/unpack, as ara does in

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...

?
A87f7a014c624587fab0d3d78c5b9c18?d=identicon&s=25 Bil Kleb (Guest)
on 2007-09-25 23:02
(Received via mailing list)
William James wrote:
>
> Would making a copy of the hash use too much
> memory or time?

I don't know, but that's surely another way out of
the Marshal-hash-proc trap...

Thanks,
A87f7a014c624587fab0d3d78c5b9c18?d=identicon&s=25 Bil Kleb (Guest)
on 2007-09-25 23:03
(Received via mailing list)
Bil Kleb wrote:
> migrating to Marshal seems to be the Simplest Thing
> That Could Possibly Work.

Well, maybe not so simple...

  `dump': can't dump hash with default proc (TypeError)

which seems to be due to the trick I learned from zenspider
and drbrain to quickly setup a hash of arrays:,

  Hash.new{ |hash,key| hash[key]=[] }

Later,
Ae36591847393e58ff189704f5eb18f2?d=identicon&s=25 Jeremy Hinegardner (Guest)
on 2007-09-25 23:04
(Received via mailing list)
On Thu, May 03, 2007 at 11:45:05PM +0900, Bil Kleb wrote:
> khaines@enigo.com wrote:
> >
> >I guess that depends on whether you need the files to be easily readable
> >or not.  If you don't, Marshal will be faster than YAML.
>
> At this point, I'm looking for an easy out that will
> reduce size and increase speed, and I'm willing to
> go binary if necessary.

If you want to describe your data needs a bit, and what operations you
need to operate on it, I'll be happy to play around with an ruby/sqlite3
program and see what pops out.

Since there's no Ruby Quiz this weekend, we all need something to work
on :-).

enjoy,

-jeremy
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (Guest)
on 2007-09-25 23:04
(Received via mailing list)
On Fri, May 04, 2007 at 01:25:05AM +0900, Bil Kleb wrote:
> Bil Kleb wrote:
> >
> > Hash.new{ |hash,key| hash[key]=[] }
>
> Is there a better way than,
>
>  samples[tag] = [] unless samples.has_key? tag
>  samples[tag] << sample

Not exactly identical but usually good enough:

samples[tag] ||= []
samples[tag] << sample

And you can probably combine:

(samples[tag] ||= []) << sample
A87f7a014c624587fab0d3d78c5b9c18?d=identicon&s=25 Bil Kleb (Guest)
on 2007-09-25 23:04
(Received via mailing list)
Matt Lawrence wrote:
>
> You're building an Orion?  Please tell me it's not true!

No; I should have been more specific..."our Orion vehicle /design/". :)

Later,
1c0cd550766a3ee3e4a9c495926e4603?d=identicon&s=25 John Joyce (Guest)
on 2007-09-25 23:04
(Received via mailing list)
On May 4, 2007, at 12:35 AM, Bil Kleb wrote:

> Brian Candler wrote:
>> Use a SQL database?
>
> I always suspect that I should be doing that more often,
> but as my experience with databases is rather limited
> and infrequent, I always shy away from those as James
> already knows.  Regardless, I should probably overcome
> my aggressive incompetence one day!

Don't be afraid of the database solution. In the long term, it is
much more scalable and will pay dividends immediately.
MySQL and PostgreSQL are both pretty fast and scalable, but if you
have a large data set, you certainly do need to plan a schema
carefully, but it should be somewhat similar to your existing data
structures anyway.
the database APIs in Ruby are pretty simple.
A87f7a014c624587fab0d3d78c5b9c18?d=identicon&s=25 Bil Kleb (Guest)
on 2007-09-25 23:04
(Received via mailing list)
Robert Dober wrote:
>           user     system      total        real
> yaml   0.590000   0.000000   0.590000 (  0.754097)
> json   0.020000   0.000000   0.020000 (  0.018363)
>
> Looks promising, n'est-ce pas?

Neat.  Thanks for the data.  For now though, Marshal
is an adequate alternative.

Regards,
Ae36591847393e58ff189704f5eb18f2?d=identicon&s=25 Jeremy Hinegardner (Guest)
on 2007-09-25 23:04
(Received via mailing list)
Okay, so I didn't get to it this weekend, but it is an interesting
project, but can you explain a bit more on the data requirments?  I have
some questions inlined.

On Sat, May 05, 2007 at 11:20:05AM +0900, Bil Kleb wrote:
> sensitivity analysis[2] on some of the simulation codes used for our
>  1) Prepare a "sufficiently large" number of cases, each with random
>     variations of the input parameters per the tolerance DSL markup.
>     Save all these input variables and all their samples for step 5.

So the DSL generates a 'large' number of cases listing the input
parameters and their associated values.  The list of input parameters
static across all cases, or a set of cases?

That is, for a given "experiment" you have the same set of parameters,
just with a "large" numver of different values to put in those
parameters.


>  2) Run all the cases.
>  3) Collect all the samples of all the outputs of interest.

I'm also assuming that the output(s) for a given "experiment" would be
consistent in their parameters?

>  4) Compute running history of the output statistics to see
>     if they have have converged, i.e., the "sufficiently large"
>     guess was correct -- typically a wasteful number of around 3,000.
>     If not, start at step 1 again with a bigger number of cases.

So right now you are doing say for a given experiment f:

    inputs  : i1,i2,i3,...,in
    outputs : o1,o2,o3,...,0m

run f(i1,i2,i3,...,in) -> [o1,...,on] where the values for i1,...,in are
"jiggled"  And you have around 3,000 diferent sets of inputs.

> 5 and 6.
[...]

> [5] The current system consists of 5 Ruby codes at ~40 lines each
> plus some equally tiny library routines.

Would you be willing to share these?  I'm not sure if what I'm assuming
about your problem is correct or not, but it intrigues me and I'd like
to fiddle with the general problem :-).  I'd be happy to talk offlist
too.

enjoy,

-jeremy
A87f7a014c624587fab0d3d78c5b9c18?d=identicon&s=25 Bil Kleb (Guest)
on 2007-09-25 23:05
(Received via mailing list)
Jamey Cribbs wrote:
>
> Hey, Bil.

Hi.

> KirbyBase should be faster than YAML and it still stores the data in
> plain text files, if that is important to you.

Plain text will be too big -- I've got an n^2 problem.

> Mongoose is faster than KirbyBase, at the expense of the data not being
> stored as plain text.

Sounds intriguing, but where can I find some docs?  So far, I'm
coming up empty...

Regards,
Ae36591847393e58ff189704f5eb18f2?d=identicon&s=25 Jeremy Hinegardner (Guest)
on 2007-09-25 23:05
(Received via mailing list)
On Mon, May 07, 2007 at 08:15:05PM +0900, Bil Kleb wrote:
> Here, the 2nd realization of the output, 'stag_pr', 108.02 corresponds
> to the 2nd case and is associated with the 2nd entries in the 'F_x'
> and 'q_r' arrays, 1.12 and 3.89e+8, respectively.

Yeah, that's what explains it best for me.


> >I'm not sure if what I'm assuming
> >about your problem is correct or not, but it intrigues me and I'd like
> >to fiddle with the general problem :-).
>
> The more I explain it, the more I learn about it; so thanks
> for the interest.

I haven't forgotten about this, I have a couple of ways to manage it
with SQLite but how you want to deal with the data after running all the
cases could influence the direction.

You said you want to test for convergence after cases are run?
Basically you want to do some calculations using the inputs and outputs
after each case ( or N cases ) and save those calculations off to the
side until they reach some error/limit etc?  For this do you want to
just record "After case N my running calculations(f,g,h) over the cases
run so
far are x,y,z"

That is something like:

    Case    Running calc f, Running calc g, Running calc h
       1        1.0         2.0             3.0
       10       2.0         4.0             9.0
       ....

Also, do you do any comparison between experiments?  That is, for one
scenario that you have a few thousand cases for with your inputs that
are jiggled; would you do anything with those results in relation to
some other scenario?  Or are all scenario/experiments isolated?

enjoy,

-jeremy
0c00d644de3b8bb2f655908c79af25a5?d=identicon&s=25 Matt Lawrence (Guest)
on 2007-09-25 23:06
(Received via mailing list)
On Sat, 5 May 2007, Bil Kleb wrote:

> I've created a small tolerance DSL, and coupled with the Monte Carlo
> Method[1] and the Pearson Correlation Coefficient[2], I'm performing
> sensitivity analysis[2] on some of the simulation codes used for our
> Orion vehicle[3].  In other words, jiggle the inputs, and see how
> sensitive the outputs are and which inputs are the most influential.

You're building an Orion?  Please tell me it's not true!

-- Matt
It's not what I know that counts.
It's what I can remember in time to use.
A87f7a014c624587fab0d3d78c5b9c18?d=identicon&s=25 Bil Kleb (Guest)
on 2007-09-25 23:07
(Received via mailing list)
Jeremy Hinegardner wrote:
> So the DSL generates a 'large' number of cases listing the input
> parameters and their associated values.  The list of input parameters
> static across all cases, or a set of cases?

The input parameter names are static across all cases, but
for each case, the parameter value will vary randomly according
to the tolerance DSL, e.g., 1.05+/-0.2.  Currently, I have all
these as a hash of arrays, e.g.,

  { 'F_x' => [ 1.23, 1.12, 0.92, 1.01, ... ],
    'q_r' => [ 1.34e+9, 3.89e+8, 8.98e+8, 5.23e+9, ... ], ... }

where 1.23 is the sample for input parameter 'F_x' for the
first case, 1.12 is the sample for the second case, etc.,
and 1.34e+9 is the sample for input parameter 'q_r' for
the first case, and so forth.

> That is, for a given "experiment" you have the same set of parameters,
> just with a "large" numver of different values to put in those
> parameters.

Yes, if I understand you correctly.

>>  2) Run all the cases.
>>  3) Collect all the samples of all the outputs of interest.
>
> I'm also assuming that the output(s) for a given "experiment" would be
> consistent in their parameters?

Yes, the output parameter hash has the same structure as the input
hash, although it typically has fewer parameters.  The number
and sequence of values (realizations) for each output parameter,
however, corresponds exactly to the array of samples for each input
parameter.  For example, the outputs hash may look like,

  { 'heating' => [ 75.23, 76.54, ... ],
    'stag_pr' => [ 102.13, 108.02, ... ], ... }

Here, the 2nd realization of the output, 'stag_pr', 108.02 corresponds
to the 2nd case and is associated with the 2nd entries in the 'F_x'
and 'q_r' arrays, 1.12 and 3.89e+8, respectively.

> So right now you are doing say for a given experiment f:
>
>     inputs  : i1,i2,i3,...,in
>     outputs : o1,o2,o3,...,0m
>
> run f(i1,i2,i3,...,in) -> [o1,...,on] where the values for i1,...,in are
> "jiggled"  And you have around 3,000 diferent sets of inputs.

Yes, where 'inputs' and 'outputs' are vectors m and k long,
respectively; so you have a matrix of values, e.g.,

  input1  : i1_1, i1_2, i1_3, ..., i1_n
  input2  : i2_1, i2_2, i2_3, ..., i2_n
    .        .    .     .     .    .
    .        [ m x n matrix ]      .
    .        .    .     .     .    .
  inputj  : im_1, im_2, im_3, ..., im_n

  output1 : o1_1, o1_2, o1_3, ..., o1_n
  output1 : o2_1, o2_2, o2_3, ..., o2_n
    .        .    .     .     .    .
    .        [ k x n matrix ]      .
    .        .    .     .     .    .
  output1 : ok_1, ok_2, ok_3, ..., ok_n

> Would you be willing to share these?

/I/ am willing, but unfortunately I'm also mired in red tape.

> I'm not sure if what I'm assuming
> about your problem is correct or not, but it intrigues me and I'd like
> to fiddle with the general problem :-).

The more I explain it, the more I learn about it; so thanks
for the interest.

Regards,
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2007-09-25 23:08
(Received via mailing list)
On May 3, 10:11 am, Bil Kleb <Bil.K...@NASA.gov> wrote:
> > Of the answers I've seen so far (thanks everyone!),
>   Hash.new{ |hash,key| hash[key]=[] }
>
> Later,
> --
> Bil Klebhttp://fun3d.larc.nasa.gov

Would making a copy of the hash use too much
memory or time?

h=Hash.new{ |hash,key| hash[key]=[] }

h['foo'] << 44
h['foo'] << 88

h_copy = {}
h.each{|k,v| h_copy[k] = v}
p h_copy
E34b5cae57e0dd170114dba444e37852?d=identicon&s=25 Logan Capaldo (Guest)
on 2007-09-25 23:09
(Received via mailing list)
On 5/4/07, Bil Kleb <Bil.Kleb@nasa.gov> wrote:
> --
> Bil Kleb
> http://fun3d.larc.nasa.gov
>
>
I dunno how safe it is to rely on this behavior, but calling
Hash#default=
strips the proc and makes the hash marshallable:

>> h = Hash.new { |h,k| h[k] = [] }
=> {}
>> h["a"]
=> []
>> h
=> {"a"=>[]}
>> h["b"] = 7
=> 7
>> h
=> {"a"=>[], "b"=>7}
>> h.default = nil
=> nil
>> h
=> {"a"=>[], "b"=>7}
>> Marshal.dump h
=> "\004\b{\a\"\006a[\000\"\006bi\f"

of course this means you won't get your default proc back when you load
the
hash and mutate it. However:
db = Marshal.load( from_disk )
delta = Hash.new { |h, k| h[k] = if db.has_key? k then db[k] else [] end
}
delta["foo"] = "add a new value"
delta["bar"] << "update something in the original"
... more of the same ...
to_disk = Marshal.dump( db.merge!( delta ) )
A87f7a014c624587fab0d3d78c5b9c18?d=identicon&s=25 Bil Kleb (Guest)
on 2007-09-25 23:10
(Received via mailing list)
Brian Candler wrote:
>
> Use a SQL database?

I always suspect that I should be doing that more often,
but as my experience with databases is rather limited
and infrequent, I always shy away from those as James
already knows.  Regardless, I should probably overcome
my aggressive incompetence one day!

> It all depends what sort of processing you're doing. If you're adding to a
> dataset (rather than starting with an entirely fresh data set each time),
> having a database makes sense.

In this point, I'm generating an entirely fresh data set
each time, but I can foresee a point where that will change
to an incremental model...

> Put it another way, does your processing really require you to read the
> entire collection of objects into RAM before you can perform any processing?

Yes, AFAIK, but I suppose there are algorithms that could
compute statistical correlations incrementally.

> You could consider using something like Madeleine:
> http://madeleine.rubyforge.org/
> This snapshots your object tree to disk (using Marshal by default I think,
> but can also use YAML). You can then make incremental changes and
> occasionally rewrite the snapshot.

Probably not a good fit as I won't change existing data,
only add new...

Thanks,
5b75d07b5adfc157376fe012845cb08a?d=identicon&s=25 Jamey Cribbs (Guest)
on 2007-09-25 23:10
(Received via mailing list)
Bil Kleb wrote:
>
>> Mongoose is faster than KirbyBase, at the expense of the data not
>> being stored as plain text.
>
> Sounds intriguing, but where can I find some docs?  So far, I'm
> coming up empty...

Docs are light compared to KirbyBase.  If you  download the
distribution, there is the README file, some pretty good examples in the
aptly named "examples" directory, and unit tests in the "tests"
directory.

HTH,

Jamey
703fbc991fd63e0e1db54dca9ea31b53?d=identicon&s=25 Robert Dober (Guest)
on 2007-09-25 23:10
(Received via mailing list)
On 5/7/07, Bil Kleb <Bil.Kleb@nasa.gov> wrote:
Bill, maybe you want to have a look at JSON
http://json.rubyforge.org/

I do not have time right now to benchmark the reading, but the writing
gives some spectacular results, look at this:

517/17 > cat test-out.rb && ruby test-out.rb
# vim: sts=2 sw=2 expandtab nu tw=0:

require 'yaml'
require 'rubygems'
require 'json'
require 'benchmark'

@hash = Hash[*(1..100).map{|l| "k_%03d" % l}.zip([*1..100]).flatten]

Benchmark.bmbm do
  |bench|
  bench.report( "yaml" ) { 50.times{ @hash.to_yaml } }
  bench.report( "json" ) { 50.times{ @hash.to_json } }
end

Rehearsal ----------------------------------------
yaml   0.630000   0.030000   0.660000 (  0.748123)
json   0.020000   0.000000   0.020000 (  0.079732)
------------------------------- total: 0.680000sec

           user     system      total        real
yaml   0.590000   0.000000   0.590000 (  0.754097)
json   0.020000   0.000000   0.020000 (  0.018363)

Looks promising, n'est-ce pas?

Maybe you want to investigate that a little bit more, JSON is of
course very readable, look e.g at this:

irb(main):002:0> require 'rubygems'
=> true
irb(main):003:0> require 'json'
=> true
irb(main):004:0> {:a => [*42..84]}.to_json
=>
"{\"a\":[42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84]}"

HTH
Robert
A87f7a014c624587fab0d3d78c5b9c18?d=identicon&s=25 Bil Kleb (Guest)
on 2007-09-25 23:10
(Received via mailing list)
Jamey Cribbs wrote:
>
> Docs are light compared to KirbyBase.  If you  download the
> distribution, there is the README file, some pretty good examples in the
> aptly named "examples" directory, and unit tests in the "tests" directory.

Roger, I was afraid you'd say that.  :)

Please throw those up on your Rubyforge webpage at some point?

Later,
A87f7a014c624587fab0d3d78c5b9c18?d=identicon&s=25 Bil Kleb (Guest)
on 2007-09-25 23:10
(Received via mailing list)
Hi,

Jeremy Hinegardner wrote:
>
> If you want to describe your data needs a bit, and what operations you
> need to operate on it, I'll be happy to play around with an ruby/sqlite3
> program and see what pops out.

I've created a small tolerance DSL, and coupled with the Monte Carlo
Method[1] and the Pearson Correlation Coefficient[2], I'm performing
sensitivity analysis[2] on some of the simulation codes used for our
Orion vehicle[3].  In other words, jiggle the inputs, and see how
sensitive the outputs are and which inputs are the most influential.

The current system[5] works, and after the YAML->Marshal migration,
it scales well enough for now.  The trouble is the entire architecture
is wrong if I want to monitor the Monte Carlos statistics to see
if I can stop sampling, i.e., the statistics are converged.

The current system consists of the following steps:

  1) Prepare a "sufficiently large" number of cases, each with random
     variations of the input parameters per the tolerance DSL markup.
     Save all these input variables and all their samples for step 5.
  2) Run all the cases.
  3) Collect all the samples of all the outputs of interest.
  4) Compute running history of the output statistics to see
     if they have have converged, i.e., the "sufficiently large"
     guess was correct -- typically a wasteful number of around 3,000.
     If not, start at step 1 again with a bigger number of cases.
  5) Compute normalized Pearson correlation coefficients for the
     outputs and see which inputs they are most sensitive to by
     using the data collected in steps 1 and 3.
  6) Lobby for experiments to nail down these "tall pole" uncertainties.

This system is plagued by the question of "sufficiently large"?
The next generation system would do steps 1 through 3 in small
batches, and at the end of each batch, check for the statistical
convergence of step 4.  If convergence has been reached, shutdown
the Monte Carlo process, declare victory, and proceed with steps
5 and 6.

I'm thinking this more incremental approach, and my lack of database
experience would make a perfect match for Mongoose[6]...

> Since there's no Ruby Quiz this weekend, we all need something to work
> on :-).

:)

Regards,
--
Bil Kleb
http://fun3d.larc.nasa.gov

[1] http://en.wikipedia.org/wiki/Monte_Carlo_method
[2] http://en.wikipedia.org/wiki/Pearson_correlation
[3] http://en.wikipedia.org/wiki/Sensitivity_analysis
[4] http://en.wikipedia.org/wiki/Crew_Exploration_Vehicle
[5] The current system consists of 5 Ruby codes at ~40 lines each
plus some equally tiny library routines.
[6] http://mongoose.rubyforge.org/
A87f7a014c624587fab0d3d78c5b9c18?d=identicon&s=25 Bil Kleb (Guest)
on 2007-09-25 23:10
(Received via mailing list)
Bil Kleb wrote:
>
>  Hash.new{ |hash,key| hash[key]=[] }

Is there a better way than,

  samples[tag] = [] unless samples.has_key? tag
  samples[tag] << sample

?

Anyway, apart from Marshal not having a convenient
#load_file method like Yaml, the conversion was
very painless and dropped file sizes considerably
and run times into the minutes category instead of
hours.

Thanks,
A87f7a014c624587fab0d3d78c5b9c18?d=identicon&s=25 Bil Kleb (Guest)
on 2007-09-25 23:11
(Received via mailing list)
khaines@enigo.com wrote:
>
> I guess that depends on whether you need the files to be easily readable
> or not.  If you don't, Marshal will be faster than YAML.

At this point, I'm looking for an easy out that will
reduce size and increase speed, and I'm willing to
go binary if necessary.

Of the answers I've seen so far (thanks everyone!),
migrating to Marshal seems to be the Simplest Thing
That Could Possibly Work.

Thanks,
This topic is locked and can not be replied to.