YAML::dump slow

This is interesting and counterintuitive …

I am creating a YAML file with about 200,000 entries. The YAML file
“reflects” the data structure of a directory tree of a disk. All I
have is directory names and subdirectories and filenames in the (sub)
directories. I have 100,000 simulated directories and 100,000 simulated
files.

It takes about
10 seconds to create this data structure in memory
10 minutes to create the YAML string (i.e. YAML::dump)
1 second to write out the YAML string (25 megabytes)
4 seconds to read the YAML string (why so long?)
8 seconds to do the YAML::load

Anyone have any ideas how to speed up the creation of the YAML file or
why there is such an asymmetric amount of time for YAML::dump and
YAML::load?

Anyone have any ideas how to speed up the creation of the YAML file or
why there is such an asymmetric amount of time for YAML::dump and
YAML::load?

A couple ideas: do a custom yaml dump by hand, or use Marshal.

On Sun, Aug 22, 2010 at 1:45 AM, Ralph S. [email protected]
wrote:

Anyone have any ideas how to speed up the creation of the YAML file or why there is such an asymmetric amount of time for YAML::dump and YAML::load?

If you’re on 1.9.2, you could try the new YAML library psych:
iota ~ % time ruby -rpsych -e ‘Psych.load(Psych.dump(Dir[“/sys/**/*”]))’
1.20s user 0.07s system 99% cpu 1.283 total

Of course, that’s just a simple Array, but would be interesting to see
how your data behaves.

Michael F.:

If you’re on 1.9.2, you could try the new YAML library psych:
iota ~ % time ruby -rpsych -e ‘Psych.load(Psych.dump(Dir["/sys/**/*"]))’
1.20s user 0.07s system 99% cpu 1.283 total

What exactly do I need to have Psych on an Ubuntu/rvm install?

chastell@devielle:~$ ruby -vrpsych -e ‘’
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
internal:lib/rubygems/custom_require:29:in require': no such file to load -- psych (LoadError) from <internal:lib/rubygems/custom_require>:29:inrequire’
chastell@devielle:~$

(Note: I have libyaml-dev installed.)

— Piotr S.

Saturday, August 21, 2010, 4:56:43 PM, you wrote:

MF> On Sun, Aug 22, 2010 at 1:45 AM, Ralph S. [email protected]
wrote:

This is interesting and counterintuitive …

I am creating a YAML file with about 200,000 entries. The YAML file “reflects” the data structure of a directory tree of a disk. All I have is directory names and subdirectories and filenames in the (sub) directories. I have 100,000 simulated directories and 100,000 simulated files.

It takes about
10 seconds to create this data structure in memory
10 minutes to create the YAML string (i.e. YAML::dump)
1 second to write out the YAML string (25 megabytes)
4 seconds to read the YAML string (why so long?)
8 seconds to do the YAML::load

Anyone have any ideas how to speed up the creation of the YAML file or why there is such an asymmetric amount of time for YAML::dump and YAML::load?

MF> If you’re on 1.9.2, you could try the new YAML library psych:
MF> iota ~ % time ruby -rpsych -e
‘Psych.load(Psych.dump(Dir[“/sys/**/*”]))’
MF> 1.20s user 0.07s system 99% cpu 1.283 total

MF> Of course, that’s just a simple Array, but would be interesting to
see
MF> how your data behaves.

Michael,

I am running
ruby 1.8.7 (2010-01-10 patchlevel 249)

“YAML library psych” ??? What does that mean?

Benoit D.:

rvm install ruby-1.9.2-head -C --with-libyaml-dir=/opt/local

Unfortunately, both rvm-wrapped and vanilla Ruby 1.9.2-p0’s configure
report ‘configure: WARNING: unrecognized options: --with-libyaml-dir’.

In Ubuntu yaml.h lives in /usr/include; I ended up symlinking
it from /usr/local/include and I can now require ‘psych’.

— Piotr S.

On 22 August 2010 16:54, Piotr S. [email protected] wrote:

internal:lib/rubygems/custom_require:29:in require': no such file to load -- psych (LoadError) from <internal:lib/rubygems/custom_require>:29:in require’
chastell@devielle:~$

(Note: I have libyaml-dev installed.)

— Piotr S.

I had trouble also with rvm on OSX, my ruby did not find psych.so.
I just had to reinstall, after installing libyaml in a usual path
(/usr/local).

I did have before libyaml with macports, but then you have to pass
configure options like:
rvm install ruby-1.9.2-head -C --with-libyaml-dir=/opt/local

About Ubuntu, I believe it installs in a standard place, so you just
need to install it again.

(psych extension is simply not compiled if it does not find libyaml
(yaml.h))

B.D.

On 23 August 2010 20:45, Piotr S. [email protected] wrote:

— Piotr S.
That is a serious problem, you should probably fill a bug report with
it.

Ruby should obviously look in /usr for libraries and include.

Benoit D.:

On 23 August 2010 20:45, Piotr S. [email protected] wrote:

Unfortunately, both rvm-wrapped and vanilla Ruby 1.9.2-p0’s configure
report ‘configure: WARNING: unrecognized options: --with-libyaml-dir’.

In Ubuntu yaml.h lives in /usr/include; I ended up symlinking
it from /usr/local/include and I can now require ‘psych’.

That is a serious problem, you should probably fill a bug report with it.

Ruby should obviously look in /usr for libraries and include.

I’m sorry, I spoke too soon; it turns out rvm was messing with
my experiments. Ruby 1.9.2-p0 does find /usr/include/yaml.h – but
I still wonder: should --with-libyaml-dir work or is it deprecated?

— Piotr S.