Library Metadata Storage Format

Hi–

I’ve been trying to decide the best way how to store library/project
metadata (name, version, etc.). I use this data for a few different
tools, one of those being Rolls, which is an alternate require system
for Ruby. I’ve considered using straight Ruby, INI, YAML and multiple
files (one per property).

Ruby scripts are too limited for my purposes b/c they are impossible to
selectively edit with automated tools.

INI files would work well, and they are pretty easy to parse, but it
doesn’t seem to be the Ruby way --where YAML tends to rule the day.

YAML would seem to be the obvious choice. But I hesitate because Syck is
a fairly heavy dependency for something like Rolls where light weight is
a big advantage. Also automated manipulation of YAML files isn’t all
that optimal --round trip a YAML file and formatting can become fairly
distorted.

My last option, of per-property files, is appealing b/c it requires no
special parser library, and such files are very easy to manipulate. The
downside of course is that a dozen of so little files can seem a bit
unwieldy and can waste file system space (depending on block size).

So, as you can see I’m torn. Should I take the high road, and not worry
about YAML’s heft, or the low road and not worry about the
unconventional use of many small files, or is there a better road
altogether?

–7rans

On Thursday 24 July 2008 14:25:09 Thomas S. wrote:

YAML would seem to be the obvious choice. But I hesitate because Syck is
a fairly heavy dependency for something like Rolls where light weight is
a big advantage.

YAML is also in the standard library, so if you’re targeting anything
resembling a standard distribution of Ruby, it’ll be there.

Also automated manipulation of YAML files isn’t all
that optimal --round trip a YAML file and formatting can become fairly
distorted.

Because YAML is a serialization format. They make good config files, but
do
you really need to support comments in the file? Your other proposal
doesn’t
seem to allow for that, anyway…

The
downside of course is that a dozen of so little files can seem a bit
unwieldy and can waste file system space (depending on block size).

My attitude is, do what’s convenient, and let the filesystem worry about
disk
space. Some filesystems support concepts like “sub blocks” and “tail
packing”
which can lead to quite efficient storage of small files.

Worry more about the usability of it. If you litter the project with
small
files, is that going to be annoying for users? I know one of the selling
points of git over SVN is that git stores one .git folder at the top of
the
checkout, whereas SVN stores a .svn folder in every directory of the
checkout.

Thanks for the response David. Thinking through thes issues all by
myself and get a little stir crazy, so getting some feedback like this
really helps.

David M. wrote:

YAML is also in the standard library, so if you’re targeting anything
resembling a standard distribution of Ruby, it’ll be there.

Also automated manipulation of YAML files isn’t all
that optimal --round trip a YAML file and formatting can become fairly
distorted.

Because YAML is a serialization format. They make good config files, but
do
you really need to support comments in the file? Your other proposal
doesn’t
seem to allow for that, anyway…

Hmm… that true. That’s part of the issue really. To be more specific, I
want to automate version bumping. If I rewrite the whole metadata.yaml
file to update the version entry, you are right, bye bye comments.
Another option is to hack a regexp solution. It would probably work ok
most of the time. But that’s a hacky band-aid kind of fix.

The
downside of course is that a dozen of so little files can seem a bit
unwieldy and can waste file system space (depending on block size).

My attitude is, do what’s convenient, and let the filesystem worry about
disk
space. Some filesystems support concepts like “sub blocks” and “tail
packing”
which can lead to quite efficient storage of small files.

Good point. Leave storage to the storage guys. It would only be a dozen
files or so, so we’re not talking a whole lot of space anyway.

Worry more about the usability of it. If you litter the project with
small
files, is that going to be annoying for users? I know one of the selling
points of git over SVN is that git stores one .git folder at the top of
the
checkout, whereas SVN stores a .svn folder in every directory of the
checkout.

Yea, I hate that about svn. This won’t be a problem here; the files
would be in one special directory. It can be annoying to edit them all,
at least for the first go round, after that they rarely change. The
other thing is for tools that might want to scrape project info. (a la
CSPAN’s META.yml) I wonder if it would be too much trouble for this
usecase to have to fetch multiple files (of course, I could always
generate an index file based on the separate files).

I came across another good reason to use separate files – say I use a
generator (eg. rubigen) to scaffold out a license. It would add the
LICENSE (or COPYING) file to my project, but it would also want to
update the metadata entry. That’s easy if there’s a separate
meta/license file. If there were a reliable way to update a YAML file in
a piecemeal fashion, then this wouldn’t be much of an issue; but without
that… well I guess I’m just not sure how comfortable I feel with a
“usually works” regexp hack.

T.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs