A new project I’m starting on has a “database” consisting of many 10s
of thousands of XML documents. They all conform to a common schema.
The project consists pretty much exclusively of searching and
presenting existing data - there’s no need (for the forseeable future)
to be able to input or update XML documents in the database. Unlike
(say) blog data, where there’s typically a date, name and a bunch of
text, each document is fairly highly structured - there’s quite a few
separate XML attributes and entities within each doc - and the format
of the XML doc closely relates to how I’ll want to present it.
I could walk through each document, parse it into its component
pieces, load the content into a relational database, then use
ActiveRecord to extract the content and “reassemble” it back into
XHTML for presentation purposes. However, it strikes me that there’s
advantages in keeping it as a collection of XML docs; for example,
when it comes to presenting the data, I could just run an existing XML
doc against a set of CSS definitions and largely make the “views”
trivial. I could also potentially make use of XSL to produce
different report formats. Overall, converting the data from XML into
a relational format, then turning it back into XML/XHTML for
presentation purposes seems a bit dumb.
- has anyone tried using Rails with a set of XML documents as “the
database”? If it’s possible, what are the limitations?
- what have you used to index content in the XML docs, and what are
the pros and cons of the approach you used?
- is it possible to use ActiveRecord to search XML docs on one or more
key values? Is it possible to do pattern matches (i.e. something
equivalent to LIKE in SQL)?
- is the whole idea dumb, and should I just load the data into e.g.
Postgres and be done with it?
Thanks in advance for any suggestions. While I’m generally
comfortable with Rails, dealing with a database of XML docs is new for
me and I’m not quite sure how best to approach it.