Hello everyone, A new project I'm starting on has a "database" consisting of many 10s of thousands of XML documents. They all conform to a common schema. The project consists pretty much exclusively of searching and presenting existing data - there's no need (for the forseeable future) to be able to input or update XML documents in the database. Unlike (say) blog data, where there's typically a date, name and a bunch of text, each document is fairly highly structured - there's quite a few separate XML attributes and entities within each doc - and the format of the XML doc closely relates to how I'll want to present it. I *could* walk through each document, parse it into its component pieces, load the content into a relational database, then use ActiveRecord to extract the content and "reassemble" it back into XHTML for presentation purposes. However, it strikes me that there's advantages in keeping it as a collection of XML docs; for example, when it comes to presenting the data, I could just run an existing XML doc against a set of CSS definitions and largely make the "views" trivial. I could also potentially make use of XSL to produce different report formats. Overall, converting the data from XML into a relational format, then turning it back into XML/XHTML for presentation purposes seems a bit dumb. Questions: - has anyone tried using Rails with a set of XML documents as "the database"? If it's possible, what are the limitations? - what have you used to index content in the XML docs, and what are the pros and cons of the approach you used? - is it possible to use ActiveRecord to search XML docs on one or more key values? Is it possible to do pattern matches (i.e. something equivalent to LIKE in SQL)? - is the whole idea dumb, and should I just load the data into e.g. Postgres and be done with it? Thanks in advance for any suggestions. While I'm generally comfortable with Rails, dealing with a database of XML docs is new for me and I'm not quite sure how best to approach it. Dave M.
on 2006-03-14 12:42
on 2006-03-14 13:24
On 3/14/06, David Mitchell <firstname.lastname@example.org> wrote: > > A new project I'm starting on has a "database" consisting of many 10s > of thousands of XML documents. They all conform to a common schema. > If I were faced with this situation, I'd ditch ActiveRecord. Your models don't have to base off of it; I'd write a new model object that can interface with your XML files. Good luck!
on 2006-03-14 14:24
Josh on Rails wrote: > On 3/14/06, *David Mitchell* <email@example.com > <mailto:firstname.lastname@example.org>> wrote: > > A new project I'm starting on has a "database" consisting of many 10s > of thousands of XML documents. They all conform to a common schema. > > > If I were faced with this situation, I'd ditch ActiveRecord. Your models > don't have to base off of it; I'd write a new model object that can > interface with your XML files. I'd take a look at the schema, and see if it could easily be mapped to a database. If there's not much need for insertion, then conversion would be a one-time affair, and there's a lot to gain by doing it that way - not least in terms of not having to come up with a whole new query system. I have a sneaking suspicion that acts_as_tree and polymorphic associations would be extremely handy in such a situation.
on 2006-03-14 14:38
> > Thanks in advance for any suggestions. While I'm generally > comfortable with Rails, dealing with a database of XML docs is new for > me and I'm not quite sure how best to approach it. > Not dumb. I agree with Josh, all you need is a new model layer. As for getting LIKE functionality, perhaps you could use calls to grep? -Derrick Spell
on 2006-03-14 20:20
Sounds like you could use a Native XML DB like eXist (http://exist.sourceforge.net/). Not sure how you'd interface it with rails, but it does run as a server and takes XQuery queries. I'd think you'd just parse the XML results from the query into REXML and have your models get information from the REXML. b
on 2006-03-14 20:26
How about importing the documents into an XML database such as Exist (http://wiki.exist-db.org). It's a java server but it supports XML-RPC and REST to talk to it so it should be fairly easy doing it from Ruby especially since you only want to read (people have built connectors for PHP, Zope, Cold Fusion and Perl). Querying is done using xpath2 and xquery (with extensions) and it is pretty powerful. /Marcus
on 2006-03-14 21:15
I've got a similar situation going on with a legacy application I'm trying to port to Rails. The legacy app is backed by an Oracle database. Oracle supports XMLTYPE columns for storing XML content in the database. It also provides powerful querying capabilities based on XPATH. I've made a patch to the Rails oracle connector that allows AR to return XMLTYPE columns as strings. So getting the data out is fairly easy. I'll be releasing my "as_xml" plugin in a few weeks. It takes a string return value, and parses it into an XML Document. This makes displaying it fairly easy as well. The bit that's not so trivial is the searching. But if you are using Oracle I'd recommend writing a stored procedure to do the actual query, and add a 'xmlfind' method to AR. You can then call the stored procedure using 'connection.execute()' and assemble the results. It should be noted that Oracle is very pricey, so this is not a cheap solution. They have just started offering a free version of 10g, but it's limited to one machine and a gig of memory. Microsoft's SQL Server just started offering a similiar feature, so that may be an option as well. The big caveat here is, this is not the 'Rails Way'. It may solve your problem, but you will lose some elegance. -wilig
on 2006-03-14 23:30
On Mar 14, 2006, at 6:39 AM, David Mitchell wrote: > of the XML doc closely relates to how I'll want to present it. > a relational format, then turning it back into XML/XHTML for > presentation purposes seems a bit dumb. > > Questions: > - has anyone tried using Rails with a set of XML documents as "the > database"? If it's possible, what are the limitations? I've been doing this for years, I don't remember precisely but just after the first SAX parsers were available. Initially in Java, now in Common Lisp and Ruby/RoR. The biggest limitation to this is the impact of the number of files in a directory on filesystem performance... negligible/manageable on linux and OS X, not so sure on windows. In Java, I ended up using either Perst or JDBM rather than the filesystem directly. Berkeley DB would be a similar kind of option (make sure you use a transactional thing or you stand to loose everything). One Java application we wrote generates about 600,000 xml documents per year (from my fallible memory but that order) and including historical data there is about 6 years in there now. > - what have you used to index content in the XML docs, and what are > the pros and cons of the approach you used? This was tricky. In Java I used an approach that used indexes (implemented in Perst or JDBM) and text indexes using Lucene. I've not implemented indexing yet in the Ruby version of xampl, but that is coming fairly soon since I am beginning to wish I had it in a project I'm working on now. We did an experiment keeping the indexes in mysql but I wasn't particularly happy. I have a couple of ideas that might help. I'll be looking into ActiveRecord for indexing in the Ruby version of xampl. > - is it possible to use ActiveRecord to search XML docs on one or more > key values? Is it possible to do pattern matches (i.e. something > equivalent to LIKE in SQL)? Well, sure. If you've only got one or two key values then there are lots of options. > - is the whole idea dumb, and should I just load the data into e.g. > Postgres and be done with it? No it is not dumb. I don't know about in Ruby but in Java it didn't take a very complex XML document before the file system blew the DB away in performance (and nothing came close to Perst or JDBM). The same guy that wrote Perst wrote a similar thing for dynamic languages including Ruby. I've not looked at it because the last time I tried to compile it I couldn't (but that could have been me -- the guy that wrote this stuff is really quite good and his documentation is good... this is the same guy that wrote GOODS and a couple of the better know main-memory database systems). Cheers, Bob > > Thanks in advance for any suggestions. While I'm generally > comfortable with Rails, dealing with a database of XML docs is new for > me and I'm not quite sure how best to approach it. > > Dave M. > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails ---- Bob Hutchison -- blogs at <http://www.recursive.ca/ hutch/> Recursive Design Inc. -- <http://www.recursive.ca/> Raconteur -- <http://www.raconteur.info/> xampl for Ruby -- <http://rubyforge.org/projects/xampl/>