Forum: Ruby on Rails "Database" as a collection of XML docs

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
F3dc06f587d1ff4c7366b102bfda9204?d=identicon&s=25 David Mitchell (Guest)
on 2006-03-14 12:42
(Received via mailing list)
Hello everyone,

A new project I'm starting on has a "database" consisting of many 10s
of thousands of XML documents.  They all conform to a common schema.
The project consists pretty much exclusively of searching and
presenting existing data - there's no need (for the forseeable future)
to be able to input or update XML documents in the database.  Unlike
(say) blog data, where there's typically a date, name and a bunch of
text, each document is fairly highly structured - there's quite a few
separate XML attributes and entities within each doc - and the format
of the XML doc closely relates to how I'll want to present it.

I *could* walk through each document, parse it into its component
pieces, load the content into a relational database, then use
ActiveRecord to extract the content and "reassemble" it back into
XHTML for presentation purposes.  However, it strikes me that there's
advantages in keeping it as a collection of XML docs; for example,
when it comes to presenting the data, I could just run an existing XML
doc against a set of CSS definitions and largely make the "views"
trivial.  I could also potentially make use of XSL to produce
different report formats.  Overall, converting the data from XML into
a relational format, then turning it back into XML/XHTML for
presentation purposes seems a bit dumb.

Questions:
- has anyone tried using Rails with a set of XML documents as "the
database"?  If it's possible, what are the limitations?
- what have you used to index content in the XML docs, and what are
the pros and cons of the approach you used?
- is it possible to use ActiveRecord to search XML docs on one or more
key values?  Is it possible to do pattern matches (i.e. something
equivalent to LIKE in SQL)?
- is the whole idea dumb, and should I just load the data into e.g.
Postgres and be done with it?

Thanks in advance for any suggestions.  While I'm generally
comfortable with Rails, dealing with a database of XML docs is new for
me and I'm not quite sure how best to approach it.

Dave M.
3319ab6fb19fcf97c8a3d66b8a9b68bf?d=identicon&s=25 Josh on Rails (Guest)
on 2006-03-14 13:24
(Received via mailing list)
On 3/14/06, David Mitchell <monch1962@gmail.com> wrote:
>
> A new project I'm starting on has a "database" consisting of many 10s
> of thousands of XML documents.  They all conform to a common schema.
>

If I were faced with this situation, I'd ditch ActiveRecord. Your models
don't have to base off of it; I'd write a new model object that can
interface with your XML files.

Good luck!
Ad7805c9fcc1f13efc6ed11251a6c4d2?d=identicon&s=25 Alex Young (Guest)
on 2006-03-14 14:24
(Received via mailing list)
Josh on Rails wrote:
> On 3/14/06, *David Mitchell* <monch1962@gmail.com
> <mailto:monch1962@gmail.com>> wrote:
>
>     A new project I'm starting on has a "database" consisting of many 10s
>     of thousands of XML documents.  They all conform to a common schema.
>
>
> If I were faced with this situation, I'd ditch ActiveRecord. Your models
> don't have to base off of it; I'd write a new model object that can
> interface with your XML files.
I'd take a look at the schema, and see if it could easily be mapped to a
database.  If there's not much need for insertion, then conversion would
be a one-time affair, and there's a lot to gain by doing it that way -
not least in terms of not having to come up with a whole new query
system.  I have a sneaking suspicion that acts_as_tree and polymorphic
associations would be extremely handy in such a situation.
Bc80625db60e9db4394c51d6c1892b49?d=identicon&s=25 Derrick Spell (Guest)
on 2006-03-14 14:38
(Received via mailing list)
>
> Thanks in advance for any suggestions.  While I'm generally
> comfortable with Rails, dealing with a database of XML docs is new for
> me and I'm not quite sure how best to approach it.
>

Not dumb.  I agree with Josh, all you need is a new model layer.  As
for getting LIKE functionality, perhaps you could use calls to grep?

-Derrick Spell
4005a47a8f2ceee49670b920593c1d52?d=identicon&s=25 Ben Munat (Guest)
on 2006-03-14 20:20
(Received via mailing list)
Sounds like you could use a Native XML DB like eXist
(http://exist.sourceforge.net/). Not
sure how you'd interface it with rails, but it does run as a server and
takes XQuery
queries. I'd think you'd just parse the XML results from the query into
REXML and have
your models get information from the REXML.

b
3a6666f57152610f172a77c8fe6a7420?d=identicon&s=25 Marcus Andersson (Guest)
on 2006-03-14 20:26
(Received via mailing list)
How about importing the documents into an XML database such as Exist
(http://wiki.exist-db.org). It's a java server but it supports XML-RPC
and REST to talk to it so it should be fairly easy doing it from Ruby
especially since you only want to read (people have built connectors for
PHP, Zope, Cold Fusion and Perl). Querying is done using xpath2 and
xquery (with extensions) and it is pretty powerful.

/Marcus
5214532353dd528a50ead7d3beb866cb?d=identicon&s=25 William Groppe (Guest)
on 2006-03-14 21:15
(Received via mailing list)
I've got a similar situation going on with a legacy application I'm
trying to port to Rails.  The legacy app is backed by an Oracle
database.  Oracle supports XMLTYPE columns for storing XML content in
the database.  It also provides powerful querying capabilities based
on XPATH.

I've made a patch to the Rails oracle connector that allows AR to
return XMLTYPE columns as strings.  So getting the data out is fairly
easy.  I'll be releasing my "as_xml" plugin in a few weeks.  It takes
a string return value, and parses it into an XML Document.  This makes
displaying it fairly easy as well.

The bit that's not so trivial is the searching.  But if you are using
Oracle I'd recommend writing a stored procedure to do the actual
query, and add a 'xmlfind' method to AR.  You can then call the stored
procedure using 'connection.execute()' and assemble the results.

It should be noted that Oracle is very pricey, so this is not a cheap
solution.  They have just started offering a free version of 10g, but
it's limited to one machine and a gig of memory.

Microsoft's SQL Server just started offering a similiar feature, so
that may be an option as well.

The big caveat here is, this is not the 'Rails Way'.  It may solve
your problem, but you will lose some elegance.

-wilig
E3c79c779c0b390049289cdfe7cb9705?d=identicon&s=25 Bob Hutchison (Guest)
on 2006-03-14 23:30
(Received via mailing list)
On Mar 14, 2006, at 6:39 AM, David Mitchell wrote:

> of the XML doc closely relates to how I'll want to present it.
> a relational format, then turning it back into XML/XHTML for
> presentation purposes seems a bit dumb.
>
> Questions:
> - has anyone tried using Rails with a set of XML documents as "the
> database"?  If it's possible, what are the limitations?

I've been doing this for years, I don't remember precisely but just
after the first SAX parsers were available. Initially in Java, now in
Common Lisp and Ruby/RoR. The biggest limitation to this is the
impact of the number of files in a directory on filesystem
performance... negligible/manageable on linux and OS X, not so sure
on windows. In Java, I ended up using either Perst or JDBM rather
than the filesystem directly. Berkeley DB would be a similar kind of
option (make sure you use a transactional thing or you stand to loose
everything).

One Java application we wrote generates about 600,000 xml documents
per year (from my fallible memory but that order) and including
historical data there is about 6 years in there now.


> - what have you used to index content in the XML docs, and what are
> the pros and cons of the approach you used?

This was tricky. In Java I used an approach that used indexes
(implemented in Perst or JDBM) and text indexes using Lucene.

I've not implemented indexing yet in the Ruby version of xampl, but
that is coming fairly soon since I am beginning to wish I had it in a
project I'm working on now.

We did an experiment keeping the indexes in mysql but I wasn't
particularly happy. I have a couple of ideas that might help. I'll be
looking into ActiveRecord for indexing in the Ruby version of xampl.

> - is it possible to use ActiveRecord to search XML docs on one or more
> key values?  Is it possible to do pattern matches (i.e. something
> equivalent to LIKE in SQL)?

Well, sure. If you've only got one or two key values then there are
lots of options.

> - is the whole idea dumb, and should I just load the data into e.g.
> Postgres and be done with it?

No it is not dumb. I don't know about in Ruby but in Java it didn't
take a very complex XML document before the file system blew the DB
away in performance (and nothing came close to Perst or JDBM).

The same guy that wrote Perst wrote a similar thing for dynamic
languages including Ruby. I've not looked at it because the last time
I tried to compile it I couldn't (but that could have been me -- the
guy that wrote this stuff is really quite good and his documentation
is good... this is the same guy that wrote GOODS and a couple of the
better know main-memory database systems).

Cheers,
Bob

>
> Thanks in advance for any suggestions.  While I'm generally
> comfortable with Rails, dealing with a database of XML docs is new for
> me and I'm not quite sure how best to approach it.
>
> Dave M.
> _______________________________________________
> Rails mailing list
> Rails@lists.rubyonrails.org
> http://lists.rubyonrails.org/mailman/listinfo/rails

----
Bob Hutchison                  -- blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc.          -- <http://www.recursive.ca/>
Raconteur                      -- <http://www.raconteur.info/>
xampl for Ruby                 -- <http://rubyforge.org/projects/xampl/>
This topic is locked and can not be replied to.