Flexible data schema in a database?


#1

I figured I pose this to the list and see if any one can build a
better mousetrap.

The problem:

  1. Store hierarchical data in a database, but without knowing the
    hierarchy beforehand.
  2. Allow the user to define there own hierarchy, and allow the system
    to support multiple hierarchies.

Call it a cataloging system, we have an object and we want to add data
about that object to the system.

Now my thoughts on the problem:

It may be possible to achieve this with just 2 tables.

fields

id - primary key
parent_id - tree like structure
name - string for the name of the field
hierarchy_id - id of the hierarchy this field belongs to
repeatable - can a record contain more then one of these fields?

data

id - primary key
field_id - reference to the field to which this data belongs
hierarchy_id - id of the hierarchy this data belongs to
content - string for the data itself
object_id - the object that this data corresponds to

Now this is very simplistic but could certainly be expanded to support
many more options.

Has anyone run into this problem? Have you devised a more elegant
solution? Have I just lost my mind, plain and simple?

-Will


#2

On 7/04/2006, at 10:15 AM, William G. wrote:

[snip]

Now this is very simplistic but could certainly be expanded to support
many more options.

Has anyone run into this problem? Have you devised a more elegant
solution? Have I just lost my mind, plain and simple?

There’s no really elegant solution, I’ve been pondering it for a
while. The solution you propose works, but it’s efficient from either
Rails or the DB point of view, storing all values as varchars removes
all possibility of sensible indexing.

My current solution revolves around dynamically manipulating tables.
There’s slightly more ROR overhead with reloading schema details, but
not a huge amount. It looks somewhat like this:

  • Data fields

  • Contains all options, field type, display type, names

  • type_1234_data

  • Table created, dropped and manipulated by after_create/
    after_update/after_delete actions on the DataField model. Contains
    normal columns

In the ROR side

  • Override const_missing to check for type_\d+_data and generate a
    model on the fly.
  • Have model generated discard all column data on each request. Easy.

So the RoR overheads are

  • Dynamic models
  • Reloading table details on each request
  • Can’t use Rails’ automatic foreign keys, can’t do has_many :type_#
    {self.id}_data, but can be worked around.

Advantages

  • Real table, easy to look up a row manually
  • Can index if one type searches on a certain column repeatedly

Disadvantages

  • Web user has to have access to change the DB

My current system is very postgres specific, but I’ll post it if
anyone’s interested.

Phillip H.
removed_email_address@domain.invalid
http://www.sitharus.com/


#3

There’s no really elegant solution, I’ve been pondering it for a
while. The solution you propose works, but it’s efficient from either
Rails or the DB point of view, storing all values as varchars removes
all possibility of sensible indexing.

Well, technically you don’t need to store all the values as varchar.
In the field type
you could have data type specifier. Then simply add columns for the
various data
types to the data table.

ie:

data

id - primary key
field_id - reference to the field to which this data belongs
hierarchy_id - id of the hierarchy this data belongs to
string_content - string data
number_content - numeric data
date_content - date data
.
.
.
object_id - the object that this data corresponds to

It would be a parse table, much like aggregate table descripted in the
agile book, but indexing should be workable.

I’m leary of the dynamic table based approach for one reason. I’ve
seen a couple of the schemas destined to be included in this system,
and at least 3 have 200+ fields. That’s a boat load of database
tables. But maybe there is some way to whittle that down to something
more reasonable.

-Will


#4

On 7/04/2006, at 1:01 PM, William G. wrote:

There’s no really elegant solution, I’ve been pondering it for a
while. The solution you propose works, but it’s efficient from either
Rails or the DB point of view, storing all values as varchars removes
all possibility of sensible indexing.

Well, technically you don’t need to store all the values as varchar.
In the field type
you could have data type specifier. Then simply add columns for the
various data
types to the data table.

That’s a very good point, hadn’t thought of that…

I’m leary of the dynamic table based approach for one reason. I’ve
seen a couple of the schemas destined to be included in this system,
and at least 3 have 200+ fields. That’s a boat load of database
tables. But maybe there is some way to whittle that down to something
more reasonable.

The system I’m working on is essentially an address book, but with
customisable fields. A similar project I’m working on uses the two
table approach you suggested first, and with 2.5 million rows there’s
no noticeable performance loss, though trying to order searches
numerically is an issue. My current system is somewhat of a hack,
I’ll figure out an easy way to describe it.


#5

I’d love to have a look Phil. The more options I can check out the
better.

-Will


#6

On 7/04/2006, at 12:53 PM, William G. wrote:

I’d love to have a look Phil. The more options I can check out the
better.

It got quite long, so I blogged it rather than emailing it to the
list. http://www.sitharus.com/articles/2006/04/07/dynamic-schemas-in-
ruby-on-rails


#7

Thanks Phillip, very interesting indeed.