Modified Single Table Inheritance

Hello,

I am developing an application that will have 100’s of model classes
each derived from a single source class. Each model will have 4 similar
attributes but then have about a dozen unique ones each. I looked into
using STI but balked at creating tables with a 1000 columns. I then
looked at using inherits_from and creating an new table for each class
that holds the new class columns. Problem is, I’ll end up with 100’s of
tables, which also does not appeal to me.

My third solution, the one I am asking advice on, is this.

I create a single table with the 4 common columns and a dozen columns
with names such as float1, float2, float3, … , string1, string2, …
etc. Creating enough columns of each type to cover my largest subclass.
Then I use STI and for each class I map the generic columns to the nicer
names inside the subclasses. So in one class float1 may be miles/hour
while in another class it might be turkeys/hectare.

Does this sound like a reasonable approach?

Also, how would I map the generic names to nice names in my subclasses?
(I am a RoR noob)

And of course, does anyone see a much better way to do what I have
proposed?

Thank you,
Ryan G.

On Dec 26, 2006, at 14:51, Ryan G. wrote:

I am developing an application that will have 100’s of model classes
each derived from a single source class. Each model will have 4
similar
attributes but then have about a dozen unique ones each. I looked
into
using STI but balked at creating tables with a 1000 columns. I then
looked at using inherits_from and creating an new table for each class
that holds the new class columns. Problem is, I’ll end up with
100’s of
tables, which also does not appeal to me.

I think you want the Rails mailing list.


Eric H. - [email protected] - http://blog.segment7.net

I LIT YOUR GEM ON FIRE!

On Dec 26, 2006, at 5:51 PM, Ryan G. wrote:

Does this sound like a reasonable approach?

Also, how would I map the generic names to nice names in my
subclasses?
(I am a RoR noob)

And of course, does anyone see a much better way to do what I have
proposed?

Even though we’re not technically on the RoR list, I thought this was
still an interesting question for ORM, STI, OOP and probably some
other acronyms too :slight_smile:

I’d be curious to hear just what it is that you’re doing that needs
100s of classes, first. Sounds like you might be taking OOP a bit
too far for your purposes.

But to answer your question, I like the concept but it brings up fear
of bugs. Dropping meaningful column names will wreak havoc on any
cases where you have to write a query by hand. But everything is a
trade off. Personally, I’d try the 100s of tables route first and
see how it panned out.

Best of luck. Sounds like a tricky one.
-Mat

Ryan G. wrote:

Hello,

I am developing an application that will have 100’s of model classes
each derived from a single source class. Each model will have 4 similar
attributes but then have about a dozen unique ones each.

Why are they deriving from a single class anyway?

If it’s only for a few identical attributes without significant
functionality to go along, I’d say it’s a bad idea to use inheritance.
And with latent typing in Ruby, you Rarely If Ever need inheritance
anyway. Saves you the bother that is inheritance mapping at any rate.

Problem is, I’ll end up with 100’s of
tables, which also does not appeal to me.

I can’t understand this. It’s rather common that an ORM database schema
has a number of tables in the same order of magnitude as there are model
classes. My biased opinion is that data should be modelled on the
database level where you have some theorethical foundations about what
are “good shapes” for the data. It also doesn’t hurt to at least make
the DB schema somewhat document the data model - the data an application
gathers / stores is the most likely part of it to persist in the long
term. Of course, you’re probably in a better position to judge this.

Database schemas like this tend to make it into The Daily WTF with some
regularity.

(Unfortunately, they also make it into production systems.)

If there’s a chance someday, someone, somewhy might want to access your
data at a DB level (using a reporting tool maybe), there’d be no end to
the grief.

Or, if a bug in your code gets the database into an inconsistent state,
and you have to drop down to SQL to patch things up.

Avoid.

And of course, does anyone see a much better way to do what I have
proposed?

Use migrations to create the schema from scratch if you just want to
avoid writing a lot of SQL up front?

David V.

Ryan G. wrote:

I guess my problem with 100’s perhaps a 1000 tables is db performance.
I don’t have oracle, only mysql or postgres. Performance wise, wouldnt
it be better to have a few thousand rows in one table as opposed to a
few dozen spread across 100’s of tables?

The history of computing knows only one answer to that question:
benchmark. (And not even that’s much of an answer.)

However, I have very, very few doubts that separate tables would be
faster in a read-mostly scenario. The records for that model are
filtered at insert time instead of query time, and deciding between
tables is probably way faster than on any ad hoc type index column.

Generally, a well-designed schema will probably buy you more performance
in the long run than any sort of peephole hand-optimizations. You want
to minimize the number of round-trip hits on the database and the volume
of the datasets retrieved for that (often conflicting goals), and a
clear schema helps do just that.

David V.

Hello David,

Why are they deriving from a single class anyway?

All the objects are the same thing, so to speak. As a silly example of
what I am trying to do, let’s say that each user an array named fruit.
There are many types of fruit and each have their own unique attributes
(# of grapes, fuzziness index of peach, banana radius) but they are all
fruit. My program is just like this, except I have hundreds of fruit.
Perhaps even a thousand. Ideally I would like to be able to add more
fruit as the program evolves just by adding a new fruit subclass and a
new fruit view.

I can’t understand this. It’s rather common that an ORM database schema
has a number of tables in the same order of magnitude as there are model
classes. My biased opinion is that data should be modelled on the
database level where you have some theorethical foundations about what
are “good shapes” for the data. It also doesn’t hurt to at least make
the DB schema somewhat document the data model - the data an application
gathers / stores is the most likely part of it to persist in the long
term. Of course, you’re probably in a better position to judge this.

I guess my problem with 100’s perhaps a 1000 tables is db performance.
I don’t have oracle, only mysql or postgres. Performance wise, wouldnt
it be better to have a few thousand rows in one table as opposed to a
few dozen spread across 100’s of tables?

Database schemas like this tend to make it into The Daily WTF with some
regularity.

(Unfortunately, they also make it into production systems.)

If there’s a chance someday, someone, somewhy might want to access your
data at a DB level (using a reporting tool maybe), there’d be no end to
the grief.

Or, if a bug in your code gets the database into an inconsistent state,
and you have to drop down to SQL to patch things up.

Avoid.

I appreciate all these warnings and I understand what you are saying.
In this case I am client for the project so I’ll take my lumps as they
come.

Thanks for your response,
Ryan G.

On Dec 27, 2006, at 8:19 PM, Ryan G. wrote:

new fruit view.
Well, I’m guessing fruit is just an example. But I think you could
still group the fruits into subcategories or something to that effect
and end up with reasonable inheritance. For example, berries
(strawberry, raspberry) , pitted fruits (peach, concord grape), not
sure what you’d call a banana but I’d bet an encyclopedia would have
some info. But I think you get the idea. To get the specific kind
of fruit, you could just use a type field and get single table
inheritance on the subcategory.

It’ll depend on your data model. But I’m inclined to agree with the
rest of group that there are either relationships or duck-typing
advantages that you’re not utilizing.

And David has a point of schemas that include things like float1,
etc… do often show up on the daily WTF. That’s might be reason
enough to stay away.

Thinking another way, what about having a fruit table and a trait
table. The trait table could reference a large number of
trait_types. The trait table would reference a fruit, a trait_type
and a value. You could probably get away with just 3 or 4 value
columns in the trait table. Just a thought.
-Mat