Database design question: building features

taylorstrait · January 25, 2010, 7:28am

I am trying to decide between two different database architectures for
an application to manage properties. For capturing the property
features, I have two options: a single model/table with all the features
as fields, or a set of joined models - property, feature, feature data.
(see attached ERD diagram)

OPTION A
Properties
name
address
has_pool
has_doorman
has_etc. (a bunch of values, not all boolean)

=OR=

OPTION B
Properties
id
name
address

Features
id
title
featurable_type (polymorphic, in this case points to “property.id”)
data_type

Feature_Data
id
feature_id
featureable_id (polymorphic, in this case points to “property.id”)

My ingoing assumption is that we do not know all the features that can
be associated with a property, and as time goes by new features will be
requested by users and need to be added. Here is my question: is going
to the hassle of joining two extra models just to make the features a
bit more independent worth the difficulty of implementation versus just
keeping everything in the main property model?

Let’s assume that this application will be scaled to a high level. Are
there any performance issues associated with OPTION B that should be
considered? Is there another way to model this data that makes more
sense?

Thanks in advance.

Taylor

taylorstrait · January 25, 2010, 8:22am

Option B is the better choice. If a change is ever made to the
potential feature set, it is far more manageable to be able to
manipulate the data as opposed to having the alter the structure of
the DB itself. It’s logically a one-to-many relationship, so I’d
reflect that fully in your schema. Plus, because of the fields that
allow polymorphic behavior you’ll have the benefit of data
independence, in that features could potentially belong to other types
of models in the future.

taylorstrait · January 25, 2010, 2:31pm

On Jan 25, 1:28 am, Taylor S. [email protected] wrote:

has_pool

Option B is certainly more flexible, but I’m unclear on why you have
these associations set up as polymorphic. Are you planning on having
other things with features besides Property? Note that subclasses of
Property don’t count, as they’ll work with plain associations.

I’m also guessing there’s at least one typo above - Feature shouldn’t
actually be referring to Property, right? You’ve essentially got a
“decorated join table” situation, with FeatureData joining Property to
Feature.

–Matt J.

taylorstrait · January 25, 2010, 3:37pm

Seirie - thanks for the response!

Matt - you are right about the typos - featureable_type in Features
would be referring to “property” rather than “property.id” and
Feature_Data should have a field “value.” The ERD is correct. I set this
up as polymorphic since at least two more models will have Features:
Units and Rooms.

I broke out Features into two models (Features and Feature_Data) for two
reasons: I wanted to normalize the feature titles and wanted to have
data_type which could record whether the data is an integer, boolean,
string, etc. (since I assume I will have to store everything in
Feature_Data as a string and convert back using rails) - a real pain!

Thinking through it again, since I would have an “unlimited” number of
features available through a polymorphic association, maybe I should
make all the data boolean and rework as such: (is there a more graceful
way?)

Properties
id
yadda yadda

Properties_Features
property_id
feature_id

Features
id
featurable_type (polymorphic - property, unit, or room)
title
value (boolean, always true?)

taylorstrait · January 25, 2010, 3:56pm

Kristian - thanks for the reply. I think it would be easier, too! But…
I was wondering if I am violating some kind of best practice to put all
the fields in one model even if half of them will be empty most of the
time?

taylorstrait · January 25, 2010, 4:09pm

Taylor S. wrote:

Kristian - thanks for the reply. I think it would be easier, too! But…
I was wondering if I am violating some kind of best practice to put all
the fields in one model even if half of them will be empty most of the
time?

i think it can sometimes be a fault to over-engineer a solution beyond
the ‘fit for purpose’ threshold A textbook solution may be option B
and i think it looks the more interesting challenge but in reality you
know your system best. If option A is the best fit (and consider your
longer term plans as well here) then why not.

taylorstrait · January 25, 2010, 3:38pm

Do you really need the flexible and dynamic scheme of option 2?

As long as you only have a few settings I would stick to option 1. You
can always migrate your data and move to scheme 2 if required.

/K

2010/1/25 Matt J. [email protected]:

taylorstrait · January 25, 2010, 4:18pm

I would look into understanding how to normalize your databases.

The problems I have personally experienced with normalizing a database
is that oftentimes you can over-normalize a database to the point that
it creates havoc on response times.

I would read up on how to normalize your database up to 3NF for this
particular situation.

taylorstrait · January 25, 2010, 4:32pm

You could also check up document oriented and schema free databases
like couchdb and mongodb if you would like to see your design problem
from another perspective.

Good luck,
Kristian

2010/1/25 Alpha B. [email protected]:

taylorstrait · January 25, 2010, 4:28pm

Taylor S. wrote:

Kristian - thanks for the reply. I think it would be easier, too! But…
I was wondering if I am violating some kind of best practice to put all
the fields in one model even if half of them will be empty most of the
time?

They won’t be empty; they’ll be false. Big difference.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

taylorstrait · January 26, 2010, 2:29am

Will Merrell wrote:

Taylor S. wrote:

I am trying to decide between two different database architectures for
an application to manage properties. For capturing the property
features, I have two options: a single model/table with all the features
as fields, or a set of joined models - property, feature, feature data.
(see attached ERD diagram)

You’re right that option A is pretty much a bad idea. I can’t tell you
how much time I have spent refactoring databases that were guaranteed
never to change.

That shouldn’t be a problem. Broadly speaking, it is better to refactor
a database tomorrow than to overdesign it today.

In that sense option B is much better.

However, there may be a solution between the two.

Consider the following table:

Features
id
name (the name of the feature: ‘pool’, ‘doorman’, or whatever)
featureable_type (‘property’, ‘unit’, or ‘room’)
featureable_id (property.id, unit.id, or room.id)

With this, you just add whatever feature you want. If the record exists
the property has the feature. The name field can be user enterable, or
you can fill it from a listbox if you want to control what the user can
enter. The list can even be editable by an admin if you want the control
and the extensibility.

That’s terrible! It defeats the point of having a database. Don’t ever
do that unless there’s absolutely no alternative.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

taylorstrait · January 26, 2010, 3:15am

Marnen Laibow-Koser wrote:

You’re right that option A is pretty much a bad idea. I can’t tell you
how much time I have spent refactoring databases that were guaranteed
never to change.

That shouldn’t be a problem. Broadly speaking, it is better to refactor
a database tomorrow than to overdesign it today.

I’m certainly not in favor of over design, which is why I suggested
something in between. That said, I have rarely seen a case where wide
and shallow is the proper solution. The OPs problem looks like it needs
some kind of normalization.

featureable_type (‘property’, ‘unit’, or ‘room’)
do that unless there’s absolutely no alternative.

Could you say a little more about which part of this you find terrible.
I have used techniques like this for some situations, and have seen
others use it also. If I’m missing something I want to know. If I
misspoke, I want to clean it up.

– Will

taylorstrait · January 26, 2010, 2:22am

Taylor S. wrote:

I am trying to decide between two different database architectures for
an application to manage properties. For capturing the property
features, I have two options: a single model/table with all the features
as fields, or a set of joined models - property, feature, feature data.
(see attached ERD diagram)

You’re right that option A is pretty much a bad idea. I can’t tell you
how much time I have spent refactoring databases that were guaranteed
never to change. In that sense option B is much better.

However, there may be a solution between the two.

Consider the following table:

Features
id
name (the name of the feature: ‘pool’, ‘doorman’, or whatever)
featureable_type (‘property’, ‘unit’, or ‘room’)
featureable_id (property.id, unit.id, or room.id)

With this, you just add whatever feature you want. If the record exists
the property has the feature. The name field can be user enterable, or
you can fill it from a listbox if you want to control what the user can
enter. The list can even be editable by an admin if you want the control
and the extensibility.

taylorstrait · January 26, 2010, 4:39am

Will Merrell wrote:

Marnen Laibow-Koser wrote:

You’re right that option A is pretty much a bad idea. I can’t tell you
how much time I have spent refactoring databases that were guaranteed
never to change.

That shouldn’t be a problem. Broadly speaking, it is better to refactor
a database tomorrow than to overdesign it today.

I’m certainly not in favor of over design, which is why I suggested
something in between. That said, I have rarely seen a case where wide
and shallow is the proper solution.

Likewise. But the OP’s problem – lots of independent attributes – is
one such case.

The OPs problem looks like it needs
some kind of normalization.

Nope! The table is already well normalized, I think. If you disagree,
please tell me what normalization condition is being violated.

featureable_type (‘property’, ‘unit’, or ‘room’)
do that unless there’s absolutely no alternative.

Could you say a little more about which part of this you find terrible.
I have used techniques like this for some situations, and have seen
others use it also.

Oh, it gets used, all right – by people who either are dealing with
unusual situations or don’t know how to use a database properly.

If I’m missing something I want to know. If I
misspoke, I want to clean it up.

Then just avoid this pattern altogether. See discussion at
http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/2ee6d1546a20409e?fwc=1
for more information.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

taylorstrait · January 26, 2010, 5:22pm

Will Merrell wrote:

Marnen Laibow-Koser wrote:

and shallow is the proper solution.
Nope! The table is already well normalized, I think. If you disagree,
please tell me what normalization condition is being violated.

If the OP’s problem really is lots of truly independent attributes,
selected from a finite list, with a limited and known number of repeated
attributes, then yes, wide and flat, (the OP’s option A) is probably an
optimal solution.

That is what the OP said his problem is.

But, that is not what I read in his post. How many pools should he be
able to capture for a property? Can a property have six doormen? What
about the property with the helipad?

What about it? Those fields could be integers.

The problem with wide and flat is
that it requires a fixed problem domain.

The OP seems to think he has one.

Cleaver as I am, I can not
fully anticipate what crazy examples the real world of the users is
going to through at my system.

From what the OP said, I don’t see where you get the requirement of the
users specifying custom attributes. You are trying to solve a different
problem.

To be sure, he said it was his “ingoing assumption”, but I read that as
meaning he’s guessing.

As I read the OP’s description I saw that the list of features he needs
to capture is not fixed and that some features could occur multiple
times.

Nowhere does he say anything about features occurring multiple times.

To me that demands that features be objects in their own right

Relational databases don’t have objects.

and that the database design should capture the relationship between
properties (et al) and features. There are at least two tables here,
properties and features. The exact structure (and number) of these
tables is a second question.

Yes – if a feature could occur multiple times. But that is not the
OP’s use case.

Oh, it gets used, all right – by people who either are dealing with

Thank you for the link, I see what you’re driving at. I mostly agree.
Certainly in the case discussed in the thread you linked to the
key/value pattern is a bad idea, and should be avoided.

There were really two intents in my original post. The first, and most
important, is that there are other ways to solve the OP’s problem beyond
the two that he presented. His first seemed to me too simplistic and
headed for trouble (as discussed above)

Then it can be refactored. Always start with the simplest approach.

and the second seemed too
complex and also headed for trouble.

Actually, option B is better normalized than your idea. I think it’s
preferable.

I simply wanted to suggest that
there may be a third choice. Whether my suggestion is that better choice
is a separate issue.

I don’t think it is.

The second point i wanted to make is that the recording of a feature
need not be very complex. The presence or absence of a record may be
sufficient.

Or, for that matter, an integer in a field!

What the OP needs to record about a given feature probably
needs a good deal more thought and discussion, but the existence of the
feature can be very simple, and just naming it may be enough.

Right.

I really do agree that recording a feature as a collection of key/value
pairs is almost certainly a bad idea, and I did not intend to suggest
such a structure. If you heard that in what I posted then I did
misspeak.

Not a collection. Your proposal, though, was something very close to
the key/value schema.

I think what we both mean to say is that good database design requires a
good understanding the problem space. Duct tape and bailing wire works
for MacGuyver, but not for the rest of us.

– Will

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

taylorstrait · January 26, 2010, 2:32pm

Marnen Laibow-Koser wrote:

and shallow is the proper solution.
Nope! The table is already well normalized, I think. If you disagree,
please tell me what normalization condition is being violated.

If the OP’s problem really is lots of truly independent attributes,
selected from a finite list, with a limited and known number of repeated
attributes, then yes, wide and flat, (the OP’s option A) is probably an
optimal solution.

But, that is not what I read in his post. How many pools should he be
able to capture for a property? Can a property have six doormen? What
about the property with the helipad? The problem with wide and flat is
that it requires a fixed problem domain. Cleaver as I am, I can not
fully anticipate what crazy examples the real world of the users is
going to through at my system.

As I read the OP’s description I saw that the list of features he needs
to capture is not fixed and that some features could occur multiple
times. To me that demands that features be objects in their own right
and that the database design should capture the relationship between
properties (et al) and features. There are at least two tables here,
properties and features. The exact structure (and number) of these
tables is a second question.

Oh, it gets used, all right – by people who either are dealing with

Thank you for the link, I see what you’re driving at. I mostly agree.
Certainly in the case discussed in the thread you linked to the
key/value pattern is a bad idea, and should be avoided.

There were really two intents in my original post. The first, and most
important, is that there are other ways to solve the OP’s problem beyond
the two that he presented. His first seemed to me too simplistic and
headed for trouble (as discussed above) and the second seemed too
complex and also headed for trouble. I simply wanted to suggest that
there may be a third choice. Whether my suggestion is that better choice
is a separate issue.

The second point i wanted to make is that the recording of a feature
need not be very complex. The presence or absence of a record may be
sufficient. What the OP needs to record about a given feature probably
needs a good deal more thought and discussion, but the existence of the
feature can be very simple, and just naming it may be enough.

I really do agree that recording a feature as a collection of key/value
pairs is almost certainly a bad idea, and I did not intend to suggest
such a structure. If you heard that in what I posted then I did
misspeak.

I think what we both mean to say is that good database design requires a
good understanding the problem space. Duct tape and bailing wire works
for MacGuyver, but not for the rest of us.

– Will