Select lists - DB best practice

Hey guys,

I’m putting together an app with user profiles, and each user will be
able to update a section called ‘quick stats’.

This will involve lots of drop down lists with options like ‘hair
colour’, ‘height’, ‘Location’ etc etc.

The way I m thinking about doing it is to create new table for each
group of options like…


hair_colour
id
colour


and then in the profile table have…


profiles
id
user_id
hair_colour_id


It just semms like there must be a better way of doing this without
creating so many tables?

The way you are doing it IS best practise. These types of tables are
commonly called “codes” or “lookup” tables as they are just
code/description pairs. I usually prefix them with LU_ to distinguish
them from the main application tables.

The other way to do it would be to have a single table for all codes by
adding a “type” column that would differentiate the type of code but I
would advise against it.

-Paul

Paul:

Could you please expand on why it’s a best practice to use separate
lookup tables? Also, why do you advise against using a single table?

I’ve used a single table in the past and want to know why it’s sub-
optimal.

Thanks,

-Anthony

Anthony,

Normalization is the short answer. Lets say you need to add an
effective_date column to one of your lookup tables because the code is
only effective starting on 1/1/2007 for instance. If everything is
lumped in one table all of your codes would have to take on that new
attribute. This might not seem like a big deal but now your table
encapuslates two different data models. Eventually you end up with code
that says “if type = this do that”. Yes, you could just create a new
table at this point to model just the code that needs the
effective_date column and leave the existing table alone. But, that is
why it’s best practise to start that way.

However, what you have done in the past WILL work within the
limitations just outlined.

-Paul

Paul,

Thanks for the explanation-- and the long answer! I appreciate stuff
like this because it makes it easier to choose the correct way the
next time I’m confronted with choices.

I have been using single table inheritance (STI) for my lookups. When
something really different comes along-- you’re right-- I do have to
change to a separate table. The trade-off is that I don’t have to
repeat myself for the standard stuff: all my lookups have a display,
key value, effective_date range, and active flag.

With many people throwing around DRY as a good idea, do you think
that separate tables is always the best starting place? I think that
for tables that have a low chance of being structurally changed, STI
is a decent time saver. But, I’ve got to admit that my proverbial
short cuts don’t always pay dividends. What’s your feeling on STI?

Thanks,

-Anthony

Anthony,

I have not used STI before but it seems rails provides support in the
framework to overcome some of the negatives I mentioned.That is, it
provides separate models for each entity as if they each had their own
database table and yet stores all of the data in a single table. The
main issue with it from my perspective would be that even though you
can split it into separate models you cannot designate for instance
that columns A and B are only to be used by model A and column C only
with model B. However, for something like simple lookup tables it may
have it’s place. If your successfully using it in that regard I would
not change anything.

-Paul

What you are talking about is 3NF normalization. Its not always the
“best” solution. Upside is that you reduce the chance of invalid data.
The downside is that you start doing tons of joins in your queries which
slows things down.

I don’t even use STI. I use CONSTANTS to define select lists and make
sure to comment what the constant is in the migration definition so
others can look it up. You can avoid funky data input by performing
validation on the data relative to the constant before saving with:

validates_inclusion_of :hair_color, :in => HAIR_COLORS

I am not a DBA so my way is not necessarily the best way. But it works
and I haven’t ever had any data invalidity issues.