Database in RoR application


#1

I’m working on a new rails project that will have a very, very large
database. Initial insert will be in the hundreds of thousands of
records.

The database we are getting the information from is Microsoft SQL based,
and we have access to tab delimited update files daily.

We’re trying to write import scripts and use a well designed database on
our end. The data they are giving us is horribly designed. There are
dozens of fields in the database tables that hold multiple data, there
are several tables that are 90% similar and could be all consolidated
into one table, etc.

I just have a couple questions about normalization and the way we’re
designing the database.

First, as far as server resources go… would normalizing the following
cause higher or lower server loads:

The fields we are getting have multiple records inserted into each
field. For instance, a single field for an option might have “A, E, F,
X, Z” inserted which means that 5 different options are configured for
that selection. We want to have a lookup table in addition to a table
which explains what each option is, but I’m afriad it will add a lot of
computing time to the queries.

Second, the system is going to have a simple CMS, very similar to what
Shopify uses. We’ll allow the creation of both pages and blogs. It
appears that Shopify is using one database table for the pages
and blogs and it’s classifying a blog as a certain type of page,
one which can have it’s own entries (other pages). So all 3, blogs,
pages and blog posts, are all in the “pages” table.

Is this the way to go? It seems simple and elegant to me, but my
programmer thinks it’s confusing and that at the very least blog entries
should be separated into their own table.

I’m not sure what the best way to go is, so I’m open for advice.


#2

On 6/12/06, Brandon E. removed_email_address@domain.invalid wrote:

First, as far as server resources go… would normalizing the following
cause higher or lower server loads:

The job of an RDBMS is to join tables based upon queries. If you
intelligently craft your tables and indexes, you’ll be light years
ahead of pulling back raw data. :: shrug :: I’m having a rough time
defining what it would look like from the abstract description; but my
rule of thumb is at least third normal form. I like to stick to 3.5,
personally. Lookup tables cause very little performance hit in a
schema; remember RDBMS packages are optimized for this sort of thing;
it’s what they do. :slight_smile:

Second, the system is going to have a simple CMS, very similar to what
Shopify uses. We’ll allow the creation of both pages and blogs. It
appears that Shopify is using one database table for the pages
and blogs and it’s classifying a blog as a certain type of page,
one which can have it’s own entries (other pages). So all 3, blogs,
pages and blog posts, are all in the “pages” table.

I would say this depends on how tightly integrated blog pages are to
standard pages. :: shrug :: I haven’t studied shopify much other
than to drool at the pretty front pages and a couple of others, so I
can’t really comment on that. But again, I prefer to separate things
out as much as possible unless they are logically related. You don’t
want logic to have to parse and separate data that should have been in
two separate tables.

-Curtis


#3

“Brandon” == Brandon E. removed_email_address@domain.invalid writes:

I’m working on a new rails project that will have a very, very large
database. Initial insert will be in the hundreds of thousands of
records.

Generally speaking, that’s a pretty small database. You’re not really
going to seriously exercise a good database engine until you reach at
least several million new entries per day. So don’t worry about that
part. Your database will deal just fine.

First, as far as server resources go… would normalizing the following
cause higher or lower server loads:

Again speaking generally, normalizing your data will make it easier
for your database to do the job it was designed for. That is, it will
almost certainly improve your performance. At worst, you’ll have to
spend some time pondering exactly how to set up your indexes to best
match your queries.

	     Calle D. <removed_email_address@domain.invalid>
	 http://www.livejournal.com/users/cdybedahl/
 "Last week was a nightmare, never to be repeated - until this week"
			-- Tom, a.s.r