Hi Suppose one has a db to track bookmarks for multiple users. Let's assume that there are ten's of thousands of users, and that an average user may have 100 bookmarks. (Assume here that no two bookmarks are the same.) Is it a common practice to have a bookmark table for ALL users, thereby producing a table that has on order one million rows. Or, is it better to somehow group the users so the bookmark tables are smaller. If tables are split, is there a rule of thumb as to how many tables a db should have? Thanks -- Jim F.
on 2005-12-17 20:43
on 2005-12-17 21:43
One million records really isn't that big. Assuming 1K/record, that's only 1 GB. Heck, that might even fit completely in RAM. That said, here's some things to consider: Some old OSs have file system limits (4GB/32-bit) that preclude super-large tables, but that is largely a thing of the past unless you are working on a legacy system. Have to use indexes on large tables--but not too many indexes because too many indexes slow down inserts. Might consider moving seldom-used fields into a separate table if those fields are allowed to be NULL, making the bookmarks table as 'slim' as possible. Might consider changing all VARCHAR fields to CHAR fields since searches seem to go faster when the record size is fixed. Might consider moving all old records that haven't been accessed in X months to an 'archive' table that is only SELECTed if not found in main bookmarks table. Add an 'inactive' flag to each record and set it if record has been inactive after Y months so you can skip those records easily. Some databases (namely, Oracle) will allow you to spread a single table over several spindles, which speeds things up a lot. If you do a lot of sorts on the records it will probably be easier on the db to pull all the records off the db and sort on a client machine. Regardless, I wouldn't make many efficiency decisions right now until it becomes obvious that a single table isn't going to give you response times as fast as you require.
on 2005-12-18 00:07
My two cents worth. I used to work on a database with 3 million rows. The reason that dbs are so good at this is because they use indexes. Indexes search in a sort of binary way. Powers of 2 are really interesting numbers. For instance, what how the power rises and the number rises. 2^1 = 2 2^5 = 32 2^10 = 1000 2^20 = 1000000 2^21 = 2000,000 Thus when you double the records from 1M to 2M you add one twentieth in the time taken to find an indexed record. And as the number of records gets bigger, the increased workload per search is getting progressively smaller (the increase, that is). So, to a db 2M records is a joke and would be an indiscernible difference from 1M records. I doubt very much whether the cost of programming re more tables is worth it for the savings you would obtain on hardware to have the necessary extra power. My 3 million row db used to work like a charm. Lightening fast, brilliant searches, combining 5 and 6 tables at a time from a collection of 65 (mysql, by the way). I recommend deep breaths and much relaxation. Your db can handle the job. They really are incredible examples of fine programming and the extraordinary power of modern computers. bruce PS. 4M is 4x 1M but the total workload on the db to find any given indexed record increased from 20 to 22, an increase of 10%. Don't you find that awesome?