One huge table or separate tables? design issue

addis_a · October 12, 2014, 10:52am

Hi, everyone,

I’ve learned Rails(Web development) on my own for two months, and came
to a
situation like this:

There are two tables A and B in my database. A 'has_manyB. For every user input, There is an entry in table A. At the same time, I need to add a huge number(say 100,000) of entries in B. As table A grows, table B seems to grow uncontrollably. There must be performance issues in later data manipulation operations, or even 'overflow problems. I use Mysql.

My initial thought was to split table B, for example, for every entry in
table A, create a new table (on the fly) for it. The questions are:

Is my thought right?
If yes, then how to design the models and migrations to represent the
relationship correctly?
If not, what are good practices to solve this kind of problem?

I’d be very grateful if you could only give some hints, doc or good
articles about it.

–
Best Regards

Yongbiao_L · October 12, 2014, 11:06am

Hi,

Creating multiple tables is not the key to solve this problem. If your
rows
are higher than 10 million records, then you have to make sure you
archive
the older ( Unused ) rows or solve it through MASTER - SLAVE db
approach.

Ideally when you’re worry about performance issues with databases , it
depends on the READ/WRITE speed expected from the db. And how often is
the
data being stored READ ?

Cheers
Vivek

On Sun, Oct 12, 2014 at 7:02 AM, Yongbiao L. [email protected]

Yongbiao_L · October 12, 2014, 11:18am

On 12 October 2014 02:32, Yongbiao L. [email protected] wrote:

Hi, everyone,

I’ve learned Rails(Web development) on my own for two months, and came to a
situation like this:

There are two tables A and B in my database. A 'has_many B. For every user input, There is an entry in table A. At the same time, I need to add a huge number(say 100,000) of entries in B. As table A grows, table B seems to grow uncontrollably. There must be performance issues in later data manipulation operations, or even 'overflow problems. I use Mysql.

Are you sure your basic design is appropriate? Having 100,000
associated records per user seems very high. Are you able to explain
what is in those records in case an alternative can be suggested?

Colin

Yongbiao_L · October 12, 2014, 12:35pm

On Sun, Oct 12, 2014 at 5:17 PM, Colin L. [email protected] wrote:

Are you sure your basic design is appropriate? Having 100,000
associated records per user seems very high. Are you able to explain
what is in those records in case an alternative can be suggested?

Thank you Colin! I’m implementing a railway ticket-booking system just
for learn purpose, not for production.
Now I try to explain it:
Suppose a train has 20 stations along the way. It has 1000 seats to
sell. Table A store one train’s information in a row. I want table B
to store every seat’s information. I design table B as follow

train_id:
start_city_id:
end_city_id:
seat_id:
availability: boolean type
date:

For one train, the number of start_city_id and end_city_id’s
combinations is 19+18+…+1=190, the number of possible date is
20(people can book tickets in 20 days), and possible seat_id’s number
is 1000. The total number of rows for one train is
190100020=3,800,000. So huge…

Actually I have more tables in database which are omitted as they are
unrelated.

Maybe I should compute the state of any seat every time the system
issues a seat? (Some words on the state of a seat: A train goes by
city L, M, N, O and there is only one seat left. If a person has
booked M->N’s seat, then L->O is not available, but L->M, N->O is.)

–
Best Regards

Yongbiao_L · October 12, 2014, 11:48am

On Sunday, October 12, 2014 9:51:55 AM UTC+1, Yongbiao L. wrote:

Is my thought right?
If yes, then how to design the models and migrations to represent the
relationship correctly?
If not, what are good practices to solve this kind of problem?

I’ve had hundreds of millions of rows in one table without problems
before. The important think is not just the number of rows but your
access patterns. For example I currently do something vaguely similar
with some mongo data: we create one collection per day (and drop the
collection from a few days ago) because dropping a collection is very
fast compared to deleting several millions documents from a larger
collection.

Another real world example of this is new relic: at least at one point
they created one table per customer per hour (
New Relic Architecture - Collecting 20+ Billion Metrics a Day - High Scalability -).
There’s is a somewhat specialised case: high insert performance and
specialised read/aggregate and delete patterns pushed them that way.

This is an uncommon pattern. While it allows a form of sharding and
avoids the need for blocking migrations on very large tables it will
make everyday life more complicated. As far as I’m aware you’ll be
pretty much on your own if you go down the path.

Odds are you don’t need this. If you do, think really carefully about
your access patterns first

Fred

Yongbiao_L · October 12, 2014, 12:56pm

On 12 October 2014 11:34, Yongbiao L. [email protected] wrote:

to store every seat’s information. I design table B as follow
20(people can book tickets in 20 days), and possible seat_id’s number
is 1000. The total number of rows for one train is
190100020=3,800,000. So huge…

The first point is that you should only add rows as seats are booked,
use that fact that there is no row to indicate that there is no
booking.
Secondly, what is the date? Is that the date of the journey or the
date of the booking? I don’t understand why you have multiplied the
number of trains by 20 days. If it is the date of the journey then
consider having a Journeys table which is a particular train on a
particular day.
Thirdly I don’t understand why you have 20 factorial for the city
combinations. Is it not true that if a seat is booked from station 1
to station 3 then that implies a booking 1 to 2 and 2 to 3? In which
case you do not separate rows for bookings 1 to 2, 2 to 3, and 1 to 3.

Yongbiao_L · October 12, 2014, 1:13pm

On 12 October 2014 11:54, Colin L. [email protected] wrote:

sell. Table A store one train’s information in a row. I want table B
combinations is 19+18+…+1=190, the number of possible date is
consider having a Journeys table which is a particular train on a
particular day.
Thirdly I don’t understand why you have 20 factorial for the city
combinations. Is it not true that if a seat is booked from station 1
to station 3 then that implies a booking 1 to 2 and 2 to 3? In which
case you do not separate rows for bookings 1 to 2, 2 to 3, and 1 to 3.

In fact thinking further, I would probably turn the problem around and
have a bookings table, specifying the journey, the stations, and the
seat (or seats).

Colin

Yongbiao_L · October 12, 2014, 1:20pm

you can also run a task that archives past dates to a duplicate table
since
you’ll
rarely need them. maybe you can keep 1 month data in the table that
you’re
writing to. this way, you won’t have to worry about the exponential
growth
of
the table you’re always accessing.

On Sun, Oct 12, 2014 at 7:12 PM, Colin L. [email protected] wrote:

Suppose a train has 20 stations along the way. It has 1000 seats to
For one train, the number of start_city_id and end_city_id’s
number of trains by 20 days. If it is the date of the journey then

.
For more options, visit https://groups.google.com/d/optout.

–

Yongbiao_L · October 12, 2014, 1:36pm

On Sun, Oct 12, 2014 at 7:18 PM, Jim Ruther N. [email protected]
wrote:

you can also run a task that archives past dates to a duplicate table since
you’ll
rarely need them. maybe you can keep 1 month data in the table that you’re
writing to. this way, you won’t have to worry about the exponential growth
of
the table you’re always accessing.

Yeah, I’ve also plan to do this kind of thing. Thank you anyway;-)

–
Best Regards

Yongbiao_L · October 12, 2014, 1:29pm

On Sun, Oct 12, 2014 at 6:54 PM, Colin L. [email protected] wrote:

The first point is that you should only add rows as seats are booked,
use that fact that there is no row to indicate that there is no
booking.

Secondly, what is the date? Is that the date of the journey or the
date of the booking? I don’t understand why you have multiplied the
number of trains by 20 days. If it is the date of the journey then
consider having a Journeys table which is a particular train on a
particular day.

It’s the date of the journey. I once thought about this method, the
problem is I don’t know how to implement it in rails. In Rails
tutorials and guidelines, there is a model file in app/model directory
for every table. If I need to create new journey tables, what’s the
model files and migration files look like?(Or I don’t need them any
more?)

Thirdly I don’t understand why you have 20 factorial for the city
combinations. Is it not true that if a seat is booked from station 1
to station 3 then that implies a booking 1 to 2 and 2 to 3? In which
case you do not separate rows for bookings 1 to 2, 2 to 3, and 1 to 3.

According to your first and third point, It’s because my design is
different. In my design one booking might indicates more than one row.
A row just means the state of a seat. So, if a seat is booked from
station 1 to station 3, then rows representing 1->3, 1->2, 2->3, 1->4,
1->5… all need to change. I do it in this way because I have
another table representing the number of left seats, and I think it’s
easier to update this table in my way(in this case, the number of leaf
seats from 1->3, 1->2, 2->3… decreases by one).

Thank you for your suggestion! I’ll dig it deeper and reveal my
concerns.

–
Best Regards

Yongbiao_L · October 12, 2014, 1:42pm

On Sun, Oct 12, 2014 at 7:28 PM, Yongbiao L. [email protected]
wrote:

I do it in this way because I have
another table representing the number of left seats, and I think it’s
easier to update this table in my way(in this case, the number of leaf
seats from 1->3, 1->2, 2->3… decreases by one).

This computation process seems the same for both our table designs…
I’ll use your approach, having a bookings table, specifying the
journey, the stations, and the seat (or seats).

–
Best Regards

Yongbiao_L · October 15, 2014, 4:45pm

I feel sorry to bother you all again here, but I still have a problem:
How to deal with the daily created tables in rails? I googled around
and found nothing.

The daily created tables are named on the dates, their names are all
different. I know how to create tables in a periodically executed rake
task. However, I surely cannot define that many(infinite) model
classes to manipulate these tables. This question might seem too naive
for you, as I asked it in previous posts but got no answers. Hope
somebody could give some hints. Thanks again!

–
Best Regards

Yongbiao_L · October 12, 2014, 2:34pm

On 12 October 2014 12:28, Yongbiao L. [email protected] wrote:

case you do not separate rows for bookings 1 to 2, 2 to 3, and 1 to 3.

According to your first and third point, It’s because my design is
different. In my design one booking might indicates more than one row.
A row just means the state of a seat. So, if a seat is booked from
station 1 to station 3, then rows representing 1->3, 1->2, 2->3, 1->4,
1->5… all need to change. I do it in this way because I have
another table representing the number of left seats, and I think it’s
easier to update this table in my way(in this case, the number of leaf
seats from 1->3, 1->2, 2->3… decreases by one).

In that case you have redundant data in your database (as 1->3 implies
1->2 and 1->3), which is almost always not the best way to do it.

Colin

Yongbiao_L · October 15, 2014, 5:00pm

On Oct 15, 2014, at 8:44 AM, Yongbiao L. [email protected]
wrote:

How to deal with the daily created tables in rails?

Don’t do that. It’s an unnormalized, broken, bordering-on-insane design.
The date is an attribute of some entity that belongs in its own table,
to which your other data is related. Figure out that entity, design it,
set up the relationship appropriately between the two.

–
Scott R.
[email protected]
http://www.elevated-dev.com/
(303) 722-0567 voice

Yongbiao_L · October 15, 2014, 5:02pm

On 15 October 2014 15:44, Yongbiao L. [email protected] wrote:

I feel sorry to bother you all again here, but I still have a problem:
How to deal with the daily created tables in rails? I googled around
and found nothing.

The daily created tables are named on the dates, their names are all
different. I know how to create tables in a periodically executed rake
task. However, I surely cannot define that many(infinite) model
classes to manipulate these tables. This question might seem too naive
for you, as I asked it in previous posts but got no answers. Hope
somebody could give some hints. Thanks again!

I thought you decided you were not going to have those tables, that
you would have a bookings table where each row corresponded to a
booking and recorded the journey, start and finish station and seat
information.

Colin

Yongbiao_L · November 4, 2014, 7:57pm

In the Rails world, what Scott wrote is basically right (bordering on
insane). ActiveRecord really won’t place nice with that kind of a
design.

Outside of the Rails world, “Big Data” and Reporting people do things
like that (date stamping table names) for aggregate reporting. There are
times when those kinds of designs aren’t such a bad idea (like you have
a massive amount of data and you want to build a reporting engine). But
if that’s your case, Rails isn’t the right tool for you.

You could still write a Rails app, but then use some kind of reporting
tool to extract data from your Rails database and copy it into another
reporting database where you will do your reporting operations (and
probably use Crystal Reports or something to deal with that data).

On Oct 15, 2014, at 10:58 AM, Scott R. [email protected]
wrote:

(303) 722-0567 voice

Jason Fleetwood-Boldt
[email protected]

All material © Jason Fleetwood-Boldt 2014. Public conversations may be
turned into blog posts (original poster information will be made
anonymous). Email [email protected] with questions/concerns about
this.