Dynamic tablenames

Hi List!

May I tap into your combined common sense ?
I have a legacy logging app that needs to be modernized. At the moment,
Data is going to be stored in 5 tables, one table is going to contain
more than 200 million and 2 others about 60 million rows. Mysql will be
used, unless someone sees a major advantage in using something else (at
the moment data is stored in a raw positioned file format, which I’m not
keeping).

For performance reasons I’m thinking about storing the data for each
entity in separate tables. So instead of a ‘logdata’ table with >200
million rows, I’d create ‘logdata_001’, ‘logdata_002’,… tables where
the rowcount of even the biggest entity logdata stays below 3 million.

Rails seems to have no problem with me setting the tablename
(set_table_name) on each incoming request before accessing the database.

I have this gut feeling though I’m overlooking something nasty.
Anyone have any comments/suggestions ? Am I about to create my own
personal maintenance nightmare ?

Thanks in advance for your insights!

Piet.

I’m doing a similar thing [setting information about the database
on-the-fly] but I’m having to change a model’s table_name_prefix. The
one
concern I have [though it hasn’t come up so far] is the possibility of
two
requests being made at the same or close to the same time and one
mistakenly
using the others’ setup. I’m thinking of using transactions, perhaps. I
dunno but I too am interested in if anyone else has done on-the-fly type
database configuring.

RSL

Nothing from one request is shared with another one (it’s the basis of
the “shared nothing” architecture) *.

If you use a before_filter on the controller you should be ok (this is
also how acts as authenticated works)

bye
Luca

  • with he exception of the session data which is shared by all the
    requests coming from the same browser session.

2007/1/23, Russell N. [email protected]:

Why not just have a table with >200m rows? As far as I’m aware, there
are no
real performance advantages with splitting it up. The point of a
database is
that it can just eat up data without without causing any performance
issues.

Although I can’t imagine what on earth you’d want to 200m rows for!
Modeling
the cells of a small rodent, maybe?

-Nathan

This is incredibly great news for me! Thanks so much Luca and thanks
Piet
for bringing up what I’m just now getting around to coding.

RSL

On 1/23/07, Piet H. [email protected] wrote:

Hi List!

May I tap into your combined common sense ?
I have a legacy logging app that needs to be modernized. At the moment, Data
is going to be stored in 5 tables, one table is going to contain more than
200 million and 2 others about 60 million rows. Mysql will be used, unless
someone sees a major advantage in using something else (at the moment data
is stored in a raw positioned file format, which I’m not keeping).

I’d go with Postgres unless you have some really solid reasons for
staying with mysql.

I suggest you set aside a couple of days to configure both db’s. (pg
ships with very modest defaults – performance will suffer if you
don’t give it a bit more shared+working memory - and do turn on auto
vacuum), load the data, vacuum analyze, add some indexes, and then try
to run a few queries.

Try some fancy subqueries, for instance. IIRC, mysql performance
suffers badly from not being able to use an index more than once per
query, or something along those lines.

For performance reasons I’m thinking about storing the data for each entity
in separate tables. So instead of a ‘logdata’ table with >200 million rows,
I’d create ‘logdata_001’, ‘logdata_002’,… tables where the rowcount of
even the biggest entity logdata stays below 3 million.

Rails seems to have no problem with me setting the tablename
(set_table_name) on each incoming request before accessing the database.
I have this gut feeling though I’m overlooking something nasty.
Anyone have any comments/suggestions ? Am I about to create my own personal
maintenance nightmare ?

Keep the data in rows, that’s what RDBMS are made for.

Just make sure you add indexes on the FKs. Periodically clustering on
that index could be beneficial too, if you need to do a lot of seq
scans on single entities.

There’s no benefit to splitting the data across multiple tables and it
makes your app more complex. And don’t forget about what happens when
you’re asked to do something that spans all those entities, for
whatever reason.

Isak

FWIW, there can be significant performance gains using partitioned
tables, assuming you have something to key on, such as a year or
something which can be logically segmented.

It appears the MySQL 5.1 includes partitioned tables based on the
documentation here:
http://dev.mysql.com/doc/refman/5.1/en/create-table.html

As for having 200m rows, it is common in data warehouses when you are
you dealing with historical data over a 5 year or greater time frame.
Hopefully I’ll get the opportunity to speak about it at RailsConf
2007, and if so come up and say hi. :slight_smile:

V/r
Anthony E.

On 1/23/07, [email protected] [email protected] wrote:

On 23/01/07, Piet H. [email protected] wrote:

Thanks in advance for your insights!

Piet.


Cell: 808 782-5046
Current Location: Melbourne, FL

On Jan 23, 9:31 am, “Russell N.” [email protected] wrote:

I’m doing a similar thing [setting information about the database
on-the-fly] but I’m having to change a model’s table_name_prefix. The one
concern I have [though it hasn’t come up so far] is the possibility of two
requests being made at the same or close to the same time and one mistakenly
using the others’ setup. I’m thinking of using transactions, perhaps. I
dunno but I too am interested in if anyone else has done on-the-fly type
database configuring.

RSL

How are you all setting table_name or table_name_prefix dynamically? I
initially thought I could just set a session variable (since I want
this value to persist for a given user across multiple requests until
it is reset by the user) and then use in my model:
class TestTable < ActiveRecord::Base
self.table_name_prefix = session[:my_little_variable] + ‘_’
end

But this doesn’t work, since apparently session data is not directly
available in the models. Not every model I’m working with needs a table
prefix, so I don’t want to set this globally. How do I take a variable
selected by the user and cleanly use this as a table name prefix for
the models that need it?

On 1/27/07, Russell N. [email protected] wrote:

the specific model as you might think. Lots of fun confusion on that for me.
#{table_name}_othermodels.* FROM #{table_name}_othermodels WHERE

RSL

Hmm, I posted a reply using the Google G. interface and it’s not
appearing (even though my first reply to this post came through just
fine).
Lots of posts elsewhere about this annoying Google G. issue. So at
the
risk of repeating myself, I’ll try sending an email reply to the group
to
see if it goes through or disappears into the ether.

Anyway, first off, thanks for your reply. My particular situation is
that I
have separate tables for individual US states with the two letter state
abbreviation as table prefixes ( e.g. ‘ny_things’, ‘ca_things’). For a
variety of reasons these tables can’t be merged into a single table.
Rather
than define (essentially the same) models for every state, I’d rather
just
have the user select a state, which will the set the state abbreviation
in
the users session data. From then on, all tables needing a prefix will
be
dynamically set according to the session data. Some tables do not need a
prefix at all, so yes, I ran into the same problem trying to set
‘table_name_prefix’ on an individual model and found it was being
prefixed
to all table names.

So for my first stab at a solution, I’m using a before_filter in my
controllers which call a private method in application.rb (so it can be
shared between several controllers) which calls the ‘table_name’ methods
for
those models needing this:

class ApplicationController < ActionController::Base
def set_table_names
Thing.table_name = “#{session[:state_abbreviation]}_things”
end
end

Before this method is called, I call another before_filter method that
makes
sure the the session variable is set and valid. If not, it redirects the
user to select an appropriate value. This is working perfectly for now.
I’m
using a regular expression to ensure that the state abbreviation is in
fact
two and only two lower case letters before it is set in the session, so
I
think I’m fairly safe with this approach.

As far as setting the table name within a model, I’ve always been able
to
just do:

class ExampleModel < ActiveRecord::Base
self.table_name =
“some_crazy_table_name_not_conforming_to_conventions”
end

Cheers,
John-Scott

http://blog.portable-eggplant.org

I’ve come to the same result as you with setting the table_name via a
filter
but I still go through the Site model to change the child table_names
because, in my case, the child table names all use an attribute of the
site
instance.[Site#db_name, which in turn comes from Site#host].

As far as the self.table_name methodology. I had a lot of trouble with
that
and finally gave up because it wants to set that when the model loads
and
since I’m actually calling an attribute from another model to get a
string,
the whole thing would collapse on me. I had the same problem with custom
finders until I came across a post somewhere that talked about using
single
rather than double quotes. I really should have mentioned that earlier
too.

RSL

What’s the criteria for which models need a new table name/prefix? If it
centers around another model then you can put a table_prefix attribute
on
that model and call it thusly:

def self.set_tables
OtherModel.set_table_name “#{table_prefix}_othermodels”
end

Don’t use table_name_prefix as it sets ActionRecord::Base itself and not
the
specific model as you might think. Lots of fun confusion on that for me.
Also, you can’t [or I couldn’t figure out how to] set the table_name on
the
model itself. For me, the models that needed new table names [databases
actually] were all related to what site was being used so my Site model
has
the set_tables method. Call that method from a filter on the appropriate
controller methods and voila! On-the-fly table name changes. If you have
a
relationship between the models, you might also need to tweak your
associations code like this:

class Whatever
has_many :othermodels, :finder_sql => ‘SELECT
#{table_name}_othermodels.*
FROM #{table_name}_othermodels WHERE
#{table_name}_othermodels.whatever_id =
#{id}’

I hope that’s not too obtuse. It’s just that my code uses this
functionality
to change databases and I don’t want to paste it and have you thinking
that
it’s merely for changing table names. However, if anyone is wondering…
If
you do need to change the database, you don’t have to establish a new
connection but can use Model.set_table_name “#{db_name}.table_name”.
You
can use the same trick in the association to select across two
databases.

God, I hope that wasn’t too obscure. Apologies in advance.

RSL