Migrating Data from Many Databases to One

I had some concerns about breaking Rails table relationships (:has_many,
:has_and_belongs_to_many, :belongs_to, :acts_as_*, etc) when moving
multiple
databases (using the same application) into one.

I seem to be stuck when trying to wrap my head around a good database
design
for what I would like to do or if I am just completely not thinking
straight. I guess this isn’t related specifically to Rails but to overal
DB
design.

So, for example I have two sites on two different databases - same
application. I want to consolidate and move Site B into Site A’s
database.
When I move it, the primary key column could get messed up and break
relationships with other tables. I think.

SITE A)

table “pages”
id, name, account_id

1,“Page One”, 55
2,“Page Two”, 55
3,“Page Three”, 55

SITE B)

table “pages”
id, name, account_id

1,“My First Page”, 453
2,“My Second Page”, 453
3,“My Third Page”, 453

When I import the rows from Site B to Site A, I get the following:

1,“Page One”, 55
2,“Page Two”, 55
3,“Page Three”, 55
4,“My First Page”, 453
5,“My Second Page”, 453
6,“My Third Page”, 453

The problem being the ids 4,5 and 6 which just pick up the
autoincrementing
column and if this pages table :has_many “files” then in that “files”
table’s “page_id” column will be pointing to the wrong row in the
“pages”
table.

Another thing to throw into the mix is that I build my URLs based on the
ID
column. Now, I heard on a podcast somewhere that it’s generally not a
good
idea to expose the IDs of your tables but I didn’t fully understand why
:slight_smile:
So, if site B had a public URL such as http://www.siteA.com/pages/view/1
and
the database moves to the new location than it won’t find “My First
Page”
… it will find Site A’s “Page One”.

Any best practices I am missing out on? Basically I am trying to
consolidate
my sites into using one database. But taking and restoring backups seems
to
be a nightmare.

Jeff

I came across setprimarykey today and I wonder if looking into
this
would help me out?

There must be some best practices for keeping the data movable without
relying on the primary keys for everything. I just don’t get it?

Jeff W. wrote:

I had some concerns about breaking Rails table relationships (:has_many,
:has_and_belongs_to_many, :belongs_to, :acts_as_*, etc) when moving
multiple
databases (using the same application) into one.

I have no direct experience of this myself, but an experienced colleague
explained to me that the preferred approach for handling data migration
is to use an intermediate database between the source databases and the
target databases, and to use scripts to:

  • extract data from the sources into the intermediate database

  • validate the data

  • transform data in the intermediate database

  • load the target database from the intermediate database.

The intermediate database may contain additional tables and columns to
help with transformation and validation - for example, you could keep
your original primary keys and foreign keys there (in non-key columns),
and the name of the source database, so that you could then create the
new foreign keys.

Everything should work from scripts, because migration steps must be
repeatable. This is necessary because:

  • validation may exposes data quality problems - these must be fixed in
    the source databases before rerunning extraction, validation and
    transformation;

  • data will continue to be updated in the source databases until the
    final migration and cutover is carried out (for large data migrations
    this is usually planned for a weekend of scheduled downtime).

[snip]

Another thing to throw into the mix is that I build my URLs based on the ID
column. Now, I heard on a podcast somewhere that it’s generally not a good
idea to expose the IDs of your tables but I didn’t fully understand why :slight_smile:
So, if site B had a public URL such as http://www.siteA.com/pages/view/1
and
the database moves to the new location than it won’t find “My First Page”
… it will find Site A’s “Page One”.

This is something that has been bothering me, too. When I’ve discussed
Hibernate’s use of synthetic primary keys (i.e. generated keys with no
business meaning) with the same colleague, who has done data modelling
for very large applications for over 25 years, he said that he had no
problem at all with synthetic keys, so long as they are never exposed
to users.

I have been naively wondering whether URLs might be an exception to this
(and meaning to discuss it with him). Your example demonstrates
conclusively that bookmarkable URLs, at least, should be treated as part
of the user interface and should be subject to the same rule. Thanks! I
shall need to bring this to the attention of a couple of projects I’m
working with.

If you want the old bookmarks to work with the new database you will
have to keep the old site and key information in the new database (or in
a separate mapping database) and redirect accordingly.

I hope that this helps, even if the answers are not what you were hoping
for.

regards

Justin

P.S. I shall probably be offline over the weekend.