Migrations Shortcomings

seanhussey · October 3, 2006, 10:18pm

Migrations are supposed to be both a way to make your database
definition platform (db2, oracle, mysql, etc) agnostic, and a good way
to share database changes between team members.

While it works excellently for: creating a versionable database
definition, and not tying you to one database, it has several
shortcomings in my eyes.

The idea is, when you make a change, it becomes a new migration. Other
team members just update from source control, will now have a
024_add_something.rb, run rake migrate, and presto, they’re database
looks like every other developers.

The problem is, databases may change a lot. Picture a fictional
accounting system with:
001_add_users.rb,
002_add_invoices.rb,
003_add_payments.rb, etc

Now, during the course of development, the users table for instance may
change a lot. You may need to add new fields, maybe we changed the max
length for last names, now we want to track roles/user types, someone
stored home-phone as one field, but we want it broken out into area
code, exchange, etc. This could be a lot of little changes. If you add
a new migration for each one of these, it is MUCH harder than just
looking at one create_table call in one file to see the structure.

What we end up doing is just modifying the original 001_add_users.rb. A
simple rake migrate VERSON=0, followed by rake migrate brings you up to
date EXCEPT that you lose all your data. Fine in development, but now
you cap deploy to your test box with data that matters, and this
doesn’t work.

What do people think are best practices for this?

I somewhat feel that migrations, along with code in them to transform
data when moving back/forward make a lot more sense once the code has
reached a stable point, i.e. coinciding with actual releases.

Perhaps it would be cool if running migrate generated intermediate SQL
that you could check out before creating the tables, to avoid having
one table def scattered through 8 migration files.

seanhussey · October 3, 2006, 10:52pm

(skip down)

024_add_something.rb, run rake migrate, and presto, they’re database
length for last names, now we want to track roles/user types, someone

What do people think are best practices for this?

I think you’re going to shoot yourself in the foot if you keep doing
this,
but that’s just me I kind of view what you are doing as going back
in
history with CVS/subversion and changing a file rather than keeping the
history intact and moving forward…

I don’t see why having lots of little migrations is a bad thing… also,
take a look at the following rake tasks:

rake db:schema:dump # Create a db/schema.rb file that can be portably
used
# against any DB supported by AR

rake db:structure:dump # Dump the database structure to a SQL file

And at the annotate models plugin:
http://www.agilewebdevelopment.com/plugins/annotate_models

Which will spit out migration-like syntax as comments at the top of each
model based on the db at that time…

seanhussey · October 3, 2006, 11:52pm

Sean Hermany wrote:

What we end up doing is just modifying the original 001_add_users.rb. A
simple rake migrate VERSON=0, followed by rake migrate brings you up to
date EXCEPT that you lose all your data. Fine in development, but now
you cap deploy to your test box with data that matters, and this
doesn’t work.

You’ve just defeated the purpose of migrations. The ONLY time you should
ever go back and modify a migration is to fix transformation bugs. For
example, sometimes an old migration will depend on a model structure
that no longer exists or functions a different way. But the actual
effect of the migration should not change.

What do people think are best practices for this?

If you just need to look at the structure of the database, look at the
schema.rb file (assuming you have set config.active_record.schema_format
= :ruby )

I somewhat feel that migrations, along with code in them to transform
data when moving back/forward make a lot more sense once the code has
reached a stable point, i.e. coinciding with actual releases.

It makes sense throughout the whole development process. Especially if
you plan on distributing your applicaiton and people might be upgrading
from a fairly old release of your application.

Perhaps it would be cool if running migrate generated intermediate SQL
that you could check out before creating the tables, to avoid having
one table def scattered through 8 migration files.

Again, schema.rb.

-matthew

seanhussey · October 3, 2006, 10:26pm

I totally agree - I do the same things as you.

Also, it’s annoying when I make a new migration (lets say 005) and then
my partner makes one too (his will also be 005!). If you only submit
code every few days then it gets tiresome renaming them.

-Ben L.

seanhussey · October 3, 2006, 11:58pm

Ben L. wrote:

I totally agree - I do the same things as you.

Also, it’s annoying when I make a new migration (lets say 005) and then
my partner makes one too (his will also be 005!). If you only submit
code every few days then it gets tiresome renaming them.

That is a communication problem, not a migrations problem. You need to
coordinate schema updates just like you have to avoid editing the same
part of the same source file.

-matthew

seanhussey · October 4, 2006, 12:02am

On Oct 03, 2006, at 9:14 pm, Sean wrote:

024_add_something.rb, run rake migrate, and presto, they’re database
change a lot. You may need to add new fields, maybe we changed the max
length for last names, now we want to track roles/user types, someone
stored home-phone as one field, but we want it broken out into area
code, exchange, etc. This could be a lot of little changes. If you add
a new migration for each one of these, it is MUCH harder than just
looking at one create_table call in one file to see the structure.

Sean,

I pretty much second Philip’s opinion on this. You shouldn’t use the
migrations as a way to view the current state of the database. They
are really designed to allow you to move from one known database
state to another with minimal effort - especially if your application
is deployed to multiple locations. If I’m right, every time you
migrate the database, the rake migrate task dumps the current schema
(in either Ruby or SQL format).

What we end up doing is just modifying the original
001_add_users.rb. A
simple rake migrate VERSON=0, followed by rake migrate brings you
up to
date EXCEPT that you lose all your data. Fine in development, but now
you cap deploy to your test box with data that matters, and this
doesn’t work.

What do people think are best practices for this?

I think if you really want to go down the route of “clean”
migrations, the best you can do is this:

during development, do rake migrate VERSON=0 as often as you want
while writing the migrations
deploy your application at version N
for future development, do rake migrate VERSON=N until your next
set of migrations is ready
deploy version M>N
etc…

You can combine this with a continuous integration system. I haven’t
used DamageControl (correct me if that isn’t a Ruby CI tool!) but I
imagine it would work happily like that.

In the absence of a CI tool, I’d just relax, write as many migrations
as you need to, and let the rake voodoo take care of the rest!

Ashley

seanhussey · October 4, 2006, 2:17am

On 10/4/06, Matthew I. [email protected] wrote:

You’ve just defeated the purpose of migrations. The ONLY time you should
ever go back and modify a migration is to fix transformation bugs. For
example, sometimes an old migration will depend on a model structure
that no longer exists or functions a different way. But the actual
effect of the migration should not change.

I have had an issue with this regarding migrations in the past.

Example

01-Make a user migration

05-Modify user with a new migration

10-Add a column to user with validtaions in the model

11-other migrations that I don’t want to loose

Now if I want to roll back to version 5 in the database? My application
is
dead.

The applications has evolved and cannot roll back, except through the
use of
svn or similar.

Am I missing something fundamental in the use of migrations, because to
me
it seems that the database and application are not completely
independant,
where migrations seem to assume that they are.

seanhussey · October 4, 2006, 2:46am

Daniel ----- wrote:

Example

01-Make a user migration

05-Modify user with a new migration

10-Add a column to user with validtaions in the model

11-other migrations that I don’t want to loose

Now if I want to roll back to version 5 in the database? My application
is
dead.

The applications has evolved and cannot roll back, except through the
use of
svn or similar.

Exactly, through svn or similar. What’s the confusion?

Am I missing something fundamental in the use of migrations, because to
me
it seems that the database and application are not completely
independant,
where migrations seem to assume that they are.

No, they aren’t independent. I don’t see how migrations assume they are.

-matthew

seanhussey · October 4, 2006, 3:01am

Daniel:

The problem is just as bad or worse if you use Sean’s method of rolling
back to zero with every change. You then have absolutely no
record/history of those changes. If you roll back code through your cc
tool, without any history of the database changes, then you’re just as
stuck. You need to be tracking and rolling back both code and database
schema changes, and iterative migrations combined with cc tools give you
that.

c.

Daniel ----- wrote:

On 10/4/06, Matthew I. [email protected] wrote:

You’ve just defeated the purpose of migrations. The ONLY time you should
ever go back and modify a migration is to fix transformation bugs. For
example, sometimes an old migration will depend on a model structure
that no longer exists or functions a different way. But the actual
effect of the migration should not change.

I have had an issue with this regarding migrations in the past.

Example

01-Make a user migration

05-Modify user with a new migration

10-Add a column to user with validtaions in the model

11-other migrations that I don’t want to loose

Now if I want to roll back to version 5 in the database? My application
is
dead.

The applications has evolved and cannot roll back, except through the
use of
svn or similar.

Am I missing something fundamental in the use of migrations, because to
me
it seems that the database and application are not completely
independant,
where migrations seem to assume that they are.

seanhussey · October 4, 2006, 3:15am

Until the application is deployed for the first time, I always use a
single
migration that I constantly edit and reload. I use a db:reload task for
this:

namespace :db do
desc ‘Wipe the database, migrate, and load fixtures’
task :reload => :environment do
raise “Will not run db:reload in production” if RAILS_ENV ==
‘production’

connection = ActiveRecord::Base.connection
database = connection.instance_variable_get('@config')[:database]

connection.drop_database(database)
connection.create_database(database)
connection.execute("use #{database}")

Rake::Task['db:migrate'].invoke
Rake::Task['db:fixtures:load'].invoke

end
end

-Jonathan.

seanhussey · October 4, 2006, 3:09am

On 10/4/06, Matthew I. [email protected] wrote:

10-Add a column to user with validtaions in the model

Exactly, through svn or similar. What’s the confusion?

Well I guess where I’m not sure is what is the point of being able to
roll
back the schema independantly of the app, when the application can’t
cope
with it.

In order for a db to be rolled back, you must use the latest version of
you
apps (trunk) migration files to go back to a version that is compatible
with
svn version xxx.

It’s not like you can checkout svn version xxx and run the migration in
it
to roll back the database, since the new migration down methods aren’t
there.

Am I missing something fundamental in the use of migrations, because to
me
it seems that the database and application are not completely
independant,
where migrations seem to assume that they are.

No, they aren’t independent. I don’t see how migrations assume they are.

If I can roll back migrations without rolling back an app then I’d say
migrations assume that they are independent. Just my opinion though and
I’d
really like to understand this better.

I use migrations always, I’m just not sure how robust they are, or what
is
the best way to use them robustly.

Cheers

seanhussey · October 4, 2006, 10:27am

there.
Not exactly. If you wish to roll back from revision 9678 of your source
to 9542, you can look in source control and see what the most recent
migration revision was at the time of 9542’s checkin. You can then
migrate back down to that version, and THEN revert your codebase to
revision 9542, and you will be in sync.

As far as looking at schema.rb for the overall view of your database, I
stupidly was not aware/thinking of this.

I still think that in development, the use of multiple little
migrations is a bad choice. Migration number collisions have happened
multiple times that I’ve seen, and I do not think it is a people
problem. If I want to create a new table (hence new migration), I
shouldn’t have to yell around the office, or circulate an email saying
“Did anyone create a new migration they didn’t check in? I’m making one
now, so svn ci so we don’t collide”.

Just because I make a new migration, I may not want to check it in
right away. What if I define a table, write the model code and a
controller, work with it a little, and decide I want to lay the table
out different? I haven’t checked in, I should be able to just modify
the migration. Write more code, test some more, change the table again,
etc. When I feel satisfied enough to check in, only then should I have
to.

The idea of “concurrent” version control is that I shouldn’t have to be
totally in sync with my team for everyting. If I check in a conflict,
the tool (source control) will tell me that it can’t merge it, and then
I will use my brain to merge the files myself.

On that note, in the case of migration conflicts, if dev A checks in
012_add_videos.rb and dev B checks in 012_add_albums.rb, source control
is fine with it. If the migration is one file, the conflict will
usually be flagged during check in.

So far I’m much more convinced that a single migration in development
makes much more sense, and the versioning functionality of multiple
migrations makes a lot more sense in moving between stable releases. As
for all of the people that talk of “throwing away the history”, that’s
what source control is for - I can always look at previous revisions in
the history to see previous changes.

seanhussey · October 4, 2006, 7:30pm

Daniel ----- wrote:

Well I guess where I’m not sure is what is the point of being able to
roll
back the schema independantly of the app, when the application can’t
cope
with it.

Rolling back is not a common act, but there are cases where you might
want to do it. One practical use I have found for rolling back is to fix
a migration error. If I find that a migration didn’t transform the data
correctly, I’ll want to roll back, fix it, and migrate forward to the
latest schema. Although this may not be proper. Some may argue that you
should write a new migration for every change.

I’ll admit, sometimes I am lazy about implementing the self.down()
method. Most of the time you are migrating forward.

In order for a db to be rolled back, you must use the latest version of
you
apps (trunk) migration files to go back to a version that is compatible
with
svn version xxx.

It’s not like you can checkout svn version xxx and run the migration in
it
to roll back the database, since the new migration down methods aren’t
there.

Presumably if you are running the latest DB schema, you also have the
corresponding svn revision (probably HEAD) checked out already. So you
roll back the database first and then svn. If, for some, reason you have
just the database you’ll have to checkout HEAD first, then roll back the
schema, then roll back svn. Fortunately, this is not too common.

Am I missing something fundamental in the use of migrations, because to
me
it seems that the database and application are not completely
independant,
where migrations seem to assume that they are.

No, they aren’t independent. I don’t see how migrations assume they are.

If I can roll back migrations without rolling back an app then I’d say
migrations assume that they are independent. Just my opinion though and
I’d
really like to understand this better.

In some ways they are dependent and in some ways they are independent.
Migrations don’t “assume” anything, only people do.

It sounds like your real problem is that there is no tight integration
between source versioning and database versioning. I can’t think of any
way to make the integration better. Since the database usualy doesn’t
reside within the source tree, there is no way to for svn to be aware of
it other than through migrations that you create manually.

I use migrations always, I’m just not sure how robust they are, or what
is
the best way to use them robustly.

Well, I don’t know how “robust” you’d want them to be without becoming
overly complicated.

-matthew

seanhussey · October 4, 2006, 7:41pm

Jonathan V. wrote:

Until the application is deployed for the first time, I always use a
single
migration that I constantly edit and reload. I use a db:reload task for
this:

But what is the point of that? Migrations are easy enough to write, and
you are going to have to start writing them eventually, so why not do it
from the very beginning?

I don’t know about you, but I like to keep my development database
populated. It makes it so much easier to see what is going on and how it
will look. It would be pain to go back and repopulate it every time I
made some trivial change to the database schema. I guess I could
maintain some system of development fixtures, but that seems like far
more trouble than it is worth. Also, what about other develpoers? Maybe
they don’t want to have to reload their development database every time
you make a trivial change.

-matthew

seanhussey · October 4, 2006, 8:20pm

Sean Hermany wrote:

I still think that in development, the use of multiple little
migrations is a bad choice. Migration number collisions have happened
multiple times that I’ve seen, and I do not think it is a people
problem. If I want to create a new table (hence new migration), I
shouldn’t have to yell around the office, or circulate an email saying
“Did anyone create a new migration they didn’t check in? I’m making one
now, so svn ci so we don’t collide”.

shrug Like I said, you have make sure that people don’t keep
uncommitted changes to a source file you want to work on. MIght as well
also ensure that they keep migrations uncommitted. Migration collisions
are the least of your worries if developers are not commiting their
changes regularly.

Try using http://www.campfirenow.com/ or IRC or whatever for your
development coordination. Mandate that anyone working on the codebase
log in to the chatroom so you can avoid yelling in the office or sending
email. Campfire has other development advantages such as sharing
screenshots. Works great for us. I’ve never once had a migration
collision.

It is a people problem.

Just because I make a new migration, I may not want to check it in
right away. What if I define a table, write the model code and a
controller, work with it a little, and decide I want to lay the table
out different? I haven’t checked in, I should be able to just modify
the migration. Write more code, test some more, change the table again,
etc. When I feel satisfied enough to check in, only then should I have
to.

You should probably have your data model planned out and coordinated a
little better. You should at LEAST let people know (see the chatroom
suggestion) that you are making such drastic changes. Again, this is a
people problem.

The idea of “concurrent” version control is that I shouldn’t have to be
totally in sync with my team for everyting.

You don’t have to be totally in sync. Just for major changes like the
kind like you mentioned above.

If I check in a conflict,
the tool (source control) will tell me that it can’t merge it, and then
I will use my brain to merge the files myself.

Indeed, it would be nice if svn was somehow aware of incremental
migrations. But it isn’t. So you either coordinate with other developers
or you lose a good portion of what migrations have to offer.

On that note, in the case of migration conflicts, if dev A checks in
012_add_videos.rb and dev B checks in 012_add_albums.rb, source control
is fine with it. If the migration is one file, the conflict will
usually be flagged during check in.

So far I’m much more convinced that a single migration in development
makes much more sense, and the versioning functionality of multiple
migrations makes a lot more sense in moving between stable releases. As
for all of the people that talk of “throwing away the history”, that’s
what source control is for - I can always look at previous revisions in
the history to see previous changes.

Small, incremental schema updates are a LOT easier to maintain and
troubleshoot. And I really don’t want to have to reload my development
database every time someone makes a trivial change to the schema. That
is far more trouble than what it takes to properly coordinate and plan
development.

Also, what if you are writing an app that is open source and you have
users who are checking out arbitrary versions and putting them into
production? Or what if you need to apply certain updates to you
production site that involve schema changes? If you have small,
incremental updates, this is no problem.

-matthew

seanhussey · October 5, 2006, 1:51am

On 10/5/06, Matthew I. [email protected] wrote:

In some ways they are dependent and in some ways they are independent.
Migrations don’t “assume” anything, only people do.

I’ll pay that one

thanx for the response.

seanhussey · October 5, 2006, 3:27am

Hi !

2006/10/3, Daniel N [email protected]:

01-Make a user migration

05-Modify user with a new migration

10-Add a column to user with validtaions in the model

11-other migrations that I don’t want to loose

What you need to do is to add models INSIDE your migration:

class MigrateEstimateAddressesToAddress < ActiveRecord::Migration
def self.up
Estimate.find(:all, :conditions => [‘address_line1 IS NOT NULL AND
LENGTH(address_line1) > 0’]).each do |estimate|
…
end

Address.update_all("addressable_type = 'Estimate'",

“addressable_type LIKE ‘MigrateEstimateAddressesToAddress%’”)
end

def self.down
Address.find(:all, :conditions => [“addressable_type =
‘Estimate’”]).each do |address|
…
end

Address.delete_all("addressable_type = 'Estimate'")

end

class Address < ActiveRecord::Base
belongs_to :addressable, :polymorphic => true
end

class Estimate < ActiveRecord::Base
has_one :address, :as => :addressable
end
end

I found this trick from Toolman Tim here:
http://toolmantim.com/article/2006/2/23/migrating_with_models

This is a GREAT way to manage your models. Plus, you can create
models just for your migrational needs. Maybe you don’t need the full
validation model, or you don’t need the relations. It doesn’t matter,
this model is only used during migrations.

The only caveat is as described above. Polymorphic assocations must
be manually corrected. STI doesn’t seem to have this problem.

I agree though that plain version numbers can cut it for ordering
migrations when multiple developers are used. I thought of using UTC
timestamps instead, but it still doesn’t help, because I need to run
unrun migrations.

What we’d really need is a table that knew which migrations were
applied. A “patch” oriented way of working, if you will. Then, we
need dependencies between migrations… This is quickly becoming hard
to manage.

What I really want is this:

class MigrateEstimateAddressesToAddress < ActiveRecord::Migration
depends_on :create_addresses, :create_estimates

def self.up
end

def self.down
end
end

The DB should have a table such as this one:

CREATE TABLE migrations (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
applied_at DATETIME NOT NULL)

For dependencies, we’d need another table, obviously. You get the
drift.

Hope that helps !

FranÃ§ois Beausoleil
http://blog.teksol.info/
http://piston.rubyforge.org/

seanhussey · October 5, 2006, 7:45am

On 10/3/06, Matthew I. [email protected] wrote:

part of the same source file.

Yes, we have to do this every time. Its a “shout over the wall” kind
of thing. In practice, we walk around and get a census before starting
any migration.

To answer the original poster, I’d suggest some other way of
enumerating the migration numbers. This could be like annotating the
number with a userid :

007-edh_create_users.rb
007-mm_add_pay_date_to_payments.rb

This would alleviate the issue of duplicate names, at least. I realize
it doesn’t fix the problem of incompatible changes to the same table,
but that is really a design team issue.

In my ideal world, I’d make the migration naming thing hookable. Then
each team could devise their own coordinated enumeration scheme. A
shared DB, for instance, to which you insert a new record every time
you generate a new migration/model, the ID becomes the schema version
#. Combine that with one of the other poster’s rake task that does an
update, a migration, then a commit all inside a transaction making the
whole thing atomic.

Ed

-matthew

–
Posted via http://www.ruby-forum.com/.

–
Ed Howland

seanhussey · October 5, 2006, 8:46pm

We have four developers working on a large project with over 300 tables.

Typical work flow is to do a quick SVN update before adding any new
migrations. If there are new migrations added by other users I’ll do a
rake migrate and run the tests to make sure there are no conflicts with
my pending code changes.

If someone has already taken my pending migration numbers I’ll just
change my numbers prior to SVN check-in. Migration numbers are fluid for
everyone right until check-in. We’re currently at migration #72 and
there really haven’t been conflicts that slowed the process down.

Our productivity would plummet if we were constantly stepping down/up
from version 0.

seanhussey · October 15, 2006, 2:00pm

It makes it very easy to edit migration files that have already been
loaded
into the development database. It wipes the slate and starts afresh. I’m
never worried about the data in my development database as I usually use
the
test fixtures as development data, it’s standard practice for the
projects I
work on.

Without such a rake task, if I edit a migration that has already been
loaded
into the development database it can be a pain to rerun it with the
changes.
Often you can simply roll back to the previous migration then migrate up
again, but that won’t work if you’ve also altered the downwards
migration.

At least for me, it is much less work to just type rake db:reload

-Jonathan.