I’m looking for a way to avoid race conditions on complicated
transactions. Some transactions I do need to update multiple values in
multiple tables, and create new objects.
It looks like the safest way to do this is to setup a single process
that would just implement these transactions and have the rest of the
application call it for services.
Where can I find some information on how to set it up?
Create the transaction process
Communicate to it from the application processes (mongrel cluster)
I’m looking for a way to avoid race conditions on complicated
transactions. Some transactions I do need to update multiple values in
multiple tables, and create new objects.
It looks like the safest way to do this is to setup a single process
that would just implement these transactions and have the rest of the
application call it for services.
Where can I find some information on how to set it up?
Create the transaction process
Communicate to it from the application processes (mongrel cluster)
Guarantee that this process is up after reboot
You can run it either through a UNIX cron (or Windows “at”) or
BackgrounDRb [1]. In both cases you’ll want to store state about its
execution somewhere external (i.e. the database) so that if the server
is rebooted during processing, it’ll restart work when the server goes
online again.
I’ve found a combination of BackgrounDRb workers (in your case: a single
one) and a seperate BackgrounDRb “maintenance worker” that runs on a
schedule to be a great match. You can communicate with the workers from
within Rails using BackgrounDRb’s MiddleMan.
Nick, that does seem to be more what I need to use. When I use a
transaction, does that mean nothing else gets inserted in the middle to
the
database?
I’m changing account, balance, creating account lines and creating
invoices.
Each, goes to a different table in the (same) database. If I just wrap
it
with a transaction, will that imply that all the read and write actions
within the transaction happen without any other writes to the database
by
other Rails instances?
Amir Helzer wrote the following on 26.07.2007 20:29 :
Nick, that does seem to be more what I need to use. When I use a
transaction, does that mean nothing else gets inserted in the middle
to the database?
I’m changing account, balance, creating account lines and creating
invoices. Each, goes to a different table in the (same) database. If I
just wrap it with a transaction, will that imply that all the read and
write actions within the transaction happen without any other writes
to the database by other Rails instances?
No. You must use serialized transactions for that, ActiveRecord doesn’t
use them by default. Even with them you only get transactions semantics:
either all of it succeeds (no incompatible concurrent writes) or it
rollbacks all changes (and you are responsible to do it all again in the
new context).
Nick, that does seem to be more what I need to use. When I use a
transaction, does that mean nothing else gets inserted in the middle to
the database?
I’m changing account, balance, creating account lines and creating
invoices.
Each, goes to a different table in the (same) database. If I just wrap
it with a transaction, will that imply that all the read and write actions
within the transaction happen without any other writes to the database
by other Rails instances?
Sorry for misunderstanding – this is exactly the use case where you’d
want to use transactions. Transactions guarantee the robustness that you
seek by locking the database. It’s controlled on the database, not by
Rails, so it’s guaranteed that other Rails instances won’t “interfere”.
If you’re worried about performance and have the rapid pace of
transactions that you say you have, then I can recommend to pick up a
book about database management and weigh your options. Many simultaneous
database transactions can congest the database, in which case you might
still be better off with a batch run like the one I was referring to.
But really, that’d be a pretty complex scenario you’d be running –
don’t worry about it too much until you hit that wall.
Amir Helzer wrote the following on 26.07.2007 20:29 :
I’m changing account, balance, creating account lines and creating
invoices. Each, goes to a different table in the (same) database. If I
just wrap it with a transaction, will that imply that all the read and
write actions within the transaction happen without any other writes
to the database by other Rails instances?
No. You must use serialized transactions for that, ActiveRecord doesn’t
use them by default. Even with them you only get transactions semantics:
either all of it succeeds (no incompatible concurrent writes) or it
rollbacks all changes (and you are responsible to do it all again in the
new context).
That’s a truly interesting point you raise.
It should be easy to patch this into ActiveRecord. All it needs is a
little #supports_serializable_transactions? and #begin_db_transaction(*attr) love. I’ll add it to my todo list.
Of course, it seems that you could always do an
ActiveRecord::Base.execute(database_dependent_serializable_start_statement)
yourself as a quick hack to start a serializable transaction.
to provide transactions with SERIALIZABLE isolation
level for PostgreSQL
class ActiveRecord::Base
# This is a transaction in serialized mode
def self.serialized_transaction(*objects, &block)
if self.connection.is_a?
ActiveRecord::ConnectionAdapters::PostgreSQLAdapter
serial_sql = ‘SET TRANSACTION ISOLATION LEVEL SERIALIZABLE’
# pass a modified block to the original transaction method
self.transaction(*objects) {
# Serialize on this connection
connection.execute(serial_sql)
objects.each { |o|
# Serialize on each object connection if any
o.connection.execute(serial_sql)
}
block.call
}
else
self.transaction(*objects) {
block.call
}
end
end
end
The problem is that it is far from enough, you either:
have to lock the affected rows in the transaction as documented in
ActiveRecord::Base.find (in this case serialized statements aren’t
needed, but you’ll have to make sure that all affected rows are properly
locked not only in the method you want to protect but in every other
possible access to the same data) or,
abstract these modifications and make sure you only issue statements
that can be serialized by the database
each time you touch the data.
For example:
Person.transaction do
person = Person.find(1, :lock => true)
person.visits += 1
person.save!
end
is OK in itself, but if you have another Person(1) instance elsewhere
which is fetched before this transaction and saved after, the lock
doesn’t matter, visits will be reset to its previous value.
Using serializable transactions give better performance and less room
for errors:
class Person
def self.inc_visits(person_id)
serialized_transaction do
update_all(‘visits = visits + 1’, [ ‘id = ?’, person_id ])
end
end
end
The good thing with this code is that concurrent updates are
supported, the right amount of visits will be stored at the end of all
concurrent Person.inc_visits without explicit locking. You can update
different parts of the same row without the database even having to hand
you explicit locks (and the DB can optimize it).
But the same problem remains, if a separate instance of Person for the
affected row is created and saved at the wrong moments, you are screwed.
My advice would be to separate data that must be updated like that in a
separate model and only use serializable transactions (or locks if you
DB doesn’t support this type of transactions) on each and every access
to this model.
If you don’t, you’ll have to remember that all updates to a model where
you had to lock somewhere in your code must use locks too (:lock => true
will then show up everywhere in your code…) or serializable
transactions…
Some more information on your schema/model might help us give you some
more pointers on the simplest way to accomplish what you are after.
My case is like this:
Different users have accounts. Each account has a balance.
There’s a table for transactions. A transaction has an amount,
from_account_id and to_account_id.
A separate table holds account_lines. For each transactions, there’s
an account line per account, showing the balance after the
transaction.
A final table is for invoices. Invoice has_many transactions. When
an invoice is paid, all transactions are executed, balance updates and
account lines are created.
Some transactions are created by user requests and some, by the
payment gateway notifying of payment clearance.
There are some accounts that will hardly update concurrently (because
they belong to users), while other accounts (like system account that
hold fees) will update very frequently.
That’s it, more or less.
As I understand it, a separate process for executing transactions
would guarantee completely that things are always coherent. Overall,
money movements in the system should be a small part of all DB queries
(maybe 1%).
If you use serialized transactions you also have to be prepared to
deal with conflicts and redo transactions.
You should and probably can find a way to design your model so that
you only need to serialize access to a single table at the most. I
really can’t think of a well designed schema off the top of my head
that requires rows in different tables to all be inserted together in
isolation. SELECT FOR UPDATE is another nice tool, at least in
postgresql.
And you also need to know how your database works. Postgresql, and
Oracle also from what I’ve heard, use multi version concurrency
control as opposed to explicit locking, which can change how you
structure your queries and transactions depending on what database you
use.
Some more information on your schema/model might help us give you some
more pointers on the simplest way to accomplish what you are after.
from_account_id and to_account_id.
they belong to users), while other accounts (like system account that
hold fees) will update very frequently.
That’s it, more or less.
As I understand it, a separate process for executing transactions
would guarantee completely that things are always coherent. Overall,
money movements in the system should be a small part of all DB queries
(maybe 1%).
What do you think?
If it’s just the accounts table that has the balance column, then
using select for update might be enough. Really depends on the
details of how you are updating everything. Here is a good place to
start:
Oops. It’s only like that in test mode. Development and production
show the transactions.
Is there anything I can set so that I can have it in test too?
Transactional fixtures sometimes interfere with transactions (they wrap
each test in its own transaction, which can actually make your own
transactions noops).
In your test classes you may want to try:
self.use_transactional_fixtures = false
You can put it in Test::Unit::TestCase in test/test_helper.rb if you
want it for all your test classes.
Thanks Lionel, Nick, Roderick and everyone who helped here.
I’ve started with an idea to make it overly complicated and now, I
think I got it right, so let me summarize:
I use transaction blocks to make sure that any failure happens on
both the withdrawals and deposits (so money doesn’t vanish into thin
air).
To make sure that concurrent operations from different processes
don’t spoil data, I use Optimistic Locking (Agile web development with
Rails, page 389).
The block of DB operations is wrapped in begin-rescue. In the rescue
section, I reload all written-to objects. This is a MUST if you want
to retry the operation, because, otherwise the optimistic locking will
just keep failing (to refresh the lock_version from the updated
database).
The application retries failed operations.
This is how it all looks like in my implementation now:
while !ok && (attempts < MAX)
begin
Account.transaction
move_money()
setup_stuff()
ok = true
end
rescue
reload_accounts()
attempt += 1
end