[PATCH] Support for DB Clusters/Replication in ActiveRecord

Hello,

I am still dressing this patch up for prime time but I would like to
get
some feedback from the community on the approach. This currently does
pass
a full “rake”

I have done some extensive work on active record to allow for databases
to
be defined in terms of “connection pools” rather than simply one
database.
Most of the work was completed in connection_specification.rb and is
done at
this level to be compatible with any backend you choose. Also this
patch
maintains full compatibiltity with the current paradigm so no changes
are
necessary to current applications/documentation.

Defining a database pool is very easy and follows a common sense
convention. I will assume you are running with RAILS_ENV=development
for
this example. You first define your connections as you always have in
database.yml . Your are then able to able to define
development_read_pool: db1, db2, db3
development_write_pool: db4, db5, db6 (In general, the name is
RAILS_ENV_write_pool so you can test your clusters in development and
production with no config changes)

where each is a set of connection names that you have already defined.
ActiveRecord::Base.connection will then return connections from the
appropriate pool using round-robin when more than one connection is
available to it. This is handy if you hvae a high traffic website for
example and you want to load balance over several slave servers for
reading
while writing to the one master server consistently. The syntax for
.connection is as follow

ExampleModel.connection # default “compatibility behavior”
always
returns a write connection.
ExampleModel.connection(:read) # Return a connection from the read
pool
ExampleModel.connection (:write) # Return a connection from the write
pool

I have also changed a few of the functions in base.rb to utilize the
correction pool (for example, find_by_sql calls connection(:read)) This
again makes the patch seamless to end applications while allowing them
to
use the new functionality.

A patch aganist the lastest CVS is attached. Some rough notes on
implementation are also include below - there not 100% complete but give
a
good idea of what was changed.

Thanks,
Stephen Blackstone


Implementation:

  1. Added two arrays to act as pools for connections related to this
    connection.

@@read_connection_pool = {}
@@write_connection_pool = {}

Each is later defined as an array such that

@@read_connection_pool[name] represenets all of the read connections

available to the current
class. This allows us to stay fully backward compatiable with
the
old methodology where name
is @active_connection_name

  1. Define 2 variables to track index of last used connection. This is
    used
    when doing round-robin

    @@last_read_connection = 0
    @@last_write_connection = 0

  2. Define a function for appending connections to pools
    (append_spec_to_connection_pools spec).
    If the config has an attribute read_only == true then it is only
    entered
    into the read_connection_pool
    and vice versa for the write_connection_pool. @@defined_connections
    also is defined at the same time.

    @@write_connection_pool[spec.object_id] always equals
    @@defined_connections[spec.object_id]

  3. Define a function establish_connection_pools which looks for two
    variables to be
    set in the configuration:

    RAILS_ENV_read_connection_pool
    RAILS_ENV_write_conenction_pool

    Normally in Rails, this would setup in the YAML file. Each variable
    is
    a comma delimited
    list of connections to use, one for read the other for write.

  4. Define a function clear_connection_pool which clears the connection
    pools.

  5. Modify establish_connection
    a. When passed nil, call establish_connection_pools(RAILS_ENV)
    b. When passed a ConnectionSpecification, clear all connection
    pools as
    well as the
    active_connection_name

  6. Move code from establish connection to ConnectionSpecification
    constructor so an object can be made
    out of any spec thats passed to it. (Avoiding breakage of DRY
    principle)

  7. Remove settings @@defined_connections[name] in establish_connection.
    This is now done by
    calling

  8. Define two functions, round robin read and round robin write which
    returns AbstractAdapters from the various pools.

  9. Modify retreive connection so that it takes an id of an object that
    should be in @@defined_connections
    Remove it’s dependency on connection= as this will break things.

  10. Modify connection= to call establish_connection on
    ConnectionSpecification.

  11. Moidfy self.remove_connection to call clear_connection_pool

  12. Modify active_connection_name to check if a pool exists instead of
    defined_connections

On Jul 24, 2006, at 9:13 AM, Stephen wrote:

Defining a database pool is very easy and follows a common sense
convention. I will assume you are running with
RAILS_ENV=development for this example. You first define your
connections as you always have in database.yml . Your are then
able to able to define
development_read_pool: db1, db2, db3
development_write_pool: db4, db5, db6 (In general, the name is
RAILS_ENV_write_pool so you can test your clusters in development
and production with no config changes)

This is very cool, Stephen.

Couldn’t this be read_pool, and write_pool, within the current block
of YAML?


– Tom M.

Here is an example. I considered the merits of allow an attribute like
pools: p1,p2,p3 but I think this makes more sense
from a configuration standpoint.

It would look something like this;

development:
adapter: mysql
database: project_development
username: someuser
password: happyland
host: 10.0.0.1
port: 3306

development2:
adapter: mysql
database: project_development
username: root
password: somethingelse
host: 10.0.0.2
port: 3306

development_write_pool: development
development_read_pool: development,development2

Interesting. I’ll take a look this evening. Please let me know if
you find out any more.

Tom

Hi,

I’ve tried creating a new ticket for this issue, but I’m just getting
500 server errors.

There appears to be a problem with the ActiveRecord SqlServer adapter
that results in illegal SQL being generated…

ActiveRecord::StatementInvalid in ArticlesController#index
DBI::DatabaseError: 37000 (170) [Microsoft][ODBC SQL Server Driver][SQL
Server]Line 1: Incorrect syntax near ''.: SELECT * FROM (SELECT TOP 1
*
FROM (SELECT TOP 1 id FROM contents WHERE (published = 1 AND
contents.published_at < ‘2006-07-25 11:59:31’ AND blog_id = 1) AND (
(contents.[type] = ‘Article’ ) ) ORDER BY contents.published_at DESC )
AS
tmp1 ORDER BY contents.[published_at] ASC) AS tmp2 ORDER BY
contents.[published_at] DESC

This is due to messed up column quoting. I’ve tracked it down to the
‘add_limit_offset!’ method in sqlserver_adapter.rb, but I can’t decipher
enough of what is going on to suggest the fix just yet.

Full stack trace also included below.

Regards,
Stu

RAILS_ROOT: ./script/…/config/…
Application Trace | Framework Trace | Full Trace

#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/connection_ada
pters/abstract_adapter.rb:120:in
`log’

#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/connection_ada
pters/sqlserver_adapter.rb:485:in
`select’

#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/connection_ada
pters/sqlserver_adapter.rb:233:in
`select_all’

#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/associations.r
b:1166:in
`select_limited_ids_list’

#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/associations.r
b:1158:in
`add_limited_ids_condition!’

#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/associations.r
b:1148:in
`construct_finder_sql_with_included_associations’

#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/associations.r
b:1111:in
`select_all_rows’

#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/associations.r
b:975:in
`find_with_associations’

#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/associations.r
b:973:in
`find_with_associations’

#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/base.rb:923:in
`find_every’

#{RAILS_ROOT}/vendor/rails/activerecord/lib/active_record/base.rb:381:in
find' #{RAILS_ROOT}/app/controllers/articles_controller.rb:27:in index’

#{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/base.rb:910:
in
`perform_action_without_filters’

#{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/filters.rb:3
68:in
`perform_action_without_benchmark’

#{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/benchmarking
.rb:69:in
perform_action_without_rescue' c:/ruby/lib/ruby/1.8/benchmark.rb:293:in measure’

#{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/benchmarking
.rb:69:in
`perform_action_without_rescue’

#{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/rescue.rb:82
:in
`perform_action’

#{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/base.rb:381:
in
`process_without_filters’

#{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/filters.rb:3
77:in
`process_without_session_management_support’

#{RAILS_ROOT}/vendor/rails/actionpack/lib/action_controller/session_mana
gement.rb:117:in
process' #{RAILS_ROOT}/vendor/rails/railties/lib/dispatcher.rb:38:in dispatch’
#{RAILS_ROOT}/vendor/rails/railties/lib/webrick_server.rb:115:in
handle_dispatch' #{RAILS_ROOT}/vendor/rails/railties/lib/webrick_server.rb:81:in service’
c:/ruby/lib/ruby/1.8/webrick/httpserver.rb:104:in service' c:/ruby/lib/ruby/1.8/webrick/httpserver.rb:65:in run’
c:/ruby/lib/ruby/1.8/webrick/server.rb:173:in start_thread' c:/ruby/lib/ruby/1.8/webrick/server.rb:162:in start_thread’
c:/ruby/lib/ruby/1.8/webrick/server.rb:95:in start' c:/ruby/lib/ruby/1.8/webrick/server.rb:92:in start’
c:/ruby/lib/ruby/1.8/webrick/server.rb:23:in start' c:/ruby/lib/ruby/1.8/webrick/server.rb:82:in start’
#{RAILS_ROOT}/vendor/rails/railties/lib/webrick_server.rb:67:in
dispatch' #{RAILS_ROOT}/vendor/rails/railties/lib/commands/servers/webrick.rb:59 c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in require’

#{RAILS_ROOT}/vendor/rails/activesupport/lib/active_support/dependencies
.rb:147:in
require' #{RAILS_ROOT}/vendor/rails/railties/lib/commands/server.rb:30 c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in require’
script/server:3
Request

Parameters: None

Show session dump

Response
Headers: {“cookie”=>[], “Cache-Control”=>“no-cache”}


Ticket URL: http://www.typosphere.org/trac/ticket/1040
typo http://typosphere.org/trac/
web2.0 weblogging engine

This Email may contain confidential and privileged information and is intended for the use of the addressee(s) only. If you are not the intended recipient please notify the sender and delete the Email from your system. It should not be transmitted to any other person without the consent of the sender. Additional important notifications regarding Email transmissions from and to members of Baring Asset Management can be accessed at http://www.barings.com/email/index.hcst

 

I think I’ve managed to fix the column quoting issue by making three
small changes to the add_limit_offset! method in the
sqlserver_adapter.rb file:

  def add_limit_offset!(sql, options)

    if options[:limit] and options[:offset]
      total_rows = @connection.select_all("SELECT count(*) as

TotalRows from (#{sql.gsub(/\bSELECT(\s+DISTINCT)?\b/i, “SELECT#{$1} TOP
1000000000”)}) tally")[0][:TotalRows].to_i
if (options[:limit] + options[:offset]) >= total_rows
options[:limit] = (total_rows - options[:offset] >= 0) ?
(total_rows - options[:offset]) : 0
end
sql.sub!(/^\s*SELECT(\s+DISTINCT)?/i, "SELECT * FROM (SELECT
TOP #{options[:limit]} * FROM (SELECT#{$1} TOP #{options[:limit] +
options[:offset]} “)
sql << “) AS tmp1”
if options[:order]
options[:order] = options[:order].split(’,’).map do |field|
parts = field.split(” ")
tc = parts[0]
if sql =~ /.[/ and tc =~ /./ # if column quoting used
in query

            ####-- ORIGINAL : tc.gsub!(/\./, '\\.\\[')
            ####-- ORIGINAL : tc << '\\]'

            tc.gsub!(/\./, '.[')		#NEW: -- do not escape

column quotes
tc << ‘]’ #NEW: – do not
escape column quotes

          end
          if sql =~ /#{tc} AS (t\d_r\d\d?)/
            parts[0] = $1

          ####-- ORIGINAL : elsif parts[0] =~ /\w+\.(\w+)/
          elsif parts[0] =~ /\w+\.([\[\]\w]+)/

#NEW: ensure that column quotes are picked up when matching to strip the
tablename

            parts[0] = $1
          end
          parts.join(' ')
        end.join(', ')
        sql << " ORDER BY

#{change_order_direction(options[:order])}) AS tmp2 ORDER BY
#{options[:order]}"
else
sql << " ) AS tmp2"
end
elsif sql !~ /^\sSELECT (@@|COUNT()/i
sql.sub!(/^\s
SELECT(\s+DISTINCT)?/i) do
“SELECT#{$1} TOP #{options[:limit]}”
end unless options[:limit].nil?
end
end

This has fixed the column quoting issue, but there are still some
problems with the SQL that ActiveRecord is generating. The original
issue of selecting by offset now works…

Article.find( :all,
  	:offset => 0,
      :limit => 1,
  	:order => "contents.published_at DESC",
      :conditions => ['published = ? AND contents.published_at <

? AND blog_id = ?’,true, Time.now, 1])

[#<Article:0x32acb34 @attributes={...  (snip)

And selecting with the :include tag works…

Article.find( :all,
      :order => "contents.published_at DESC",
  	:include => [:categories, :tags, :user, :blog],
      :conditions => ['published = ? AND contents.published_at <

? AND blog_id = ?’,true, Time.now, 1])

[#<Article:0x3848110 @user=#<User:0x38476fc @attributes={...

(snip)

But a find that combines both the :offset and :include will fail due to
bad SQL…

Article.find( :all,
:offset => 0,
:limit => 1,
:order => “contents.published_at DESC”,
:include => [:categories, :tags, :user, :blog],
:conditions => [‘published = ? AND contents.published_at < ? AND
blog_id = ?’,true, Time.now, 1])

ActiveRecord::StatementInvalid: DBI::DatabaseError: S0022 (207)
[Microsoft][ODBC SQL Server Driver][SQL Server]Invalid column name
‘published_at’.: SELECT * FROM (SELECT TOP 1 * FROM (SELECT TOP 1 id
FROM contents WHERE (published = 1 AND contents.published_at <
‘2006-07-26 12:24:01’ AND blog_id = 1) AND ( (contents.[type] =
‘Article’ ) ) ORDER BY contents.published_at DESC ) AS tmp1 ORDER BY
[published_at] ASC) AS tmp2 ORDER BY [published_at] DESC
from
./script/…/config/…/config/…/vendor/rails/activerecord/lib/active_rec
ord/connection_adapters/abstract_adapter.rb:120:in log' from ./script/../config/../config/../vendor/rails/activerecord/lib/active_rec ord/connection_adapters/sqlserver_adapter.rb:521:inselect’
from
./script/…/config/…/config/…/vendor/rails/activerecord/lib/active_rec
ord/connection_adapters/sqlserver_adapter.rb:257:in select_all' from ./script/../config/../config/../vendor/rails/activerecord/lib/active_rec ord/associations.rb:1166:inselect_limited_ids_list’
from
./script/…/config/…/config/…/vendor/rails/activerecord/lib/active_rec
ord/associations.rb:1158:in add_limited_ids_condition!' from ./script/../config/../config/../vendor/rails/activerecord/lib/active_rec ord/associations.rb:1148:inconstruct_finder_sql_with_included_associations’
from
./script/…/config/…/config/…/vendor/rails/activerecord/lib/active_rec
ord/associations.rb:1111:in select_all_rows' from ./script/../config/../config/../vendor/rails/activerecord/lib/active_rec ord/associations.rb:975:infind_with_associations’
from
./script/…/config/…/config/…/vendor/rails/activerecord/lib/active_rec
ord/associations.rb:973:in find_with_associations' from ./script/../config/../config/../vendor/rails/activerecord/lib/active_rec ord/base.rb:923:infind_every’
from
./script/…/config/…/config/…/vendor/rails/activerecord/lib/active_rec
ord/base.rb:381:in `find’
from (irb):104
from :0

This one is caused by just selecting the [id] in the inner select, and
not exposing the the [published_at] column to be sorted on in the outer
selects. Sadly I haven’t been able to get anywhere with debugging this
one. Also, please note that this is being run against the Typo 4
database, and the test are run via the script/console util. The issues
do also occur through the web frontend.

Regards,
Stu

Thanks for the extra info. I hate that limit/offset thing, cause of
all my sorrow.

Tom