Prepared statements (PostgreSQL)

joel · September 21, 2006, 4:30pm

Bashed this together after reading Ola B.'s post about his annoyances
about the MySQL centric approach of ActiveRecord at

I decided to find out if it was really that bad, and I suppose it might
be. I spent some time trying to get the type information about columns
from ActiveRecord into the driver, but I couldn’t come up with a good
way to do it.

So I started over and wrote this query cache thingie into the
PostgreSQL driver. It’s a giant hack, ugly and buggy, but it passes all
ActiveRecord tests except one (where it results in two queries being
made, one for the prepare and another for the execute, instead of a
single query doing select) and I managed to try it out on a site of
mine in development mode.

So that’s a nice speed up, and while that might be a poor example
beause of the high number of repeated selects in it, I think that
prepared statements should be supported in some way by ActiveRecord, as
soon as possible.

It’s a giant hack and I’m actually surprised it works at all, I’m not
proud of this code but it told me what I wanted to know, whether or not
prepared statements really are that cool or not (turns out they are).

Index:
activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb

activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb
(revision 5147)
+++
activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb
(working copy)
@@ -1,4 +1,5 @@
require ‘active_record/connection_adapters/abstract_adapter’
+require ‘csv’

module ActiveRecord
class Base
@@ -54,6 +55,8 @@

   def initialize(connection, logger, config = {})
     super(connection, logger)

```
   @statements = {}
```

   @prepared_statements = []
   @config = config
   @async = config[:allow_concurrency]
   configure_connection

@@ -141,8 +144,184 @@
result.first if result
end

```
 def try_prepared_select(sql)
```

   # split the query into the statement part (select a from x

left join y…) and the conditions part (where b = c and x.d = y.d)

```
   return nil unless sql =~ /WHERE/
```

   statement, conditions = sql.split("WHERE", 2)

```
   return nil if conditions == nil
```

   # try to find a "primary table", for simple queries where

column names aren’t fully qualified

   #  select name from developers where name = 'Joel'

   #                   ^-- this is the primary table

```
   # vs
```

   #  select developers.name from developers where

developers.name = ‘Joel’

   # if we find columns without a table in the conditions, we

prefix them with the primary table

   # we don't always get a primary table, but when we don't, all

columns are fully qualified

   sql =~ /FROM ([\w\s,]+?)\s+(WHERE|LEFT|INNER|OUTER)/

```
   primary_table = $1.strip if $1
```

   # table aliases cause all kinds of trouble, since we need to

scan the real table to find column types

   # select name from developers d where d.name = 'Joel'

   #                             ^       ^-- we need to translate

this into “developers.name” when trying to find out the type

   #                              \-- d is an alias for

developers

```
   alias_table = {}
```
```
   tables = []
```

   if primary_table && (primary_table =~ /\w+\s+\w+/) # aliases

     primary_table.split(/,/).each_with_index { |table_alias, i|

       pri, ali = table_alias.strip.split(/\s+/)

```
       pri.strip!
```
```
       ali.strip!
```
```
       primary_table = pri if i == 0
```
```
       tables << pri
```
```
       alias_table[ali] = pri
```
```
     }
```
```
   end
```

   tables << primary_table if primary_table

```
   tables = tables.sort.uniq
```

   # next we need to normalize the conditions, since the

following two statements are functionally the same and should use the
same stored procedure

   # select name from developers d where d.name = 'Joel'

   # select name from developers d where d.name = 'Andreas'

   # so, we should normalize the conditions into "name = $1" for

the comparison and for the prepared statement

```
   variables_values = []
```

conditions.split(/\s+(AND|OR|LIMIT|ORDER|OFFSET|GROUP|HAVING|FOR)\s+/i).each
{ |part|

```
     variable, operand, value =
```

part.strip.split(/\s*(=|>|>=|<>|<|<=|!=)\s*/, 3)

```
     if variable and value
```

       r1 = Regexp.new(/^[\(\)\s]*/)   # remove leading

parantheses and spaces …

       r2 = Regexp.new(/[\(\)\s]*$/)   # ... and remove trailing

parantheses and spaes

```
       r3 = Regexp.new(/^\w+\(/)
```
```
       variable.sub!(r1, '')
```
```
       variable.sub!(r2, '')
```
```
       variable.sub!(r3, '')
```
```
       value.sub!(r1, '')
```
```
       value.sub!(r2, '')
```
```
       if value =~ /^('|"|\d)/
```

         variables_values << [ variable, operand, value ]

```
       end
```
```
     end
```
```
   }
```

   return nil if variables_values.length == 0

   # make the replacements and store the columns involved

```
   variables = [ ]
```
```
   aliased_columns = [ ]
```
```
   normalized_conditions = conditions
```

   variables_values.each_with_index { |v,i|

```
     name, operand, value = v
```
```
     variables << value
```
```
     aliased_columns << name
```

     normalized_conditions.sub!(/\s*#{operand}\s*#{value}/,

“#{operand} $#{i+1}”)

     normalized_conditions.sub!(/\s*#{operand}\s*#{value}/,

“#{operand} $#{i+1}”)

```
   }
```

   # keep track of how many times we've seen this statement

   if @statements.has_key?(statement+normalized_conditions)

     @statements[statement+normalized_conditions] += 1

```
   else
```

     @statements[statement+normalized_conditions] = 0

```
   end
```

   # now that we have the variables and normalized conditions, we

can look for a prepared statement

```
   if
```

@prepared_statements.include?(statement+normalized_conditions)

```
     # if we have one, go for it
```
```
     number =
```

@prepared_statements.index(statement+normalized_conditions)

     return execute("EXECUTE prep#{number}(#{variables.join(',

')})")

   # if we have some number of hits on this statement, we deem it

worthy of being turned into a prepared statement

   elsif @statements.fetch(statement+normalized_conditions, 0) >

3

```
     columns = []
```

     # here we want to match d.name and turn it into

developers.name, and save it for the next step

```
     aliased_columns.each { |c|
```
```
       table, column = c.split(/\./)
```

       table = alias_table[table] if alias_table[table]

```
       if column
```

         columns << "#{table}.#{column.strip.gsub(/\"/, "")}"

```
       else
```

         columns << "#{primary_table}.#{table.strip.gsub(/\"/,

“”)}"

```
       end
```
```
     }
```

     # now we need to find the type of all the columns in the

conditions. we should find out that developers.name is a char
varying(255)

     # the first step is to store an array of columns for every

table in condition_columns[table] = [ column1, column2, … ]

```
     columns
```
```
     condition_columns = { }
```
```
     columns.uniq.each { |c|
```
```
       t, c = c.split("\.")
```
```
       t.strip! if t
```
```
       c.strip! if c
```
```
       if condition_columns[t]
```
```
         condition_columns[t] << c
```
```
       else
```
```
         condition_columns[t] = [ c ]
```
```
       end
```
```
     }
```

     # then we go through every table and find the types for

(hopefully) all the columns we care about

```
     column_types = { }
```
```
     condition_columns.each_key { |t|
```

       column_definitions(t).collect do |name, type, default,

notnull, typmod|

         if condition_columns[t].include?(name)

           column_types["#{t}.#{name}"] = type

```
         end
```
```
       end
```
```
     }
```

     # and now that we finally have all types of the columns

where interested in…

```
     argument_types = []
```
```
     columns.each { |c|
```

       return nil unless column_types.has_key?(c) # actually

sometimes we don’t, and in that case we give up

       argument_types << column_types[c]

```
     }
```

     # ... but when we do, we can prepare a statement

```
     execute("PREPARE
```

prep#{@prepared_statements.length}(#{argument_types.join(', ')}) AS
#{statement} WHERE #{normalized_conditions}")

     @prepared_statements << (statement+normalized_conditions)

```
     number =
```

@prepared_statements.index(statement+normalized_conditions)

     return execute("EXECUTE prep#{number}(#{variables.join(',

')})")

```
   end
```
```
   return nil
```
```
 end
```
```
 def try_prepared_insert(sql)
```
```
   statement = sql.split("VALUES")[0]
```
```
   values = sql.split("VALUES")[1]
```

   statement =~ /INSERT INTO (\w+) \((.+)\)/i

```
   table = $1
```
```
   columns = $2
```

   if @statements.fetch(statement, 0) > 3

     value_array = CSV::parse_line(values).map { |e| e.to_s.strip

}

     column_array = CSV::parse_line(columns.gsub(/\"/, "")).map {

|e| e.to_s.strip }

```
     argument_types = {}
```
```
     if tables.include?(table)
```

       column_definitions(table).collect do |name, type, default,

notnull, typmod|

         if column_array.include?(name)

```
           argument_types[name] = type
```
```
           column_array.delete(name)
```
```
         end
```
```
       end
```

       if column_array.length == 0  # we matched all columns

```
         type_array = []
```

         column_array = CSV::parse_line(columns.gsub(/\"/,

“”)).map { |e| e.to_s.strip }

         column_array.each { |c| type_array << argument_types[c]

}

         unless @prepared_statements.include?(statement)

           dollar_arguments = ((1 .. column_array.length).map {

|i| ‘$’ + “#{i}” }).join(", ")

           execute("PREPARE prep#{@prepared_statements.length}

(#{type_array.join(', ‘)}) AS INSERT INTO
#{table}(#{column_array.join(’, ')}) VALUES(#{dollar_arguments})")

           @prepared_statements << statement

```
         end
```

         number = @prepared_statements.index(statement)

         return execute("EXECUTE prep#{number}#{values}")

```
       end
```
```
     end
```
```
   else
```

     if @statements.has_key?(statement)

```
       @statements[statement] += 1
```
```
     else
```
```
       @statements[statement] = 0
```
```
     end
```
```
   end
```
```
   return nil
```
```
 end
```

 def insert(sql, name = nil, pk = nil, id_value = nil,

sequence_name = nil) #:nodoc:

```
   execute(sql, name)
```

   execute(sql, name) if try_prepared_insert(sql) == nil
   table = sql.split(" ", 4)[2]
   id_value || last_insert_id(table, sequence_name ||

default_sequence_name(table, pk))
end
@@ -405,7 +584,8 @@
end

     def select(sql, name = nil)

```
     res = execute(sql, name)
```

```
     res = try_prepared_select(sql)
```

     res = execute(sql, name) if res == nil
     results = res.result
     rows = []
     if results.length > 0

joel · September 21, 2006, 6:42pm

Well, you sure have got an awesome proof of concept! Why don’t you take
it up with the maintainer of the PostgreSQL adapter or the core team? It
would be great to see more of this.

Regards,

Roderick

joel · September 22, 2006, 1:51am

Roderick van Domburg wrote:

Well, you sure have got an awesome proof of concept! Why don’t you take
it up with the maintainer of the PostgreSQL adapter or the core team? It
would be great to see more of this.

Thank you! However…

If this was for the PostgreSQL adapter and only the PostgreSQL adapter,
that might be worth doing, but it’s more than that. Any database
supporting prepared statements could do this.
Let’s turn it into a wrapper around all the calls to select and insert
for all the databases that support it. That would be great, right?
Actually, not really, for several reasons.

even if it was rewritten to do proper parsing of the SQL, it would
still cause unnecessary overhead (possibly even more so if it did
proper parsing)
there’s no control over when the prepared statements are being made.
Here I use a minimum of 3 times, then the statement is prepared. In a
production setting that would mean that almost every statement is
prepared, which isn’t optimal. Higher limits would still get just about
every query prepared, which isn’t always optimal.
if this was solid it would still only be treating the symptoms and
hiding the real problem

What real problem? ActiveRecord is, uhm… well, let’s just say it’s
not well structured for this kind of thing (I’d love to be proven wrong
on this) and needs to be refactored, and rewritten in large parts, if
this was to be done cleanly. SQL statements are built in several parts,
piece for piece, and no information is saved about what actually goes
into it.
That’s why we need to parse the SQL string to find things we should
already know. ActiveRecord made that SQL string, so ActiveRecord should
know what variables, tables and values are in there, and we should be
able to use that to do useful things like prepare/execute before it
hits the adapter.
Adapters are stupid, and they should be. I think ActiveRecord is stupid
because it’s throwing all this information away, and it shouldn’t be.

Ideally ActiveRecord would use a slightly more abstract format for
queries that then gets turned into SQL in one place… and while I
don’t know a lot about ActiveResource, standing on the outside and
looking in, it seems to me like it should be an ActiveRecord with
different storage.
A method like ‘delete’ shouldn’t build a SQL string, it should send a
delete message to the object’s storage object, whether it’s REST or SQL
or something else (file based storage could have some uses I bet).
Or implementing caching for all “ActiveObjects”, which should also be
done above the SQL/REST stuff, there’s no way to do that either from
what I can tell.

I bet I’m going to have to regret that last part.
Anyways, the main focus is and should be that ActiveRecord isn’t very
well structured for implementing features like prepare/execute, and
until it is, I can’t see how it can be done well. And if we can’t do it
well, we might as well not do it at all (unless it’s for testing or
research, like this was).
If I’m wrong and there is a good way to implement it, it should be made
a top priority for 2.0 or 2.1, because this is more important that
providing even more helpers.
People like to call Ruby on Rails slow, and the defenders like to say
that most of the time is spent in the database anyway, yet here we
could get a huge database increase and yet it’s still not done. Someone
posted a couple of weeks ago about how he had been called up by the DBA
and was told to shut down his Rails application because it caused too
much load on the database, and 80% of the time was spent preparing
statements.
I don’t think there’s any good excuse for this, and it’s time to do
something about it, but this here isn’t the way to go.

In my humble opinion.
Probably came across as pompous bastard there, which wasn’t my
intention. I’d love to help working on a real solution to this problem
but I don’t know where to start (and even if I did, I couldn’t do it
alone, and if I did it wouldn’t be accepted anyways). I’m trying to
show how useful prepared statements can be to hopefully bring some more
attention to this.

joel · September 22, 2006, 2:55am

On 9/21/06, Joel W. [email protected] wrote:

queries that then gets turned into SQL in one place… and while I
don’t know a lot about ActiveResource, standing on the outside and
looking in, it seems to me like it should be an ActiveRecord with
different storage.
A method like ‘delete’ shouldn’t build a SQL string, it should send a
delete message to the object’s storage object, whether it’s REST or SQL
or something else (file based storage could have some uses I bet).
Or implementing caching for all “ActiveObjects”, which should also be
done above the SQL/REST stuff, there’s no way to do that either from
what I can tell.

Joel, good input all around. Not to punt on this, but I fear that
refactoring will turn into a class-for-every-noun design spree/quagmire.
AR
works really well for its common case; we don’t need to solve the
non-problems that some ORMs relish.

However, it’s getting creaky in spots and could use some love. A good
example is how [email protected] reworked eager associations:
http://dev.rubyonrails.org/ticket/3913

jeremy

joel · October 2, 2006, 8:39pm

You’ve got some good ideas here. I agree that we need to get prepared
statements in AR.

I just spent about 20 minutes working on this proof of concept. It’s
really ugly, but it shows a different approach to using PS.

Here’s how it works. First you extend AR::Base with the module below.
Then your class could look like:
class Person < ActiveRecord::Base
prepare :find_by_name
end

When the Person class is loaded, it will build a prepared statement
and execute it against the database. Then it defines find_by_name to
execute the PS instead of building a standard SQL query.

The main advantage that I can see is that the prepared statement is
explicit. Instead of trying to figure out if it should prepare a
statement after a certain number of queries, you can just say define a
particular method to use a prepared statement.

My implementation blows (it’s at the bottom of this post), but as I
said it was just a quick proof of concept. I’m not super well-versed
in AR, so I don’t know what exactly I can use. Ideally I’d like to
somehow get a string that represents the SQL that AR would generate on
its own, then use that to create the prepared statement.

I think it could be implemented to handle inserts easily.

There are some issues that need to be handled, like making sure not to
prepare an exising PS, or making sure one exists before executing it.
But those are of course things that can come later if this seems like
a good approach. Also I think some of this probably belongs in the
connector, though I’m not entirely sure what parts.

What do you think of this approach? Is it worth pursuing?

Pat

module PreparedPostgres
def prepare(method_id)
if match = /find_(by)_([_a-zA-Z]\w*)/.match(method_id.to_s)
finder = determine_finder(match)
attribute_names = extract_attribute_names_from_match(match)
super unless all_attributes_exists?(attribute_names)

  cols = columns.select { |c| attribute_names.include?(c.name) }
  col_types = cols.collect { |c| ps_type c.type }
  conds = []
  cols.each { |c| conds << "#{c.name}=$#{conds.size + 1}" }
  prep = "PREPARE #{method_id.to_s} (#{col_types.join(", ")}) AS " +
         "SELECT * FROM #{table_name} WHERE #{conds.join(" AND ")};"
  connection.execute prep;

  (class << self; self; end).instance_eval do
    define_method(method_id) do |*args|
      qargs = args.collect { |a| quote a }
      statement = "EXECUTE #{method_id.to_s}(#{qargs.join(", ")});"
      find_by_sql(statement).first;
    end
  end
end

end

private
def ps_type(type)
case type
when :integer then :int
when :string then :text
end
end
end

joel · September 22, 2006, 4:31am

I’m with you … but it’s about #10 on my list right now.

joel · October 4, 2006, 4:23am

I am glad to see that in the last week we have gone from “we should do
something” to “here’s a proof of concept.”

Here’s a couple considerations and implications:

Only the database connector knows (or should be responsible for
knowing)
whether or not the underlying RDBMS supports prepared statements.
ActiveRecord needs to be the one responsible for determining whether
or not
a prepared statement exists. Otherwise it will generate a lot of
unneeded
sql strings that are subsequently ignored in favor of the prepared
statements.

The implication is that the availability of prepared statements must be
exposed as a standardized (boolean) property of the RDBMS connector, and
then
the ActiveRecord can ignore (current behavior) or use it later as it
matures.

If a mechanism for generating a “footprint” for a prepared statements
can be
determined, that is more efficient than generating an SQL string, then
we are
in very good shape.

The “find_by_id” idiom is a good trivial case, reflecting 90% of SQL in
my own
rails apps. However, it is incomplete.

A trivial footprint consists of the SQL from the “from” clause to the
end of
the statement, prior to sanitize/populate parameters. The code for this
pre-exists, and as a String it provides a good key into a hash map.
Once
the “from” and “where” clause are computed, the actual prepared
statement
could be obtained by something along these lines

# This is entirely pseudocode.  Any resemblance to Ruby is
# randomly intentional but entirely unreliable

footprint=... # existing "from ... where ..." code
if connector.supports_prepared_statement then
  stmt=stmt_map[footprint]
  # populate parameters ...
  ...
  # execute query
  result = stmt.execute
end
else
    # existing code
end

Unless specifically noted (like in find_by_sql), prepared statement
queries
would return all columns from all addressed tables. find_by_sql should
use
the full sql as a footprint.

Prepared statements (PostgreSQL)

Index: activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb

Index:
activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb