Optimising code

bartb · October 5, 2006, 10:49pm

I’m running a website that displays projects and it takes too long for
one
page to render. It shows all projects but that takes up to 1 second, a
bit
too much for a very much visited page. How can I improve the following
code
snippet without starting to cache?

<% for project in @projects %>
<% invalidproject = (controller.authenticate and
((Connection.count([“project_id = ? and kind = ‘p’”, project.id]) == 0)
or
(Connection.count([“project_id = ? and kind = ‘s’”, project.id]) == 0)))
%>

Where Project has_many :connections. Can I use eager loading there? This
count method generates way to much queries now…

Bart

bartb · October 5, 2006, 10:53pm

Have you looked at your logs? Eager loading seems like a good choice,
but it would be great to know whether you’re really executing a lot
of SQL prior to making such an optimization.

bartb · October 5, 2006, 11:01pm

Bart B. wrote:

Where Project has_many :connections. Can I use eager loading there? This
count method generates way to much queries now…

Also, make sure you use pagination. It will make a huge difference.

bartb · October 6, 2006, 4:20pm

Steven R. wrote:

Have you looked at your logs? Eager loading seems like a good choice,
but it would be great to know whether you’re really executing a lot
of SQL prior to making such an optimization.

I put the server in development mode and tried the page:
Completed in 0.52014 (1 reqs/sec) | Rendering: 0.18694 (35%) | DB:
0.24157
(46%)
That’s not fun, because it shows lots of queries like
SQL (0.000585) SELECT count(*) AS count_all FROM assignments WHERE
(project_id = 11)
There are around 100 projects there so I don’t like the idea of
hammering
the database like that…
Can I force a join somewhere to get around those queries?

Thanks for your help,
Bart

bartb · October 6, 2006, 12:10am

On Thursday 05 October 2006 22:40, Bart B. wrote:

I’m running a website that displays projects and it takes too long
for one page to render. It shows all projects but that takes up to 1
second, a bit too much for a very much visited page. How can I
improve the following code snippet without starting to cache?

<% for project in @projects %>
<% invalidproject = (controller.authenticate and
((Connection.count([“project_id = ? and kind = ‘p’”, project.id]) ==
0) or (Connection.count([“project_id = ? and kind = ‘s’”,
project.id]) == 0))) %>

First and foremost you can improve this code by pulling it out of the
view. At least the counting stuff should be pushed back into the
Project model class, the decision whether a project is valid or not
belongs exactly there. Regarding authentication, if it involves more
than a simple, single method call, hide it away in a helper method.

Where Project has_many :connections. Can I use eager loading there?
This count method generates way to much queries now…

When you say that you don’t want to use caching, I take it, you’re
talking of page or fragment caching. Look up counter_cache in the docs
for belongs_to, that’s a way of caching much better suited to your
problem. I would treat the question whether to use eager loading
separately. Eager loading is great if you know that you have to access
all the loaded objects eventually. Otherwise it is costly.

Michael

–
Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/

bartb · October 6, 2006, 4:39pm

Michael S. wrote:

First and foremost you can improve this code by pulling it out of the
view. At least the counting stuff should be pushed back into the
Project model class, the decision whether a project is valid or not
belongs exactly there. Regarding authentication, if it involves more
than a simple, single method call, hide it away in a helper method.

Yes, I’d think you want a Project.invalid_projects() methods which would
produce a single hit to the database rather than a hit for each of the
100 projects.

bartb · October 6, 2006, 4:54pm

Michael S. wrote:

or (Connection.count([“project_id = ? and kind = ‘s’”,
project.id]) == 0))) %>

First and foremost you can improve this code by pulling it out of the
view. At least the counting stuff should be pushed back into the
Project model class, the decision whether a project is valid or not
belongs exactly there. Regarding authentication, if it involves more
than a simple, single method call, hide it away in a helper method.

The counting is put into the Project model class. But is it enough to
just
put this method there:
def invalid_for_students
return ((Connection.count([“project_id = ? and kind = ‘p’”, id])
== 0)
or (Connection.count([“project_id = ? and kind = ‘s’”, id]) == 0))
end
That doesn’t feel very optimized to me, altough I wonder wether it can
be
improved. (See below also.)

The authentication does not need more method calls as it works based on
sessions. I don’t a helper method would work here.

Where Project has_many :connections. Can I use eager loading there?
This count method generates way to much queries now…

When you say that you don’t want to use caching, I take it, you’re
talking of page or fragment caching. Look up counter_cache in the docs
for belongs_to, that’s a way of caching much better suited to your
problem. I would treat the question whether to use eager loading
separately. Eager loading is great if you know that you have to access
all the loaded objects eventually. Otherwise it is costly.

Yes I meant page and fragment caching. counter_cache seems nice but due
to
the nature of the associations it looks difficult. I have to count based
on
a condition, which does not seem possible with counter caching.

As mentioned in the other post, I now think the following series of
queries
is the problem:
SELECT count(*) AS count_all FROM assignments WHERE
(assignments.project_id
= 311)
But I wonder wether that can be improved. The only reference to
assignments
in the view is the method project.assignments.size so that triggers it.

I have some fairly complicated SQL queries to get to the list of
projects,
similar to this:
SELECT p.* FROM projects AS p WHERE p.kind like ‘%somekind%’ and
p.active
= ‘y’ and
(SELECT count() FROM connections WHERE project_id = p.id and kind =
‘p’) >
0 and
(SELECT count() FROM connections WHERE project_id = p.id and kind =
‘s’) >
0 and
(SELECT count() FROM assignments WHERE project_id = p.id and year < ?
) =
0 and
(SELECT count() FROM assignments WHERE project_id = p.id and year >= ?
) <=
p.number_of_students
ORDER BY title ASC

This is a fairly heavy query and it is repeated for another p.kind with
other years too. I did not write it and I do not like it, but it
expresses
the correct application logic. This seems to be the root cause, but how
can
I fix it?
I’m thinking about splitting the connection table into different kinds,
but
then there are more tables and models of course.
Which makes me wonder wether it’s not better splitting out the project
table. It’s based on a (currently) fixed number of kinds, altough I
think
the like p.kind part of the query should not be too heavy.

Thanks for your input, this is really interesting!

Bart

bartb · October 6, 2006, 4:58pm

Jamey C. wrote:

Also, make sure you use pagination. It will make a huge difference.

I know it would, but that’s not possible here. For political reasons I
can’t
use pagination.

Bart

bartb · October 6, 2006, 6:50pm

Hey,

okay, you can use eager loading to reduce the number of hits to the
database (there’s a ‘but’ coming at the end of this:

@projects = Project.find(:all, :conditions => [‘your_conditions = ?’,
:whatever],
:include => [:connections, :assignments])

@projects.each do |project|

get the count of ‘p’ kind connections

project.connections.inject(0) {|m,v| m += 1 if v.kind == ‘p’; m}

get the count of ‘s’ kind connections

project.connections.inject(0) {|m,v| m += 1 if v.kind == ‘p’; m}
end

So… because you’ve loaded all the records into memory as regular
ruby objects you can traverse them and do whatever calculations/tests
you need, without hitting the database again.

BUT (told you it was coming) - this may actually be slower than the
way you’re doing things now. If you have a massive query cache
(mysql) and these queries are executed repeatedly you will probably
find that eager loading with :include is much slower. You also need
to make sure you’ve correctly indexed all your join columns if you use
eager loading - otherwise it’s virtually guaranteed to be slower (even
without a big query cache to skew the results).

You seem to be missing one bit of the puzzle which will reduce your
hundreds-of-queries to a small handful - namely grouping your counts.
Consider this:

@projects = Project.find(:all, :conditions => ‘yadda’) # NO :include
HERE

@p_counts = Connection.count(:group => ‘project_id’,
:conditions => [‘project_id in (?) and kind = ?’,
@projects.collect(&:id), ‘p’])

@s_counts = Connection.count(:group => ‘project_id’,
:conditions => [‘project_id in (?) and kind = ?’,
@projects.collect(&:id), ‘s’])

That’s 3 queries, no eager loading, but everything you need to figure
out the counts for connections of kind p and s for the projects you’ve
loaded.

@p_counts and @s_counts are hashes keyed on project_id, where the
value is the counts. Note that if one of the projects has no p or s
connections it won’t have an entry in the hash. As such you need to
use to_i to coerce any nil lookups to be 0. See below:

@projects.each do |project|

any ‘p’ connections for this project?

@p_counts[project.id].to_i == 0

any ‘s’ connections for this project?

@s_counts[project.id].to_i == 0
end

Something to try out anyhow, and something that you can use to compare
against an eager loading solution. I’d be willing to bet a beer that
grouped counts will be faster than eager loading (at least 2 X as
fast).

Hope this helps,
Trevor

–
Trevor S.
http://somethinglearned.com

bartb · October 6, 2006, 8:55pm

On 10/6/06, Bart B. [email protected] wrote:

0 and
the correct application logic. This seems to be the root cause, but how
can
I fix it?

Don’t know which DB you use, but I would recommend
a) a good book about SQL
b) using your database’s performance analyzer (e.g. EXPLAIN PLAN).

There are only a few rules to speed up database access:

optimize your queries
optimize your queries
…

With Oracle, your query would most likely cause (assuming you have
indexes
only on the primary and foreign keys):

full table scan on projects(1)
- 2x index scan + table access on connections (for each project found
  in
  step (1))
- 2x index scan + table access on assignments (for each project found
  in
  step (1))

So, assuming projects returns 100 rows, this will run 200 times through
each
of the other tables (bad, when these countain a few million rows).

I would at first get rid of the connection scans (easy - a simple inner
join), and then try to speed up the remaining subqueries and the join
(perhaps by using indexes on connections.kind and assignments.year).

A quick shot to optimize this thing in Oracle (I’m sure this is not the
optimal query for - say - postgresql) would result in:

SELECT p.*
FROM connections c1,
connections c2,
projects p
WHERE c1.project_id = p.id
and c1.kind = ‘p’
and c2.project_id = p.id
and c2.kind = ‘s’
and not exists (
SELECT 1
FROM assignments
WHERE project_id = p.id
AND year < ?)
and p.number_of_students >= (
SELECT count(*) FROM assignments WHERE project_id = p.id and year

= ?
)
and p.kind like ‘%somekind%’
and p.active = ‘y’

Have fun!

Thomas

bartb · October 8, 2006, 10:23am

Michael S. wrote:

Judging by the shown code, I’d say the original poster has more pressing
needs than to optimize queries. Squeezing performance out of
ill-conceived code is the wrong approach. The problem appears to be one
of algorithm, not micro-optimization.

I know that the data modelling is not optimal. This is an inherited
project
and I don’t like the speed right now.
I’m strongly thinking of making separate tables for connections with
kind p
and kind s, that would remove a lot of “select from connections where
kind
= ‘s’” queries…

Bart

bartb · October 6, 2006, 10:11pm

On Friday 06 October 2006 20:27, T wenrich wrote:

Don’t know which DB you use, but I would recommend
a) a good book about SQL
b) using your database’s performance analyzer (e.g. EXPLAIN PLAN).

There are only a few rules to speed up database access:

optimize your queries

optimize your queries

…

Judging by the shown code, I’d say the original poster has more pressing
needs than to optimize queries. Squeezing performance out of
ill-conceived code is the wrong approach. The problem appears to be one
of algorithm, not micro-optimization.