Company.count taking over 3 seconds for DB of over 500K

Why would:
Company.count

Take 3.5s consistently to process… if I pass conditions that limit
the size to about 5K the time it takes to count is milliseconds. What
is up with that jerry?


John K.
[email protected]

http://www.kopanas.com

http://www.soen.info

Hey John,

Company.count runs a “select count(*) from company”(or some db
specific variation) - which by nature must scan every row of your
database.

This is something you may be able to tune from your databases
perspective - try a google search on “select count” and your database
name.

Cheers,
Jodi

count a result set instead of the model.

comps = Company.find(:all)
comps.count

#best guess your executing multiple queries as part of count (2n+1)

On 24-Nov-06, at 4:05 PM, Keynan P. wrote:

count a result set instead of the model.

comps = Company.find(:all)
comps.count

#best guess your executing multiple queries as part of count (2n+1)

Keyan, using that method, I think he’d be instantiating 1/2 million
objects.

You’d be better off finding a better way to have the db count for you.

J

Just to chime in- I have a table with 412K entries, and the simple
“select count(*)…” takes 0.01 seconds. So it seems clear that
something else is going on here.

Hey,

how long does it take to run this query directly against the DB?

select count(*) as count_all from companies

I don’t have a table with 500K records handy but I do have one with
50K and Model.count is instantaneous for me.

Here’s a guess: is your Company model using STI? If it is, and you
haven’t indexed your inheritance column then things might get slow
because the query is actually something like:

select count(*) as count_all from your_sti_table where type = ‘Company’

HTH,
Trevor

Keynan P. wrote:

count a result set instead of the model.

comps = Company.find(:all)
comps.count

#best guess your executing multiple queries as part of count (2n+1)

Responding to whoever said that would create an object for every record
in
the table:

ActiveRecord does everything else automagically, so can’t it find() a
cursor
and then decline to populate this until we start calling .each?

I know that’s probably a little bit too much magic. Nothing else around
here
seems to value streaming over buffering. But cursors might find their
record
counts almost as cheaply as a database COUNT(*) would too, right?


Phlip
Redirecting... ← NOT a blog!!!

The speed at which SELECT COUNT() depends somewhat on your database for
example if you are running MySQL and your table is a MyISAM one then the
row count is part of the table meta data, and so count(
) is pretty much
instant. Other databases may also have that property.
On an innoDB table Mysql has to run through an index (or failing that,
the whole table) to find the row count, the point at which things become
slow probably corresponds to when mysql doesn’t have enough memory to
read the whole thing in one go or something like that.

Fred

Very interesting… I am using innoDB. Maybe that is why I am having
problems. Plus I probably don’t have enough RAM on my dev box.
hmmm… thanks :slight_smile:

On 11/25/06, Fred [email protected] wrote:

Fred


Posted via http://www.ruby-forum.com/.


John K.
[email protected]

http://www.kopanas.com

http://www.soen.info

Adam G. wrote:

try ‘select count(id) from…’ instead of ‘select count(*)…’. I’m
not sure about MySQL, but other databases (sybase and db2) will select
all columns from the row to do the count, which means a full table
scan. If you only scan the id column, it will be faster. If it is
indexed, it will be a lot faster.

(I don’t know if someone asked this, but) if you have a billionty-one
extra
tables all joined in with rampant has_many directives, would a lowly
.count() accidentally traverse them all?


Phlip
Redirecting... ← NOT a blog!!!

Hi John,
try ‘select count(id) from…’ instead of ‘select count(*)…’. I’m
not sure about MySQL, but other databases (sybase and db2) will select
all columns from the row to do the count, which means a full table
scan. If you only scan the id column, it will be faster. If it is
indexed, it will be a lot faster.

good luck!
Adam