Size of array returned by named_scope with :group/:select distinct

Here’s the example:
$ ruby script/generate scaffold thing name:string
class Thing < ActiveRecord::Base
named_scope :distinct, :group => :name
end

:select => “DISTINCT things.*” would work in the same way.

Start up script/console:

Thing.new(:name => “a”).save!
=> true

Thing.new(:name => “a”).save!
=> true

Thing.all
=> [#<Thing id: 1, name: “a”, created_at: “2009-10-16 02:34:28”,
updated_at: “2009-10-16 02:34:28”>, #<Thing id: 2, name: “a”,
created_at: “2009-10-16 02:34:28”, updated_at: “2009-10-16 02:34:28”>]

Thing.distinct
=> [#<Thing id: 2, name: “a”, created_at: “2009-10-16 02:34:28”,
updated_at: “2009-10-16 02:34:28”>]

Thing.distinct.size
=> 2

Thing.distinct.all.size
=> 1

As you can see, Thing.distinct.size is mis-reporting the size of the
result. I can’t figure out what’s causing this, or why adding .all
fixes it. Running the query straight into the database predictably
returns just one record:

sqlite> SELECT * FROM “things” GROUP BY name;
id = 2
name = a
created_at = 2009-10-16 02:34:28
updated_at = 2009-10-16 02:34:28

Am I missing something obvious?

I just stumbled upon the same problem… From what I can tell,
ActiveRecord doesn’t add the GROUP BY clause when it’s doing a COUNT.
Take a look in your log. You should see two nearly identical queries,
but one will be a SELECT with the GROUP BY and one will be a COUNT
without the GROUP BY.

Something like this:

SELECT things.* FROM things GROUP BY name

and then

SELECT COUNT(*) AS count_all FROM things; # NO GROUP BY!

The thing is, GROUP BY with an aggregate function will return the
number of items in each group. So if you have 3 Things, “a”, “a”, and
“b”, adding a GROUP BY at the end of the COUNT query will give you
something like this:

±----------+
| count_all |
±----------+
| 2 | # <= a’s
| 1 | # <= b
±----------+

What you really want to know is the number of rows in that resultset,
since that’s the number of groups in your query. I’m not sure if
there’s a good way of doing that without running a subquery. Any
ideas?

Also, the reason why Thing.distinct.all.size works is because .all
converts the association proxy returned by the named scope into an
array. Rather than running an SQL query to determine the size of the
dataset, after calling .all it just returns the number of items in the
actual array of data. Since the GROUP is added properly to the actual
data fetch query, .all returns the proper number because that array
has the proper data in it.

This seems like a bug to me… Any AR hackers out there know why this
might be intended behavior, or should I work on a patch?

  • Brandon

That makes sense - I was thinking of what named_scope returns as an
Array, when really it’s ActiveRecord::NamedScope::Scope. I see now in
the source that it just delegates Array methods to find(:all), which
is why you can do things like Thing.distinct.each and why
Thing.distinct.to_s returns an Array’s string notation. I’m not sure
why it has to define a size method, that seems like something it could
just hand off to find(:all) too.

I’ll have to think about that subquery.

On Oct 16, 10:08 am, Brandon D. [email protected]

I found that calling named_scope.all.size resolves this issue. It’s not
perfect, but it does the job.