How to count a group of data

okkezSS · November 9, 2010, 1:26pm

Hello everyone,

Let’s say I have the following set of data in my database table:

Cars table

[Brand] [Type]
Mercedes CL63
BMW M3
Mercedes CL63
Audi RS4
BMW M3
BMW M3

I would like to know how many cartypes I have of a particular brand.
So in this case I would expect an output something like:

2x Mercedes CL63
3x BMW M3
1x Audi RS4

How can I generate such an output? I have really no clue how to start.
Suggestions, tips that can pull me in the right direction are welcome.

Thanks a bunch.

John

jngo · November 9, 2010, 2:37pm

John N. wrote in post #960298:

Hello everyone,

Let’s say I have the following set of data in my database table:

Cars table

[Brand] [Type]
Mercedes CL63
BMW M3
Mercedes CL63
Audi RS4
BMW M3
BMW M3

I would like to know how many cartypes I have of a particular brand.
So in this case I would expect an output something like:

2x Mercedes CL63
3x BMW M3
1x Audi RS4

How can I generate such an output? I have really no clue how to start.

You would do this by using SQL aggregate functions (in this case,
COUNT(*) ). If you don’t know about those, then your friendly local SQL
reference is your first stop.

Once you know about aggregate functions, look at the
ActiveRecord::Calculations module (at least in Rails 2; it might have
got moved in Rails 3), which abstracts these functions and provides a
nice AR interface.

Suggestions, tips that can pull me in the right direction are welcome.

My usual tip: learn SQL. You may not have to write much of it when
using Rails, but you need to know how SQL databases basically work. AR
exists to automate and abstract, not to prevent you from having to learn
SQL.

Thanks a bunch.

John

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

jngo · November 9, 2010, 3:27pm

Hi,

Why the data is redundant, ids there any relation behind that.

Mercedes CL63 why in two times is there any factor to distinguish in
between than type and Brand?

On Tue, Nov 9, 2010 at 5:56 PM, John N. [email protected] wrote:

Mercedes CL63
1x Audi RS4

–
Thanks & Regards,
MuraliDharaRao.T
+91-9642234646

jngo · November 9, 2010, 3:48pm

Murali Tirupati wrote in post #960320:

Hi,

Why the data is redundant, ids there any relation behind that.

Mercedes CL63 why in two times is there any factor to distinguish in
between than type and Brand?

Good catch. I was too focused on the proximate question.

More advice for the OP, then: your DB contains repetition. That’s
virtually always a bad thing. The process of removing this repetition
is called normalization. You should always (except for a few special
cases) normalize your schema to Third Normal Form (3NF) at a bare
minimum, and usually to 5NF.

Yes, I know I’m throwing around jargon. Read about DB normalization
(the Wikipedia articles are excellent) and apply it religiously.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

jngo · November 9, 2010, 7:46pm

Michael P. wrote in post #960375:
[…]

On 9 November 2010 13:37, Marnen Laibow-Koser [email protected]
wrote:

How can I generate such an output? I have really no clue how to start.

You would do this by using SQL aggregate functions (in this case,
COUNT(*) ). If you don’t know about those, then your friendly local SQL
reference is your first stop.

In addition to using the SQL functions, Ruby and Rails both include
functions for grouping (and sort, if you wish) enumerable objects.
Have a look at:
Enumerable
module Enumerable - RDoc Documentation

The choice as to whether to do the same thing in code or in the DB is
down to you, and your evaluation of the efficiency/benefit one may
give you over the other (personal preference comes into it a lot too!

However, here’s a rule of thumb: never use Ruby for things like a
simple count of DB records. Let the DB do the things that it’s good at.
That includes mass operations on large amounts of data. Complex
calculations and application programming are generally better done in
the app layer, of course.

Doing a count of DB records on the Ruby side? I wouldn’t dignify that
with the term “personal preference”.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

jngo · November 9, 2010, 7:28pm

On 9 November 2010 14:48, Marnen Laibow-Koser [email protected]
wrote:

Mercedes CL63 why in two times is there any factor to distinguish in
between than type and Brand?

Good catch. I was too focused on the proximate question.

I’d assume (and happy to be proved wrong if the OP wants to confirm)
that the “cars” table has other fields that identify different
instances of physical cars (say, for a vehicle dealership tracking its
sales), and there are fields other than just make and model in the
row (such as colour, engine size, fuel, etc)

On 9 November 2010 13:37, Marnen Laibow-Koser [email protected]
wrote:

How can I generate such an output? I have really no clue how to start.

You would do this by using SQL aggregate functions (in this case,
COUNT(*) ). If you don’t know about those, then your friendly local SQL
reference is your first stop.

In addition to using the SQL functions, Ruby and Rails both include
functions for grouping (and sort, if you wish) enumerable objects.
Have a look at:
http://api.rubyonrails.org/classes/Enumerable.html#method-i-group_by
http://ruby-doc.org/core/classes/Enumerable.html

The choice as to whether to do the same thing in code or in the DB is
down to you, and your evaluation of the efficiency/benefit one may
give you over the other (personal preference comes into it a lot too!

jngo · November 9, 2010, 8:14pm

Michael P. wrote in post #960385:

On 9 November 2010 18:46, Marnen Laibow-Koser [email protected]
wrote:

However, here’s a rule of thumb: never use Ruby for things like a
simple count of DB records. Let the DB do the things that it’s good at.

Doing a count of DB records on the Ruby side? I wouldn’t dignify that
with the term “personal preference”.

As a thought-experiment, how would you handle several counts in a row
on a rapidly changing table for some form of end-user report?
If I do one SQL query for a count by one set of criteria, and then do
another query for a similar query, but just grouped differently, I may
get different numbers because the underlying records in the table have
changed.

That’s what transactions are for – guaranteeing atomicity and
consistent state over multiple queries. This is a perfect use case for
them:

User.transaction do
mikes = User.count :conditions => {:name => ‘michael’}
active = User.count :conditions => {:active => true}
end

If I do the same thing in code; I do one DB query, and then use the
Ruby/Rails methods to manipulate that data how I need, and I can
always be assured that I am at least always operating on the set
same records for that request.

But you’re doing far too big a query. You’re asking for (let’s say)
thousands of records, which Ruby will then parse in memory.

If you do the count on the DB side, then the DB will be able to use
whatever indices and performance hacks are available to it to do the
counting more quickly – and yes, more maintainably – then your Ruby
application could.

This is the kind of situation that I have a “personal preference” to
do in code, not in the DB (although the DB may be better at it,
there’s a chance my numbers won’t be the same which will confuse
users).

Then your preference is, I think, poorly founded, as it does not take
the facts of the situation fully into account, and does not take full
advantage of the power of the DB.

Do you disagree with my reasoning? If so, why?

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

jngo · November 9, 2010, 8:06pm

On 9 November 2010 18:46, Marnen Laibow-Koser [email protected]
wrote:

However, here’s a rule of thumb: never use Ruby for things like a
simple count of DB records. Let the DB do the things that it’s good at.

Doing a count of DB records on the Ruby side? I wouldn’t dignify that
with the term “personal preference”.

As a thought-experiment, how would you handle several counts in a row
on a rapidly changing table for some form of end-user report?
If I do one SQL query for a count by one set of criteria, and then do
another query for a similar query, but just grouped differently, I may
get different numbers because the underlying records in the table have
changed.

If I do the same thing in code; I do one DB query, and then use the
Ruby/Rails methods to manipulate that data how I need, and I can
always be assured that I am at least always operating on the set
same records for that request.

This is the kind of situation that I have a “personal preference” to
do in code, not in the DB (although the DB may be better at it,
there’s a chance my numbers won’t be the same which will confuse
users).

jngo · November 9, 2010, 8:50pm

On 9 November 2010 19:14, Marnen Laibow-Koser [email protected]
wrote:

Do you disagree with my reasoning? If so, why?

Not at all… quite the opposite; I’ve just slapped myself on the
forehead. I can’t believe it’s never occurred to me to use
transactions for data-retrieval queries (use them for updating and
inserting all the time)

Regards,

jngo · November 10, 2010, 2:49pm

+1

jngo · November 10, 2010, 3:10pm

On 10 November 2010 13:48, pepe [email protected] wrote:

+1

Does that mean that you agree that Michael should slap himself on the
forehead?
Or possibly if the reader is Michael.

Colin

jngo · November 15, 2010, 5:15pm

+1

Does that mean that you agree that Michael should slap himself on the
forehead?
Or possibly if the reader is Michael.

Colin

First option. It also increases the count of slapped foreheads.

I haven’t tested what Marnen suggested yet, though. I’m sure it will
work but I just have to see for myself.

How to count a group of data

[Brand] [Type] Mercedes CL63 BMW M3 Mercedes CL63 Audi RS4 BMW M3 BMW M3

[Brand] [Type] Mercedes CL63 BMW M3 Mercedes CL63 Audi RS4 BMW M3 BMW M3

Best,

Best,

Best,

Best,

[Brand] [Type]
Mercedes CL63
BMW M3
Mercedes CL63
Audi RS4
BMW M3
BMW M3

[Brand] [Type]
Mercedes CL63
BMW M3
Mercedes CL63
Audi RS4
BMW M3
BMW M3