What is the most efficient way to split a table into 2 groups?

oldbug · September 28, 2009, 11:33pm

I have the following:

@lot = Lot.find(params[:id])

part_nums = Part.all(:conditions => [“id <> ?”, @lot.part.id])

I guess I should mention that

Lot :belongs_to => :part

I was looking at the log following the execution of these two
statements and I saw something like this:

Lot Load (0.4ms) SELECT * FROM “lots” WHERE (“lots”.“id” = 13)
Part Load (0.3ms) SELECT * FROM “parts” WHERE (“parts”.“id” = 2)
Part Load (0.9ms) SELECT * FROM “parts” WHERE (id <> 2)

It looked a bit silly to me – first I grab a record from the “parts”
table with an ID of 2, then I grab all the records from the parts
table whose ID is not 2.

I played around a little with :include clauses, thinking that I
should, at least, be able to fetch the record from the “lots” table
and the “parts” table simultaneously (with something like a joins
clause and a “lots.part_id = parts.id” WHERE clause), but didn’t get
anywhere with that.

I will end up leaving this the way it is (most likely), especially
since this isn’t the right phase of development to be worrying about
optimization, but I am curious how one might do this most efficiently.

Is it most efficient to grab 1 record where record.id = blah and then
all the (rest of the) records where record.id <> blah?

Is it more efficient to grab all the records at once and write some
ruby code to select the one record from the rest? If so, what would
that code look like? I don’t like this:

everything = Part.all
the_one = everything.select {|x| x.id == 2}
the_rest = everything.reject {|x| x.id != 2}

That’s going to iterate over all of the records. twice! in interpreted
code!

Any thoughts, ideas, or snide remarks?

–wpd

oldbug · September 28, 2009, 11:44pm

On Sep 28, 10:32 pm, Patrick D. [email protected] wrote:

I will end up leaving this the way it is (most likely), especially
since this isn’t the right phase of development to be worrying about
optimization, but I am curious how one might do this most efficiently.

Is it most efficient to grab 1 record where record.id = blah and then
all the (rest of the) records where record.id <> blah?

Is it more efficient to grab all the records at once and write some
ruby code to select the one record from the rest? If so, what would
that code look like? I don’t like this:

Try both, benchmark them - don’t take some random person on the
internet’s word for it!

everything = Part.all
the_one = everything.select {|x| x.id == 2}
the_rest = everything.reject {|x| x.id != 2}

That’s going to iterate over all of the records. twice! in interpreted code!

partition will do it in one

Fred

oldbug · September 29, 2009, 12:07am

Patrick D. wrote:

I have the following:

@lot = Lot.find(params[:id])

part_nums = Part.all(:conditions => [“id <> ?”, @lot.part.id])

I guess I should mention that

Lot :belongs_to => :part

I was looking at the log following the execution of these two
statements and I saw something like this:

Lot Load (0.4ms) SELECT * FROM “lots” WHERE (“lots”.“id” = 13)
Part Load (0.3ms) SELECT * FROM “parts” WHERE (“parts”.“id” = 2)
Part Load (0.9ms) SELECT * FROM “parts” WHERE (id <> 2)

It looked a bit silly to me – first I grab a record from the “parts”
table with an ID of 2, then I grab all the records from the parts
table whose ID is not 2.

I think what’s happening here is this: you’re calling @lot.part.id, so
Rails needs to load @lot.part, which accounts for the extra query. To
fix, use the :joins option on Lot.find, or simply call @lot.part_id.

I played around a little with :include clauses, thinking that I
should, at least, be able to fetch the record from the “lots” table
and the “parts” table simultaneously (with something like a joins
clause and a “lots.part_id = parts.id” WHERE clause), but didn’t get
anywhere with that.

:includes is useless here. :joins will do the trick.

I will end up leaving this the way it is (most likely), especially
since this isn’t the right phase of development to be worrying about
optimization, but I am curious how one might do this most efficiently.

Is it most efficient to grab 1 record where record.id = blah and then
all the (rest of the) records where record.id <> blah?

Is it more efficient to grab all the records at once and write some
ruby code to select the one record from the rest?

That’s what I think I’d do.

If so, what would
that code look like? I don’t like this:

everything = Part.all
the_one = everything.select {|x| x.id == 2}
the_rest = everything.reject {|x| x.id != 2}

That’s going to iterate over all of the records. twice! in interpreted
code!

So just iterate once, and test for the special value as you go. If you
can’t do that, then make 2 DB queries.

Any thoughts, ideas, or snide remarks?

–wpd

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

oldbug · September 29, 2009, 12:18am

Frederick C. wrote:
[…]

That’s going to iterate over all of the records. ï¿½twice! ï¿½in interpreted code!

partition will do it in one

Hey, that’s good to know!

If you need to do the partitioning in the DB, you could use ORDER BY
abs(id - :magic_id). That way the first record will be the magic one.

Fred

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

oldbug · September 29, 2009, 2:37pm

On Mon, Sep 28, 2009 at 6:18 PM, Marnen Laibow-Koser
[email protected] wrote:

Frederick C. wrote:
[…]

That’s going to iterate over all of the records. ï¿½twice! ï¿½in interpreted code!

partition will do it in one

Hey, that’s good to know!

If you need to do the partitioning in the DB, you could use ORDER BY
abs(id - :magic_id). Â That way the first record will be the magic one.

Oh that’s clever! I like that!

Thanks.

–wpd

oldbug · September 29, 2009, 2:37pm

On Mon, Sep 28, 2009 at 5:42 PM, Frederick C.
[email protected] wrote:

On Sep 28, 10:32 pm, Patrick D. [email protected] wrote:

partition will do it in one

Thanks Fred, that’s exactly what I was looking for.

Try both, benchmark them - don’t take some random person on the
internet’s word for it!

Ahh, but you are not “some random person on the internet”. You are an
active member of this community who regularly answers questions for
newbies and unfortunates who can be too lazy to look things up
themselves. As such, I have come to appreciate and respect your
answers, especially the ones that say, “the method for which you are
looking is #partition, but it’s up to you to determine if it makes
things go faster or not”

Thanks again for all the help.

–wpd

oldbug · September 29, 2009, 6:31pm

On Mon, Sep 28, 2009 at 6:07 PM, Marnen Laibow-Koser
[email protected] wrote:

I think what’s happening here is this: you’re calling @lot.part.id, so
Rails needs to load @lot.part, which accounts for the extra query. To
fix, use the :joins option on Lot.find, or simply call @lot.part_id.

I should let this be, but now I’m curious… what does :joins actually
do for me?

(in script/console)

l=Lot.find(13, :joins => :part)

produces (in the logfile)

SELECT “lots”.* FROM “lots” INNER JOIN “parts” ON “parts”.id =
“lots”.part_id WHERE (“lots”.“id” = 13)

and that looks like what I wanted. But when I follow that up with

l.part

I see in the logfile:

SELECT * from “parts” WHERE (“parts”.“id” = 2)

so it seems that the inner join was wasted. Is this a development vs.
production thing? (I know, I’m not supposed to worry about
optimization now, but I figure if I learn how to write generally more
optimal code by default, it will only help).

If I try:

l=Lot.find(13, :include => :part)

I see in the logfile:

SELECT * FROM “lots” WHERE (“lots”.“id” = 13)
SELECT * FROM “parts” WHERE (“parts”.id" = 2)

so neither the :joins nor the :include options seem to help, although
the :include option prevents the subsequent database access when I
finally access l.part

Again, this is not stopping me from anything, and I really should
leave the optimization phase for later, but I am curious about the
intended behavior of :include and :joins, and puzzled by the apparent
behavior.

–wpd