Speeding up ActiveRecord Querying on very large joins?

Marnen Laibow-Koser wrote:

Poppycock. You might not be willing to alter your production DB this
season, but you can (and probably should) set up a testing environment
to see what you can do.

I mentioned in paragraph two that I will be setting up a test
environment for it. But, again, I’m not going to alter my production DB
at this time. The season is just a couple weeks away and it wouldn’t
allow for enough time to implement, test, and then complete. I won’t
risk any issues.

Why?

I talked about table corruption previously. Matt simply said well you
can restore from backup. If I had 50,000 subscribers all expecting a
service and a table this large that would basically represent all of the
data provided in the service failed, there will be a critical outage.
Table corruption has a larger chance of occurring on larger tables.

I like to ensure whatever I provide works 100%. My current situation is
a lot more improved (query times are really good right now) and with
caching, will only get better.

What I’m doing right now is not wrong, it can just be more efficient
later on (as Matt pointed out).

At this point in time, we can agree to disagree on this topic.

Thanks.

Alpha B. wrote:
[…]

So, when I say we can agree to disagree - I’m referring to the “timing”
of the optimizations. I believe they should occur “after this season”,
with proper testing, and implementation.

I don’t think there’s necessarily a disagreement there – I agree that
you should be careful about what you do to your production DB –
although I wouldn’t be surprised if you have a DB meltdown in mid-season
since your current schema is apparently so weird.

(And BTW, you might want to review the usage of quotation marks. :slight_smile: )

I’m very systematic about how
I work on things.

Good.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

In order to make the DB change work I would have to perform the
following:

I would have to change how my parser pulls data and uploads data
I would have to change the querying for all MVCs in my project
… that’s just for starters …

That is a lot of work. I look at things from a priority perspective.

Right now, all of the priority mechanisms in my app work 100% and have
very good query times. Nothing is broken. Everything tests out 100%
from within internal tests. Everything tests out 100% from external
tests.

I finished up all my mailers, observers and fully implemented my payment
gateway. However, I’m working on another segment for the payment
gateway and perhaps another type of payment gateway plan (paymo or boku
for mobile payments) to cover those that don’t have credit cards or
paypal accounts.

I still have a lot of extra things that need to get done before I’m
happy with my app (clubhouse pages, reporting cycles for upcoming bowl
games, etc.).

Because the DB situation is fine now (again, realizing that it can be
more optimized down the road), there’s no reason for me to reinvent and
add more potential problems to my app when I still have other things to
work on.

So, when I say we can agree to disagree - I’m referring to the “timing”
of the optimizations. I believe they should occur “after this season”,
with proper testing, and implementation. I’m very systematic about how
I work on things.

Alpha B. wrote:
[…]

The largest join I have now is 7 tables on indexed results. Those
queries are only taking 3ms apiece.

OK, that sounds way better.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

One more vote for PostgreSQL (indexes for sorting (Index Scan Backward
instead sorting all data in mysql), better usage of indexes, composite
indexes and etc).

Also one vote for using EXPLAIN

Also one vote for using pastie.org %)

And to avoid a lot of joins i use de-normalization.

Hi Marnen,

Right now, the max time I’m pulling using the query routines that now
have proper indexes and are broken down into manageable pieces runs
anywhere from:

18ms - 24ms

I don’t think that would create a DB meltdown considering these numbers
are without caching enabled. I would imagine that once I have caching
turned on, the amount of clicks the database receives would be greatly
minimized.

What I’ve decided to do (after talking to Slicehost staff) is to prepare
for a larger slice upgrade (which would take approx. 30 minutes or less)
in the wee hours of the morning if I get hit by too much load. I’m also
seriously considering running a cache routine on all matchups each week
so that they are cached (this was a pretty good suggestion given).

I think given this scenario I should be good to go for now. However,
I’m already looking at how to implement the larger table and test it
against a double year data load (just for simulation purposes).

My problem before was that my tables were not properly indexed and I had
two queries that used 13 table joins each. It was just not efficient.
The largest join I have now is 7 tables on indexed results. Those
queries are only taking 3ms apiece.

Well I found a way to reach a middle ground.

I created a cache table that houses the exact fields for the data I pull
using VM. So, the methods that pull the data were moved to a task and I
can pre-populate the cache table with the data for that week.

So, my cache table will have 120 rows and 67 fields of data for each
week. When people use the VM matchups, they are only querying that
table so one table, no joins etc., and it’s a lot smaller than 500+
fields.

Well, good news and I want to thank Matt first and then Marnen second
for their input into this.

My old setup, even with the low query times was doing a full render of
my VM matchup page in 540-620ms from start to completion.

With the new single table setup using the cache table, my full time
render of the page from start to completion is 220ms-240ms which is
about a 300% improvement.

I just wanted to say thanks to you all for getting me to look at the
code and you were both right - it wasn’t hard to implement. Because the
test worked out so well, I will more than likely start to work on
implementing the very large table for my stats views.

Thanks again!