Forum: Ruby on Rails find_in_batches + query_cache = bloat

C7760eac6d5e1cb375aea615faac9c01?d=identicon&s=25 Woody Peterson (woahdae)
on 2010-12-31 12:00
(Received via mailing list)
We've been using find_in_batches to reduce memory usage, and recently
noticed one of our more intensive background processes had a huge
memory footprint (600mb+) and was getting killed by our memory
monitor. We were unable to reproduce this in development, and after
investigation, the culprit is query_cache. Wrapping the task in
ActiveRecord::Base#uncached kept the task stable (~200mb), and it's
not hard to imagine why. Looping over thousands of items while eager
loading many more likely grows the cache to huge amounts, which seems
counter to the use case for find_in_batches.

So first of all, this is an FYI. Beyond that, all the ways in which we
use find_in_batches would be aversely affected by the query_cache;
sure, it *might* make a query faster, but it *definitely* will grow in
memory, as you are expected to use it across thousands of records.
Given that find_in_batches' use case is to reduce memory when
searching across thousands of records, should it not be default
behavior to disable query cache for find_in_batches operations?
81b61875e41eaa58887543635d556fca?d=identicon&s=25 Frederick Cheung (Guest)
on 2010-12-31 12:08
(Received via mailing list)
On Dec 30, 11:00pm, Woody Peterson <woody.peter...@gmail.com> wrote:
> We've been using find_in_batches to reduce memory usage, and recently
> noticed one of our more intensive background processes had a huge
> memory footprint (600mb+) and was getting killed by our memory
> monitor. We were unable to reproduce this in development, and after
> investigation, the culprit is query_cache. Wrapping the task in
> ActiveRecord::Base#uncached kept the task stable (~200mb), and it's
> not hard to imagine why. Looping over thousands of items while eager
> loading many more likely grows the cache to huge amounts, which seems
> counter to the use case for find_in_batches.
>
Were you manually turning on query cache in your background processes?
(I was trying to think why I hadn't been bitten by this before and
remembered that the query cache is turned on via an around filter by
default, so doesn't affect scripts run by hand, daemon processes etc)

> So first of all, this is an FYI. Beyond that, all the ways in which we
> use find_in_batches would be aversely affected by the query_cache;
> sure, it *might* make a query faster, but it *definitely* will grow in
> memory, as you are expected to use it across thousands of records.
> Given that find_in_batches' use case is to reduce memory when
> searching across thousands of records, should it not be default
> behavior to disable query cache for find_in_batches operations?

Seems sensible. I'm not sure how tight the scope of your disabling
should be, ie should query caching be forced off for the contents of
the block? The block might also be doing lots of stuff that is
inherently pointless to cache, but equally it might not.

Fred
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.