Find_in_batches + query_cache = bloat

We’ve been using find_in_batches to reduce memory usage, and recently
noticed one of our more intensive background processes had a huge
memory footprint (600mb+) and was getting killed by our memory
monitor. We were unable to reproduce this in development, and after
investigation, the culprit is query_cache. Wrapping the task in
ActiveRecord::Base#uncached kept the task stable (~200mb), and it’s
not hard to imagine why. Looping over thousands of items while eager
loading many more likely grows the cache to huge amounts, which seems
counter to the use case for find_in_batches.

So first of all, this is an FYI. Beyond that, all the ways in which we
use find_in_batches would be aversely affected by the query_cache;
sure, it might make a query faster, but it definitely will grow in
memory, as you are expected to use it across thousands of records.
Given that find_in_batches’ use case is to reduce memory when
searching across thousands of records, should it not be default
behavior to disable query cache for find_in_batches operations?

On Dec 30, 11:00pm, Woody P. [email protected] wrote:

We’ve been using find_in_batches to reduce memory usage, and recently
noticed one of our more intensive background processes had a huge
memory footprint (600mb+) and was getting killed by our memory
monitor. We were unable to reproduce this in development, and after
investigation, the culprit is query_cache. Wrapping the task in
ActiveRecord::Base#uncached kept the task stable (~200mb), and it’s
not hard to imagine why. Looping over thousands of items while eager
loading many more likely grows the cache to huge amounts, which seems
counter to the use case for find_in_batches.

Were you manually turning on query cache in your background processes?
(I was trying to think why I hadn’t been bitten by this before and
remembered that the query cache is turned on via an around filter by
default, so doesn’t affect scripts run by hand, daemon processes etc)

So first of all, this is an FYI. Beyond that, all the ways in which we
use find_in_batches would be aversely affected by the query_cache;
sure, it might make a query faster, but it definitely will grow in
memory, as you are expected to use it across thousands of records.
Given that find_in_batches’ use case is to reduce memory when
searching across thousands of records, should it not be default
behavior to disable query cache for find_in_batches operations?

Seems sensible. I’m not sure how tight the scope of your disabling
should be, ie should query caching be forced off for the contents of
the block? The block might also be doing lots of stuff that is
inherently pointless to cache, but equally it might not.

Fred