Working with Large Data Collection and Performance

addis_a · July 4, 2014, 6:53pm

Hi,

I am currently reimplementing how my project’s data collection. The
project
itself uses rails, and we collect a lot of data from APIs and then store
them to the database. We are dealing with ten of thousands sets of data
per
couple minutes being processed. The challenges here is that there are a
many associations with unique constraints on each model to represent
each
set of data.

To by pass the unique constraints, I have been using
find_or_initialize_by
the unique indices and then update but this result in many database
calls.
I have attempted to use upsert gem, but there are problems with hstore
and
array data and it also does not support associations and callbacks,
which I
end up dealing with lot of hash and array to rebuild the associations. I
have also attempted to use manually written psql query to insert HABTM
associations in a single database commit for a batch of associations.

All these attempts from above seems to result in code that is not
maintainable, although some approaches might result in less database
calls.
I want gather your thoughts from the Rails community about how to
maximize
performance using the Rails way to deal with large amount of data
upsert.

Thanks,

icottee · July 4, 2014, 8:16pm

On Thu, Jul 3, 2014 at 7:18 AM, Ian C. [email protected] wrote:

I am currently reimplementing how my project’s data collection.

Why? What problem are you trying to solve?

–
Hassan S. ------------------------ [email protected]

twitter: @hassan