Working with Large Data Collection and Performance


I am currently reimplementing how my project’s data collection. The
itself uses rails, and we collect a lot of data from APIs and then store
them to the database. We are dealing with ten of thousands sets of data
couple minutes being processed. The challenges here is that there are a
many associations with unique constraints on each model to represent
set of data.

To by pass the unique constraints, I have been using
the unique indices and then update but this result in many database
I have attempted to use upsert gem, but there are problems with hstore
array data and it also does not support associations and callbacks,
which I
end up dealing with lot of hash and array to rebuild the associations. I
have also attempted to use manually written psql query to insert HABTM
associations in a single database commit for a batch of associations.

All these attempts from above seems to result in code that is not
maintainable, although some approaches might result in less database
I want gather your thoughts from the Rails community about how to
performance using the Rails way to deal with large amount of data


On Thu, Jul 3, 2014 at 7:18 AM, Ian C. [email protected] wrote:

I am currently reimplementing how my project’s data collection.

Why? What problem are you trying to solve?

Hassan S. ------------------------ [email protected]
twitter: @hassan

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs