Forum: Ruby on Rails What's the best way to run (many) complex queries in a rake task ?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
40f8cf64adb3232dd1457587f9a13e17?d=identicon&s=25 Paulo Pereira (Guest)
on 2009-05-27 21:11
(Received via mailing list)
Hi everyone,

I'm currently working on a project which will execute a rake task from
time to time, and said taks involves a complex recommendation
algorithm that can be accomplished with a complex SQL query.

I may use a single one to do everything, or a lot of them, one per
item to be computed. The nature of the query isn't in cause here, it's
just

#################
######## ONE SINGLE QUERY
query "INSERT INTO deviations(pivot_id, deviant_id, value)
SELECT
  ratings.content_id as pivot_id,
  ratings2.content_id as deviant_id,
  AVG(ratings.rate - ratings2.rate) as value
FROM ratings
INNER JOIN ratings as ratings2
  ON ratings.viewer_id = ratings2.viewer_id
  AND ratings.viewer_type = ratings2.viewer_type
  AND ratings.content_id < ratings2.content_id
GROUP BY pivot_id, deviant_id"

ActiveRecord::Base.connection().execute(query)

######## MULTIPLE QUERIES
sql = ActiveRecord::Base.connection()
ids.each do |id|
  query = "INSERT INTO deviations(pivot_id, deviant_id, value)
  SELECT
    #{id.to_i} as pivot_id,
    ratings2.content_id as deviant_id,
    AVG(ratings.rate - ratings2.rate) as value
  FROM ratings
  INNER JOIN ratings as ratings2
    ON ratings.viewer_id = ratings2.viewer_id
    AND ratings.viewer_type = ratings2.viewer_type
    AND ratings.content_id = #{id.to_i}
    AND ratings.content_id < ratings2.content_id
  GROUP BY deviant_id"
  sql.execute(query)
end
#################

Just for curiosity sake, I have an index on [viewer_id, viewer_type,
content_id]

I teste performance, for the single query variant and the multiple
queries variant, and then changed MySQL so I can have more memory and
the temp tables aren't stored in the hard drive.
After changing MySQL settings, the single query variant improved,
(268sec vs 80sec) but the multiple queries variant took almost twice
the time. (190sec vs 390sec) What the...??

I checked process info, and it's just using one of the four available
CPUs.

What can I do to improve this performance?
How to use all the available CPUs?
Is it to create multiple threads inside the rake task, and then use a
connection per task with multiple queries??
Am I doing something fundamentally wrong here?

Thanks in advance for some tips, I'm really out of ideas, and this is
my master thesis that ends in two weeks :(

Cheers,

Paulo Pereira


THE NEW MYSQL OPTIONS:
[mysqld]
sort_buffer_size=2M
read_buffer_size=2M
join_buffer_size=2M

read_rnd_buffer_size=2M

max_heap_table_size=256M
tmp_table_size=256M

myisam_sort_buffer_size=64M

thread_cache=256

query_cache_type=1
query_cache_limit=1M
query_cache_size=32M
81b61875e41eaa58887543635d556fca?d=identicon&s=25 Frederick Cheung (Guest)
on 2009-05-27 21:59
(Received via mailing list)
On May 27, 8:11 pm, Paulo Pereira <paulo.z...@gmail.com> wrote:
> #################
>   AND ratings.content_id < ratings2.content_id
>     ratings2.content_id as deviant_id,
> #################
>
> Just for curiosity sake, I have an index on [viewer_id, viewer_type,
> content_id]
>
> I teste performance, for the single query variant and the multiple
> queries variant, and then changed MySQL so I can have more memory and
> the temp tables aren't stored in the hard drive.
> After changing MySQL settings, the single query variant improved,
> (268sec vs 80sec) but the multiple queries variant took almost twice
> the time. (190sec vs 390sec) What the...??

hard to say without knowing what you changed. Take a look at your
query plans (use explain) to see if your queries could be better. If
your tables are innodb you'll might also want to increase the innodb
buffer pool size
>
> I checked process info, and it's just using one of the four available
> CPUs.
>
> What can I do to improve this performance?
> How to use all the available CPUs?

> Is it to create multiple threads inside the rake task, and then use a
> connection per task with multiple queries??
> Am I doing something fundamentally wrong here?
>
Running multiple queries in parallel won't help if you are IO bound
(also unfortunately running a mysql query with the standard ruby mysql
gem will block all threads in the ruby interpreter).

Fred
29ebf90af6107d2eb39b587c7972639c?d=identicon&s=25 Mukund (Guest)
on 2009-05-28 10:08
(Received via mailing list)
Use the spawn plugin to fork new processes for each query once you
address the database optimizations.
67ecd92df094d1e1e571dd199cdd1aac?d=identicon&s=25 Wisccal Wisccal (wisccal)
on 2009-05-29 12:50
> Just for curiosity sake, I have an index on [viewer_id, viewer_type,
> content_id]

Maybe, you want to include the rate in the index as well? That way,
everything is in the index, and I would presume, MYSQL will not have to
access the table at all.
This topic is locked and can not be replied to.