Large database table issue

Hi guys

I have a problem with a fairly standard Rails / mySQL query. I must be
doing something very wrong. I was hoping somebody could shed some
light on this.

First model: article, has_many comments
Second model: comment, belongs_to article

The number of articles and comments can be big (one to two millions).
Every once in a while, I “clean” the DB and remove older articles and
comments:

Articles.delete_all([“a_date < ?”, xxxx]
Comments.delete_all(“comments.article_id not in (select id from
articles)”)

The first query is executed with no problems. But the second one hangs
and completely loads my database server.

Notes:

  • I am not using :dependent => :delete since I was thinking doing a
    destroy instead of delete on Articles would be too slow (my
    understanding is when you “destroy”, Rails loads the object first)
  • I have killed all other queries on the DB, just to make sure there
    are no locks somewhere.

Thanks!
Pierre

On 14 September 2011 12:16, PierreW [email protected] wrote:

The number of articles and comments can be big (one to two millions).
Every once in a while, I “clean” the DB and remove older articles and
comments:

Articles.delete_all([“a_date < ?”, xxxx]
Comments.delete_all(“comments.article_id not in (select id from
articles)”)

The first query is executed with no problems. But the second one hangs
and completely loads my database server.

Delete the comments first?
Comments.delete_all(“comments.article_id in (select id from articles
where a_date < ?)”, xxxx)

On Sep 14, 12:16pm, PierreW [email protected] wrote:

destroy instead of delete on Articles would be too slow (my
understanding is when you “destroy”, Rails loads the object first)

  • I have killed all other queries on the DB, just to make sure there
    are no locks somewhere.

Do you have an index on article_id? Id yo don’t this will be super
slow. A subselect that this is probably going to be slow - i’d try a
left join instead, i.e. something like

Comment.joins(“left join articles on articles.id =
article_id”).where(“articles.id is null”).delete_all

An index on article id is still advisable

Fred

Articles.delete_all([“a_date < ?”, xxxx]
Comments.delete_all(“comments.article_id not in (select id from
articles)”)

The first query is executed with no problems. But the second one hangs
and completely loads my database server.

Innodb or myisam table types? If it’s the latter, the table in question
is going to get locked while the delete happens. If you’re removing a
lot of rows, regardless of how you index it, it’s going to be slow. In
addition to the other advice, you might clean the table up more often
(so you’re removing fewer rows).

Another option if you’re willing to stick with mysql is to use their
??? (can’t remember the name) feature. It lets you create what appears
to be a normal table, but it actually splits it up into multiple tables
based on one of the columns – in your case… the date. So when you
remove the old entries you’re not touching the “latest table”. At least
if I’m remembering things right.

-philip

On Wed, Sep 14, 2011 at 11:03 AM, Philip H. [email protected]
wrote:

Another option if you’re willing to stick with mysql is to use their ??? (can’t
remember the name) feature.
It lets you create what appears to be a normal table, but it actually splits it
up into multiple tables based
on one of the columns – in your case… the date. So when you remove the old
entries you’re not touching
the “latest table”. At least if I’m remembering things right.

-philip

I think you’re talking about partitioning:

http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

-J

Thanks a lot guys for your help.

I apologize I did not mention it, but I had the index on article_id
unfortunately.

Fred: I tried your LEFT JOIN method and indeed, when I EXPLAIN it the
second select is of type eq_ref (vs unique_subquery with my method) so
it should be faster. Yet, when I try it on a smaller test DB, the
difference seems only marginal I am afraid. It still goes through all
the comments rows it seems.

I just made a simple test directly in mySQL on my instance: “select
count(*) from comments” and even that one does not want to return! I
tried to restart mysql: same. I did not know this was even possible.

I realize it is now more a mySQL issue than a Rails one, but just in
case: have you guys ever been confronted to something like that?

I will try partitioning as well.

Thanks
Pierre

i don’t think partitioning is going to help you if count(*) didn’t even
work.

some basic questions for you on the comments table. is there a primary
key?
is there a separately indexed article_id? i’m sure you have them, but
better
to rule out the easy stuff first.

i was going to suggest using the exists() function rather than an outer
join, but again it doesn’t sound like your main issue. (you should learn
to
use the exists() function if you don’t know it. i’ve found it to work
better.)

for the order of operations you are trying to do, i agree with an
earlier
suggestion to delete comments first, then articles. also don’t use “not
in”. instead delete comments.* from comments where comments.article_id

@article"

(sorry if this shows up like 3 times, it is my first time posting to
this
group and it wasn’t working.)

On 14 Sep 2011, at 19:35, PierreW [email protected] wrote:

I just made a simple test directly in mySQL on my instance: “select
count(*) from comments” and even that one does not want to return! I
tried to restart mysql: same. I did not know this was even possible.

Count(*) isn’t magically fast in innodb - mysql has to do an index scan
(versus being able to just read some table metadata for some db types).

You might try using show innodb status to see what is going on (there’s
another thing you can do to see more detailed info about locks being
held, but I don’t remember off the top of my head.

Another option might be to delete the comments before you delete the
articles (so inner joining comments & articles with whatever condition
on articles you use to determine what to delete).

Fred

Guys,

Thanks a lot for all your help and input. Just in case it can help
someone else one day: it turned out that my table was corrupted. It
was a real surprise to me since I was still able to “use it” (via
joins) and mySQL would not “complain”.

I found out doing a “check table”: it did not return an error but it
was crashing mysqld every time.

Thanks
Pierre