How to cap table with active record

We want to use SQL/active record for logging but limit the table size
so that older messages disappear off of the table Some process has to
run periodically to do that.

Suppose I want to keep my table size to not much bigger than
50,000,000 rows or so. What is the easiest, most efficient way to
delete any extra rows that there may be ? This is an SQL/active record
problem I have not encountered before.

I would know in theory how to get all the records as an array by
calling MyLog.find(:all) and ordering it by date and then iterating
from where I want to chop off to the end and deleting each one, but
that may not be the most efficient or acceptable way to do that.

Jedrin wrote in post #1058661:

We want to use SQL/active record for logging but limit the table size
so that older messages disappear off of the table Some process has to
run periodically to do that.

Suppose I want to keep my table size to not much bigger than
50,000,000 rows or so. What is the easiest, most efficient way to
delete any extra rows that there may be ? This is an SQL/active record
problem I have not encountered before.

I don’t know the current state of using these with Rails and
ActiveRecord, but it sound to me like what you need is a Round-Robin
Database Storage Engine:

I would know in theory how to get all the records as an array by
calling MyLog.find(:all) and ordering it by date and then iterating
from where I want to chop off to the end and deleting each one, but
that may not be the most efficient or acceptable way to do that.

Using MyLog.find(:all) would be a really bad idea. Selecting all from a
database table that has the potential of containing more than a few
hundred records is almost never a good idea.

Besides that’s not the right way to count records in a table any. That’s
why we have SQL count. Rails support count through aggregates:
http://api.rubyonrails.org/classes/ActiveRecord/Calculations.html#method-i-count

You would also, certainly, not want to sort the results in memory as you
suggest. You would instead as the database engine to do that for you:

Here’s an approach that may work. Keep in mind this was put together
quickly so you’ll need to test it out for yourself.

MAX_TABLE_SIZE = 50000000
row_count = MyLog.count
delete_limit = (row_count > MAX_TABLE_SIZE) ? row_count - MAX_TABLE_SIZE
: 0
logs_to_delete = MyLog.order(‘created_at’).limit(delete_limit)
logs_to_delete.each do |log|
MyLog.delete(log)
end

Some example SQL the above would generate:
SELECT COUNT() FROM “my_logs” #returns 50000150
SELECT “my_logs”.
FROM “my_logs” ORDER BY created_at LIMIT 150

DELETE FROM “my_logs” WHERE “my_logs”.“id” = 1
DELETE FROM “my_logs” WHERE “my_logs”.“id” = 2
DELETE FROM “my_logs” WHERE “my_logs”.“id” = 3

DELETE FROM “my_logs” WHERE “my_logs”.“id” = 150

Make a background job to run that daily. Probably still not the most
efficient way to do it, but shouldn’t be too bad if run often enough.
I’m sure there’s a way to do this without calling separate delete
statements for each object, but I’ll leave that as an exercise for the
reader.

I sort of knew count() is what I might use even though I said
find(:all), I haven’t used count() in a long time and I forget the
syntax and all or how to set it up (it used to have some special set
up in the old rails), but it’s still sort of the same problem.

The round robin sounds good, but I doubt it exists in the DB that we
use (MS SQL Server) … So we have to do it our self. Someone has said
you can write a process that runs in SQL server itself. Not something
I am familiar with, but I guess one of the guys I work with can help
me out …

On 27 April 2012 16:23, Jedrin [email protected] wrote:

calling MyLog.find(:all) and ordering it by date and then iterating
from where I want to chop off to the end and deleting each one, but
that may not be the most efficient or acceptable way to do that.

If you have a reasonably consistent number of new records each day or
week or whatever then you could delete old ones by date rather than
count, so keeping six months worth of records for example. This would
be much easier as you could just find the records where created_at is
before a given date and delete them. Put an index on created_at
obviously.

Colin

If your strategy is to just keep the newest MAXNUMBER records you might
consider just adding an after_create method to your model.

Jedrin wrote in post #1058661:

We want to use SQL/active record for logging but limit the table size
so that older messages disappear off of the table Some process has to
run periodically to do that.

Here’s something I discovered after posting my reply, but certainly
worth considering.

Do you logging with MongoDB instead of a SQL database. MongoDB has
built-in support for high performance logging scenarios:

See the following for more: