Efficient way to prune a table

I have a table that retains strings to display in a select box. The
select box is supposed to display only the most recently added 10
Strings (there is a field, tstamp, which holds the timestamp of the
most recent change to a row).

So there’s never a reason for the table to hold more than 10 strings.
The issue – my question – is how do I keep only the most recent 10
records in the db, efficiently? This issue must be faced by many, and
many times over. Rather than try to reinvent the wheel here, can
anyone tell me how this sort of thing is typically handled? Thanks,
RVince

We have a similar problem, and the two options we saw were:

  1. cron job that ran either rails or sql command to prune the table
  2. a function that runs when something else happens (like login/logout)

We ended up choosing 2, just because it was easier to implement (didn’t
have to play around with cron jobs etc).

Cheers
Simon

Simon, Is there a way to do it all in one sql statement? May I ask how
you implement this in Ruby/.Rails, say, doing it when something occurs
(like login/logout?) Thanks Rvince

Hi,

SearchResults is the model that we are cleaning up, so we do the
following
on login

SearchResults.delete_all([‘updated_at < ?’, 1.week.ago])

Cheers
Simon

Ah…wonderful. Exacly what I was hoping. I was afraid I would have
to put out a series of straight sql, but once again, ruby/rails has a
better way. Thanks Simon!

Hmmm, I think that just throwing out records older than week wont work
for me – I need to keep, say, 5 or 10 of them. How can I specify
this? THanks -Rvince

RVince wrote:

Hmmm, I think that just throwing out records older than week wont work
for me – I need to keep, say, 5 or 10 of them. How can I specify
this? THanks -Rvince

Look into using a limit clause on your delete operation.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Yes, but I still need to pare down he table size periodically to the N
most recent records

I’m assuming you have an ActiveRecord model that is managing this
table. If so, you could handle the delete in that model, perhaps with
a callback (after_save). (Again, without knowing the details of the
application, this may or may not be a good approach performance wise
depending on how much activity is on the table.)

You could also just schedule a job to periodically clean it up and
then define a default scope on the model that limits it to the ten
most recent records. Assuming you have timestamp columns (updated_at
specifically), this is a trivial sort/limit.

RVince wrote:

Yes, but I still need to pare down he table size periodically to the N
most recent records

I know. That’s why you want to use delete with limit (and order).

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Yes, I can see there is no easy, rails-way to do this. In effect, I
need to

  1. read in the X most recent records (Select * from Chanelnotes order
    by tstamp DESC limit 10)
  2. Delete the entire table (delete * in Channelnotes)
  3. Do an insert on the ten records I read in.

There’s no other way.

Doesn’t “ORDER BY tstamp” need to be “ORDER BY tstamp DESC”? The
latest 10 need to be captured, not the oldest 10.

On Aug 14, 10:19 pm, Marnen Laibow-Koser <rails-mailing-l…@andreas-

Marmen,

Is created_at the standard way of doing that (I alwasy put in certain
fields in my tables – a timestamp field, a crossref field – which is
a string that would be the index of the field in another db whose
structure might be trying to follow mine or vice versa).

Additionally, I’m not even sure how to perform this in my rails app. I
see an ActiveRecord.find_by_sql and count_by_sql but there is nothing
more generic whic hwould allow me to perform an SQL delete statement
out of rails – is there?

Thanks for your help on this guys. I had a hard time getting my head
around this and am most grateful to your help! -RVince.

RVince wrote:

Yes, I can see there is no easy, rails-way to do this. In effect, I
need to

  1. read in the X most recent records (Select * from Chanelnotes order
    by tstamp DESC limit 10)
  2. Delete the entire table (delete * in Channelnotes)
  3. Do an insert on the ten records I read in.

There’s no other way.

Sure there is. Find the IDs of the 10 most recent records, then delete
any record whose ID is not in that set. SQL would be something like
(from memory):

DELETE FROM channelnotes
WHERE id NOT IN
(SELECT id from channelnotes
ORDER BY tstamp
LIMIT 10)

See? You only touch the records you’re deleting. No reinsert
necessary.

BTW, why is your timestamp column called tstamp instead of created_at ?

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

On Aug 15, 3:00 pm, RVince [email protected] wrote:

Marmen,

Is created_at the standard way of doing that (I alwasy put in certain
fields in my tables – a timestamp field, a crossref field – which is
a string that would be the index of the field in another db whose
structure might be trying to follow mine or vice versa).

if your timestamp is called created_at or updated_at Rails will set it
for you

Additionally, I’m not even sure how to perform this in my rails app. I
see an ActiveRecord.find_by_sql and count_by_sql but there is nothing
more generic whic hwould allow me to perform an SQL delete statement
out of rails – is there?

There is delete_all but that expects merely a set of sql conditions.

ActiveRecord::Base.connection.execute “…” allows you to execute
arbitrary sql statements

ActiveRecord::Base.connection.delete is nearly identical to that (it
calls execute) but you may prefer the explicitness of calling a method
called delete, more importantly using the delete method will flush
rails’ sql cache.

Fred

The issue – my question – is how do I keep only the most recent 10
records in the db, efficiently?

For efficiency, might PStore or TokyoCabinet be more appropriate
choices? All those db accesses might add up.

Ron

g_f wrote:

Doesn’t “ORDER BY tstamp” need to be “ORDER BY tstamp DESC”? The
latest 10 need to be captured, not the oldest 10.

Yes, and that was a silly mistake on my part. The query still shows the
general pattern, though.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

I’'ve taken this approach:

def self.prune_table
if self.count > 1000
youngest_victim = find(:first, :select => :id, :order => ‘created_at
desc’, :offset => 1000)
delete_all([“id <= ?”,youngest_victim.id])
end
end

My requirements are not particularly strict around the 1000. Just more
or
less around there. You could do the same with created_at I guess.

Any obvious issues with this approach?

Regards
Ivor

On Mon, Aug 17, 2009 at 4:52 PM, Marnen Laibow-Koser <