Finding duplicate data

Hello,
I have a database with the following fields:
id, created_at, updated_at, study_date, user_id, modality_id, count

In other words, a rather normal Rails database.
I had some issues getting ‘study_date’ to play nice, and have ended up
with some duplicate data: rows where
user_id, modality_id, count are the same, and
study_date is … Almost the same (an hour or two off, depending on time
zone adjustment).

My first step will be to bring study_date to have the same hour, so
there are four identical fields for the duplicate data.
Now - is there any ‘easy’ way to find and delete such duplicate data, or
should I just do something like this?

duplicate = []
Count.all.each do |count|
duplicate << count if Count.find(:first, :conditions => {:user_id =>
count.user_id (and so on) } )
end

Aldric G. wrote:

Hello,
I have a database with the following fields:
id, created_at, updated_at, study_date, user_id, modality_id, count

No, you’ve got a table with those fields.

In other words, a rather normal Rails database.
I had some issues getting ‘study_date’ to play nice, and have ended up
with some duplicate data: rows where
user_id, modality_id, count are the same, and
study_date is … Almost the same (an hour or two off, depending on time
zone adjustment).

My first step will be to bring study_date to have the same hour, so
there are four identical fields for the duplicate data.
Now - is there any ‘easy’ way to find and delete such duplicate data, or
should I just do something like this?

duplicate = []
Count.all.each do |count|
duplicate << count if Count.find(:first, :conditions => {:user_id =>
count.user_id (and so on) } )
end

Finding duplicates is always hard. You might be able to use SQL count
and group to help locate them, but it’s not going to be that easy.

Once you’ve found them, define a unique index in the DB so this can’t
happen again.

Best,
–Â
Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Marnen Laibow-Koser wrote:

Aldric G. wrote:

Hello,
I have a database with the following fields:
id, created_at, updated_at, study_date, user_id, modality_id, count

No, you’ve got a table with those fields.

Indeed.

Finding duplicates is always hard. You might be able to use SQL count
and group to help locate them, but it’s not going to be that easy.

Once you’ve found them, define a unique index in the DB so this can’t
happen again.

“id” is not good enough? :wink: The problem was with my checking code,
which also checked the hour - so “2009-12-05 05:00:00” was different
than “2009-12-05 04:00:00” and I’d get a new field. I just finished a
migration to change that field from a datetime to a date and get rid of
the issue altogether.
If it’s just SQL then I can do it. I was wondering if Rails had
something fancy for duplicate data and couldn’t find much online.

Aldric G. wrote:

Marnen Laibow-Koser wrote:

Aldric G. wrote:

Hello,
I have a database with the following fields:
id, created_at, updated_at, study_date, user_id, modality_id, count

No, you’ve got a table with those fields.

Indeed.

Finding duplicates is always hard. You might be able to use SQL count
and group to help locate them, but it’s not going to be that easy.

Once you’ve found them, define a unique index in the DB so this can’t
happen again.

“id” is not good enough? :wink:

Not if you care about duplication of other data.

The problem was with my checking code,
which also checked the hour - so “2009-12-05 05:00:00” was different
than “2009-12-05 04:00:00” and I’d get a new field.

You mean a new record?

I just finished a
migration to change that field from a datetime to a date and get rid of
the issue altogether.

That would do it. Now, if you don’t want two records for the same date,
you need a unique index on the date field. Don’t trust the app for
this.

If it’s just SQL then I can do it. I was wondering if Rails had
something fancy for duplicate data and couldn’t find much online.

I’m not aware of anything, but that doesn’t mean there isn’t anything.

Best,
–Â
Marnen Laibow-Koser
http://www.marnen.org
[email protected]