Rescuing ActiveRecord::RecordNotUnique: clever or ugly?

luislavena · March 22, 2012, 8:35pm

I have an ActiveRecord that has several foreign keys, where the foreign
keys act as a single compound key that needs to be unique:

class CreateRelations < ActiveRecord::Migration
def change
create_table :relations do |t|
t.references :src, :null => false
t.references :dst, :null => false
end
add_index :relations, [:src_id, :dst_id], :unique => true
end
end

One obvious way to enforce uniqueness would be using first_or_create:

def self.create_relation(src, dst)
where(:src_id => src.id, :dst_id => dst.id).first_or_create!
end

Which works, but that always does a SELECT before creating the record.
Since I don’t expect users to attempt to create duplicates very often,
I’ve written a the method to catch RecordNotUnique errors, like this:

def self.create_relation(src, dst)
create!(:src => src, :dst => dst)
rescue ActiveRecord::RecordNotUnique
where(:src_id => src.id, :dst_id => dst.id)
end

So mine is a question of style: is this considered an abuse of rescue?
Or is how seasoned Rails programmer would do this?

ff

fearless_fool · March 23, 2012, 12:13am

On Thu, Mar 22, 2012 at 8:35 PM, Fearless F. [email protected]
wrote:

end
end

One obvious way to enforce uniqueness would be using first_or_create:

def self.create_relation(src, dst)
where(:src_id => src.id, :dst_id => dst.id).first_or_create!
end

Which works, but that always does a SELECT before creating the record.

A custom validation on the compound unique key would behave
similar, always do a SELECT (and suffer from the race condition
documented in
http://api.rubyonrails.org/classes/ActiveRecord/Validations/ClassMethods.html#method-i-validates_uniqueness_of
).

Or is how seasoned Rails programmer would do this?
If you want to be sure of the uniqueness, you need to do
this anyway (catch the exception thrown by the database
uniqueness validation to resolve the race condition).

Then the use of a validation of uniqueness is merely an
optimization to catch the non-uniqueness problem earlier.

And indeed, if the chance for double inserts is low, it may
be more efficient to not do the SELECT that is triggered
by a uniqueness validation, do an “optimistic” save and
fall back it the database tells you it fails.

Just my 2p.

Peter

fearless_fool · March 27, 2012, 8:08pm

Fearless F. wrote in post #1053062:

Peter V. wrote in post #1052874:

A custom validation on the compound unique key would behave
similar, always do a SELECT (and suffer from the race condition
documented in

http://api.rubyonrails.org/classes/ActiveRecord/Validations/ClassMethods.html#method-i-validates_uniqueness_of

).

I was aware of the race condition in the original post, which is why I
thought this might be a ‘clever’ implementation. To merge what @Peter
said, I guess you can summarize it as:

If the likelihood of duplicate entries is low, let the db raise an error
– this saves a SELECT call for non-duplicate entries:
def self.create_relation(src, dst)
  create!(:src_id => src.id, :dst_id => dst.id)
rescue ActiveRecord::RecordNotUnique
  where(:src_id => src.id, :dst_id => dst.id)
end
If the likelihood of duplicate entries is high, use first_or_create as a
first line of defense. This generates an extra SELECT, but (presumably)
that’s cheaper than frequent calls to raise/rescue[*].

A SQL query takes around 1 ms, possibly more if the db server is on a
different machine. That is way more expensive than rescuing an
exception.

Joe

Regardless, keep
the rescue clause to avoid race conditions:
def self.create_relation(src, dst)
  where(:src_id => src.id, :dst_id => dst.id).first_or_create!
rescue ActiveRecord::RecordNotUnique
  where(:src_id => src.id, :dst_id => dst.id)
end
[*] This begs for a benchmarking test to compare the cost of a SELECT vs
raise/rescue. I’ll put it on the list.

fearless_fool · March 24, 2012, 2:26pm

Peter V. wrote in post #1052874:

A custom validation on the compound unique key would behave
similar, always do a SELECT (and suffer from the race condition
documented in

http://api.rubyonrails.org/classes/ActiveRecord/Validations/ClassMethods.html#method-i-validates_uniqueness_of

).

I was aware of the race condition in the original post, which is why I
thought this might be a ‘clever’ implementation. To merge what @Peter
said, I guess you can summarize it as:

If the likelihood of duplicate entries is low, let the db raise an error
– this saves a SELECT call for non-duplicate entries:

def self.create_relation(src, dst)
  create!(:src_id => src.id, :dst_id => dst.id)
rescue ActiveRecord::RecordNotUnique
  where(:src_id => src.id, :dst_id => dst.id)
end

If the likelihood of duplicate entries is high, use first_or_create as a
first line of defense. This generates an extra SELECT, but (presumably)
that’s cheaper than frequent calls to raise/rescue[*]. Regardless, keep
the rescue clause to avoid race conditions:

def self.create_relation(src, dst)
  where(:src_id => src.id, :dst_id => dst.id).first_or_create!
rescue ActiveRecord::RecordNotUnique
  where(:src_id => src.id, :dst_id => dst.id)
end

[*] This begs for a benchmarking test to compare the cost of a SELECT vs
raise/rescue. I’ll put it on the list.

fearless_fool · April 3, 2012, 11:30pm

Let me redact that previous post…my only excuse is that I hadn’t had
my coffee yet!

If the time to make a DB query (let’s call it 1 Unit of time) swamps out
any amount of processing in Ruby, then:

Approach A (do a SELECT to check for existence of the record before
attempting an insert) will take 1 Unit if the record exists and 2 Units
if the record does not exist.

Approach B (always attempt the insert, let the DB signal an error,
recover by doing a SELECT) will take 2 Units if the record exists and 1
Unit if the record does not exist.

So I’d conclude that:

If the chance of a duplicate insert is less than 50%, you’re better off
with Approach B, otherwise you should choose Approach A.

Yes?

fearless_fool · April 4, 2012, 9:40am

On 3 April 2012 22:30, Fearless F. [email protected] wrote:

Approach B (always attempt the insert, let the DB signal an error,
recover by doing a SELECT) will take 2 Units if the record exists and 1
Unit if the record does not exist.

So I’d conclude that:

If the chance of a duplicate insert is less than 50%, you’re better off
with Approach B, otherwise you should choose Approach A.

I don’t know for certain but I imagine a db write takes much longer
than a db read, maybe 10 units for the write and one for the read. If
so then that changes the calculation somewhat. However there is also
the question of whether db access time is significant in the app
anyway. Bottlenecks in apps are usually not where you expect them to
be. I would do it the way that seems cleanest to you and only worry
about optimisation if throughput becomes an issue, in which case you
will have to do some profiling to find where the bottlenecks are.

Colin

fearless_fool · April 3, 2012, 7:13pm

Joe V. wrote in post #1053641:

A SQL query takes around 1 ms, possibly more if the db server is on a
different machine. That is way more expensive than rescuing an
exception.

Joe: Thanks for the useful data point. Just to make sure my
understanding is complete, since any db transaction is going to take 1ms
(more or less), you would ALWAYS let the db catch the duplicate and
avoid the extra SELECT:

def self.create_relation(src, dst)
  create!(:src_id => src.id, :dst_id => dst.id)
rescue ActiveRecord::RecordNotUnique
  where(:src_id => src.id, :dst_id => dst.id)
end

Yes?