Has_many seems funky when handling duplicate records

casper_the_ghost · January 21, 2008, 10:38pm

The following test fails on the last line as firm.clients.length == 2,
not 4:

def test_replace_with_duplicates
firm = Firm.find(:first)
c1, c2 = companies(:first_client),companies(:second_client)
assert_equal firm.clients, [c1, c2]
firm.clients = [c1, c1, c2, c2]
firm.save
firm.reload
assert_equal 4, firm.clients.length
end

Looking at the implementation of the “replace” method in
ActiveRecord::Associations::AssociationCollection I can see where this
behavior is coming from. Not what i was expecting really. Is this
expected?

casper_the_ghost · January 21, 2008, 10:42pm

Jeff,

I wonder if it would make a difference if you changed line 3 to be:

c1, c2 = Company.find(1), Company.find(2)

I know that I have seen a difference in the past between loading the
fixture from the db vs. loading with the fixture method.

–Tom

casper_the_ghost · January 21, 2008, 10:55pm

Hi Tom,

I gave the above a shot, but it gave the same results.

-J

casper_the_ghost · January 21, 2008, 11:37pm

Why would you want to associate the same client to the same firm
multiple
times anyway? The idea is that for a has_many you have unique records.
For
example if I had a post and it had a comment on it, if I did
@blog.comments= [
Comment.find(:first),Comment.find(:first)] it should only show ONE
comment.

casper_the_ghost · January 22, 2008, 2:01am

What is the SQL in the test.log telling you about the objects? Is it
actually creating two additional clients with the same attributes? If
not, I don’t see how it would be possible after the reload for the
firm to 4 clients?

casper_the_ghost · January 22, 2008, 1:31am

I agree with you in the case where a domain model expects a unique set
of associations, but what about situation that doesn’t? It’s a
similar difference between Arrays and Sets: Arrays can be unique or
have duplicates, but Sets must be unique. I would understand if it
was clear that has_many associations are treated like Sets, but
nothing describes them that way. And if they were, what would be the
point of the “:uniq” option on has_many and HABTM? It would be
redundant, right?

casper_the_ghost · January 22, 2008, 7:47pm

It is not creating two new clients with the same attributes. The
“replace” method in AssoctiationCollection, which is what is called
when you do something like model_instance.collection = [other_model1,
other_model2], looks like this:

  # Replace this collection with +other_array+
  # This will perform a diff and delete/add only records that have

changed.
def replace(other_array)
other_array.each { |val| raise_on_type_mismatch(val) }

    load_target
    other   = other_array.size < 100 ? other_array :

other_array.to_set
current = @target.size < 100 ? @target : @target.to_set

    @owner.transaction do
      delete(@target.select { |v| !other.include?(v) })
      concat(other_array.select { |v| !current.include?(v) })
    end
  end

Certainly if the size of the existing or supplied array is > 100, it’s
going to ignore duplicates, but also the delete and concat calls,
because they don’t check for duplicate instances of the same object,
will ignore them as well. That means that if @target and other_array
contain the same unique set of objects replace will do nothing. For
example, consider ModelA which has_many ModelB. Assume instance
mod_a.model_bs == [ mod_b ]. If that were the case, the following
line would have no effect:

mod_a.model_bs = [mod_b, mod_b, mod_b, mod_b, mod_b, mod_b, mod_b,
mod_b, mod_b]

Similarly, if mod_a.model_bs already was set to [mod_b, mod_b, mod_b,
mod_b, mod_b, mod_b, mod_b, mod_b, mod_b], and we did the following,
it would have no effect:

mod_a.model_bs = [mod_b]

What I’m suggesting is that if we created the has_many relationship
with :uniq => true I would expect this behavior. Instead it seems
that :uniq => true only has an effect on database reads.
“collection=” is documented in the API as, “replaces the collections
content by deleting and adding objects as appropriate”, which doesn’t
seem true if the current or new collection contain duplicates

casper_the_ghost · January 25, 2008, 12:56am

What I’m suggesting is that if we created the has_many relationship
with :uniq=> true I would expect this behavior. Instead it seems
that :uniq=> true only has an effect on database reads.
“collection=” is documented in the API as, “replaces the collections
content by deleting and adding objects as appropriate”, which doesn’t
seem true if the current or new collection contain duplicates

yes. if one says

class Users < ActiveRecord::Base
has_and_belongs_to_many :roles, :uniq => true
end

then, at minimum, rails should do something like

before_save do |record|
record.send(:roles).uniq!
end

under the hood for you. anything else is very non-POLS, to put it
mildly.

kind regards.

casper_the_ghost · January 22, 2008, 10:50pm

jeff b wrote:

The following test fails on the last line as firm.clients.length == 2,
not 4:

def test_replace_with_duplicates
firm = Firm.find(:first)
c1, c2 = companies(:first_client),companies(:second_client)
assert_equal firm.clients, [c1, c2]
firm.clients = [c1, c1, c2, c2]
firm.save
firm.reload
assert_equal 4, firm.clients.length
end

Looking at the implementation of the “replace” method in
ActiveRecord::Associations::AssociationCollection I can see where this
behavior is coming from. Not what i was expecting really. Is this
expected?

c1,c1 is the SAME db record, and it’s a primary key in your db… unless
you dup c1 and save with a different id, you will still only have one
model in your db with that id.

The only way to do the functionality that you desire is by adding in a
link table and exploring has and belongs to many relationships as the
link table can keep “duplicate” associations…

Forget about rails and look at it from the db side of things…

hth

ilan