Broken has_many

This has been killing me this weekend. Doing a has_many fetch is
accurate the first time only.

Here’s a simple example:

tp = TestParent.new(:name => “testing”)
tp.save!

tc = TestChild.new(:name => “test1”, :test_parent_id => tp.id)
tc.save!
tc = TestChild.new(:name => “test2”, :test_parent_id => tp.id)
tc.save!
tc = TestChild.new(:name => “test3”, :test_parent_id => tp.id)
tc.save!
tp.test_children.map {|x| x.name }.to_sentence # => “test1, test2, and
test3” CORRECT

tc = TestChild.new(:name => “test4”, :test_parent_id => tp.id)
tc.save!
tp.test_children.map {|x| x.name }.to_sentence # => “test1, test2, and
test3” WRONG!

tp = TestParent.find(tp.id)
tp.test_children.map {|x| x.name }.to_sentence # => “test1, test2,
test3, and test4” CORRECT

In my case, we were preparing an order of line items (the children of
the order) and then fetching the results to get a subtotal to add one
last line item (think tax or such) but then the last line item is lost
on the following receipt page, mailers, etc.

This is Rails 4.0.8, Pg 0.17.1, ruby 2.1.1p30 (2014-02-10 revision
44904) [x86_64-linux]. Models and migration are very, very basic in the
test above:

---- migration
class HasManyTest < ActiveRecord::Migration

def self.up
execute %{

CREATE TABLE test_parents (
id serial primary key,
created_at timestamp ,
updated_at timestamp,
name varchar(255)
);

CREATE TABLE test_children (
id serial primary key,
created_at timestamp ,
updated_at timestamp,
name varchar(255),
test_parent_id int4 references test_parents(id)
);

}
end

def self.down
raise “No you can’t!”
end

end


class TestParent < ActiveRecord::Base

has_many :test_children

end


class TestChild < ActiveRecord::Base

belongs_to :test_parent

end

Is this a known problem? It’s hard to believe others haven’t
experienced this? :’) Or is this an obscure result of some sort of
caching or such I should be turning off?

Thanks!

Phil

I hate to reply to myself, but I narrowed it down to Rails caching by
DEFAULT of model queries. It can be worked around by passing ‘true’,
like
this:

tc.name = “something different”

tc.save!

tp.test_children.map {|x| x.name }.to_sentence # => “test1, test2,
test3,
and test4” WRONG

tp.test_children(true).map {|x| x.name }.to_sentence # => “test1, test2,
test3, and something different” CORRECT

Another workaround is to just stop using has_many and such in favor of
manual functions, a la:

def test_children

return TestChild.where([“test_parent_id = ?”, self.id])

end

Is there a way to turn this sort of caching off globally? (Other
caching
is fine, I don’t want to turn all caching off.)

BTW- It is a bit mind blowing that this is turned on by default.
Possible
data corruption shouldn’t ever be preferred by default over (possible)
speed gains. I’d still categorize this as a serious bug, at least as a
configuration default.

Thanks!

Phil

Thanks for the reply. It still seems to be a systemic integrity
problem.
But the additional info on another work around is welcome.

Phil

On Sun, Aug 17, 2014 at 8:57 PM, Phil [email protected] wrote:

tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2, test3,

data corruption shouldn’t ever be preferred by default over (possible)
speed gains. I’d still categorize this as a serious bug, at least as a
configuration default.

Try tp.reload after you make a change in the collection the way you have
been. If you had been building your children entries as:

tp.children.create(name: “test1”)

tp would be fully cognizant of all it’s children.

On Monday, August 18, 2014 2:57:05 AM UTC+1, Phil wrote:

Not that I know of. As of rails 4 (or is it 3.2?) the generated
association accessors are defined in a module (rather than directly on
the
class so it’s easy enough to override them so you could do

class TestParent
has_many :test_children

def test_childen(force_reload=true)
super(force_reload)
end
end

To change it for all associations without having to do this on a per
association basis would probably require some monkey patching inside the
activerecord code - sounds brittle.

As tamouse says if you use build/create on the association then you
won’t
see this behaviour, although there can still be differences between the
array thus constructed in memory and the array you’d get if you had
selected from the database (for example if you have an order clause on
the
association, custom select etc.)

Rails has been like this at least since the 1.0 days - I can’t say I’ve
ever run into particular issues from it.

Fred

It’s not a systemic integrity problem, it is the way Rails has always
worked. Using tp.test_children.create() is not “another work-around”,
it
is the recommended way of adding children to a parent model that you
have
already instantiated and has been available for as long as I can
remember
(at least since rails 2.0.x).

Scenarios where you would actually need to re-query the database every
time you access a relation
are rare. If you really need to, you have
that
option, but in no way should that be a default.

Jim

On Sunday, 17 August 2014 21:57:05 UTC-4, Phil wrote:

def test_children
BTW- It is a bit mind blowing that this is turned on by default. Possible
data corruption shouldn’t ever be preferred by default over (possible)
speed gains. I’d still categorize this as a serious bug, at least as a
configuration default.

The cleanest way to deal with this is to use the association. Instead
of:

tc = TestChild.new(:name => “test4”, :test_parent_id => tp.id)

do:

tp.test_children.new(:name => “test4”)

This will correctly reflect the new record when subsequently calling
tp.test_children.

–Matt J.

On Monday, August 18, 2014 1:57:08 AM UTC-7, Frederick C. wrote:

Possible data corruption shouldn’t ever be preferred by default over

see this behaviour, although there can still be differences between the
array thus constructed in memory and the array you’d get if you had
selected from the database (for example if you have an order clause on the
association, custom select etc.)

Ah, yes, I see it now in
activerecord-4.0.8/lib/active_record/associations/collection_association.rb

OK, makes sense. I’ve love to have that be a config option I could
throw
in the environments config files to turn that off, if needed.

Rails has been like this at least since the 1.0 days - I can’t say I’ve
ever run into particular issues from it.

In my case it was rather nasty. It went something like this:

@order = Order.new …
… record line items…
@order.line_items… calculate subtotal to calculate tax and record that
as
the final line item
run charge on credit card, render receipt page, send mailers, etc.

The result was the customer got receipt page/emails and charged without
tax! Fulfillment and Accounting pulling up the order would confusingly
get
the right total and tax.

Thanks, I have a much better understanding of what’s going on now.

Phil

On Tuesday, August 19, 2014 6:16:26 AM UTC+5:30, Phil wrote:

falls, but I’d just be happy to have a config option to turn it off.

is the recommended way of adding children to a parent model that you have

Hi,

I disagree with you on your statement - “significant performance gains
of
caching associations like this is also rare/minimal”.
Almost all the apps that I’ve built had pages where I’ve used the same
association so many times for rendering the view and disabling caching
in
those cases would be an absolute performance disaster! Think of all the
other problems it can cause… eager loading of associations?

Well like Jim said its very rare that someone has to do what you’re
doing
and even if they have to do it, Rails has a way of doing it properly :slight_smile:

Thanks!

Steve

@stev4nity

(I apologize in advance for top posting, but trying to keep this thread
consistent.)

I would say it’s a definite problem for us. This project is an old
legacy
project that several people have been working on for 6+ years. When I’m
rewriting, say, the receipt page and mailer, how could I possibly know
in
advance that @order.line_items would give stale results without digging
through the details of the controller? If I have to do
@order.line_items(true) (and with every other has_many, etc.) everywhere
just to be sure, I’d rather just simply like to have the feature turned
off
globally.

While it can be ‘rare’ that somebody might run into this, I’d still
argue
that significant performance gains of caching associations like this is
also rare/minimal (would be interesting to test). I’d rather lean
towards
data accuracy than speed, at least in my case.

I’d simply like a simple config option to turn on/off this feature. I’d
also suggest that it could even be dangerous to have on by default if
things have to be done in particular ways to get around the potential
pit
falls, but I’d just be happy to have a config option to turn it off.

Thanks for the reply!

Phil

On Tuesday, August 19, 2014 1:49:14 AM UTC-7, Steve R.son wrote:

On Monday, August 18, 2014 5:25:58 AM UTC-7, Jim wrote:

association so many times for rendering the view and disabling caching in
those cases would be an absolute performance disaster! Think of all the
other problems it can cause… eager loading of associations?

Well like Jim said its very rare that someone has to do what you’re doing
and even if they have to do it, Rails has a way of doing it properly :slight_smile:

Thanks!

Thanks for the reply Steve!

I guess it’s hard to say what the advantages are without testing…
Assuming you are already indexing all of your id and *_id database
columns
and SQL query caching is on, I’m not sure what advantages association
caching is actually providing. A test might be to edit
collection_association.rb from the gem manually to set force_refresh =
true
and do some benchmarking (which I’d like to do when I get some time).
Ironically, 37 Signals has some very good articles on how to do caching
properly without worrying about getting stale data, this is a basic
example:

I’m not sure what you mean by the ‘rails way’ in this case. Usually the
rails way means not having to worry about cache management by default.
In
my case I have a mailer that simply looks like this:

def customer_receipt(order)

In the view it does @order.line_items and they are sometimes wrong.
This
just feels like an ugly hack to try to make rails faster during the days
when ruby was really, really, slow (thankfully the latest since 2.0.0 is
MUCH faster).

Phil

I would recommend you start by getting all that domain logic out of the
Controllers and into context or composite objects. Check out
Data, context and interaction - Wikipedia or James
Coplien’s excellent book “Lean Architecture” in which he discusses DCI
at length.

This doesn’t actually solve the specific
Rails-doesn’t-reload-associations problem you’re having, and in fact
I’ve worked on an e-commerce codebase with the exact same problem you
describe. We had several @order.reload calls because of the way Rails is
implemented.

Moving to a DCI pattern doesn’t eliminate the Rails problem here, but it
encapsulates the business logic into a single context object whose
responsibility is to both do the operations and then tell the object to
reload correctly, producing the expected results. You can test the
context object independently of the controller, which is ideal.

I actually think there is significant performance gain from not
reloading associations in general, and in e-commerce specific apps you
simply need to know this about Rails and suck it up and do @order.reload
when you make certain operations. Once you realize this is just how
Rails works, and you move that domain logic into a place more
appropriate than the controller, it starts to bother you less that you
have to do it in the first place.

Just my two

-Jason

On Tuesday, August 19, 2014 11:21:34 PM UTC-7, Jim wrote:

If your dataset is small and properly indexed, and you have very few
related items, chances are you will see very little change in performance
from reloading every time you access the order line items. If you have
several hundred line items, things may be a little different. If your
database is on a different machine than the web server, things may also be
a little different.

Yes, you are probably right(?).

Rails should check the updated_at for every ActiveRecord object every
single time you access it. In that case, good luck. I doubt you will ever
find any ORM so poorly written.)

No no, I mean there’s already a layer of SQL caching that takes care of
repeated identical selects between updates and inserts. Is there not?
(I
think there is?) Association caching seems unneeded(?) or at least a
lot
less intelligent than the existing SQL query caching.

In my case I have a mailer that simply looks like this:
somebody made some incorrect assumptions and wrote some incorrect code,
examples of which have already been pointed out. Thankfully, the goal of
the framework is not to make sure that can never happen by default, with
the side effect that you may need to make drastic changes to the code if
you ever want to start caching. The framework does provide multiple ways
to create related many records in a way that updates the relation in the
parent object.

Before blaming ‘incorrect code’ or bad programmers, look back again at
my
example. It’s quite trivial and reproducible. Saves order line items,
in
what I would consider a completely sane and logical way, call back line
items to compute tax, add the last line item (the tax), and then throws
the
@order to payment processing, mailers, receipt page… where totals are
wrong and tax is missing. I understand that doing things differently
avoid
the problem and I have used them to fix this issue on my end, but I
still
am quite skeptical as to how a programmer is supposed to know about
these
pitfalls.

In your case, you can either find that problem and fix it, or simply force
reload the items in the mailer you are working on. To complain about bad
code as a “Rails Problem” is not useful. Bad code can be written in any
language, using any framework.

Oh, absolutely. I’m not just venting to vent, I have a few (what I
think
are constructive) questions:

  • How can one turn off ‘association caching’ (for lack of a better name)
    without having to edit the .rb file in the gem (the answer appears to be
    you can’t?)

  • If not, can we simply add a simple config switch (e.g.
    config.active_record… that goes in config/environments/*) to Rails to
    turn this feature off/on easier in the future? (Not just for my sake, I
    mean as a general improvement.)

  • If this association caching was added as a hack to improve performance
    back in the day, is it really still needed? Perhaps it can be removed
    or
    turned off by default? Perhaps it was added before SQL caching and it
    is
    not needed(???)

To be honest, I have written code that requires a reload where certain
functionality is invoked. I have a callback on deleting a related many
that modifies a value in the parent. However, that code really should be
refactored such that the children are not destroyed directly, but instead a
method on the parent should destroy the child.

It would be interesting if has_many and such could throw a callback on
the
child model(s) to know to update automatically. Again, if this is
already
handled by the SQL caching layer, perhaps it’s all moot to even have
caching at the association layer. It makes sense that Rails can’t know
that changes are being make to the database outside of the scope of
Rails
(e.g. somebody on a SQL console), but it seems like it could be cleaner
about knowing what it is doing within its self.

My apologies if I sounded a bit frustrated before. Now that I
understand
what’s happening and how to avoid the problem, I am back in my zen
state.
;’) Thanks again for your previous thoughts and insight.

Phil

On Tuesday, August 19, 2014 9:01:36 PM UTC-4, Phil wrote:

I guess it’s hard to say what the advantages are without testing…

Assuming you are already indexing all of your id and *_id database columns
and SQL query caching is on, I’m not sure what advantages association
caching is actually providing. A test might be to edit
collection_association.rb from the gem manually to set force_refresh = true
and do some benchmarking (which I’d like to do when I get some time).

If your dataset is small and properly indexed, and you have very few
related items, chances are you will see very little change in
performance
from reloading every time you access the order line items. If you have
several hundred line items, things may be a little different. If your
database is on a different machine than the web server, things may also
be
a little different.

Ironically, 37 Signals has some very good articles on how to do caching
properly without worrying about getting stale data, this is a basic example:

How key-based cache expiration works – Signal v. Noise

Ironically, this has nothing to do with what you are having problems
with.
You will still have the “stale” object. (Unless you are suggesting
that
Rails should check the updated_at for every ActiveRecord object every
single time you access it. In that case, good luck. I doubt you will
ever
find any ORM so poorly written.)

I’m not sure what you mean by the ‘rails way’ in this case. Usually the

rails way means not having to worry about cache management by default.

I don’t worry about cache management by default. In addition, I don’t
have
to worry about how to start using the database caching, and when I might
need it.

In my case I have a mailer that simply looks like this:

def customer_receipt(order)

In the view it does @order.line_items and they are sometimes wrong. This
just feels like an ugly hack to try to make rails faster during the days
when ruby was really, really, slow (thankfully the latest since 2.0.0 is
MUCH faster).

If your order line items are wrong, that is because somewhere in the
code,
somebody made some incorrect assumptions and wrote some incorrect code,
examples of which have already been pointed out. Thankfully, the goal
of
the framework is not to make sure that can never happen by default,
with
the side effect that you may need to make drastic changes to the code if
you ever want to start caching. The framework does provide multiple
ways
to create related many records in a way that updates the relation in the
parent object.

In your case, you can either find that problem and fix it, or simply
force
reload the items in the mailer you are working on. To complain about
bad
code as a “Rails Problem” is not useful. Bad code can be written in any
language, using any framework.

To be honest, I have written code that requires a reload where certain
functionality is invoked. I have a callback on deleting a related many
that modifies a value in the parent. However, that code really should
be
refactored such that the children are not destroyed directly, but
instead a
method on the parent should destroy the child.

Jim

On Wednesday, August 20, 2014 11:37:03 PM UTC-4, Phil wrote:

Before blaming ‘incorrect code’ or bad programmers, look back again at my
example. It’s quite trivial and reproducible. Saves order line items, in
what I would consider a completely sane and logical way, call back line
items to compute tax, add the last line item (the tax), and then throws the
@order to payment processing, mailers, receipt page… where totals are
wrong and tax is missing. I understand that doing things differently avoid
the problem and I have used them to fix this issue on my end, but I still
am quite skeptical as to how a programmer is supposed to know about these
pitfalls.

It sounds like you are thinking in terms of the database, while
ActiveRecord is an ORM. You need to think of it in terms of objects
which
happen to be persisted in a database. When you create a line item
record
in the database, your @order object does not know about it, so you have
to
tell it to reload. The only other ORM I’ve used is Apple’s CoreData,
and
the same thing applies. If you modify the database without using the
object in question, the object will not be updated until you reload it.

Jim