Forum: Ruby on Rails Broken has_many

6e672922c21a5298f2a666ecb1c11d7c?d=identicon&s=25 Philip Edelbrock (Guest)
on 2014-08-18 03:27
(Received via mailing list)
This has been killing me this weekend.  Doing a has_many fetch is
accurate the *first* time only.

Here's a simple example:

tp = TestParent.new(:name => "testing")
tp.save!

tc = TestChild.new(:name => "test1", :test_parent_id => tp.id)
tc.save!
tc = TestChild.new(:name => "test2", :test_parent_id => tp.id)
tc.save!
tc = TestChild.new(:name => "test3", :test_parent_id => tp.id)
tc.save!
tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2, and
test3" *CORRECT*

tc = TestChild.new(:name => "test4", :test_parent_id => tp.id)
tc.save!
tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2, and
test3"  *WRONG!*

tp = TestParent.find(tp.id)
tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2,
test3, and test4" *CORRECT*


In my case, we were preparing an order of line items (the children of
the order) and then fetching the results to get a subtotal to add one
last line item (think tax or such) but then the last line item is lost
on the following receipt page, mailers, etc.

This is Rails 4.0.8, Pg 0.17.1, ruby 2.1.1p30 (2014-02-10 revision
44904) [x86_64-linux].  Models and migration are very, very basic in the
test above:

---- migration
class HasManyTest < ActiveRecord::Migration

def self.up
 execute %{

CREATE TABLE test_parents  (
        id serial primary key,
        created_at timestamp ,
        updated_at timestamp,
  name varchar(255)
);

CREATE TABLE test_children  (
        id serial primary key,
        created_at timestamp ,
        updated_at timestamp,
  name varchar(255),
  test_parent_id int4 references test_parents(id)
);

}
end

def self.down
  raise "No you can't!"
end

end
----

----
class TestParent < ActiveRecord::Base

 has_many       :test_children

end
----

----
class TestChild < ActiveRecord::Base

 belongs_to :test_parent

end
----

Is this a known problem?  It's hard to believe others haven't
experienced this? :')  Or is this an obscure result of some sort of
caching or such I should be turning off?

Thanks!


Phil
6e672922c21a5298f2a666ecb1c11d7c?d=identicon&s=25 Phil (Guest)
on 2014-08-18 03:58
(Received via mailing list)
I hate to reply to myself, but I narrowed it down to Rails caching by
DEFAULT of model queries.  It can be worked around by passing 'true',
like
this:


tc.name = "something different"

tc.save!

tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2,
test3,
and test4" *WRONG*

tp.test_children(true).map {|x| x.name }.to_sentence # => "test1, test2,
test3, and something different" *CORRECT*


Another workaround is to just stop using has_many and such in favor of
manual functions, a la:

 def test_children

  return TestChild.where(["test_parent_id = ?", self.id])

 end


Is there a way to turn this sort of caching off globally?  (Other
caching
is fine, I don't want to turn all caching off.)


BTW- It is a bit mind blowing that this is turned on by default.
Possible
data corruption shouldn't ever be preferred by default over (possible)
speed gains.  I'd still categorize this as a serious bug, at least as a
configuration default.


Thanks!


Phil
Aa082c8b00a50928e5860dcd70bf2368?d=identicon&s=25 tamouse m. (tamouse_m)
on 2014-08-18 06:17
(Received via mailing list)
On Sun, Aug 17, 2014 at 8:57 PM, Phil <phil@edgedesign.us> wrote:

> tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2, test3,
>
> data corruption shouldn't ever be preferred by default over (possible)
> speed gains.  I'd still categorize this as a serious bug, at least as a
> configuration default.
>
>

Try tp.reload after you make a change in the collection the way you have
been. If you had been building your children entries as:

tp.children.create(name: "test1")

tp would be fully cognizant of all it's children.
6e672922c21a5298f2a666ecb1c11d7c?d=identicon&s=25 Phil (Guest)
on 2014-08-18 06:30
(Received via mailing list)
Thanks for the reply.  It still seems to be a systemic integrity
problem.
 But the additional info on another work around is welcome.


Phil
81b61875e41eaa58887543635d556fca?d=identicon&s=25 Frederick Cheung (Guest)
on 2014-08-18 10:58
(Received via mailing list)
On Monday, August 18, 2014 2:57:05 AM UTC+1, Phil wrote:
>
>
> Not that I know of. As of rails 4 (or is it 3.2?) the generated
association accessors are defined in a module (rather than directly on
the
class so it's easy enough to override them so you could do

class TestParent
   has_many :test_children

   def test_childen(force_reload=true)
     super(force_reload)
   end
end

To change it for all associations without having to do this on a per
association basis would probably require some monkey patching inside the
activerecord code - sounds brittle.

As tamouse says if you use build/create on the association then you
won't
see this behaviour, although there can still be differences between the
array thus constructed in memory and the array you'd get if you had
selected from the database (for example if you have an order clause on
the
association, custom select  etc.)

Rails has been like this at least since the 1.0 days - I can't say I've
ever run into particular issues from it.

Fred
17c1d72be21a2a43be717c6dc1660c7b?d=identicon&s=25 Jim (Guest)
on 2014-08-18 14:27
(Received via mailing list)
It's not a systemic integrity problem, it is the way Rails has always
worked.  Using tp.test_children.create() is not "another work-around",
it
is the recommended way of adding children to a parent model that you
have
already instantiated and has been available for as long as I can
remember
(at least since rails 2.0.x).

Scenarios where you would actually need to re-query the database *every
time you access a relation* are rare.  If you really need to, you have
that
option, but in no way should that be a default.

Jim
6883e5ef03484d4fcef507d7b4f1d243?d=identicon&s=25 Matt Jones (Guest)
on 2014-08-18 17:09
(Received via mailing list)
On Sunday, 17 August 2014 21:57:05 UTC-4, Phil wrote:
>
>  def test_children
> BTW- It is a bit mind blowing that this is turned on by default.  Possible
> data corruption shouldn't ever be preferred by default over (possible)
> speed gains.  I'd still categorize this as a serious bug, at least as a
> configuration default.
>
>
>
The cleanest way to deal with this is to *use* the association. Instead
of:

tc = TestChild.new(:name => "test4", :test_parent_id => tp.id)

do:

tp.test_children.new(:name => "test4")

This will correctly reflect the new record when subsequently calling
`tp.test_children`.

--Matt Jones
6e672922c21a5298f2a666ecb1c11d7c?d=identicon&s=25 Phil (Guest)
on 2014-08-19 02:17
(Received via mailing list)
On Monday, August 18, 2014 1:57:08 AM UTC-7, Frederick Cheung wrote:
>>  Possible data corruption shouldn't ever be preferred by default over
>
> see this behaviour, although there can still be differences between the
> array thus constructed in memory and the array you'd get if you had
> selected from the database (for example if you have an order clause on the
> association, custom select  etc.)
>

Ah, yes, I see it now in
activerecord-4.0.8/lib/active_record/associations/collection_association.rb

OK, makes sense.  I've love to have that be a config option I could
throw
in the environments config files to turn that off, if needed.


> Rails has been like this at least since the 1.0 days - I can't say I've
> ever run into particular issues from it.
>

In my case it was rather nasty.  It went something like this:

@order = Order.new ...
... record line items..
@order.line_items... calculate subtotal to calculate tax and record that
as
the final line item
run charge on credit card, render receipt page, send mailers, etc.

The result was the customer got receipt page/emails and charged without
tax!  Fulfillment and Accounting pulling up the order would confusingly
get
the right total and tax.

Thanks, I have a much better understanding of what's going on now.


Phil
6e672922c21a5298f2a666ecb1c11d7c?d=identicon&s=25 Phil (Guest)
on 2014-08-19 02:47
(Received via mailing list)
(I apologize in advance for top posting, but trying to keep this thread
consistent.)

I would say it's a definite problem for us.  This project is an old
legacy
project that several people have been working on for 6+ years.  When I'm
rewriting, say, the receipt page and mailer, how could I possibly know
in
advance that @order.line_items would give stale results without digging
through the details of the controller?  If I have to do
@order.line_items(true) (and with every other has_many, etc.) everywhere
just to be sure, I'd rather just simply like to have the feature turned
off
globally.

While it can be 'rare' that somebody might run into this, I'd still
argue
that significant performance gains of caching associations like this is
also rare/minimal (would be interesting to test).  I'd rather lean
towards
data accuracy than speed, at least in my case.

I'd simply like a simple config option to turn on/off this feature.  I'd
also suggest that it could even be dangerous to have on by default if
things have to be done in particular ways to get around the potential
pit
falls, but I'd just be happy to have a config option to turn it off.

Thanks for the reply!


Phil
D2df321347fb07d4a159792a40bb1374?d=identicon&s=25 Steve Robinson (Guest)
on 2014-08-19 10:50
(Received via mailing list)
On Tuesday, August 19, 2014 6:16:26 AM UTC+5:30, Phil wrote:
> falls, but I'd just be happy to have a config option to turn it off.
>> is the recommended way of adding children to a parent model that you have
>>>>
Hi,

I disagree with you on your statement - "significant performance gains
of
caching associations like this is also rare/minimal".
Almost all the apps that I've built had pages where I've used the same
association so many times for rendering the view and disabling caching
in
those cases would be an absolute performance disaster! Think of all the
other problems it can cause... eager loading of associations?

Well like Jim said its very rare that someone has to do what you're
doing
and even if they have to do it, Rails has a way of doing it properly :)

Thanks!

Steve
--
@stev4nity
Dfc7587fd73f2efa19d6f1f9611b70ba?d=identicon&s=25 Jason Fb (jasonfb)
on 2014-08-19 16:41
(Received via mailing list)
I would recommend you start by getting all that domain logic out of the
Controllers and into context or composite objects. Check out
http://en.wikipedia.org/wiki/Data,_context_and_interaction or James
Coplien's excellent book "Lean Architecture" in which he discusses DCI
at length.

This doesn't actually solve the specific
Rails-doesn't-reload-associations problem you're having, and in fact
I've worked on an e-commerce codebase with the exact same problem you
describe. We had several @order.reload calls because of the way Rails is
implemented.

Moving to a DCI pattern doesn't eliminate the Rails problem here, but it
encapsulates the business logic into a single context object whose
responsibility is to both do the operations and then tell the object to
reload correctly, producing the expected results. You can test the
context object independently of the controller, which is ideal.

I actually think there is significant performance gain from not
reloading associations in general, and in e-commerce specific apps you
simply need to know this about Rails and suck it up and do @order.reload
when you make certain operations. Once you realize this is just how
Rails works, and you move that domain logic into a place more
appropriate than the controller, it starts to bother you less that you
have to do it in the first place.

Just my two

-Jason
6e672922c21a5298f2a666ecb1c11d7c?d=identicon&s=25 Phil (Guest)
on 2014-08-20 03:02
(Received via mailing list)
On Tuesday, August 19, 2014 1:49:14 AM UTC-7, Steve Robinson wrote:
>>
>> On Monday, August 18, 2014 5:25:58 AM UTC-7, Jim wrote:
>>>
> association so many times for rendering the view and disabling caching in
> those cases would be an absolute performance disaster! Think of all the
> other problems it can cause... eager loading of associations?
>
> Well like Jim said its very rare that someone has to do what you're doing
> and even if they have to do it, Rails has a way of doing it properly :)
>
> Thanks!
>


Thanks for the reply Steve!

I guess it's hard to say what the advantages are without testing...
Assuming you are already indexing all of your id and *_id database
columns
and SQL query caching is on, I'm not sure what advantages association
caching is actually providing.  A test might be to edit
collection_association.rb from the gem manually to set force_refresh =
true
and do some benchmarking (which I'd like to do when I get some time).
Ironically, 37 Signals has some very good articles on how to do caching
properly without worrying about getting stale data, this is a basic
example:

https://signalvnoise.com/posts/3113-how-key-based-...

I'm not sure what you mean by the 'rails way' in this case.  Usually the
rails way means not having to worry about cache management by default.
In
my case I have a mailer that simply looks like this:

def customer_receipt(order)

In the view it does @order.line_items and they are sometimes wrong.
This
just feels like an ugly hack to try to make rails faster during the days
when ruby was really, really, slow (thankfully the latest since 2.0.0 is
MUCH faster).


Phil
17c1d72be21a2a43be717c6dc1660c7b?d=identicon&s=25 Jim (Guest)
on 2014-08-20 08:23
(Received via mailing list)
On Tuesday, August 19, 2014 9:01:36 PM UTC-4, Phil wrote:

I guess it's hard to say what the advantages are without testing...
> Assuming you are already indexing all of your id and *_id database columns
> and SQL query caching is on, I'm not sure what advantages association
> caching is actually providing.  A test might be to edit
> collection_association.rb from the gem manually to set force_refresh = true
> and do some benchmarking (which I'd like to do when I get some time).
>

If your dataset is small and properly indexed, and you have very few
related items, chances are you will see very little change in
performance
from reloading every time you access the order line items.  If you have
several hundred line items, things may be a little different.  If your
database is on a different machine than the web server, things may also
be
a little different.


> Ironically, 37 Signals has some very good articles on how to do caching
> properly without worrying about getting stale data, this is a basic example:
>
> https://signalvnoise.com/posts/3113-how-key-based-...
>

Ironically, this has nothing to do with what you are having problems
with.
 You will still have the "stale" object.  (Unless you are suggesting
that
Rails should check the updated_at for every ActiveRecord object every
single time you access it. In that case, good luck. I doubt you will
ever
find any ORM so poorly written.)

I'm not sure what you mean by the 'rails way' in this case.  Usually the
> rails way means not having to worry about cache management by default.
>

I don't worry about cache management by default.  In addition, I don't
have
to worry about how to start using the database caching, and when I might
need it.


>   In my case I have a mailer that simply looks like this:
>
> def customer_receipt(order)
>
> In the view it does @order.line_items and they are sometimes wrong.  This
> just feels like an ugly hack to try to make rails faster during the days
> when ruby was really, really, slow (thankfully the latest since 2.0.0 is
> MUCH faster).
>

If your order line items are wrong, that is because somewhere in the
code,
somebody made some incorrect assumptions and wrote some incorrect code,
examples of which have already been pointed out.  Thankfully, the goal
of
the framework is *not* to make sure that can never happen by default,
with
the side effect that you may need to make drastic changes to the code if
you ever want to start caching.  The framework does provide multiple
ways
to create related many records in a way that updates the relation in the
parent object.

In your case, you can either find that problem and fix it, or simply
force
reload the items in the mailer you are working on.  To complain about
bad
code as a "Rails Problem" is not useful.  Bad code can be written in any
language, using any framework.

To be honest, I have written code that requires a reload where certain
functionality is invoked.  I have a callback on deleting a related many
that modifies a value in the parent.  However, that code really should
be
refactored such that the children are not destroyed directly, but
instead a
method on the parent should destroy the child.

Jim
6e672922c21a5298f2a666ecb1c11d7c?d=identicon&s=25 Phil (Guest)
on 2014-08-21 05:38
(Received via mailing list)
On Tuesday, August 19, 2014 11:21:34 PM UTC-7, Jim wrote:
>>
>
> If your dataset is small and properly indexed, and you have very few
> related items, chances are you will see very little change in performance
> from reloading every time you access the order line items.  If you have
> several hundred line items, things may be a little different.  If your
> database is on a different machine than the web server, things may also be
> a little different.
>

Yes, you are probably right(?).


> Rails should check the updated_at for every ActiveRecord object every
> single time you access it. In that case, good luck. I doubt you will ever
> find any ORM so poorly written.)
>

No no, I mean there's already a layer of SQL caching that takes care of
repeated identical selects between updates and inserts.  Is there not?
(I
think there is?)  Association caching seems unneeded(?) or at least a
lot
less intelligent than the existing SQL query caching.


>>   In my case I have a mailer that simply looks like this:
> somebody made some incorrect assumptions and wrote some incorrect code,
> examples of which have already been pointed out.  Thankfully, the goal of
> the framework is *not* to make sure that can never happen by default, with
> the side effect that you may need to make drastic changes to the code if
> you ever want to start caching.  The framework does provide multiple ways
> to create related many records in a way that updates the relation in the
> parent object.
>

Before blaming 'incorrect code' or bad programmers, look back again at
my
example.  It's quite trivial and reproducible.  Saves order line items,
in
what I would consider a completely sane and logical way, call back line
items to compute tax, add the last line item (the tax), and then throws
the
@order to payment processing, mailers, receipt page... where totals are
wrong and tax is missing.  I understand that doing things differently
avoid
the problem and I have used them to fix this issue on my end, but I
still
am quite skeptical as to how a programmer is supposed to know about
these
pitfalls.


> In your case, you can either find that problem and fix it, or simply force
> reload the items in the mailer you are working on.  To complain about bad
> code as a "Rails Problem" is not useful.  Bad code can be written in any
> language, using any framework.
>

Oh, absolutely.  I'm not just venting to vent, I have a few (what I
think
are constructive) questions:

- How can one turn off 'association caching' (for lack of a better name)
without having to edit the .rb file in the gem (the answer appears to be
you can't?)

- If not, can we simply add a simple config switch (e.g.
config.active_record... that goes in config/environments/*) to Rails to
turn this feature off/on easier in the future? (Not just for my sake, I
mean as a general improvement.)

- If this association caching was added as a hack to improve performance
back in the day, is it really still needed?  Perhaps it can be removed
or
turned off by default?  Perhaps it was added before SQL caching and it
is
not needed(???)


>
> To be honest, I have written code that requires a reload where certain
> functionality is invoked.  I have a callback on deleting a related many
> that modifies a value in the parent.  However, that code really should be
> refactored such that the children are not destroyed directly, but instead a
> method on the parent should destroy the child.
>
>
It would be interesting if has_many and such could throw a callback on
the
child model(s) to know to update automatically.  Again, if this is
already
handled by the SQL caching layer, perhaps it's all moot to even have
caching at the association layer.  It makes sense that Rails can't know
that changes are being make to the database outside of the scope of
Rails
(e.g. somebody on a SQL console), but it seems like it could be cleaner
about knowing what it is doing within its self.

My apologies if I sounded a bit frustrated before.  Now that I
understand
what's happening and how to avoid the problem, I am back in my zen
state.
;')  Thanks again for your previous thoughts and insight.


Phil
17c1d72be21a2a43be717c6dc1660c7b?d=identicon&s=25 Jim (Guest)
on 2014-08-21 06:51
(Received via mailing list)
On Wednesday, August 20, 2014 11:37:03 PM UTC-4, Phil wrote:


> Before blaming 'incorrect code' or bad programmers, look back again at my
> example.  It's quite trivial and reproducible.  Saves order line items, in
> what I would consider a completely sane and logical way, call back line
> items to compute tax, add the last line item (the tax), and then throws the
> @order to payment processing, mailers, receipt page... where totals are
> wrong and tax is missing.  I understand that doing things differently avoid
> the problem and I have used them to fix this issue on my end, but I still
> am quite skeptical as to how a programmer is supposed to know about these
> pitfalls.
>

It sounds like you are thinking in terms of the database, while
ActiveRecord is an ORM.  You need to think of it in terms of objects
which
happen to be persisted in a database.   When you create a line item
record
in the database, your @order object does not know about it, so you have
to
tell it to reload.  The only other ORM I've used is Apple's CoreData,
and
the same thing applies.  If you modify the database without using the
object in question, the object will not be updated until you reload it.

Jim
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.