Change Graph. (or "Tracking ERB Substitutions?")

Hey all, I have a complicated problem on my hands and I was wondering
if you could help. Let me explain my situation:

At work we are developing a rails system where the site is dynamic
for internal use but the pages on the live site are static. For
example, a user of our system edits Product data and they can see
their changes immediately in development. When the user publishes
their changes a static page html is generated and uploaded.

What I want to do is somehow keep a Graph of all the things that have
changed since the last publish. This way we only have to generate the
pages that have changed.

A little more background: We have Products and Categories. There is a
many-to-many relationship between them.

One thought I had was to keep track of when the User changed Products
and Categories and then flag all Pages that reference them. This
doesn’t seem like it would be too hard. Note that we only need to
keep track of changes at the Page level. If only one thing has
changed for a given URL then we know that page is ‘dirty’ and needs
to be republished.

This approach seems pretty straight-forward. Every time someone edits
a Product you know to update 1) the Product’s page and 2) any
Category pages that reference it. (You know this because you know
what Categories the Product is in.)

It seems like this would work, however things get more complicated
when you factor in view/template changes. Lets say someone updates a
partial. How can we keep track of all pages that reference that
partial? Or say someone updates a template or reorganizes them, how
can you know if they changed?

It seems like you would need to keep track of every point of ERB
substitution. So for instance you would have a Page (identified by
the url http://site/products/sku123) you would need to have a list of
all substitutions in the template and then all the substitutions in
those templates and so on.

My coworkers rightly say that mathematically this gets way too
complex and is actually inefficient. Also this Graph of Pages and
points of substitution would have to get recreated every time a
template was changed (Say, on a SVN commit). They say that what we
should do is just spider the entire site and keep MD5 sums (or
whatever) of the Pages and if they have changed then we publish it.
They think that this brute force is will actually be faster then
keeping track of the massive change Graph.

I still haven’t made up my mind. I really like the theoretical
efficiency of only generating the pages with information that has
changed. But I see what they are saying about the mathematical
complexity. My only issue with the brute-force approach is that we
have thousands of pages and it will take a while to spider the whole
site.

So I have a few questions for you ruby gurus:

  1. Is there a way to actually create this graph of substitutions?
    e.g. given the following template is there a way to come up with the
    list of the 2 substitutions

<%= @foo.title %> <%= render(:partial => "related_items", :collection => @foo.related_items) %>

  1. Any ideas on a better way to approach this?

On 01.01.2007 18:20, Nate M. wrote:

At work we are developing a rails system where the site is dynamic
for internal use but the pages on the live site are static. For
example, a user of our system edits Product data and they can see
their changes immediately in development. When the user publishes
their changes a static page html is generated and uploaded.

Why are live pages static? Is this necessary?

It seems like this would work, however things get more complicated
when you factor in view/template changes. Lets say someone updates a
partial. How can we keep track of all pages that reference that
partial? Or say someone updates a template or reorganizes them, how
can you know if they changed?

Do you actually have to update related pages? I mean, the id probably
does not change and the title also - which would be the key bits you
have on that page. Or do you actually include more information from the
partial in other pages? In that case using IFRAMES might be a solution;
then one page corresponds to exactly one item and you can easily detect
whether you have to generate it anew or not. But that might make larger
changes to your application necessary.

  1. Any ideas on a better way to approach this?

If you follow the staged approach only to control when changes get
visible you could do the export on DB level: you set up a second Rails
installation with another DB and synchronize those databases once per
day / week / whatever interval you need.

This won’t work of course if you do the staged approach with static
pages for performance reasons. In that case (and yet it’s a completely
different approach) you could as well work with a reverse proxy.

Yet another idea: you could probably collect change dates of all items
presented in a view and set the corresponding HTTP header accordingly.
That way clients should do a refresh only if the content had changed
since last access time. I am not sure though how good this works in a
Rails environment.

So, this is a rather unstructured list. I guess my brain is not yet
working properly this year. :slight_smile:

Kind regards

robert
  1. Any ideas on a better way to approach this?

Why not just turn on page caching in production and have the publish
process clear out the cache, then pages will be regenerated on demand?

Unless I missed something in your requirements this should work fine
and you don’t need to do anything other than manage the cache cleaning
process.