Hey all, I have a complicated problem on my hands and I was wondering
if you could help. Let me explain my situation:
At work we are developing a rails system where the site is dynamic
for internal use but the pages on the live site are static. For
example, a user of our system edits Product data and they can see
their changes immediately in development. When the user publishes
their changes a static page html is generated and uploaded.
What I want to do is somehow keep a Graph of all the things that have
changed since the last publish. This way we only have to generate the
pages that have changed.
A little more background: We have Products and Categories. There is a
many-to-many relationship between them.
One thought I had was to keep track of when the User changed Products
and Categories and then flag all Pages that reference them. This
doesn’t seem like it would be too hard. Note that we only need to
keep track of changes at the Page level. If only one thing has
changed for a given URL then we know that page is ‘dirty’ and needs
to be republished.
This approach seems pretty straight-forward. Every time someone edits
a Product you know to update 1) the Product’s page and 2) any
Category pages that reference it. (You know this because you know
what Categories the Product is in.)
It seems like this would work, however things get more complicated
when you factor in view/template changes. Lets say someone updates a
partial. How can we keep track of all pages that reference that
partial? Or say someone updates a template or reorganizes them, how
can you know if they changed?
It seems like you would need to keep track of every point of ERB
substitution. So for instance you would have a Page (identified by
the url http://site/products/sku123) you would need to have a list of
all substitutions in the template and then all the substitutions in
those templates and so on.
My coworkers rightly say that mathematically this gets way too
complex and is actually inefficient. Also this Graph of Pages and
points of substitution would have to get recreated every time a
template was changed (Say, on a SVN commit). They say that what we
should do is just spider the entire site and keep MD5 sums (or
whatever) of the Pages and if they have changed then we publish it.
They think that this brute force is will actually be faster then
keeping track of the massive change Graph.
I still haven’t made up my mind. I really like the theoretical
efficiency of only generating the pages with information that has
changed. But I see what they are saying about the mathematical
complexity. My only issue with the brute-force approach is that we
have thousands of pages and it will take a while to spider the whole
So I have a few questions for you ruby gurus:
- Is there a way to actually create this graph of substitutions?
e.g. given the following template is there a way to come up with the
list of the 2 substitutions
<%= @foo.title %> <%= render(:partial => "related_items", :collection => @foo.related_items) %>
- Any ideas on a better way to approach this?