Hey all, I have a complicated problem on my hands and I was wondering if you could help. Let me explain my situation: At work we are developing a rails system where the site is dynamic for internal use but the pages on the live site are static. For example, a user of our system edits Product data and they can see their changes immediately in development. When the user publishes their changes a static page html is generated and uploaded. What I want to do is somehow keep a Graph of all the things that have changed since the last publish. This way we only have to generate the pages that have changed. A little more background: We have Products and Categories. There is a many-to-many relationship between them. One thought I had was to keep track of when the User changed Products and Categories and then flag all Pages that reference them. This doesn't seem like it would be too hard. Note that we only need to keep track of changes at the Page level. If only one thing has changed for a given URL then we know that page is 'dirty' and needs to be republished. This approach seems pretty straight-forward. Every time someone edits a Product you know to update 1) the Product's page and 2) any Category pages that reference it. (You know this because you know what Categories the Product is in.) It seems like this would work, however things get more complicated when you factor in view/template changes. Lets say someone updates a partial. How can we keep track of all pages that reference that partial? Or say someone updates a template or reorganizes them, how can you know if they changed? It seems like you would need to keep track of every point of ERB substitution. So for instance you would have a Page (identified by the url http://site/products/sku123) you would need to have a list of all substitutions in the template and then all the substitutions in those templates and so on. My coworkers rightly say that mathematically this gets way too complex and is actually inefficient. Also this Graph of Pages and points of substitution would have to get recreated every time a template was changed (Say, on a SVN commit). They say that what we should do is just spider the entire site and keep MD5 sums (or whatever) of the Pages and if they have changed then we publish it. They think that this brute force is will actually be faster then keeping track of the massive change Graph. I still haven't made up my mind. I really like the theoretical efficiency of only generating the pages with information that has changed. But I see what they are saying about the mathematical complexity. My only issue with the brute-force approach is that we have thousands of pages and it will take a while to spider the whole site. So I have a few questions for you ruby gurus: 1) Is there a way to actually create this graph of substitutions? e.g. given the following template is there a way to come up with the list of the 2 substitutions <p> <%= @foo.title %> <%= render(:partial => "related_items", :collection => @foo.related_items) %> </p> 2) Any ideas on a better way to approach this?
on 2007-01-01 19:25
on 2007-01-01 20:25
On 01.01.2007 18:20, Nate M. wrote: > At work we are developing a rails system where the site is dynamic > for internal use but the pages on the live site are static. For > example, a user of our system edits Product data and they can see > their changes immediately in development. When the user publishes > their changes a static page html is generated and uploaded. Why are live pages static? Is this necessary? > It seems like this would work, however things get more complicated > when you factor in view/template changes. Lets say someone updates a > partial. How can we keep track of all pages that reference that > partial? Or say someone updates a template or reorganizes them, how > can you know if they changed? Do you actually *have* to update related pages? I mean, the id probably does not change and the title also - which would be the key bits you have on that page. Or do you actually include more information from the partial in other pages? In that case using IFRAMES might be a solution; then one page corresponds to exactly one item and you can easily detect whether you have to generate it anew or not. But that might make larger changes to your application necessary. > 2) Any ideas on a better way to approach this? If you follow the staged approach only to control when changes get visible you could do the export on DB level: you set up a second Rails installation with another DB and synchronize those databases once per day / week / whatever interval you need. This won't work of course if you do the staged approach with static pages for performance reasons. In that case (and yet it's a completely different approach) you could as well work with a reverse proxy. Yet another idea: you could probably collect change dates of all items presented in a view and set the corresponding HTTP header accordingly. That way clients should do a refresh only if the content had changed since last access time. I am not sure though how good this works in a Rails environment. So, this is a rather unstructured list. I guess my brain is not yet working properly this year. :-) Kind regards robert
on 2007-01-01 20:51
> 2) Any ideas on a better way to approach this? Why not just turn on page caching in production and have the publish process clear out the cache, then pages will be regenerated on demand? Unless I missed something in your requirements this should work fine and you don't need to do anything other than manage the cache cleaning process.