HTML snapshots for crawlable ajax

dubstep · May 17, 2011, 3:08am

Hi,
There doesn’t seem to be any reference for taking HTML snapshots from
within a Rails server. I wonder how one could implement Google’s
crawlable AJAX spec
(Understand JavaScript SEO Basics | Google Search Central | Documentation | Google for Developers)on a Rails
application?

To summarize: I have a Rails application with a Javascript front-end
with lots of AJAX. I need Google to index the AJAX content, hence need
to implement the above spec. Now, I can send an AJAX request to Rails
for a link that the crawler asks; I need Rails server to respond with
HTML snapshot. Can this be handled on a single Rails running on nginx?
Or do we need to send the link to a HTMLUnit headless browser to take a
snapshot?

Has anyone done this for a Rails app?

mustafa_c · May 17, 2011, 7:50pm

On Monday, May 16, 2011 7:08:11 PM UTC-6, Ruby-Forum.com User wrote:

Hi,
There doesn’t seem to be any reference for taking HTML snapshots from
within a Rails server. I wonder how one could implement Google’s
crawlable AJAX spec
(Understand JavaScript SEO Basics | Google Search Central | Documentation | Google for Developers)on a Rails
application?

As always, there are several ways…

To summarize: I have a Rails application with a Javascript front-end
with lots of AJAX. I need Google to index the AJAX content, hence need
to implement the above spec. Now, I can send an AJAX request to Rails
for a link that the crawler asks; I need Rails server to respond with
HTML snapshot. Can this be handled on a single Rails running on nginx?
Or do we need to send the link to a HTMLUnit headless browser to take a
snapshot?

Does your “Javascript front-end with lots of AJAX” create or render
lots
of new HTML content? Or, is your AJAX the kind mostly manipulates the
DOM by
getting new HTML document fragments via XHR requests?

As the Google docs on the subject itself mentions, if it is the former
case
then you may want to consider a server-side “browser” like HTMLUnit.
Otherwise, you might want to focus more on your actual rails code. Even
within the framework of rails conventions, there is so much latitude in
how
sites implement AJAX applications that there are lots of possible
answers.

For example, I’ve got a rails app (its an older rails 2 app) that has
fair
amount of AJAX. I first developed it statically and used “progressive
enhancement” techniques to add AJAX functionality. The result is that in
many cases I have controller actions that when executed may “return”
(render) either a full HTML document or a document fragment, depending
on
whether the request is an XHR. If I were updating this site (quick and
dirty) to support this Google spec., I’d simply make it so that said
actions
return a full HTML document when an AJAX request has the special
escaped_fragment parameter.

However, I can conceive of several different techniques (and have used
different ones to various degrees) that would require a different
approach.

This might be an area that would be good for some kind of rails (and/or
rack) gem built around a specific set of AJAX conventions and design
patterns, that integrates or is solely written to implement this Google
spec. If, indeed, such a beast doesn’t already exist. Such a solution
would
still only work for those who want to, are willing to, or already do
adhere
to the chosen conventions. But, then, rails users do the same for web
app.
dev. in general.

Anyone else care to let their mind wander too?

mustafa_c · May 18, 2011, 2:13pm

Kendall G. wrote in post #999301:

On Monday, May 16, 2011 7:08:11 PM UTC-6, Ruby-Forum.com User wrote:

Does your “Javascript front-end with lots of AJAX” create or render
lots
of new HTML content? Or, is your AJAX the kind mostly manipulates the
DOM by
getting new HTML document fragments via XHR requests?

It’s the former, a Javascript-minified web application, it manages the
entire front-end.

As the Google docs on the subject itself mentions, if it is the former
case
then you may want to consider a server-side “browser” like HTMLUnit.
Otherwise, you might want to focus more on your actual rails code. Even
within the framework of rails conventions, there is so much latitude in
how
sites implement AJAX applications that there are lots of possible
answers.

It’s the former one, hence the need for HTMLunit. I came across this
(http://tinyurl.com/6yxrch7) implementing HTMLUnit on GWT. I’m not an
expert in GWT, hence deffering it for now until I can find a better
solution.

For example, I’ve got a rails app (its an older rails 2 app) that has
fair
amount of AJAX. I first developed it statically and used “progressive
enhancement” techniques to add AJAX functionality. The result is that in
many cases I have controller actions that when executed may “return”
(render) either a full HTML document or a document fragment, depending
on
whether the request is an XHR. If I were updating this site (quick and
dirty) to support this Google spec., I’d simply make it so that said
actions
return a full HTML document when an AJAX request has the special
escaped_fragment parameter.

Mine is a Rails2 app too, I return raw data to the client where it gets
put in my custom templates. I reckon you mean RJS by “return a full HTML
document”, though it can get very complicated if I were to build the
styles in a few Rails views.

However, I can conceive of several different techniques (and have used
different ones to various degrees) that would require a different
approach.

This might be an area that would be good for some kind of rails (and/or
rack) gem built around a specific set of AJAX conventions and design
patterns, that integrates or is solely written to implement this Google
spec. If, indeed, such a beast doesn’t already exist. Such a solution
would
still only work for those who want to, are willing to, or already do
adhere
to the chosen conventions. But, then, rails users do the same for web
app.
dev. in general.

Anyone else care to let their mind wander too?

I’ve come across Crowljax, which seems to be mostly for testing. I
agree, it’d be great if there was a gem that ushered _escaped_fragment
requests to a GAE app and returned back the HTML snapshot. It wouldn’t
require high traffic volume, running only for crawling requests. Without
such a solution, folks like me will go ahead and build their own GAE app
I imagine… unless there is another way!