Most of you already know this, but Google and Yahoo have been inviting
webmasters to register “site maps” for their sites for about a year now.
The idea is that it helps the search engine’s spiders to find all of
your site and update its knowledge of various pages at an appropriate
rate.
This came across my radar for the first time today…it seems the idea
is coming of age, and there is a move afoot to open-source the format
for these sitemaps. The major players are all on board now.
I’ve been looking for a good project that would be useful for the Rails
community. This might be it. Create a Gem that would make these site
maps automatically.
The basic idea would be that you install this gem and it would (by some
still TBD method) discover all of a site’s pages (including those that
are dynamically created based on database tables, like product listings)
and create the sitemap.
This posting is just to test the waters and see if there is sufficient
interest in this and also to invite you all to provide suggestions on
how it might be done and what sort of configuration would be needed etc.
thanks,
jp
This posting is just to test the waters and see if there is sufficient
interest in this and also to invite you all to provide suggestions on
how it might be done and what sort of configuration would be needed etc.
Jeff, this would be useful but I’m wondering if a gem is the right
distribution format. I think of gems as akin to libraries; methods I’ll
use again and again to do stuff. One needs to generate updated sitemaps
periodically, but I don’t think of a sitemap generator as a library I’d
call much. Could it be that a Rails-aware utility – rake task perhaps?
– might be the a more appropriate form?
Or are you thinking about folks who want to write Web apps that do
things with sitemaps, in which case I’ve completely missed the point?
/afb
Adam B. wrote:
This posting is just to test the waters and see if there is sufficient
interest in this and also to invite you all to provide suggestions on
how it might be done and what sort of configuration would be needed etc.
Jeff, this would be useful but I’m wondering if a gem is the right
distribution format. I think of gems as akin to libraries; methods I’ll
use again and again to do stuff. One needs to generate updated sitemaps
periodically, but I don’t think of a sitemap generator as a library I’d
call much. Could it be that a Rails-aware utility – rake task perhaps?
– might be the a more appropriate form?
Or are you thinking about folks who want to write Web apps that do
things with sitemaps, in which case I’ve completely missed the point?
/afb
Hi Adam,
I understand what you’re saying, but I’m thinking of a more automated
site map updater that would take into account things being added to the
database that cause new pages to be available. You’re thinking of a
more static site I think, where the developer defines a set of pages and
that’s that. I’m thinking of a site where new blog entries or new
products added to the store, etc. would create new “pages” that the site
owner would want to have added to their site map automatically.
thanks,
jp
Jeff P. wrote:
I’m thinking of a site where new blog entries or new
products added to the store, etc. would create new “pages” that the site
owner would want to have added to their site map automatically.
That makes sense, but is that a realistic use case? I don’t know much
about how the search engines deal with (or intend to deal with)
Sitemaps, but I can imagine they might not be excited about getting a
new one every time someone posts an entry on a busy site.
But regardless, if you write the library then it would be very easy to
build a rake task (for my use case) or an after_filter (for yours) that
constructed the Sitemap based on the app structure and the db contents.
So back to the distribution method: isn’t this pretty tightly coupled to
Rails, which would suggest a Rails extension (like a plugin), rather
than a gem?
/afb
Anyway, I would use this. Here are just some random things that occur to
me that you may want to think about:
-
Way of specifying which controllers and actions are included (or
excluded). Could use something like the controller filter
inclusion/exclusion attributes (:only, :except).
-
Whether or not to iterate over the database items for a given action.
What about a cacheing scheme so you don’t have to rewrite the entire
thing each time? Could you point to an existing XML sitemap (or Sitemap
object created from an existing XML file) when updating?
-
Along those lines, if you are going to have to run individual actions
for each db object (you would try not to do this, presumably, but see my
Description bullet below), you might want to put in a way of manually
throttling the number of calls this thing could execute. Incidentally,
you could write an actual spider (or modify an existing spider) and go
at things from the opposite direction. But since a spider would know
nothing about routes or the database, it seems like kind of the opposite
of the preferred architecture. Scratch that. 
-
How to recognize actions protected by ACL, for (possible) exclusion. I
guess this could be done manually using the controls described in my
first bullet point.
-
I notice there’s no “Description” tag in the Sitemaps schema. That
seems like something that could change, so you might want to have
methods for figuring out what the description would be if it were
there. Could be the page title generated by the view or something
generated programatically by passing a proc to one of your methods. (I’m
increasingly convinced that anything that touches the view is Bad, for
performance reasons.) Having a description field might make Sitemaps
suitable for uses beyond just URL submission to search engines.
-
I don’t know what the right memory storage model is here. You could
create a REXML object and populate that, or do a tree hash, or something
else. Anything where you don’t have to write your own to_yaml or to_xml
methods would be nice.
Good luck!
/afb
Adam B. wrote:
Jeff P. wrote:
So back to the distribution method: isn’t this pretty tightly coupled to
Rails, which would suggest a Rails extension (like a plugin), rather
than a gem?
/afb
Yes, absolutely. My bad. That makes more sense.
Wish I could edit the title. From this point forward, this thread is
about a sitemaps plugin, not a gem.
thanks,
jp