Catching non-RSS/Atom feeds when using Feed-Normalizer

chunkyslink · November 24, 2008, 8:43am

A production app is using FeedNormalizer in an rss daemon. The worker
opens a ‘Source’ (the apps naming convention for an RSS feed), parses it
using FeedNormalizer, and does the business on the items in the feed;

feed = FeedNormalizer::FeedNormalizer.parse open(source.rss_url)

A beta tester added a URL, rather than an RSS/Atom URL, and the worker
choked (on screen prompts have since been cleared up, but that’s not
enough). I see two options, but I don’t know how to implement either;

Validate against anything that isn’t RSS/Atom, or one of the
recognised ‘feed’ formats which we can parse using feednormalizer

Or, the better option from a usability perspective,

The RSS worker catches any errors when FeedNormalizer tries to open
the source.rss_url, and, if it isn’t an RSS URL, tries to find any RSS
feed associated with the ‘source.rss_url’ the user provided.

I’m happy to go with the first option (as a stop-gap) if someone knows
of a regex which can be used to validate against a non-feed url (my
limited experience with rss formats suggests this is unlikely). If you
think I should go with the second option, I can probably work out how to
catch the errors from FN but it would be great to know if anyone has a
snippet for taking a URL and attempting to find a feed on it. Any
suggestions?

chunkyslink · November 27, 2008, 2:08pm

No thoughts on this one?