Parsing multiple RSS & Atom feed formats

I’m working on a RSS aggregator, and I’ve based the parser on a script
from this post;

http://www.superwick.com/archives/2007/06/09/rss-feed-parsing-in-ruby-on-rails/

But, being the complete newbie, I’ve found that this parser only works
for specifically formatted feeds. For example, some feeds will throw a
‘nil text’ error. I know that I could make this script handle ‘nil’
attributes, but I’m betting that someone out there has already found a
good solution for handling all Atom/RSS1.0/RSS2.0 formats.

I’ve scoured Google, but there only seems to be snippets or short posts
on how to handle individual feeds or one type of format. Would anyone be
able to enlighten me on this one? Are there any glaringly
well-documented snippets, gems, plugins, tutorials, or books which I’m
completely missing?

I’ve scoured Google, but there only seems to be snippets or short posts
on how to handle individual feeds or one type of format. Would anyone be
able to enlighten me on this one? Are there any glaringly
well-documented snippets, gems, plugins, tutorials, or books which I’m
completely missing?

http://simple-rss.rubyforge.org/

I’ve tried usual fixes of captial letters etc, but no luck…

That’s because open-uri is part of the ruby standard library, rather
than a gem. Just “require ‘open-uri’” in your script/application and
you’re good to go.

SimpleRSS does take care of some of the difficulties of parsing multiple
feeds, and does take care of the atom/RSS differences. You still might
need some conditional or chained assignment to deal with things like the
publishing date. (Is it ? ? dc:date? or whatever
else …?)

hth
Jon

Philip H. wrote:

I’ve scoured Google, but there only seems to be snippets or short posts
on how to handle individual feeds or one type of format. Would anyone be
able to enlighten me on this one? Are there any glaringly
well-documented snippets, gems, plugins, tutorials, or books which I’m
completely missing?

http://simple-rss.rubyforge.org/

Thanks Philip.

I’ve seen simple-rss, I just haven’t come across a definitive example of
how to pull down all elements of a feed, like in the tutorial link in
the original post. Am I right in presuming I can use simple-rss to
handle all feeds in place of the RSSParser defined in the aforementioned
link, so I just need to write one parser, rather than conditionals for
Atom/RSS?

I’m actually having problems installing the open-uri gem;

“could not find open-uri locally or in a repository”

I’ve tried usual fixes of captial letters etc, but no luck…

Jonathan Stott wrote:

I’ve tried usual fixes of captial letters etc, but no luck…

That’s because open-uri is part of the ruby standard library, rather
than a gem. Just “require ‘open-uri’” in your script/application and
you’re good to go.

SimpleRSS does take care of some of the difficulties of parsing multiple
feeds, and does take care of the atom/RSS differences. You still might
need some conditional or chained assignment to deal with things like the
publishing date. (Is it ? ? dc:date? or whatever
else …?)

hth
Jon

That helps, thanks Jon. I definitely needed some clarification on how
streamlined I could expect the parser to be, and I’ve only seen pubDate
so far, so that example was useful.

At the risk of sounding like a lazy fool, I’m quite suprised there isn’t
a one-stop chunk of Ruby code for parsing generic blog post feeds and
handling the differences in Atom/RSS (like publishing date) out there.
If I work it out I’ll be sure to share it.

I’ve also had really good luck with FeedTools.

Jamey

Feed normalizer might work for you.
http://code.google.com/p/feed-normalizer/

On Mar 23, 8:37 am, Neil C. [email protected]