RSS/Atom feed consuming lib?

I have a customer (we build their intranet with Rails) that subscribes
to a number of news feeds. They want to make these feeds available on
their intranet so we need to fetch, parse and publish these feeds
(like a interal web based feed reader).

Are there any good Ruby libs to this that preferably supports 0.92,
2.0 and Atom (Atom is more of a nice to have than need to have…) and
exposes it with a nice common (for all formats) API?

I looked att RAA and Rubyforge but didn’t find anything that really
peaked my interest (although I might have missed something)

/Marcus

On 10/18/06, Marcus B. [email protected] wrote:

I looked att RAA and Rubyforge but didn’t find anything that really
peaked my interest (although I might have missed something)

/Marcus

I was unable to find anything that really fit my needs either. I’m in
the
process of hacking one together, but it’s still a ways from being really
useful. You can check out FeedTools[1], it seems to have most of the
capabilities you’re looking for. I wasn’t able to use if for a few
reasons,
but maybe it’ll be helpful to you.

[1] http://sporkmonger.com/projects/feedtools/

Marcus B. [email protected] wrote:

Are there any good Ruby libs to this that preferably supports 0.92,
2.0 and Atom (Atom is more of a nice to have than need to have…) and
exposes it with a nice common (for all formats) API?

Yes, syndication1 and FeedTools2 should be two of the better
libraries.

HTH,
Jochen

Thanks for the tips! I’ve tried feedtools and it seems to work nicely :slight_smile:

Out of curiosity: Why couldn’t you use feedtools?

/Marcus

Marcus B. [email protected] wrote:

Are there any good Ruby libs to this that preferably supports 0.92,
2.0 and Atom (Atom is more of a nice to have than need to have…) and
exposes it with a nice common (for all formats) API?

If you’re planning to go through FeedBurner, you can checkout the plugin
at
http://combustible.rubyforge.org/docs

Still very young, but perhaps you’ll find it usefull

Gustav


about me:
My greatest achievement was when all the other
kids just learnt to count from 1 to 10,
i was counting (0…9)

  • gustav.paul

Marcus B. wrote:

peaked my interest (although I might have missed something)

/Marcus

You may be interested in feed-normalizer; something I pieced together to
wrap a few different Atom/RSS parsers. It outputs a normalized
object graph to represent a feed, regardless of the underlying feed
format.

It currently wraps the Ruby RSS parser and Lucas Carlson’s SimpleRSS,
but it can be easily extended to support more parsers. Patches welcome.

http://feed-normalizer.rubyforge.org/

Hope that helps.

Andy

On 10/18/06, Marcus B. [email protected] wrote:

Thanks for the tips! I’ve tried feedtools and it seems to work nicely :slight_smile:

Out of curiosity: Why couldn’t you use feedtools?

/Marcus

I’m going to be parsing a LOT of feeds, but only for a few specific
elements. A few quick tests showed it would probably be too slow for
what I
need it for. I hadn’t seen syndicate, so I will definitely be checking
it,
as it seems close to what I need. I could end up using feedtools for
generating feeds, but when it comes to consuming it’s just got a bit too
much overhead for me.

I am also working on a performance app that requires feed parsing. The
two that I have tried are feedtools and syndication. First I tried
feedtools for RSS and Atom, but that was too slow, so I switched to
syndication for both RSS and Atom. I found syndication to break on a
high percentage of Atom sites, so in the end, I sent RSS to syndication
and Atom to feedtools and took the corresponding perf hit for Atom
feeds.

I find this approach to be decently robust, but not very elegant. I am
going through > 10k feeds a day of all varieties.

Can someone comment on the robustness of Ruby RSS Parser and Lucas
Carlson’s SimpleRSS? I am curious about Andy’s feed normalizer.

HTH,
Ray

Ray C. wrote:

I am also working on a performance app that requires feed parsing.

As previously mentioned, feed-normalizer aims to produce a ‘Feed’ object
that is independent of the underlying format. This means it will use
each parser (in a user-defined order) until it gets back a successful
parse and usable a object which to interface.

What this also means is that the primary goal of feed-normalizer is to
produce the aforementioned Feed object graph. This might mean it hitting
3 parsers before it gets that result. So performance isn’t really a
consideration.

Of course, you could change the order of parsing so that feed-normalizer
uses the fastest parser first, and so on. feed-normalizer currently uses
most strict to most liberal as its default order. Right now, this just
happens to be fastest parser first, too :slight_smile:

The two that I have tried are feedtools and syndication. First I tried
feedtools for RSS and Atom, but that was too slow, so I switched to
syndication for both RSS and Atom. I found syndication to break on a
high percentage of Atom sites, so in the end, I sent RSS to syndication
and Atom to feedtools and took the corresponding perf hit for Atom
feeds.

In this case you could create a wrapper for feed-normalizer that
interfaces both syndication and feedtools, and tell feed-normalizer
which one to use first. I assume you’ll probably encounter more RSS than
Atom.

I find this approach to be decently robust, but not very elegant. I am
going through > 10k feeds a day of all varieties.

Can someone comment on the robustness of Ruby RSS Parser and Lucas
Carlson’s SimpleRSS? I am curious about Andy’s feed normalizer.

I personally have found Ruby’s RSS library to be very good at handling
RSS feeds that aren’t broken :slight_smile: What that means is the results should be
predictable, but the chance of a good parse may be lower.

SimpleRSS on the other hand is uber-liberal, and if the feed resembles
anywhere near an RSS or Atom document, you’ll probably get a pretty good
result back, but there are small errors sometimes.

Bob A. did an overview of both parsers, somewhere on sporkmonger.com.

Back to performance again; I did some rudimentary benchmarks[1] of both
Ruby’s RSS as well as SimpleRSS. I think the results of this benchmark
really make the point for SimpleRSS being a great ‘backup’ parser to
have when nothing else will parse an ill-formed feed.

And of course, I’m always looking for patches and new parser wrappers
for feed-normalizer.

HTH,
Ray

Hope that helps.

Andy

[1]