Ray C. wrote:
I am also working on a performance app that requires feed parsing.
As previously mentioned, feed-normalizer aims to produce a ‘Feed’ object
that is independent of the underlying format. This means it will use
each parser (in a user-defined order) until it gets back a successful
parse and usable a object which to interface.
What this also means is that the primary goal of feed-normalizer is to
produce the aforementioned Feed object graph. This might mean it hitting
3 parsers before it gets that result. So performance isn’t really a
Of course, you could change the order of parsing so that feed-normalizer
uses the fastest parser first, and so on. feed-normalizer currently uses
most strict to most liberal as its default order. Right now, this just
happens to be fastest parser first, too
The two that I have tried are feedtools and syndication. First I tried
feedtools for RSS and Atom, but that was too slow, so I switched to
syndication for both RSS and Atom. I found syndication to break on a
high percentage of Atom sites, so in the end, I sent RSS to syndication
and Atom to feedtools and took the corresponding perf hit for Atom
In this case you could create a wrapper for feed-normalizer that
interfaces both syndication and feedtools, and tell feed-normalizer
which one to use first. I assume you’ll probably encounter more RSS than
I find this approach to be decently robust, but not very elegant. I am
going through > 10k feeds a day of all varieties.
Can someone comment on the robustness of Ruby RSS Parser and Lucas
Carlson’s SimpleRSS? I am curious about Andy’s feed normalizer.
I personally have found Ruby’s RSS library to be very good at handling
RSS feeds that aren’t broken What that means is the results should be
predictable, but the chance of a good parse may be lower.
SimpleRSS on the other hand is uber-liberal, and if the feed resembles
anywhere near an RSS or Atom document, you’ll probably get a pretty good
result back, but there are small errors sometimes.
Bob A. did an overview of both parsers, somewhere on sporkmonger.com.
Back to performance again; I did some rudimentary benchmarks of both
Ruby’s RSS as well as SimpleRSS. I think the results of this benchmark
really make the point for SimpleRSS being a great ‘backup’ parser to
have when nothing else will parse an ill-formed feed.
And of course, I’m always looking for patches and new parser wrappers
Hope that helps.