On Thu, Jan 31, 2008 at 1:20 AM, Sunil K. firstname.lastname@example.org
I am working on a RSS parser script. Here I have to parser thousands and
thousands of RSS feeds every hour.
I am looking for a optimized parser which can take parse all these
feeds. Please suggest the RSS parser you have come across.
Sounds like a case of premature optimization to me. If you intend to do
anything like stick the data parsed from the feeds into a database or
index, I think you’ll quickly find that will become the bottleneck,
than the feed processing itself.
My company went through something similar, with a performance obsessed
former C++ programmer looking for the fastest feed parsing solution
available. He settled on building his own, highly procedural feed
around libxml-ruby after benchmarking several of the solutions
However, soon after he discovered that updating the database and search
index was a far bigger bottleneck, one he spent the next several months
addressing. Feed parsing speed went completely by the wayside.
If you intend to do any sort of indexing of the feeds at all, you should
really focus on building a maintainable feed reader, as opposed to a
one. The database and/or search index are going to be your bottleneck
anyway, so don’t let the desire for speed trump things like correctness
code clarity. Feed processing is something that scales horizontally
queue and multiple feed reader processes, as opposed to databases and
indexes which generally don’t scale quite as well.
Given that, I would suggest looking at existing solutions like feedtools
feedzirra before trying to write your own, and if you do, go with
It has a nice, clear, easy-to-use API and is relatively fast.