Is anybody out there working on a Ruby-based parser for the WARC (Web ARChive) file format that (relatively) recently replaced the ARC format? Or perhaps Ruby wrappers around WARC-parsing utilities?
on 2010-07-09 01:50
on 2013-10-24 18:28
Hello, I have written a ruby WARC parser a while ago and made it available on the intertubes today. There is no documentation right now, but it shouldn't be too hard to figure out how to use it by looking at the tests. I don't plan on working on it much but I would be glad if you could contribute to it. Source code : https://github.com/antoinerg/warc-ruby As a gem : gem install warc