Forum: Ferret [ANN] RDig - ferret-based website crawler/indexer

Announcement (2017-05-07): is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see and for other Rails- und Ruby-related community platforms.
Jens K. (Guest)
on 2006-03-25 15:34
(Received via mailing list)

RDig is a small tool to build a Ferret index for the contents of a
website or intranet. It contains a simple HTTP crawler and some support
for extracting textual content from the fetched pages.

I built this to implement a site-wide search for a recent project
that combined a Rails application with lots of static html files
generated by a CMS.

Any feedback is very welcome!

Rubyforge project page:

`gem install rdig` should work once the gem has reached the rubyforge


webit! Gesellschaft für neue Medien mbH
Dipl.-Wirtschaftsingenieur Jens Krämer 
Schnorrstraße 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66
Jan P. (Guest)
on 2006-03-25 17:30
(Received via mailing list)
Hi, Jens,

great stuff. Just installed it and made a short test as described in the
readme. It works as announced. Thanks for sharing this! The crawler has
problems with frames but this is a quite common problem. I've had to
configure it to the main content frame.

You'll probably know nutch. But here is a pointer anyway: just if you're in search for some
inspiration. Nutch is a great tool for webcrawling. I've used it and it
worked great...

Best Regards
Jan P.
This topic is locked and can not be replied to.