Forum: Ferret [ANN] RDig - ferret-based website crawler/indexer

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
C9dd93aa135988cabf9183d3210665ca?d=identicon&s=25 Jens Kraemer (Guest)
on 2006-03-25 14:34
(Received via mailing list)
Hi!

RDig is a small tool to build a Ferret index for the contents of a
website or intranet. It contains a simple HTTP crawler and some support
for extracting textual content from the fetched pages.

I built this to implement a site-wide search for a recent project
that combined a Rails application with lots of static html files
generated by a CMS.

Any feedback is very welcome!

Rubyforge project page: http://rubyforge.org/projects/rdig
RDocs: http://rdig.rubyforge.org/

`gem install rdig` should work once the gem has reached the rubyforge
mirrors.


Jens

--
webit! Gesellschaft für neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer       kraemer@webit.de
Schnorrstraße 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66
E48d29dc8fedb2878fa518d41cc63d88?d=identicon&s=25 Jan Prill (Guest)
on 2006-03-25 16:30
(Received via mailing list)
Hi, Jens,

great stuff. Just installed it and made a short test as described in the
readme. It works as announced. Thanks for sharing this! The crawler has
problems with frames but this is a quite common problem. I've had to
configure it to the main content frame.

You'll probably know nutch. But here is a pointer anyway:
http://lucene.apache.org/nutch/ just if you're in search for some
inspiration. Nutch is a great tool for webcrawling. I've used it and it
worked great...

Best Regards
Jan Prill
This topic is locked and can not be replied to.