Basic rdig setup

I’m developing locally on Windows and I have a remote dev box that runs
Linux. I’m trying to use RDig just to index using urls, no files.

Both use acts_as_ferret for an administrative search that works fine.

On the Windows machine, I get no errors, but get no results.

On the Linux machine, I get:

File Not Found Error occured at <except.c>:93 in xraise
Error occured in index.c:840 - sis_find_segments_file
couldn’t find segments file

On both machines I have run the indexer with no errors using: rdig -c
config/rdig_config.rb

Both machines have an index dir at the rails root that has two files,
segments and segments_0. Both files look like they have next to nothing
in them.

Both rdig_config.rb files look like:
cfg.crawler.start_urls = [ ‘http://domain.tpl/’ ]
cfg.crawler.include_hosts = [ ‘domain.tpl/’ ]
cfg.index.path = ‘./rdig_index’
cfg.verbose = true
cfg.content_extraction = OpenStruct.new(

:hpricot      => OpenStruct.new(
  :title_tag_selector => 'title',
  :content_tag_selector => 'body'
)

Both enviroment.rb files have:
require ‘acts_as_ferret’
require ‘rdig’
require ‘rdig_config’

Finally, both have rdig and hpricot gems installed.

Any help would be great.

I’m just getting started with rdig and rails as well. I’m working off
Windows and when I do a crawl, I get 3 files - segments, segments_1 and
_0.cfs. The cfs file appears to contain the actual crawl.

After the crawl, you can do a query with rdig -c CONFIGFILE -q ‘your
query’ if you haven’t tried that already.

I’m having trouble getting the crawl into a variable. There seems to be
a problem with this statement:
search_results = RDig.searcher.search(params[:query])

Perhaps you can PM me and we can work together.

Actually there’s no problem with that statement. I just overlooked the
fact that a parameter needs to get passed to it.

Eggman Eggman wrote:

I’m just getting started with rdig and rails as well. I’m working off
Windows and when I do a crawl, I get 3 files - segments, segments_1 and
_0.cfs. The cfs file appears to contain the actual crawl.

After the crawl, you can do a query with rdig -c CONFIGFILE -q ‘your
query’ if you haven’t tried that already.

I’m having trouble getting the crawl into a variable. There seems to be
a problem with this statement:
search_results = RDig.searcher.search(params[:query])

Perhaps you can PM me and we can work together.

After the crawl, you can do a query with rdig -c CONFIGFILE -q ‘your
query’ if you haven’t tried that already.

Yeah, I’ve tried that and get no errors, but no results either. It
appears that the crawl didn’t actually index anything.

Is “cfg.verbose = true” supposed to show me errors during the crawl?

On Tue, Sep 18, 2007 at 05:27:28AM +0200, Jonathan Towell wrote:

After the crawl, you can do a query with rdig -c CONFIGFILE -q ‘your
query’ if you haven’t tried that already.

Yeah, I’ve tried that and get no errors, but no results either. It
appears that the crawl didn’t actually index anything.

Is “cfg.verbose = true” supposed to show me errors during the crawl?

errors should be shown in any case, verbose adds stack traces to them,
and will cause rdig to print out the uri of each document it indexed. If
your index ends up with just the segment* files in it, RDig hasn’t
indexed anything.
I’d suspect the cause to be your include hosts config entry, don’t use
‘/’ at the end of your hostname there.

cheers,
Jens


Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/ - The new free film database