Forum: Ferret Ferret with Rdig - Indexing between tags

Posted by Sébastien Mizrahi (slum)
on 2008-08-01 09:38
Hi,

I'm using Ferret and Rdig, and I'm trying to index HTML between tags
without success :
I just want to index data like this :
<! -- startToIndex -->
Here's my HTML code which I want to index
<!-- endToIndex -->

My code is the following :

 cfg.content_extraction = OpenStruct.new(

    # HPRICOT configuration
    # hpricot is the html parsing lib used by RDig. See
    # http://code.whytheluckystiff.net/hpricot for usage information.
    # Any code blocks given for content selection will receive an
Hpricot instance
    # containing the full page content when called.
    :hpricot      => OpenStruct.new(
      # css selector for the element containing the page title
      :title_tag_selector => 'title',
      # might also be a proc returning either an element or a string:
      # :title_tag_selector => lambda { |hpricot_doc| ... }
      :content_tag_selector => 'body'
      # might also be a proc returning either an element or a string:
      # :content_tag_selector => lambda { |hpricot_doc| ... }
    )
  )

Any help would be helpful :)

Best regards,
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.