Hi,
I'm using Ferret and Rdig, and I'm trying to index HTML between tags
without success :
I just want to index data like this :
<! -- startToIndex -->
Here's my HTML code which I want to index
<!-- endToIndex -->
My code is the following :
cfg.content_extraction = OpenStruct.new(
# HPRICOT configuration
# hpricot is the html parsing lib used by RDig. See
# http://code.whytheluckystiff.net/hpricot for usage information.
# Any code blocks given for content selection will receive an
Hpricot instance
# containing the full page content when called.
:hpricot => OpenStruct.new(
# css selector for the element containing the page title
:title_tag_selector => 'title',
# might also be a proc returning either an element or a string:
# :title_tag_selector => lambda { |hpricot_doc| ... }
:content_tag_selector => 'body'
# might also be a proc returning either an element or a string:
# :content_tag_selector => lambda { |hpricot_doc| ... }
)
)
Any help would be helpful :)
Best regards,
on 2008-08-01 09:38
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.