I’m using nokogiri to extract content from web pages and every now and
then I see this error thrown for certain pages, along with other
versions of the NoMethodError. I’m trying to understand the nature of
the error so I can be in a better position to provide fixes. The
relevant code and error message are pasted below.
nokogiri_helper.rb:
require ‘nokogiri’
module NokogiriHelper
def content_elements_for(node)
nodes = []
# We add node.parent because node is an
# instance of Nokogiri::XML::Text
# but we want the tag that contains the text.
node.traverse{|element| nodes << element.parent if element.text? &&
element.parent.name != ‘a’ && has_content?(element.text)}
…
end
…
end
ERROR (ContentExtractor)
#<NoMethodError: undefined method traverse' for nil:NilClass> /opt/plugins/helpers/nokogiri_helper.rb:68:in
content_elements_for’
/opt/plugins/ContentExtractor.rb:66:in translate_one' /opt/plugins/Translator.rb:24:in
translate’
/opt/plugins/Translator.rb:21:in each' /opt/plugins/Translator.rb:21:in
translate’
/opt/plugins/Translator.rb:16:in execute' /opt/plugins/ContentExtractor.rb:53:in
execute’
/opt/plugins/BasePlugin.rb:201:in call' /opt/lib/filesystem_lock_provider.rb:83:in
lock’
/opt/plugins/BasePlugin.rb:200:in call' /opt/lib/feed.rb:251:in
run’
/opt/lib/feed.rb:248:in each' /opt/lib/feed.rb:248:in
run’
/opt/lib/feed.rb:243:in each' /opt/lib/feed.rb:243:in
run’
/opt/lib/schedule_runner.rb:159:in create_feed_process' /opt/lib/schedule_runner.rb:154:in
fork’
/opt/lib/schedule_runner.rb:154:in create_feed_process' /opt/lib/schedule_runner.rb:325:in
run_iteration’
/opt/lib/schedule_runner.rb:269:in run' /opt/lib/schedule_runner.rb:268:in
loop’
/opt/lib/schedule_runner.rb:268:in `run’
bin/run_feed:57