I’ve got a set of scripts that collect URLs from certain web pages and
I’m trying to extract some content from each of those pages (translation
stage). I keep seeing the error below.
Can someone help me understand what’s happening here? I’m certainly not
expecting a fix, I just want to get some insights into the nature of
this issue. We’ve been seeing several similar issues on this project
we’re working on. Thanks in advance.
(projectx) Running translation stage
DEBUG [2010-12-28 12:57:01 EST] (PageContentExtractor#559021) Executing
plugin input_files=48 (0477475be2aa9f8b79013eaf8e410f8d, etc)
ERROR [2010-12-28 12:57:01 EST] (projectx) Unexepected fatal error
while processing page_1: private method gsub' called for nil:NilClass /usr/lib/ruby/1.8/uri/common.rb:289:in
escape’
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:37:in
execute' ~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:36:in
each’
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:36:in
execute' ~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:191:in
call’
~/sandbox/projectx/lib/filesystem_lock_provider.rb:66:in lock' ~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:190:in
call’
~/sandbox/projectx/lib/feed.rb:216:in run' ~/sandbox/projectx/lib/feed.rb:212:in
each’
~/sandbox/projectx/lib/feed.rb:212:in run' ~/sandbox/projectx/lib/feed.rb:207:in
each’
~/sandbox/projectx/lib/feed.rb:207:in run' bin/_run_feeds:77 bin/_run_feeds:74:in
each’
bin/_run_feeds:74
Going through each step with rdebug, we can get a view of what is
happening when it trips up:
(rdb:1) step
projectx/core_plugins/plugins/PageContentExtractor.rb:37
host = @host_cache[filename] =
URI(URI.escape(@state[filename[‘link’])).host.downcase
(rdb:1) step
/usr/lib/ruby/1.8/uri/common.rb:285 unless unsafe.kind_of?(Regexp)
(rdb:1) step
/usr/lib/ruby/1.8/uri/common.rb:289 str.gsub(unsafe) do |us|
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:197
@logger.context=prev_context
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:198 @basedir = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:199 @lockdir = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:200 @state = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:201 @input_files = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:202 @permstate = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:203 @context_counters = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:204 @on_error = nil
(rdb:1) step
projectx/lib/feed.rb:221
msg = “Unexepected fatal error while running translation:
#{@name}: #{e.message}”
…code snippet from PageContentExtractor.rb:
Load all files and caches them for further processing.
input_filenames.each do |filename|
host = @host_cache[filename] =
URI(URI.escape(@state[filename][‘link’])).host.downcase # line 37
(@document_cache[host] ||= {})[filename] =
Nokogiri::HTML(file_contents(filename))
end