Yet another private method `gsub' called for nil:NilClass error

I’ve got a set of scripts that collect URLs from certain web pages and
I’m trying to extract some content from each of those pages (translation
stage). I keep seeing the error below.
Can someone help me understand what’s happening here? I’m certainly not
expecting a fix, I just want to get some insights into the nature of
this issue. We’ve been seeing several similar issues on this project
we’re working on. Thanks in advance.


(projectx) Running translation stage
DEBUG [2010-12-28 12:57:01 EST] (PageContentExtractor#559021) Executing
plugin input_files=48 (0477475be2aa9f8b79013eaf8e410f8d, etc)
ERROR [2010-12-28 12:57:01 EST] (projectx) Unexepected fatal error
while processing page_1: private method gsub' called for nil:NilClass /usr/lib/ruby/1.8/uri/common.rb:289:inescape’
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:37:in
execute' ~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:36:ineach’
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:36:in
execute' ~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:191:incall’
~/sandbox/projectx/lib/filesystem_lock_provider.rb:66:in lock' ~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:190:incall’
~/sandbox/projectx/lib/feed.rb:216:in run' ~/sandbox/projectx/lib/feed.rb:212:ineach’
~/sandbox/projectx/lib/feed.rb:212:in run' ~/sandbox/projectx/lib/feed.rb:207:ineach’
~/sandbox/projectx/lib/feed.rb:207:in run' bin/_run_feeds:77 bin/_run_feeds:74:ineach’
bin/_run_feeds:74

Going through each step with rdebug, we can get a view of what is
happening when it trips up:

(rdb:1) step
projectx/core_plugins/plugins/PageContentExtractor.rb:37
host = @host_cache[filename] =
URI(URI.escape(@state[filename[‘link’])).host.downcase
(rdb:1) step
/usr/lib/ruby/1.8/uri/common.rb:285 unless unsafe.kind_of?(Regexp)
(rdb:1) step
/usr/lib/ruby/1.8/uri/common.rb:289 str.gsub(unsafe) do |us|
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:197
@logger.context=prev_context
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:198 @basedir = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:199 @lockdir = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:200 @state = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:201 @input_files = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:202 @permstate = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:203 @context_counters = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:204 @on_error = nil
(rdb:1) step
projectx/lib/feed.rb:221
msg = “Unexepected fatal error while running translation:
#{@name}: #{e.message}”

…code snippet from PageContentExtractor.rb:

Load all files and caches them for further processing.

input_filenames.each do |filename|
  host = @host_cache[filename] =

URI(URI.escape(@state[filename][‘link’])).host.downcase # line 37
(@document_cache[host] ||= {})[filename] =
Nokogiri::HTML(file_contents(filename))
end

On Wed, Dec 29, 2010 at 8:21 PM, Mr. Bill [email protected] wrote:

DEBUG [2010-12-28 12:57:01 EST] (PageContentExtractor#559021) Executing
~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:191:in `call’

(rdb:1) step
(rdb:1) step
#{@name}: #{e.message}"
end
Without looking too much into it, I would say that
@state[filename][‘link’] is nil. You are passing that nil to
URI.escape, which raises an error. Can you print
@state[filename][‘link’] before calling URI.escape?

Jesus.

Update: we found a solution that involves simply not using the
PageContentExtractor but another ruby plugin.
Thank you for your time and attention to this.