Forum: Ruby on Rails rails2.3/ruby1.9: invalid byte sequence in utf-8 with blank?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
buddycat (Guest)
on 2009-04-11 01:01
(Received via mailing list)
hi all,

anyone seen this controller argument error:

invalid byte sequence in utf-8

ror/vendor/rails/activesupport/lib/active_support/core_ext/blank.rb:
50:in `=~'
ror/vendor/rails/activesupport/lib/active_support/core_ext/blank.rb:
50:in `!~'
ror/vendor/rails/activesupport/lib/active_support/core_ext/blank.rb:
50:in `blank?'
ror/vendor/rails/actionpack/lib/action_controller/response.rb:119:in
`etag='
ror/vendor/rails/actionpack/lib/action_controller/response.rb:185:in
`handle_conditional_get!'
ror/vendor/rails/actionpack/lib/action_controller/response.rb:143:in
`prepare!'
ror/vendor/rails/actionpack/lib/action_controller/base.rb:531:in
`send_response'
ror/vendor/rails/actionpack/lib/action_controller/base.rb:525:in
`process'
ror/vendor/rails/actionpack/lib/action_controller/filters.rb:606:in
`process_with_filters'
ror/vendor/rails/actionpack/lib/action_controller/base.rb:391:in
`process'
ror/vendor/rails/actionpack/lib/action_controller/base.rb:386:in
`call'
ror/vendor/rails/actionpack/lib/action_controller/routing/route_set.rb:
433:in `call'

can't seem to figure out what exact string it is choking on, but it is
occurring when i call an index page without and id to get the total
list.

doc/list

as opposed to:

doc/list/23

where 23 is a category of documents.
buddycat (Guest)
on 2009-04-11 01:36
(Received via mailing list)
as an update, i added a rescue clause to blank.rb to find out what was
choking:

blank.rb:50
class String #:nodoc:
  def blank?
    self !~ /\S/

  rescue
raise "#{self.class} #{self.encoding.name} #{self.valid_encoding?} #
{self}"
  end
end

i reran my page and checked the log. what i get is that it is the html
of my entire page that is choking. it says that the encoding is utf-8
but self.valid_encoding is false. the text of the page follows.

RuntimeError (String UTF-8 false
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/
TR/html4/strict.dtd">
<html>
... the rest of my page

any ideas are appreciated...
Conrad T. (Guest)
on 2009-04-11 02:05
(Received via mailing list)
On Fri, Apr 10, 2009 at 2:35 PM, buddycat <removed_email_address@domain.invalid>
wrote:

> raise "#{self.class} #{self.encoding.name} #{self.valid_encoding?} #
> TR/html4/strict.dtd">
> >
> > ror/vendor/rails/actionpack/lib/action_controller/response.rb:119:in
> > `process_with_filters'
> >
> > doc/list
> >
> > as opposed to:
> >
> > doc/list/23
> >
> > where 23 is a category of documents.
>

Can you provide the Ruby version, Rails version, and platform?
Also, do you have a file(s) to recreate the test?  For example,
if the error is occurring in Ruby file(s) or ERB template(s), then
please copy-paste the relevant file(s) in your post.

-Conrad
buddycat (Guest)
on 2009-04-11 02:31
(Received via mailing list)
thanks conrad,

debian lenny
apache2-mpm-worker
dbd-odbc (0.2.4)
dbi (0.4.1)
deprecated (2.0.1)
fastthread (1.0.7)
json (1.1.4)
passenger (2.1.3)
pg (0.8.0)
rack (0.9.1)
rails-sqlserver-2000-2005-adapter (2.2.15)
rake (0.8.4)
ruby-1.9.1p0 compiled with pthreads, shared
rails2.3 branch in vendor/rails

vendor/rails/activesupport/lib/active_support/core_ext/blank.rb line
50
class Object
  # An object is blank if it's false, empty, or a whitespace string.
  # For example, "", "   ", +nil+, [], and {} are blank.
  #
  # This simplifies
  #
  #   if !address.nil? && !address.empty?
  #
  # to
  #
  #   if !address.blank?
  def blank?
    respond_to?(:empty?) ? empty? : !self
  end

  # An object is present if it's not blank.
  def present?
    !blank?
  end
end

class NilClass #:nodoc:
  def blank?
    true
  end
end

class FalseClass #:nodoc:
  def blank?
    true
  end
end

class TrueClass #:nodoc:
  def blank?
    false
  end
end

class Array #:nodoc:
  alias_method :blank?, :empty?
end

class Hash #:nodoc:
  alias_method :blank?, :empty?
end

class String #:nodoc:
  def blank?
    self !~ /\S/

# i added this to see what string had bad encoding
#  rescue
#raise "#{self.class} #{self.encoding.name} #{self.valid_encoding?} #
{self}"
#
  end
end

class Numeric #:nodoc:
  def blank?
    false
  end
end
buddycat (Guest)
on 2009-04-11 14:22
(Received via mailing list)
one last update. i discovered that the problem is that i have windows
asp content from a legacy server that i use Net::HTTP to import into
my rails site. windows allows high ascii value codes (in this case
ascii 150/unicode 8211 resembles as dash, but isn't. it's a fancy
longer windows dash). anyway, someone entered data presumably on a
windows box and pasted some stuff into a text field. that text field
got saved in sql server fine. but when rails tries to parse the high
ascii code which isn't utf-8, my page crashes with the above error.

so, am now investigating how to better sanitize by stuff, but i am
also thinking that it is a bit crazy that my entire rails stack can be
brought to it's knees by one errant character. i would suggest that
this is not a "feature", represents a significant problem, and should
be made more bullet proof.

i am needing, i guess, some way to just sanitize utf stuff as
html::fullsanitize strips all the html. my use case is that the
content is my own asp generated html, so, i really don't want to have
to whitelist all of the tags.

also, why is it that in rails2.3 a string class variable doesn't have
ruby1.9 encoding methods exposed as they do in irb?
This topic is locked and can not be replied to.