Just don’t know what to do: any advice appreciated:
My DB is InnoDB utf-8 and I have no problems when I get information from
db, displaying it in the broswer or saving it.
Except one case:
I’m saving the text with umlaut. This works fine if my browser’s
character encoding is set to utf-8.
But: if I switch encoding to western ( ISO-8859-1 ), then copy the
string with umlaut to the field, and click on save I get the following
error:
ActiveRecord::StatementInvalid
Mysql::Error: #22001Data too long for column ‘_text’ at row 1: UPDATE
company_description SET creation_time = ‘2006-10-04 21:17:23’, _text
= ‘Stefan Ha�’, <…>
The POST request is sent with Content-Type:
application/x-www-form-urlencoded,
So it looks like it sents request in western encoding.
POST Content differes for the case when I have browser encoding set to
utf8 and when it is set to western.
It could be quite OK if I hadn’t that error. BTW, I don’t have this
error under linux, only under windows (probably because under linux ruby
version is 1.8.4 and under windows 1.8.5?)
I tried everything I found about unicode, I tried to find the way to
escape string before saving it, but no success((
I am scratching my head here, but isn’t there a way to have the form
force a character encoding in the POST request?
i.e. you can explicitly state that the POST character data is utf-8?
THat way the user can mess around with their character encodings
all they want, but whatever characters submitted will be valid utf-8
characters (but possibly/probably not the characters they expected).
I am scratching my head here, but isn’t there a way to have the form
force a character encoding in the POST request?
i.e. you can explicitly state that the POST character data is utf-8?
THat way the user can mess around with their character encodings
all they want, but whatever characters submitted will be valid utf-8
characters (but possibly/probably not the characters they expected).
but how can I do that? i.e. have the form force a character encoding in
the POST request???
Dmitry, my gut feeling is that you have to enforce POST encoding in the
form at least, or otherwise detect when you have not received a utf-8
encoded POST data string.
I am at a loss as to how a latin-1 string ended up bigger than a UTF-8
one, but its possible that you might have encountered some cut&paste
artifacts. Try entering an umlaut using the character map (i.e. more
naturally).
ok, ill try this; anyway i think its bad that it`s possible to pass
such data to application…
Well yes, but its not Rails fault. In fact anyone can pass any
kind of information to any kind of web system. Your system has
to be robust enough to handle it.
Even by your best efforts to ensure everything comes across as
utf-8, users can still force it to be something that won’t display
properly, like latin-1, Shift-JIS or whatever. In those cases you
have to detect that you have received an invalid encoding and
either convert it to utf-8 or send back an error message.
I just thought that a particularly clever hacker might be able
to exploit encoding confusion with multi-byte encoding
systems to get around cross-site-scripting defences. Its just a thought,
and I am thinking in general, not in a Rails context (which has
some fairly serious XSS defences)
Dmitry, my gut feeling is that you have to enforce POST encoding in the
form at least, or otherwise detect when you have not received a utf-8
encoded POST data string.
I am at a loss as to how a latin-1 string ended up bigger than a UTF-8
one, but its possible that you might have encountered some cut&paste
artifacts. Try entering an umlaut using the character map (i.e. more
naturally).
ok, ill try this; anyway i think its bad that it`s possible to pass
such data to application…
but how can I do that? i.e. have the form force a character
encoding in
the POST request???
That’s what the accept-charset is for on the form element. However,
if your page is by itself explicitly UTF-8 (via output headers or he
element, or both) all forms that you postback or get should be
automagically in UTF-8 as well.
So if it’s possible to fake the encoding and headers, should I check
that all the contents of the params hash is in the utf-8; say by using
before_filter in application.rb? Wouldn’t it affect performance of the
whole application? Maybe there is a better way to avoid such errors?
class ApplicationController < ActionController::Base
before_filter :convert_request
def convert_request
convert_hash(params) #if request.post?
end
def convert_hash(hash)
for k, v in hash
case v when String: hash[k] = Kconv.toutf8(v).to_s when Array:
hash[k] = v.collect { |v| Kconv.toutf8(v).to_s } when Hash:
convert_hash(v) end
end
end
end
but how can I do that? i.e. have the form force a character
encoding in
the POST request???
That’s what the accept-charset is for on the form element. However,
if your page is by itself explicitly UTF-8 (via output headers or he
element, or both) all forms that you postback or get should be
automagically in UTF-8 as well.
So if it’s possible to fake the encoding and headers, should I check
that all the contents of the params hash is in the utf-8; say by using
before_filter in application.rb? Wouldn’t it affect performance of the
whole application? Maybe there is a better way to avoid such errors?
Thanks
I created the following filter to check incoming requests. Is there
better and faster way to do the same?
def convert_request
convert_hash(params) #if request.post?
end
def convert_hash(hash)
begin
for k, v in hash
case v when String: ICONV.iconv(v) when Array: v.collect {
|v| ICONV.iconv(v) } when Hash: convert_hash(v) end
end
rescue Iconv::Failure => iconv_exception
hash[k] = iconv_exception.success
flash[:error] = ‘Request was sent in invalid encoding (not
utf-8). Text was truncated.’
end
end
end
Hi, I’m having a related utf8-problem I would like to share in this
topic.
When I’m submitting swedish characters (such as åäö) in a ajaxcall
(:observe_field) the åäö-characters gets translated into weird
characters that causes the postgresql to display the error:
PGError: ERROR: invalid byte sequence for encoding “UTF8”: 0xf6f6f6f6
Hi, thanks for the reply. The answer is 2 : the user is entering the
text. I don’t understand why rails processes the characters correctly
through normal posts, but not when I do the ajax obversefield-call.
Anyhow, how do I specify to use UTF-8 in that ajax-call?
My database is UTF-8.
This is what I have right now:
In applicationcontroller:
before_filter :set_charset
Sets default character set to UTF-8
def set_charset
if request.xhr? @headers[“Content-Type”] = “text/javascript; charset=utf-8”
else @headers[“Content-Type”] = “text/html; charset=utf-8”
end
end
At the bottom of environment.rb I have:
$KCODE = ‘u’
require ‘jcode’
Hi, I’m having a related utf8-problem I would like to share in this
topic.
When I’m submitting swedish characters (such as åäö) in a ajaxcall
(:observe_field) the åäö-characters gets translated into weird
characters that causes the postgresql to display the error:
PGError: ERROR: invalid byte sequence for encoding “UTF8”: 0xf6f6f6f6
Its pretty clear that you are inserting high bit latin-1 (?) characters
into
your UTF-8 database. (?) I am assuming that scandinavian countries
are part of the latin-1 (ISO-8859-1) character set.
Anyone know a fix?
Are you (1) typing these characters into your source file
or are you (2) letting the user enter them directly from a form?
(1) I suspect you will need to escape the characters in your source
file. I don’t know enough about using non-ASCII characters in Ruby
source to help you further
(2) should work as long as you just let Rails pass them through,
and screen them for being valid utf-8. Make sure your browser
knows the page is UTF-8 (you will need to do something in your
Rails config to enforce this) and that any POSTs are encoding
the data as UTF-8.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.