I swear I searched this forum and couldn't find the solution! Anyway, the problem: I have a db (PostgreSQL) with data encoded in utf-8. I want to check this data, so I hoped to create a script to assert the values found there. The problem is they do not match, since the strings obtained from db are not utf-8. I tried with iconv, but the resulted strings were even messier. I'm using 'postgres' gem to connect to the db. Any solutions?
on 2007-06-26 22:53
on 2007-06-27 14:04
Dear mare, does the information here: http://groups.google.ca/group/rubyonrails-core/bro... help ? Otherwise, I'm not sure whether for Postgres, something like Mysql.escape(string) could be of help. Maybe you post some example string of what you want to achieve, and what goes wrong... Best regards, Axel
on 2007-06-27 16:37
Axel Etzold wrote: > Dear mare, > > does the information here: > > http://groups.google.ca/group/rubyonrails-core/bro... > > help ? Unfortunately it doesn't, because rails has some kind of configuration (I was only playing with rails, never digging deeper into it, so I'm not sure what it looks like). As for my problem, I'm trying to do this: db = PGconn.connect('localhost', 5432, '', '', 'isys', 'postgres', 'postgres') res = db.exec('select * from "ADDRESS"') puts res == "SÃ¼dstrasse" and make the output "true". But i can't, because res (second column of the second row) is in some other encoding and instead of SÃ¼dstrasse ("Sudstrasse" where "u" has umlauts - the two dots above the letter) I get SÃƒÂ¼dstrasse ("u" with umlauts is presented as two letters). I've tried using: Iconv.new('UTF-8', 'ISO-8859-1').iconv(res) but now every byte that I have instead of umlauted u is transformed into another two characters - instead of u with umlauts, I now have 4 characters. I believe that the problem lies in 'postgres' gem and that it should be configured somehow, but I don't have the slightest idea how...
on 2007-06-27 17:10
Dear Marko, I am pretty sure that nothing is wrong here - it's merely a question of what encoding settings you give your editor. I've tried the following: Open an editor, write the word "Südstrasse" into it, and save it in UTF8 encoding as "a.txt" - so the "ü" gets displayed correctly when you have UTF8 encoding set. Then execute the following script, which is in ISO-8859-1 encoding - that matters for the "ü" in this "Südstrasse": require "iconv" text=IO.readlines("a.txt").to_s p text => "S\303\274dstrasse" result = Iconv.conv("ISO-8859-1","UTF-8", text) p result => "S\374dstrasse" compare_to="Südstrasse" p compare_to==result => true, because of the conversion made in Iconv p compare_to==text => false, because of the lack of conversion Best regards, Axel
on 2007-06-27 17:28
On Jun 27, 2007, at 9:37 , Marko Marjanovic wrote: > Unfortunately it doesn't, because rails has some kind of configuration > column of the second row) is in some other encoding and instead of > into > another two characters - instead of u with umlauts, I now have 4 > characters. > I believe that the problem lies in 'postgres' gem and that it > should be > configured somehow, but I don't have the slightest idea how... I think your issue is that your Ruby script doesn't know what encoding is coming back from the database. Try setting $KCODE = 'u' in your script. $ cat db_encoding_test.rb #!/usr/local/bin/ruby -w $KCODE = 'u' require 'rubygems' require 'postgres' require 'dbi' DBI.connect('dbi:Pg:test:localhost:54824', 'postgres') do |dbh| sth = dbh.prepare(<<-EOS) SELECT * FROM streets; EOS sth.execute sth.fetch do |row| p row p row.first == 'Südstrasse' end end $ ./db_encoding_test.rb ["Südstrasse"] true Michael Glaesemann grzm seespotcode net
on 2007-06-27 19:48
Thanks a lot, guys! It seems that SciTE editor that I used for creating the script somehow screwed up the encoding. When I checked, I found out that its UTF-8-Y or something like that, instead of UTF-8. I edited the script in JEdit, fixed its encoding, added $KCODE='u', just for extra safety :), and everything was fine. Thanks again! Cheers