Ruby Forum Rails I18n > Legacy ISO-8859-1 MSSQL db and UTF8 rails app

Posted by Tom Bak (linus)
on 28.08.2007 10:55
Hi!
I have problem with legacy database in MSSQL encoding.

The new app in Rails and most of the database is Utf8, while
tables from legacy MSSQL are encoded in ISO-8859-1.

I am not sure how to approach this problem. I would imagine
creation of some kind of filter with Iconv.conv("UTF-8", "ISO-8859-1",
input)
for reading sql statements and the reverse for write methods.

Any suggestions how this could be done?

Cheers,
Tomasz Bak
Posted by Tom Bak (linus)
on 28.08.2007 12:15
Hi,
So far I have come come with poor solution:

class String
  require 'iconv'

  def iso1_to_utf8
    Iconv.conv("UTF-8", "ISO-8859-1", self) rescue self
  end

  def utf8_to_iso1
    Iconv.conv("ISO-8859-1", "UTF-8", self) rescue self
  end
end

class NCustomer < ActiveRecord::Base
...
  def after_initialize
    self.attributes.each do |k,v|
      if v.class.to_s == "String" and !v.empty?
        self[k] = v.iso1_to_utf8
      end
    end
  end

  def before_save
    self.attributes.each do |k,v|
      if v.class.to_s == "String" and !v.empty?
        self[k] = v.utf8_to_iso1
      end
    end
  end
...
end

Cheers,
Tomasz Bak
Posted by Roderick van Domburg (roderickvd)
on 28.08.2007 21:31
Tom Bak wrote:
> I have problem with legacy database in MSSQL encoding.
> 
> The new app in Rails and most of the database is Utf8, while
> tables from legacy MSSQL are encoded in ISO-8859-1.
> 
> I am not sure how to approach this problem. I would imagine
> creation of some kind of filter with Iconv.conv("UTF-8", "ISO-8859-1",
> input)
> for reading sql statements and the reverse for write methods.
> 
> Any suggestions how this could be done?

I think that everyone who's ever been cornered by this can attest that 
it can be a royal pain and that the options are all choosing between 
lesser evils.

Last time I found myself there I exported the database to a plain text 
SQL file and then ran recode [1] on it. It worked well enough but I 
still found myself scanning over the file to correct tiny bits here and 
there.

I'll admit that your solution has a ring to it -- being able to do it on 
the fly improves the time to deploy. I like it. If however speed or 
strict migration paths are an issue for you then recode may be an option 
to consider.

[1] http://www.gnu.org/software/recode/

--
Roderick van Domburg
http://www.nedforce.nl
Posted by neongrau __ (neongrau)
on 03.09.2007 11:02
i'm having a quite similar issue atm and i'm unsure what to do.

same thing, Legacy MSSQL Database with windows-1252 charset (so iso 
compatible) on the legacy tables, and the Rails app purely using UTF-8.

i had no problems with encodings while using the sqlserver driver in ado 
mode
and setting:
WIN32OLE.codepage = WIN32OLE::CP_UTF8

but unfortunately the ado driver is leaking memory so badly that the 
mongrels need to be restarted every 2 hours.

when i switch to odbc mode, the memory leak is gone but with ODBC the 
codepage setting doesn't work anymore, so i get messed up texts from 
those old tables.


so i have to choose between the 2 evils:

1. memory leaking app with proper encoding
or
2. stable app with garbled text
Posted by Tom Bak (linus)
on 03.09.2007 11:33
Attachment: recode_legacy_database.zip (2,9 KB)
I solved my problem with own plugin:

recode_legacy_database :from => "ISO-8859-1", :to => "UTF-8"

Which translates fields in models on read and before write to database.

Cheers,
Tomasz
Posted by neongrau __ (neongrau)
on 03.09.2007 12:09
Tom Bak wrote:
> I solved my problem with own plugin:
> 
> recode_legacy_database :from => "ISO-8859-1", :to => "UTF-8"
> 
> Which translates fields in models on read and before write to database.
> 
> Cheers,
> Tomasz

so this is for a one time conversion of the whole db?

i don't have the luxury to do that because the legacy tables still get 
accessed by the legacy app :(
Posted by Tom Bak (linus)
on 03.09.2007 12:15
neongrau __ wrote:
> Tom Bak wrote:
>> I solved my problem with own plugin:
>> 
>> recode_legacy_database :from => "ISO-8859-1", :to => "UTF-8"
>> 
>> Which translates fields in models on read and before write to database.
>> 
>> Cheers,
>> Tomasz
> 
> so this is for a one time conversion of the whole db?

Exacly the oposit. I probably should change naming.
I recodes db "on fly". Check the code.

Cheers,
Tomasz Bak
Posted by Julio Damasceno (xjulio)
on 05.11.2007 16:39
Tom Bak wrote:
> I solved my problem with own plugin:
> 
> recode_legacy_database :from => "ISO-8859-1", :to => "UTF-8"
> 
> Which translates fields in models on read and before write to database.
> 
> Cheers,
> Tomasz

Tom,

How is the usage of this plugin? I have to run this command manual? When 
I have to put this code?


Thanks,

Julio
Posted by Herman Jansen (herman)
on 18.12.2007 14:46
Julio Damasceno wrote:
> Tom Bak wrote:
>> I solved my problem with own plugin:
>> 
>> recode_legacy_database :from => "ISO-8859-1", :to => "UTF-8"
>> 
>> Which translates fields in models on read and before write to database.
>> 
>> Cheers,
>> Tomasz
> 
> Tom,
> 
> How is the usage of this plugin? I have to run this command manual? When 
> I have to put this code?
> 
> 
> Thanks,
> 
> Julio

I did it this way :
- Unzip the zip-file into you rails vendor/plugins directory
- Add the following line to environments.rb
ActiveRecord::Base.recode_legacy_database 
:from=>"ISO-8859-1",:to=>"UTF-8"