Forum: Rails I18n Legacy ISO-8859-1 MSSQL db and UTF8 rails app

Posted by Tomasz Bak (linus)
on 2007-08-28 10:55
Hi!
I have problem with legacy database in MSSQL encoding.

The new app in Rails and most of the database is Utf8, while
tables from legacy MSSQL are encoded in ISO-8859-1.

I am not sure how to approach this problem. I would imagine
creation of some kind of filter with Iconv.conv("UTF-8", "ISO-8859-1",
input)
for reading sql statements and the reverse for write methods.

Any suggestions how this could be done?

Cheers,
Tomasz Bak
Posted by Tomasz Bak (linus)
on 2007-08-28 12:15
Hi,
So far I have come come with poor solution:

class String
  require 'iconv'

  def iso1_to_utf8
    Iconv.conv("UTF-8", "ISO-8859-1", self) rescue self
  end

  def utf8_to_iso1
    Iconv.conv("ISO-8859-1", "UTF-8", self) rescue self
  end
end

class NCustomer < ActiveRecord::Base
...
  def after_initialize
    self.attributes.each do |k,v|
      if v.class.to_s == "String" and !v.empty?
        self[k] = v.iso1_to_utf8
      end
    end
  end

  def before_save
    self.attributes.each do |k,v|
      if v.class.to_s == "String" and !v.empty?
        self[k] = v.utf8_to_iso1
      end
    end
  end
...
end

Cheers,
Tomasz Bak
Posted by Roderick van Domburg (roderickvd)
on 2007-08-28 21:31
Tom Bak wrote:
> I have problem with legacy database in MSSQL encoding.
> 
> The new app in Rails and most of the database is Utf8, while
> tables from legacy MSSQL are encoded in ISO-8859-1.
> 
> I am not sure how to approach this problem. I would imagine
> creation of some kind of filter with Iconv.conv("UTF-8", "ISO-8859-1",
> input)
> for reading sql statements and the reverse for write methods.
> 
> Any suggestions how this could be done?

I think that everyone who's ever been cornered by this can attest that 
it can be a royal pain and that the options are all choosing between 
lesser evils.

Last time I found myself there I exported the database to a plain text 
SQL file and then ran recode [1] on it. It worked well enough but I 
still found myself scanning over the file to correct tiny bits here and 
there.

I'll admit that your solution has a ring to it -- being able to do it on 
the fly improves the time to deploy. I like it. If however speed or 
strict migration paths are an issue for you then recode may be an option 
to consider.

[1] http://www.gnu.org/software/recode/

--
Roderick van Domburg
http://www.nedforce.nl
Posted by neongrau __ (neongrau)
on 2007-09-03 11:02
i'm having a quite similar issue atm and i'm unsure what to do.

same thing, Legacy MSSQL Database with windows-1252 charset (so iso 
compatible) on the legacy tables, and the Rails app purely using UTF-8.

i had no problems with encodings while using the sqlserver driver in ado 
mode
and setting:
WIN32OLE.codepage = WIN32OLE::CP_UTF8

but unfortunately the ado driver is leaking memory so badly that the 
mongrels need to be restarted every 2 hours.

when i switch to odbc mode, the memory leak is gone but with ODBC the 
codepage setting doesn't work anymore, so i get messed up texts from 
those old tables.


so i have to choose between the 2 evils:

1. memory leaking app with proper encoding
or
2. stable app with garbled text
Posted by Tomasz Bak (linus)
on 2007-09-03 11:33
Attachment: recode_legacy_database.zip (2,94 KB)
I solved my problem with own plugin:

recode_legacy_database :from => "ISO-8859-1", :to => "UTF-8"

Which translates fields in models on read and before write to database.

Cheers,
Tomasz
Posted by neongrau __ (neongrau)
on 2007-09-03 12:09
Tom Bak wrote:
> I solved my problem with own plugin:
> 
> recode_legacy_database :from => "ISO-8859-1", :to => "UTF-8"
> 
> Which translates fields in models on read and before write to database.
> 
> Cheers,
> Tomasz

so this is for a one time conversion of the whole db?

i don't have the luxury to do that because the legacy tables still get 
accessed by the legacy app :(
Posted by Tomasz Bak (linus)
on 2007-09-03 12:15
neongrau __ wrote:
> Tom Bak wrote:
>> I solved my problem with own plugin:
>> 
>> recode_legacy_database :from => "ISO-8859-1", :to => "UTF-8"
>> 
>> Which translates fields in models on read and before write to database.
>> 
>> Cheers,
>> Tomasz
> 
> so this is for a one time conversion of the whole db?

Exacly the oposit. I probably should change naming.
I recodes db "on fly". Check the code.

Cheers,
Tomasz Bak
Posted by Julio Damasceno (xjulio)
on 2007-11-05 16:39
Tom Bak wrote:
> I solved my problem with own plugin:
> 
> recode_legacy_database :from => "ISO-8859-1", :to => "UTF-8"
> 
> Which translates fields in models on read and before write to database.
> 
> Cheers,
> Tomasz

Tom,

How is the usage of this plugin? I have to run this command manual? When 
I have to put this code?


Thanks,

Julio
Posted by Herman Jansen (herman)
on 2007-12-18 14:46
Julio Damasceno wrote:
> Tom Bak wrote:
>> I solved my problem with own plugin:
>> 
>> recode_legacy_database :from => "ISO-8859-1", :to => "UTF-8"
>> 
>> Which translates fields in models on read and before write to database.
>> 
>> Cheers,
>> Tomasz
> 
> Tom,
> 
> How is the usage of this plugin? I have to run this command manual? When 
> I have to put this code?
> 
> 
> Thanks,
> 
> Julio

I did it this way :
- Unzip the zip-file into you rails vendor/plugins directory
- Add the following line to environments.rb
ActiveRecord::Base.recode_legacy_database 
:from=>"ISO-8859-1",:to=>"UTF-8"
Posted by Daniel ... (wurst)
on 2008-11-17 14:42
Attachment: recode_legacy_database.rb (1,29 KB)
I had problems with blob fields beeing treated as Strings as well. I 
made a little change, so it is posible to exclude some fields from the 
conversion.

use in model like this:
recode_legacy_database :from => "ISO-8859-1", :to => "UTF-8", :exclude 
=> ["myblob1","myblob2"]
Posted by Tommaso Patrizi (tommasop)
on 2010-05-06 12:44
Tomasz Bak wrote:
> I solved my problem with own plugin:
> 
> recode_legacy_database :from => "ISO-8859-1", :to => "UTF-8"
> 
> Which translates fields in models on read and before write to database.
> 
> Cheers,
> Tomasz

Dear Tomasz I'll never be able to thank you enough for this recode 
plugin!

You are my saviour

Tommaso
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.