Legacy ISO-8859-1 MSSQL db and UTF8 rails app

Hi!
I have problem with legacy database in MSSQL encoding.

The new app in Rails and most of the database is Utf8, while
tables from legacy MSSQL are encoded in ISO-8859-1.

I am not sure how to approach this problem. I would imagine
creation of some kind of filter with Iconv.conv(“UTF-8”, “ISO-8859-1”,
input)
for reading sql statements and the reverse for write methods.

Any suggestions how this could be done?

Cheers,
Tomasz Bak

Hi,
So far I have come come with poor solution:

class String
require ‘iconv’

def iso1_to_utf8
Iconv.conv(“UTF-8”, “ISO-8859-1”, self) rescue self
end

def utf8_to_iso1
Iconv.conv(“ISO-8859-1”, “UTF-8”, self) rescue self
end
end

class NCustomer < ActiveRecord::Base

def after_initialize
self.attributes.each do |k,v|
if v.class.to_s == “String” and !v.empty?
self[k] = v.iso1_to_utf8
end
end
end

def before_save
self.attributes.each do |k,v|
if v.class.to_s == “String” and !v.empty?
self[k] = v.utf8_to_iso1
end
end
end

end

Cheers,
Tomasz Bak

Tom Bak wrote:

I have problem with legacy database in MSSQL encoding.

The new app in Rails and most of the database is Utf8, while
tables from legacy MSSQL are encoded in ISO-8859-1.

I am not sure how to approach this problem. I would imagine
creation of some kind of filter with Iconv.conv(“UTF-8”, “ISO-8859-1”,
input)
for reading sql statements and the reverse for write methods.

Any suggestions how this could be done?

I think that everyone who’s ever been cornered by this can attest that
it can be a royal pain and that the options are all choosing between
lesser evils.

Last time I found myself there I exported the database to a plain text
SQL file and then ran recode [1] on it. It worked well enough but I
still found myself scanning over the file to correct tiny bits here and
there.

I’ll admit that your solution has a ring to it – being able to do it on
the fly improves the time to deploy. I like it. If however speed or
strict migration paths are an issue for you then recode may be an option
to consider.

[1] Recode


Roderick van Domburg

I solved my problem with own plugin:

recode_legacy_database :from => “ISO-8859-1”, :to => “UTF-8”

Which translates fields in models on read and before write to database.

Cheers,
Tomasz

i’m having a quite similar issue atm and i’m unsure what to do.

same thing, Legacy MSSQL Database with windows-1252 charset (so iso
compatible) on the legacy tables, and the Rails app purely using UTF-8.

i had no problems with encodings while using the sqlserver driver in ado
mode
and setting:
WIN32OLE.codepage = WIN32OLE::CP_UTF8

but unfortunately the ado driver is leaking memory so badly that the
mongrels need to be restarted every 2 hours.

when i switch to odbc mode, the memory leak is gone but with ODBC the
codepage setting doesn’t work anymore, so i get messed up texts from
those old tables.

so i have to choose between the 2 evils:

  1. memory leaking app with proper encoding
    or
  2. stable app with garbled text

Tom Bak wrote:

I solved my problem with own plugin:

recode_legacy_database :from => “ISO-8859-1”, :to => “UTF-8”

Which translates fields in models on read and before write to database.

Cheers,
Tomasz

so this is for a one time conversion of the whole db?

i don’t have the luxury to do that because the legacy tables still get
accessed by the legacy app :frowning:

neongrau __ wrote:

Tom Bak wrote:

I solved my problem with own plugin:

recode_legacy_database :from => “ISO-8859-1”, :to => “UTF-8”

Which translates fields in models on read and before write to database.

Cheers,
Tomasz

so this is for a one time conversion of the whole db?

Exacly the oposit. I probably should change naming.
I recodes db “on fly”. Check the code.

Cheers,
Tomasz Bak

Tom Bak wrote:

I solved my problem with own plugin:

recode_legacy_database :from => “ISO-8859-1”, :to => “UTF-8”

Which translates fields in models on read and before write to database.

Cheers,
Tomasz

Tom,

How is the usage of this plugin? I have to run this command manual? When
I have to put this code?

Thanks,

Julio

Julio Damasceno wrote:

Tom Bak wrote:

I solved my problem with own plugin:

recode_legacy_database :from => “ISO-8859-1”, :to => “UTF-8”

Which translates fields in models on read and before write to database.

Cheers,
Tomasz

Tom,

How is the usage of this plugin? I have to run this command manual? When
I have to put this code?

Thanks,

Julio

I did it this way :

  • Unzip the zip-file into you rails vendor/plugins directory
  • Add the following line to environments.rb
    ActiveRecord::Base.recode_legacy_database
    :from=>“ISO-8859-1”,:to=>“UTF-8”

I had problems with blob fields beeing treated as Strings as well. I
made a little change, so it is posible to exclude some fields from the
conversion.

use in model like this:
recode_legacy_database :from => “ISO-8859-1”, :to => “UTF-8”, :exclude
=> [“myblob1”,“myblob2”]

Tomasz Bak wrote:

I solved my problem with own plugin:

recode_legacy_database :from => “ISO-8859-1”, :to => “UTF-8”

Which translates fields in models on read and before write to database.

Cheers,
Tomasz

Dear Tomasz I’ll never be able to thank you enough for this recode
plugin!

You are my saviour

Tommaso