Hello I have a lot of xml and java files witch have German Umlauts and other non ASCII files in them. I want to read the files and convert them to UTF-8 using a Ruby script. I convert the strings with this code: def to_utf8(str) str.unpack('U*').map do |c| if c < 0x80 c.chr else '( u%04X )' % c end end.join end (taken from "The Ruby Way" by Hal Fulton). sometimes it works, sometimes I get this error: "malformed UTF-8 character" I tought this might happen because the File is encoded in ISO-8859-1 (was written with Eclipse set to ISO-8859-1 for text encoding). how can I read a file with Ruby and specify that it is read with ISO-8859-1 encoding (similar to Java's BufferedReader where I can specify the encoding). any help welcome. best wishes Claus
on 2007-05-14 16:39
on 2007-05-14 16:49
Claus Hausberger wrote: > str.unpack('U*').map do |c| I'd be surprised if this was right - you're telling it that you're expecting the string to be UTF-8 already with that unpack format. <snip> > how can I read a file with Ruby and specify that it is read with > ISO-8859-1 encoding (similar to Java's BufferedReader where I can > specify the encoding). Investigate Iconv in the standard library. It does what you need.
on 2007-05-14 16:50
On 14 May 2007, at 16:39, Claus Hausberger wrote: > def to_utf8(str) > > any help welcome. best wishes > > Claus > > -- > Posted via http://www.ruby-forum.com/. > Hallo Claus, you could use jcode... $KCODE = 'UTF8' require 'jcode' Cheers, Enrique Comba Riepenhausen