strube
November 29, 2007, 1:50pm
1
Can someone tell me what it is that I’m getting wrong here with “iconv”?
I either get “IllegalSequence” or “äöüß” are not encoded properly when
using Iconv.conv while it looks good using backticks. ("IllegalSequence
right now with the second. ÄÖü with the first anytime…)
require ‘rss/1.0’; require ‘rss/2.0’; require ‘open-uri’; require
“iconv”
#source = “http://www.sueddeutsche.de/app/service/rss/alles/rss.xml ”
source = “Panorama - WELT ”
content = “”; open(source) { |s| content = s.read }; rss =
RSS::Parser.parse(content, false)
rss.items.each do |item|
converted = '#{item.title}' | iconv -c -f ISO-8859-1 -t UTF8
puts(Iconv.conv(‘ISO-8859-1’, ‘UTF-8’, item.title)); puts " "
end
strube
November 30, 2007, 1:05am
2
On Nov 29, 6:50 am, Marcus S. [email protected] wrote:
content = “”; open(source) { |s| content = s.read }; rss =
RSS::Parser.parse(content, false)
rss.items.each do |item|
converted = '#{item.title}' | iconv -c -f ISO-8859-1 -t UTF8
puts(Iconv.conv(‘ISO-8859-1’, ‘UTF-8’, item.title)); puts " "
end
Posted viahttp://www.ruby-forum.com/.
Not sure about the error, but I see two issues. First, this is an
error…
'#{item.title}' | iconv -c -f ISO-8859-1 -t UTF8
I think you meant to echo the vale to the pipe…
echo -n '#{item.title}' | iconv -c -f ISO-8859-1 -t UTF8
Second, iso-8859-1 to utf-8 doesn’t appear to be the proper encoding.
The following string…
Düsseldorf: Prominentengedrängel bei der Bambi-Verleihung
…is encoded as…
“D\303\203\302\274sseldorf: Prominentengedr\303\203\302\244ngel bei
der Bambi-Verleihung”
…by iconv from the command prompt. But it should be…
“D\303\274sseldorf: Prominentengedr\303\244ngel bei der Bambi-
Verleihung”
I’m not good with encodings and utf-8, so I can’t tell you the
problem. I just know “umlaut u” should be 0xc3bc (\303\274), but it’s
not doing that.
Regards,
Jordan