Character encoding question

I have an html file which is encoded in UTF-8. The file contains the
following text:

It's a wonderful life

now the character code 39 is for aphostrohpe in UTF8. so suppose I got
the 39 out of the text using:

s=“It’s a wonderful life”

s.gsub(/&#(\d+);/, ‘\1’)

The output is

It39s a wonderful life

So firstly I am having trouble making it

It\39s a wonderful life

Secondly I manually did this in test_utf8.rb:

puts “It\39s a wonderful life”

and ran it

ruby test_utf8.rb > utf8.txt

but by opening it in the open office by setting the encoding to utf-8
the output is

It#9s a wonderful life

So how to correctly parse the collect and convert html character
reference to encoded charcters in utf-8 and then save file?

Thanks.

s=“It’s a wonderful life”

I stumbled across this:

require ‘cgi’
s=CGI.unescapeHTML(“It’s a wonderful life”)


David

try something like this:

require ‘cgi’
s=“UPPERCASE Russian Alphabet\n”.encode(‘utf-8’)
s+=CGI.unescapeHTML(“АБВГ”.encode(‘utf-8’))
s+=CGI.unescapeHTML(“ДЕЖЗ”.encode(‘utf-8’))
s+=CGI.unescapeHTML(“ИЙКЛ”.encode(‘utf-8’))
s+=CGI.unescapeHTML(“МНОП”.encode(‘utf-8’))
s+=CGI.unescapeHTML(“РСТУ”.encode(‘utf-8’))
s+=CGI.unescapeHTML(“ФХЦЧ”.encode(‘utf-8’))
s+=CGI.unescapeHTML(“ШЩЪЫ”.encode(‘utf-8’))
s+=CGI.unescapeHTML(“ЬЭЮЯ”.encode(‘utf-8’))
puts s

David

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs