Forum: Ruby Saving the web, charset problems and symbols problems

Announcement (2017-05-07): is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see and for other Rails- und Ruby-related community platforms.
D208930fb4a8fb858658a717081da813?d=identicon&s=25 Sak .. (saknarede)
on 2009-01-30 06:08
Hi all!

I think that a lot of ruby scripts are for web crawling, web scrapping
and many more applications with the web. I'm working with the web too, I
try to save text of many different webs. In this moment I'm trying to
solve two problems:

1 - How to standard the charset of the web. There are a lot of
differents charsets and I think that it must be possible another
solution that see every charset and convert to proper charset each time.
(By the way, what is the best method to see charset of a file? command
file is not very good, I think)

2 - How to convert HTML to plain text. I use Hpricot but a lot of very
rare simbols continues there like "€" or "”". Wich is the most used

Thanks a lot
This topic is locked and can not be replied to.