HTML entites in ClothRed: yay or nay?


#1

In preparation of release 0.3.0 (hopefully later today ;), I’m
wondering: Should I translated HTML entities into human-readable format?
My hunch is yes, as that’s the point of Textile.

So, I’m at a bit of a loss: I’ve never, ever worked with character
encodings (I don’t even know how to check the encoding on Linux or
Windows).

So, my question is, how do I replace the HTML entities with ISO-8859-1
characters?

The trouble is, that the character encodings don’t seem to be taken off
of UTF-8 or something else that I can just escape, or can I?

Meanwhile, I’m digging through the RDoc documentation. Hopefully, I can
find something there.


Phillip “CynicalRyan” Gawlowski
http://cynicalryan.110mb.com/
http://clothred.rubyforge.org

Rule of Open-Source Programming #13:

Your first release can always be improved upon.


#2

It seems like you need HTMLEntities
(http://htmlentities.rubyforge.org/) which will add a dependency on
your distribution (but better than repeating the same effort others
did).

I think that dependency may be made optional and ClothRed could throw
an exception when asked to decode HTML entities and could not find
that module. I don’t know if this is acceptable to you.

From the docs, code like

 require 'htmlentities'
 coder = HTMLEntities.new
 string = "élan"
 coder.decode(string) # => "élan"

take your HTML with entities into UTF-8 characters if I understood
correctly.

Cheers,
Adriano F…


#3

All HTML can be coded in ASCII, as can XML and XHTML
However, that is simply the markup itself.
Do not forgo the encoding.
Convert everything to UTF-8
There is no reason to use anything else.
Visit:
http://www.unicode.org/charts/

http://www.unicode.org/