In preparation of release 0.3.0 (hopefully later today ;), I'm wondering: Should I translated HTML entities into human-readable format? My hunch is yes, as that's the point of Textile. So, I'm at a bit of a loss: I've never, ever worked with character encodings (I don't even know how to check the encoding on Linux or Windows). So, my question is, how do I replace the HTML entities with ISO-8859-1 characters? The trouble is, that the character encodings don't seem to be taken off of UTF-8 or something else that I can just escape, or can I? Meanwhile, I'm digging through the RDoc documentation. Hopefully, I can find something there. -- Phillip "CynicalRyan" Gawlowski http://cynicalryan.110mb.com/ http://clothred.rubyforge.org Rule of Open-Source Programming #13: Your first release can always be improved upon.
on 2007-04-12 14:47
on 2007-04-12 15:10
It seems like you need HTMLEntities (http://htmlentities.rubyforge.org/) which will add a dependency on your distribution (but better than repeating the same effort others did). I think that dependency may be made optional and ClothRed could throw an exception when asked to decode HTML entities and could not find that module. I don't know if this is acceptable to you. From the docs, code like require 'htmlentities' coder = HTMLEntities.new string = "élan" coder.decode(string) # => "élan" take your HTML with entities into UTF-8 characters if I understood correctly. Cheers, Adriano Ferreira.
on 2007-04-12 15:11
All HTML can be coded in ASCII, as can XML and XHTML However, that is simply the markup itself. Do not forgo the encoding. Convert everything to UTF-8 There is no reason to use anything else. Visit: http://www.unicode.org/charts/ http://www.unicode.org/