Question about entities

1.) What is everyone’s preference on NCRs or character entities?
Textile 2 uses decimal NCRs, so a less-than character becomes <
whereas RedCloth (3.04 and prior) used <. What is your
preference? It gets tough because ’ (a straight single quote)
doesn’t have a character entity equivalent.

2.) How do you feel about encoding characters like quotes in
blockcode and pre blocks? Textile 2 does it, but the old RedCloth
never did. Example:

This is some code, “isn’t it”.

under Textile 2 becomes

This is some code, "isn't it".

Thanks!
Jason

References:
http://www.w3.org/International/questions/qa-escapes
http://textile.thresholdstate.com/

At 6:07 PM -0500 2/21/08, Jason G. wrote:

This is some code, “isn’t it”.

under Textile 2 becomes

This is some code, "isn't it".

I prefer unicode character references rather than entities.

See:
http://rubyforge.org/pipermail/redcloth-upwards/2007-August/000161.html

I will start by referencing the parts in the parser that are specific
to HTML and are not covered by the ruby methods (“def p”, “def h1”,
etc) and also the parts that could need a 2 step parsing (1 build a
tree, 2 render) typically parts that need some knowledge of what is
coming further down before we can render them (footnotes, tables).

From this list we can discuss the best ways to enable multiple outputs
without loosing speed.

Gaspard

2008/3/11, Jason G. [email protected]:

This idea about hooks is a good one. I’d wished for it myself when
outputting HTML vs. XHTML (because while
works, it isn’t truly
valid HTML)

A patch to do this would be most welcome!

Jason

ENTITIES

I think html entities are more readable in case someone reads the raw
code, but as you mentioned, some cannot be escaped and need unicode
character references.

From my needs, it does not really matter. Maybe consistency is good,
then we have to go for unicode.

A much more important thing is that this entity escaping should be
optional. I wrote a “to_latex” grammar and escaping entities is
different in LateX. I proposed a patch that tried to make as little
changes to the overall design as possible
(http://code.whytheluckystiff.net/redcloth/ticket/35). But in essence,
I think there should be some universal hooks to do this kind of
escaping. I would propose the following hooks:

pre: before anything happens
escape: before raw text is written out (entity escaping in html for
example)
post: after parsing

It would then be easy to alter the grammar for HTML by writing:

class << SuperRedCloth::HTML
def escape(text)
html_unicode_escape(text)
end
end

The method “html_entity_escape” would be the C function
“rb_str_cat_escaped”, “html_unicode_escape” could be another C
function so there is no speed loss.

The “pre” and “post” hooks could be used to extract custom tags and
put the parsed result back after parsing. The interest of having them
“inside” RedCloth is that we can alter the extracted data during
parsing. This might seem mad, but it might be the only way to solve
footnotes or tables when producing latex output without parsing the
whole text once more.

CODE ESCAPING

This seems bad to me. Code should be as raw material as possible. It
would be terrible to write code in ASCII and put it on a website. A
user makes a copy of the code and finds himself with utf-8 data that
doesn’t compile just because of quotes that look pretty but have no
meaning in the language the code was written in.

Ok, that’s it.

Gaspard

2008/3/10, Stephen B. [email protected]:

On Mar 11, 2008, at 5:24 PM, Gaspard B. wrote:

also the parts that could need a 2 step parsing (1 build a
tree, 2 render) typically parts that need some knowledge of what is
coming further down before we can render them (footnotes, tables).

That would be really handy. Right now I re-parse the document if any
link aliases were found and I assume that footnotes always come after
they’re referenced. Re-parsing isn’t that slow, but it is unnecessary.

Jason