Forum: Redcloth RedCloth, JRuby and national characters

58c27bbf5c57037cc3579ca8f3082176?d=identicon&s=25 Claus Folke Brobak (cfbrobak)
on 2010-02-25 14:00
Am I doing something wrong or have I hit a bug in the JRuby version of
RedCloth?

RedCloth version 4.2.2

Program:
  require 'rubygems'
  require 'redcloth'
  str = 'blåbærgrød'
  puts 'String : ' + str
  puts 'HTML   : ' + RedCloth.new(str).to_html()

In "ruby 1.8.6 (2008-08-11 patchlevel 287) [i386-mswin32]" the output is

  String : blåbærgrød
  HTML   : <p>blåbærgrød</p>

In "jruby 1.4.0 (ruby 1.8.7 patchlevel 174) (2009-11-02 69fbfa3) (Java
HotSpot(TM) Client VM 1.6.0_17) [x86-java]" the output is

  String : blåbærgrød
  HTML   : <p>bl</p>

As you can see, when running i JRuby, parsing of the string seems to
have stopped, when the first Danish national character, "Ã¥", was met.

Claus
58c27bbf5c57037cc3579ca8f3082176?d=identicon&s=25 Claus Folke Brobak (cfbrobak)
on 2010-02-26 07:03
Claus Folke Brobak wrote:
> Am I doing something wrong or have I hit a bug in the JRuby version of
> RedCloth?
>
> RedCloth version 4.2.2
>

I should add that I am on Windows XP.

Claus
99048356f84c208112ea60cbdee6fb79?d=identicon&s=25 Georg M. (georg_m)
on 2010-10-18 11:12
Hi Claus,

I hit the same issue. Linux, Redcloth 4.2.3, jruby 1.5.1 (ruby 1.8.7
patchlevel 249).

My workaround was to html-escape non-standard characters before passing
them to redcloth. It works but my solution is a little bit fragile. Did
you find a proper solution?

Best, Georg
8e14fbc488f5a69a4a246f9594d11bd5?d=identicon&s=25 Marek Kowalski (Guest)
on 2010-10-18 14:02
(Received via mailing list)
Hi Georg,
I went the same went. Eventually I managed to kind of solve the
problem, I have the proper jar generated which handles UTF characters
properly. What basicly what needs to be done is to use char (16-bit)
datatype in Ragel code instead of byte (8-bit). You can take a look
here at my work:
http://github.com/kowalski/redcloth
Problem is, that when you run rspec now on the new code, there is a
number of tests that fails. The difference is the extra whitespaces
added to the resulting html code. For me it is no harm so I have this
jar working on production for a few months now.

Cheers,
Marek Kowalski

2010/10/18 Georg M. <lists@ruby-forum.com>:
A50dcaaf8e545e6cc1fb4e32919be6ad?d=identicon&s=25 Jason Garber (jgarber)
on 2010-11-12 11:20
(Received via mailing list)
Great work, Marek!  I pulled your work into a branch: jruby-mbc

The problem, as you pointed out, is extra whitespace.  I'm hoping you or
someone else can help me get it figured out so I can release it!

It seems to be just when there's HTML in the input. (At least that's all
I've found so far.)  When it's a standalone HTML tag (just a block tag
on a line), it puts two BRs after.  When it's an HTML block (start tag,
contents, end tag), it puts the BR inside the beginning of the next
block.  When just one newline ends the document, it puts a BR inside the
end of the last block; two newlines before EOF behave fine though.

>
> <div>test</div>
>
> Another p.


Results in:
> <p><br />
> Another p.<br />
> </p>


Weird, huh?  I'd greatly appreciate anyone who can help this Java dunce
(me).  Here's the fast way to get it checked out and set up:
> git clone git@github.com:jgarber/redcloth.git
> cd redcloth
> git checkout jruby-mbc
> rvm use jruby-1.5.3@redcloth  # assuming you're using rvm and you've done 'rvm
install jruby'
> bundle
> rake compile

Thanks!
Jason
8e14fbc488f5a69a4a246f9594d11bd5?d=identicon&s=25 Marek Kowalski (Guest)
on 2010-11-12 14:59
(Received via mailing list)
Hey,
I'm glad I could contribute a little :) I spent a lot of time trying
to figure out the reason for extra whitespace and failed. I came to
the conclusion that the newline coding for ragel had to be updated so
that it matches the 32bit characters. I know nothing about Ragel so I
gave up. Nowadays I'm doing python, so I cannot help any further.
Still I guess that the whitespace problem should be far easier to spot
and fix. Last but not least it could be just ignored, this is what I
did, I just deployed the code as it is to the webapp I've been working
on.

Regards and good luck!
Marek Kowalski

2010/11/12 Jason Garber <jg@jasongarber.com>:
A50dcaaf8e545e6cc1fb4e32919be6ad?d=identicon&s=25 Jason Garber (jgarber)
on 2010-11-27 14:07
(Received via mailing list)
I've posted this task to oDesk with a $100 budget.  Let's hope someone
takes
the job!

http://www.odesk.com/jobs/JRuby-fix-for-RedCloth_%...
8e14fbc488f5a69a4a246f9594d11bd5?d=identicon&s=25 Marek Kowalski (Guest)
on 2010-12-03 01:35
(Received via mailing list)
Mhm, thanks for keeping me informed! I don't agree with the task
description though. As far as I remember the problem was with ragel
code not html_esc function. It should be easy to figured out for
someone with the ragel experience.. You might want to update the
description, to lure people with correct profile. Well we will see,
hope someone figures it out.

Cheers!
MK

2010/11/27 Jason Garber <jg@jasongarber.com>:
This topic is locked and can not be replied to.