Forum: Ruby Premature end of regular expression with non-ascii character

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Nick S. (Guest)
on 2006-01-29 22:53
Hi,

I'm trying to get regular expressions to work with a string that
contains letters with accents. I have the following sentence:

De kiné weet één van hun patiënten te overtuigen om gekke dingen te
doen.

The regexp /patiënten/ matches the word patiënten. However when I do the
regexp /kiné/, I get the error 'premature end of regular expression:
/kiné/ (SyntaxError)'. Can anybody tell me what is going on? Another
issue with the same sentence is, when I use the regexp /\s/ to highlight
all the spaces, the space between 'kiné weet' is not highlighted as a
space. It seems like regular expressions cann't handle non-ascii
characters at the end of a string.

Kind regards,

Nick
Matthew S. (Guest)
on 2006-01-29 23:15
(Received via mailing list)
> issue with the same sentence is, when I use the regexp /\s/ to
> highlight
> all the spaces, the space between 'kiné weet' is not highlighted as a
> space. It seems like regular expressions cann't handle non-ascii
> characters at the end of a string.


I believe this is a character encoding problem which is fixed in 1.9
by the inclusion of a new regular expression engine (Which you can
also download and use in 1.8):

http://www.geocities.jp/kosako3/oniguruma/

Best of luck.
matt.
Logan C. (Guest)
on 2006-01-30 06:13
(Received via mailing list)
On Jan 29, 2006, at 3:53 PM, Nick S. wrote:

> regexp /kiné/, I get the error 'premature end of regular expression:
>
> --
> Posted via http://www.ruby-forum.com/.
>

Are you using $KCODE="u"  at the top of your script?
Nick S. (Guest)
on 2006-01-30 22:18
Thank you both very much for the suggestions. First off I have
$KCODE="u" in config/environment.rb (Rails). I have also tried to add it
into the class. But the error remained.

Secondly I looked at oniguruma and I must say it looks promising.
Unfortunately for me and my Windows (Cygwin) machine I have to compile
it into Ruby 1.8.2-1.8.4. And I cann't get it to work. Cann't get 1.8.2
to compile, an error which you then solve, yet another error and so one.
Hopeless. I managed to compile 1.8.4 but when I open Ruby I get the
error that a file is missing. I'm using the Windows one-click Ruby
installer if anybody is wondering how on earth I managed to get Ruby
working :). I could use 1.9.0 because this includes oniguruma. The only
problem here is that I don't know if Rails works with it. I have
contacted the author of oniguruma, maybe he can be conclusive as to
whether or not oniguruma  solves my problem. When I get a response I'll
post it here. In the mean time if anybody has any other suggestions,
please let me hear. Thanks.

Kind regards,

Nick
Dave B. (Guest)
on 2006-02-01 23:39
(Received via mailing list)
Nick S. asked:
> I'm trying to get regular expressions to work with a string that
> contains letters with accents. ...
>
> The regexp /patiënten/ matches the word patiënten. However when I do the
> regexp /kiné/, I get the error 'premature end of regular expression:
> /kiné/ (SyntaxError)'. Can anybody tell me what is going on?

You might avoid the syntax error by setting $KCODE = "u" at the start of
your program.

> Another
> issue with the same sentence is, when I use the regexp /\s/ to highlight
> all the spaces, the space between 'kiné weet' is not highlighted as a
> space. It seems like regular expressions cann't handle non-ascii
> characters at the end of a string.

Ruby strings are made up of bytes, not characters. That's the cause of
the
issues you're having. There are a couple of recent plugins for Ruby to
help
improve the situation (see
http://redhanded.hobix.com/inspect/unicodeLibForRuby18.html) but they're
far
from perfect.

I hope $KCODE can clear up most of your problems, though.

Cheers,
Dave
Dave B. (Guest)
on 2006-02-02 00:25
(Received via mailing list)
Nick S. wrote:
> Thank you both very much for the suggestions. First off I have
> $KCODE="u" in config/environment.rb (Rails). I have also tried to add it
> into the class. But the error remained.

I haven't had the issues you're talking about, because I'm only doing
apps
in English, but here are a couple of places you might start to look for
solutions:

http://wiki.rubyonrails.com/rails/pages/HowToUseUn...

http://redhanded.hobix.com/inspect/unicodeLibForRuby18.html

> I could use 1.9.0 because this includes oniguruma. The only
> problem here is that I don't know if Rails works with it.

Don't. 1.9.0 isn't for production, really; it's an experimental version
which is growing some features that may become part of Ruby 2.0.

Cheers,
Dave
This topic is locked and can not be replied to.