Premature end of regular expression with non-ascii character


#1

Hi,

I’m trying to get regular expressions to work with a string that
contains letters with accents. I have the following sentence:

De kiné weet één van hun patiënten te overtuigen om gekke dingen te
doen.

The regexp /patiënten/ matches the word patiënten. However when I do the
regexp /kiné/, I get the error ‘premature end of regular expression:
/kiné/ (SyntaxError)’. Can anybody tell me what is going on? Another
issue with the same sentence is, when I use the regexp /\s/ to highlight
all the spaces, the space between ‘kiné weet’ is not highlighted as a
space. It seems like regular expressions cann’t handle non-ascii
characters at the end of a string.

Kind regards,

Nick


#2

issue with the same sentence is, when I use the regexp /\s/ to
highlight
all the spaces, the space between ‘kiné weet’ is not highlighted as a
space. It seems like regular expressions cann’t handle non-ascii
characters at the end of a string.

I believe this is a character encoding problem which is fixed in 1.9
by the inclusion of a new regular expression engine (Which you can
also download and use in 1.8):

http://www.geocities.jp/kosako3/oniguruma/

Best of luck.
matt.


#3

On Jan 29, 2006, at 3:53 PM, Nick S. wrote:

regexp /kiné/, I get the error 'premature end of regular expression:


Posted via http://www.ruby-forum.com/.

Are you using $KCODE=“u” at the top of your script?


#4

Thank you both very much for the suggestions. First off I have
$KCODE=“u” in config/environment.rb (Rails). I have also tried to add it
into the class. But the error remained.

Secondly I looked at oniguruma and I must say it looks promising.
Unfortunately for me and my Windows (Cygwin) machine I have to compile
it into Ruby 1.8.2-1.8.4. And I cann’t get it to work. Cann’t get 1.8.2
to compile, an error which you then solve, yet another error and so one.
Hopeless. I managed to compile 1.8.4 but when I open Ruby I get the
error that a file is missing. I’m using the Windows one-click Ruby
installer if anybody is wondering how on earth I managed to get Ruby
working :). I could use 1.9.0 because this includes oniguruma. The only
problem here is that I don’t know if Rails works with it. I have
contacted the author of oniguruma, maybe he can be conclusive as to
whether or not oniguruma solves my problem. When I get a response I’ll
post it here. In the mean time if anybody has any other suggestions,
please let me hear. Thanks.

Kind regards,

Nick


#5

Nick S. asked:

I’m trying to get regular expressions to work with a string that
contains letters with accents. …

The regexp /patiënten/ matches the word patiënten. However when I do the
regexp /kiné/, I get the error ‘premature end of regular expression:
/kiné/ (SyntaxError)’. Can anybody tell me what is going on?

You might avoid the syntax error by setting $KCODE = “u” at the start of
your program.

Another
issue with the same sentence is, when I use the regexp /\s/ to highlight
all the spaces, the space between ‘kiné weet’ is not highlighted as a
space. It seems like regular expressions cann’t handle non-ascii
characters at the end of a string.

Ruby strings are made up of bytes, not characters. That’s the cause of
the
issues you’re having. There are a couple of recent plugins for Ruby to
help
improve the situation (see
http://redhanded.hobix.com/inspect/unicodeLibForRuby18.html) but they’re
far
from perfect.

I hope $KCODE can clear up most of your problems, though.

Cheers,
Dave


#6

Nick S. wrote:

Thank you both very much for the suggestions. First off I have
$KCODE=“u” in config/environment.rb (Rails). I have also tried to add it
into the class. But the error remained.

I haven’t had the issues you’re talking about, because I’m only doing
apps
in English, but here are a couple of places you might start to look for
solutions:

http://wiki.rubyonrails.com/rails/pages/HowToUseUnicodeStrings

http://redhanded.hobix.com/inspect/unicodeLibForRuby18.html

I could use 1.9.0 because this includes oniguruma. The only
problem here is that I don’t know if Rails works with it.

Don’t. 1.9.0 isn’t for production, really; it’s an experimental version
which is growing some features that may become part of Ruby 2.0.

Cheers,
Dave