Dear all,
I try to implement some rewrites using regular expressions and my URIs
will contain Greek characters.
Trials of the REs are going ok when tested with pcretest:
[root@localhost ~]# pcretest
PCRE version 8.10 2010-06-25
re> #^[\x{0386}-\x{03FF}]+$#8
data> bv
No match
data> ΤηλÎ
0: \x{3a4}\x{3b7}\x{3bb}\x{3ad}
note the 8 modifier that actually tells PCRE to do a UTF-8 matching.
Having the RE in nginx.config complains about
[emerg]: pcre_compile() failed: character value in \x{...} sequence is
too large in
which I guess means that somehow nginx calls PCRE without the PCRE_UTF8
option flag
Am I right? How can I implement these Greek character URL rewrites?
The system environment is:
- CentOS 5.4
- PCRE 8.10 with utf-8 and utf-properties enabled
- nginx 0.8.42
Cheers
Tilemahos
Posted at Nginx Forum:
Hello!
On Thu, Jul 01, 2010 at 11:33:49AM -0400, tmanolat wrote:
re> #^[\x{0386}-\x{03FF}]+$#8
[emerg]: pcre_compile() failed: character value in \x{...} sequence is
> too large in
which I guess means that somehow nginx calls PCRE without the PCRE_UTF8
option flag
Am I right? How can I implement these Greek character URL rewrites?
Using (*UTF8) to switch pcre into utf-8 mode should work find in
both nginx and pcretest. See man pcresyntax for details.
Maxim D.
tmanolat at 2010-7-1 23:33 wrote:
re> #^[\x{0386}-\x{03FF}]+$#8
[emerg]: pcre_compile() failed: character value in \x{...} sequence is
> too large in
which I guess means that somehow nginx calls PCRE without the PCRE_UTF8
option flag
Am I right? How can I implement these Greek character URL rewrites?
I use the raw bits in Chinese character substitution in my subscitution
module(Google Code Archive - Long-term storage for Google Code Project Hosting.)
I think you could convert the Greek cahracter like this:
‘\x3a\x43\xb7\x3b\xb3\xad’
Posted at Nginx Forum: problem with PCRE matching, utf-8, Greek, rewrite
nginx mailing list
[email protected]
nginx Info Page
–
Weibin Y.
tmanolat Wrote:
FYI I put in nginx.conf:
…
if ($request_uri ~
“(UTF8)^(.)[\?|&]filename=([%
,a-zA-Z0-9\x{386}-\x{3ff}_-.]+)(&.*)?$”) {
…
I’ve got a very similar problem in nginx but I dont really understand
your solution. Could you please post your nginx.conf or at least some
more lines related to UTF8 filenames conversions? It would be a life
saver!
Posted at Nginx Forum:
Dear Maxim,
it works like a charm now.
FYI I put in nginx.conf:
...
if ($request_uri ~ "(*UTF8)^(.*)[\\?|&]filename=([%
,a-zA-Z0-9\x{386}-\x{3ff}_\-\.]+)(&.*)?$") {
...
Kindest regards,
Tilemahos Manolatos
PS. @Weibin Y.: I would like to avoid “fixed” character lists, I wanted
to use ranges of characters, so the above solution seems, in my opinion,
better for matching all Greek characters
Posted at Nginx Forum:
initially this worked well: (\x{386}-\x{3ff} for Greek chars)
location ~ "^(/optionalwebappname)?/ProcessImageServlet.*$" {
root /opt/myfilerepository/;
rewrite ^(.+)$ http://static-dev.myhost.eu/$arg_hotel_id/$th$fn
break;
set $hid '';
set $filename '';
set $th '';
if ($request_uri ~ "^(.*)[\\?|&]hotel_id=([0-9]+)(&.*)?$") {
set $hid $2;
}
if ($request_uri ~ "(*UTF8)^(.*)[\\?|&]filename=([%
,a-zA-Z0-9\x{386}-\x{3ff}_\-\.]+)(&.*)?$") {
set $fn $2;
}
if ($request_uri ~ "^(.*)[\\?|&]type=th(&.*)?$") {
set $th 'th_';
}
rewrite ^(.+)$ http://static-dev.myhost.eu/$hid/$th$fn break;
access_log logs/site-pis.log main;
expires 1h;
}
however, later I found this to work better, including of course utf8
arguments - you would better check this out first… much more elegant
location ~
"^(/optionalwebappname)?/ProcessImageServlet.*$" {
set $th '';
if ($request_uri ~ "^(.*)[\\?|&]type=th(&.*)?$") {
set $th 'th_';
}
rewrite ^(.+)$
http://static-dev.myhost.eu/$arg_hotel_id/$th$arg_filename break;
expires 1d;
}
Posted at Nginx Forum:
I would really like a wiki page on UTF-8 support as well.
*UTF8 doesnt work for me though, ive tried.
When attempting to use *UTF8 I always receive.
[emerg]: pcre_compile() failed: (VERB) not recognized in
“(UTF8)^/([^/^.]+)(?:/?)(?:index([0-9]).html?)?$” at
"8)^/([^/^.]+)(?:/?)(?:index([0-9]).html?)?$" in
/etc/nginx/sites-enabled/nexusddl.com:85
Yet my PCRE has UTF-8 support, tested it in PHP (both nginx and php
compiled against PCRElib included in debian)
Hello!
On Sat, Sep 25, 2010 at 11:43:48AM +1000, mat h wrote:
When attempting to use *UTF8 I always receive.
[emerg]: pcre_compile() failed: (VERB) not recognized in
“(UTF8)^/([^/^.]+)(?:/?)(?:index([0-9]).html?)?$” at
"8)^/([^/^.]+)(?:/?)(?:index([0-9]).html?)?$" in
/etc/nginx/sites-enabled/nexusddl.com:85
Yet my PCRE has UTF-8 support, tested it in PHP (both nginx and php
compiled against PCRElib included in debian)
You need at least pcre 7.9 for (*UTF8) support.
http://www.pcre.org/changelog.txt
[…]
š š š š š š š š š š š šif ($request_uri ~ “(UTF8)^(.)[\?|&]filename=([%
,a-zA-Z0-9\x{386}-\x{3ff}_-.]+)(&.*)?$”) {
š š š š š š š š š š š š š š š šset $fn $2;
š š š š š š š š š š š š}
Note that (*UTF8) is meaningless here as $request_uri doesn’t
contain utf-8 characters, it’s urlencoded.
Maxim D.
so, seems that the following is the most correct, (at least for me
worked well so far)
location ~ "^(/optionalwebappname)?/ProcessImageServlet.*$" {
set $th '';
if ($request_uri ~ "^(.*)[\\?|&]type=th(&.*)?$") {
set $th 'th_';
}
rewrite ^(.+)$
http://static-dev.myhost.eu/$arg_hotel_id/$th$arg_filename break;
expires 1d;
}
Again, my problem was to rewrite some urls with GET vars that were
expected to contain utf8 characters.
So, the $arg_filename is passed correctly to the rewrite.
Posted at Nginx Forum: