Is it possible somehow to use the global modifier in a regex map match?
I’m trying to use the map directive to filter the query string leaving
my proxy_cache_key with only known parameters.
For the first test I’ve tried to use a map like below to just catch all
parameters without really filtering them without success. In this case,
the variable $args_filtered ends empty.
map $args $args_filtered {
"~(?<list>[^=]*=[^&]+)/g" $list;
default /;
}
When I try the same map, without the /g modifier at the end of the
expression, the variable $args_filtered ends with only the first query
string parameter in it.
Hi Maxim, first thank you very much for your answer!
Maxim D. wrote in post #1154662:
Hello!
…
Note well that even if you’ll be able to filter arguments, there
is an additional problem of order of the arguments.
The same problem would occur using the variable $args right?
Simpliest way to normalize arguments is to use all of them in
proxy_cache_key, like this:
proxy_cache_key $proxy_host$uri$is_args$arg_foo:$arg_bar;
I’ve tried this, but it seems that nginx only catches the first
occurence of the parameter, resulting in 2 different requests cached
with the same key.
Example:
?fq=xxxxxx&sm=0&PageNumber=1 and
?fq=xxxxxx&sm=0&PageNumber=1&fq=yyyyyyyyyy
Are returning the same content. Is there a way to avoid this behavior?
Hello!
On Fri, Aug 08, 2014 at 08:37:39AM +0200, Gabriel Arrais wrote:
"~(?<list>[^=]*=[^&]+)/g" $list;
default /;
}
When I try the same map, without the /g modifier at the end of the
expression, the variable $args_filtered ends with only the first query
string parameter in it.
No, it’s not supported. To use “/g”, one have to do regexp
matching multiple times and do something with the results of each
match, and this isn’t something nginx knows how to do. |(In perl,
this usually what happens automatically in substitution, “s///”,
but needs writing code when matching with “m//”.)
Note well that even if you’ll be able to filter arguments, there
is an additional problem of order of the arguments.
Simpliest way to normalize arguments is to use all of them in
proxy_cache_key, like this:
proxy_cache_key $proxy_host$uri$is_args$arg_foo:$arg_bar;
–
Maxim D.
http://nginx.org/
Hello!
On Fri, Aug 08, 2014 at 08:27:54PM +0200, Gabriel Arrais wrote:
?fq=xxxxxx&sm=0&PageNumber=1&fq=yyyyyyyyyy
return $filtered_args;
}
';
proxy_cache_key $host:$uri?$filtered_args;
Will do the job?
Doing this with embedded perl snippet will be more or less
trivial, yes. Note though that in case of multiple arguments with
the same name it may be important to preserve their order.
I also suspect that split() + grep may be better/easier than a
single regular expression to match all needed arguments.
–
Maxim D.
http://nginx.org/
Hello!
On Fri, Aug 08, 2014 at 05:06:55PM +0200, Gabriel Arrais wrote:
The same problem would occur using the variable $args right?
Sure.
Example:
?fq=xxxxxx&sm=0&PageNumber=1 and
?fq=xxxxxx&sm=0&PageNumber=1&fq=yyyyyyyyyy
Are returning the same content. Is there a way to avoid this behavior?
There is no easy one, as nginx itself doesn’t know how to work
with multiple arguments with the same name. You may try to build
a regex to extract second argument with the given name (3rd, 4th,
and so on) and include these into the cache key as well.
–
Maxim D.
http://nginx.org/
Maxim D. wrote in post #1154708:
Hello!
On Fri, Aug 08, 2014 at 08:27:54PM +0200, Gabriel Arrais wrote:
?fq=xxxxxx&sm=0&PageNumber=1&fq=yyyyyyyyyy
return $filtered_args;
}
';
proxy_cache_key $host:$uri?$filtered_args;
Will do the job?
Doing this with embedded perl snippet will be more or less
trivial, yes. Note though that in case of multiple arguments with
the same name it may be important to preserve their order.
In our case the order is not important, so the cache performance talks
louder =)
I also suspect that split() + grep may be better/easier than a
single regular expression to match all needed arguments.
Yes, certainly it would be easier.
Again, thank you so much for the quick responses and the attention.
Maxim D. wrote in post #1154691:
Hello!
On Fri, Aug 08, 2014 at 05:06:55PM +0200, Gabriel Arrais wrote:
The same problem would occur using the variable $args right?
Sure.
Example:
?fq=xxxxxx&sm=0&PageNumber=1 and
?fq=xxxxxx&sm=0&PageNumber=1&fq=yyyyyyyyyy
Are returning the same content. Is there a way to avoid this behavior?
There is no easy one, as nginx itself doesn’t know how to work
with multiple arguments with the same name. You may try to build
a regex to extract second argument with the given name (3rd, 4th,
and so on) and include these into the cache key as well.
I think that it would end in a complicated solution this way…
Do you think that a perl code like
perl_set $filtered_args ’
sub {
my $r = shift;
my $args=$r->args;
my @parts = $args =~ /[DESIRED_REGEX]/g;
@parts = sort @parts;
$filtered_args = join("&", @parts);
return $filtered_args;
}
';
proxy_cache_key $host:$uri?$filtered_args;
Will do the job?
I’m trying it right now.
Again, thank you for your time.