GeoIP Module

Hey Guys, I have run into a problem with the geo module. I have set up a
geo list containing a LARGE list of IPs which we need to have
“whitelisted” for getting through to the upstream. These IPs are for
search engines. Currently we have the list set up via the following
way…

geo $remote_addr $search {
default 0;
include geoip-search.conf;
}

The geoip-search.conf file contains a the list of IPs in the following
format…

114.111.36.26/32 search;
114.111.36.28/32 search;
114.111.36.29/32 search;
114.111.36.30/32 search;
114.111.36.31/32 search;
114.111.36.32/32 search;
119.63.193.100/32 search;
119.63.193.101/32 search;
119.63.193.102/32 search;
119.63.193.103/32 search;

Then inside of the configurations, we do the following… which was
based on recommendations from Igor…

if ( $search = search ) {
proxy_pass http://LB_HTTP_UPSTREAM;
break;
}

Then under that we also have some stuff for security which checks for a
cookie and stuff serving them a different page if no cookie is present.
We want the search engine IPs to be able to make it through to the
upstream, but it appears that this is no longer occurring. We had no
problems in the past… Perhaps it is due to something in 0.8.53 as we
had upgraded to that a while ago, and then after a while we got
complaints of google bots not getting through. Our list contains about
40,000 lines which covers well over 100,000 IPs. Anyone have any ideas
on what could be causing this?

Posted at Nginx Forum:

Sorry, this is not the GeoIP module, rather it’s the geo module.

Posted at Nginx Forum:

On Wed, Dec 01, 2010 at 05:05:56AM -0500, Nam wrote:

119.63.193.101/32 search;

Then under that we also have some stuff for security which checks for a
cookie and stuff serving them a different page if no cookie is present.
We want the search engine IPs to be able to make it through to the
upstream, but it appears that this is no longer occurring. We had no
problems in the past… Perhaps it is due to something in 0.8.53 as we
had upgraded to that a while ago, and then after a while we got
complaints of google bots not getting through. Our list contains about
40,000 lines which covers well over 100,000 IPs. Anyone have any ideas
on what could be causing this?

It should work. Could you create debug log of the request ?
BTW, you may compress geo file using this script:


#!/usr/bin/perl -w

use Net::CIDR::Lite;
use strict;
use warnings;

my %cidr;

while (<>) {
if (/^(\S+)\s+(\S+);/) {
my($net, $region) = ($1, $2);
if (!defined $cidr{$region}) {
$cidr{$region} = Net::CIDR::Lite->new;
}
$cidr{$region}->add($net);
}
}

for my $region (sort { $a cmp $b } keys %cidr) {
print((join " $region;\n", $cidr{$region}->list), " $region;\n");
}

For example, the 10 above lines are compressed to just 4:

114.111.36.26/32 search;
114.111.36.28/30 search;
114.111.36.32/32 search;
119.63.193.100/30 search;

Also, if you use an original client $remote_addr, then this

-geo $remote_addr $search {
+geo $search {
default 0;
include geoip-search.conf;
}

will work slightly faster.

Also, you may avoid “if”:

geo $search {
default usual_upstream;
… search_upstream;
… search_upstream;
… search_upstream;
… search_upstream;

}

upstream search_upstream {

}

upstream usual_upstream {

}

server {
location / {
proxy_pass http://$search;
}


Igor S.
http://sysoev.ru/en/

On Wed, Dec 01, 2010 at 01:23:17PM -0500, Nam wrote:

upstream. These IPs are for
of IPs in the following

119.63.193.102/32 search;
}
something in 0.8.53 as we

That may be a bit difficult… do you need to see the debug log from the
requests NOT getting through, or just any request? Our servers are
currently pushing well over 150mbps of traffic right now, and we cannot
put it into debug mode and start messing around, but we can test it out
on our test server, and get just a single request worth of debug log
data.

You may enable the debug log for some addresses only:
http://nginx.org/en/docs/debugging_log.html


Igor S.
http://sysoev.ru/en/

Igor S. Wrote:

up via the following

Then under that we also have some stuff for
a while we got
complaints of google bots not getting through.
Our list contains about
40,000 lines which covers well over 100,000 IPs.
Anyone have any ideas
on what could be causing this?

It should work. Could you create debug log of the
request ?

That may be a bit difficult… do you need to see the debug log from the
requests NOT getting through, or just any request? Our servers are
currently pushing well over 150mbps of traffic right now, and we cannot
put it into debug mode and start messing around, but we can test it out
on our test server, and get just a single request worth of debug log
data.

for my $region (sort { $a cmp $b } keys %cidr) {
114.111.36.32/32 search;
119.63.193.100/30 search;

Awesome, we will have to use that to run through our large list. We have
a lot of IPs in that list, so this script would be handy. Thanks Igor.

will work slightly faster.
Sounds good, we will implement that as well.

}
location / {
proxy_pass http://$search;
}

This is a very nice idea/feature, but it will not work in our case
unfortunately because we use this list across many sites we host. Some
sites have additional security features in place which needs to always
be bypassed for search engine crawlers and our monitoring systems. In
configs which use security features, we include if statements to ensure
that they get a proxy_pass to that configs upstream.


Igor S.
Igor Sysoev


nginx mailing list
[email protected]
nginx Info Page

Posted at Nginx Forum:

Igor S. Wrote:

On Wed, Dec 01, 2010 at 01:23:17PM -0500, Nam
wrote:

You may enable the debug log for some addresses
only:
A debugging log

Yes, I understand that, but in the past on our production boxes, we have
found that simply having the --debug compilation parameter caused
instabilities and even crashes. It could have been unique to the version
we used at the time, but since then, we have stopped using the --debug
flag on our production machines. We also currently have these machines
running with a lot of traffic… currently over 250mb/s of traffic
running behind a hardware load balancer to 6 servers and would prefer
not to take them down to bring our customers into a possibly unstable
situation.

I am going to get some sleep, and then run some more tests to see if I
can provide more information. I will check this thread when I get up for
and possibly insight or anything else. Perhaps I can run a HTTP
benchmark test while analyzing the logs, still… that’s going to be a
LOT of log data to go through when trying to replicate load to spot the
issue I think. I will do my best to find out more information though.
Thanks Igor


Igor S.
Igor Sysoev


nginx mailing list
[email protected]
nginx Info Page

Posted at Nginx Forum: