GeoIPCity with nginx

Hello list!

I know Maxmind’s GeoIP Country database can be used easily with nginx.
But what about their Geo-City database?

The default CSV database stands at > 100MB in size (and will grow even
larger if the two normalized files are merged together). For this
reason, using the CIDR format may not be feasible (due to excessive
memory requirement)

The binary file is much smaller however.

Has anybody been able to use the geo-city database with nginx? For
apache MaxMind provides mod_geoip which works on the binary file, making
it very fast.

Does anyone have any solution (like mod_geoip) for nginx? I’m using PECL
geoip for PHP and the one for ruby. But I feel, geo lookup at the server
level would be much faster.

Is there any official/3rd party geo-city module for nginx?

Thanks in advance.

Hello!

On Sat, Nov 22, 2008 at 05:18:31PM +0100, Bobby Dr wrote:

The binary file is much smaller however.
The problem with maxmind’s city database afaik is that text
information they provide aren’t cidrs, but ip ranges. This is
generaly good for relation databases, but worst case for those who
are able to work with cidrs.

Binary file afaik is radix tree dump with real cidrs, that’s why it’s
much smaller.

Theoretically it should be possible to collapse ip ranges to
optimal set of cidr’s to make this usable with native nginx geo
module, but this isn’t really easy task.

Has anybody been able to use the geo-city database with nginx? For
apache MaxMind provides mod_geoip which works on the binary file, making
it very fast.

Does anyone have any solution (like mod_geoip) for nginx? I’m using PECL
geoip for PHP and the one for ruby. But I feel, geo lookup at the server
level would be much faster.

Is there any official/3rd party geo-city module for nginx?

AFAIK, no.

Maxim D.

Maxim D. wrote:

Theoretically it should be possible to collapse ip ranges to
optimal set of cidr’s to make this usable with native nginx geo
module, but this isn’t really easy task.

I’ve written a geoip module from grounds up that operates on a CSV file.
Once you get down to city level or even ISP level IP searches, the
current approach of operating on netmaskable blocks takes up a lot of
memory. Trees seem to be a better option.

Is there any official/3rd party geo-city module for nginx?

AFAIK, no.

Unfortunately, my module is not something that I can release right away
since I store a lot of additional data against each IP range and my code
currently assumes those fields exist. I’ll release a cleaner version
once I figure how to refactor the code.

Hello!

On Sun, Nov 23, 2008 at 02:12:25PM +0300, Igor S. wrote:

memory requirement)

And if you convert Maxmind GeoCity base file
GeoLiteCity_20081101/GeoLiteCity-Blocks.csv that has 3014818 ip ranges
you will get 4125519 CIDR - one third more.

The increase is due to IP allocations as I have showed above
and due to Maxmind errors - they may split single CIDR to 3 ranges as:

10.0.0.1-10.0.0.1
10.0.0.2-10.0.0.254
10.0.0.254-10.0.0.255

This is somewhat obvious. I’m talking about situation where from
small number of CIDRs multiple non-overlapping ranges are produced, e.g.

10.0.0.0/8 1;
10.255.255.127/32 2;

will result in the following ranges:

10.0.0.0-10.255.255.126 1;
10.255.255.127-10.255.255.127 2;
10.255.255.128-10.255.255.255 1;

and this in turn will result in huge number of CIDRs.

takes about 250M on i386 (fortunately, the memory is shared between master
and workers on VM copy-on-write basis).

Sounds good anyway. :slight_smile:

Yesterday I investigated using ranges instead of CIDR, the in memory base
will take about 25M as Maxmind’s one. However, the memory footprint in top
will be the same as modern malloc()s in FreeBSD and probably Linux lazy
frees memory using madvise(MADV_FREE) and nginx uses a lot of memory
while handling the base on reconfigiration.

The search should be as fast as simple radix tree, may be even faster:
the simple radix tree goes through short loop, but it causes tens of TLB
and cache misses, while searching suitable range goes through longer loop,
but it causes only several TLB and cache misses.

This should be an intresting alternative for range-centric bases.

10.0.0.2-10.0.0.254 delete;
10.0.0.254-10.0.0.255 delete;
10.0.0.1-10.0.0.255 1;

As far as I understand, with CIDRs one anyway have to define identical
CIDR to override erroneous one, no? What’s wrong with same
aproach applied to ranges?

E.g.

10.0.0.2-10.0.0.254 1;

should be enough to correct error in the above example, as in

10.0.0.0/24 2;
10.0.0.1/32 1;
10.0.0.254/31 1;

it’s enough to add

10.0.0.0/24 1;

The only problem with ranges I see is that if somebody will add
something like

10.0.0.2-10.0.0.2 1;

to original database, he will probably also modify
10.0.0.2-10.0.0.254 to be 10.0.0.3-10.0.0.254, and private
modifications will in turn require modifications.

Maxim D.

On Sun, Nov 23, 2008 at 12:05:48AM +0300, Maxim D. wrote:

The binary file is much smaller however.
optimal set of cidr’s to make this usable with native nginx geo
module, but this isn’t really easy task.

No, sinlge IP allocations may be equal to several CIDRs,
for example, some time ago I saw this IP range:

inetnum: 94.25.31.248 - 94.25.43.251

that is equal to 10 CIDRs:

94.25.31.248/29
94.25.32.0/21
94.25.40.0/23
94.25.42.0/24
94.25.43.0/25
94.25.43.128/26
94.25.43.192/27
94.25.43.224/28
94.25.43.240/29
94.25.43.248/30

And if you convert Maxmind GeoCity base file
GeoLiteCity_20081101/GeoLiteCity-Blocks.csv that has 3014818 ip ranges
you will get 4125519 CIDR - one third more.

The increase is due to IP allocations as I have showed above
and due to Maxmind errors - they may split single CIDR to 3 ranges as:

10.0.0.1-10.0.0.1
10.0.0.2-10.0.0.254
10.0.0.254-10.0.0.255

Has anybody been able to use the geo-city database with nginx? For
apache MaxMind provides mod_geoip which works on the binary file, making
it very fast.

Does anyone have any solution (like mod_geoip) for nginx? I’m using PECL
geoip for PHP and the one for ruby. But I feel, geo lookup at the server
level would be much faster.

Last week I have speeded up loading huge geo base (like Maxmind’s one),
it will be in 0.7.23. However, the memory footprint is large: Maxmind
base
takes about 250M on i386 (fortunately, the memory is shared between
master
and workers on VM copy-on-write basis).

Yesterday I investigated using ranges instead of CIDR, the in memory
base
will take about 25M as Maxmind’s one. However, the memory footprint in
top
will be the same as modern malloc()s in FreeBSD and probably Linux lazy
frees memory using madvise(MADV_FREE) and nginx uses a lot of memory
while handling the base on reconfigiration.

The search should be as fast as simple radix tree, may be even faster:
the simple radix tree goes through short loop, but it causes tens of TLB
and cache misses, while searching suitable range goes through longer
loop,
but it causes only several TLB and cache misses.

The only unhandy thing with ranges is range overriding to correct
external base errors. For example, to correct

10.0.0.1-10.0.0.1 1;
10.0.0.2-10.0.0.254 2;
10.0.0.254-10.0.0.255 1;

something like this should be used:

10.0.0.1-10.0.0.1 delete;
10.0.0.2-10.0.0.254 delete;
10.0.0.254-10.0.0.255 delete;
10.0.0.1-10.0.0.255 1;