On Sun, Nov 23, 2008 at 12:05:48AM +0300, Maxim D. wrote:
The binary file is much smaller however.
optimal set of cidr’s to make this usable with native nginx geo
module, but this isn’t really easy task.
No, sinlge IP allocations may be equal to several CIDRs,
for example, some time ago I saw this IP range:
inetnum: 94.25.31.248 - 94.25.43.251
that is equal to 10 CIDRs:
94.25.31.248/29
94.25.32.0/21
94.25.40.0/23
94.25.42.0/24
94.25.43.0/25
94.25.43.128/26
94.25.43.192/27
94.25.43.224/28
94.25.43.240/29
94.25.43.248/30
And if you convert Maxmind GeoCity base file
GeoLiteCity_20081101/GeoLiteCity-Blocks.csv that has 3014818 ip ranges
you will get 4125519 CIDR - one third more.
The increase is due to IP allocations as I have showed above
and due to Maxmind errors - they may split single CIDR to 3 ranges as:
10.0.0.1-10.0.0.1
10.0.0.2-10.0.0.254
10.0.0.254-10.0.0.255
Has anybody been able to use the geo-city database with nginx? For
apache MaxMind provides mod_geoip which works on the binary file, making
it very fast.
Does anyone have any solution (like mod_geoip) for nginx? I’m using PECL
geoip for PHP and the one for ruby. But I feel, geo lookup at the server
level would be much faster.
Last week I have speeded up loading huge geo base (like Maxmind’s one),
it will be in 0.7.23. However, the memory footprint is large: Maxmind
base
takes about 250M on i386 (fortunately, the memory is shared between
master
and workers on VM copy-on-write basis).
Yesterday I investigated using ranges instead of CIDR, the in memory
base
will take about 25M as Maxmind’s one. However, the memory footprint in
top
will be the same as modern malloc()s in FreeBSD and probably Linux lazy
frees memory using madvise(MADV_FREE) and nginx uses a lot of memory
while handling the base on reconfigiration.
The search should be as fast as simple radix tree, may be even faster:
the simple radix tree goes through short loop, but it causes tens of TLB
and cache misses, while searching suitable range goes through longer
loop,
but it causes only several TLB and cache misses.
The only unhandy thing with ranges is range overriding to correct
external base errors. For example, to correct
10.0.0.1-10.0.0.1 1;
10.0.0.2-10.0.0.254 2;
10.0.0.254-10.0.0.255 1;
something like this should be used:
10.0.0.1-10.0.0.1 delete;
10.0.0.2-10.0.0.254 delete;
10.0.0.254-10.0.0.255 delete;
10.0.0.1-10.0.0.255 1;