GeoIP: anyway to pull out lat/lon?

Hi guys

I’ve just got the geo module working, and I’ve geo2nginx.pl’d the
maxmind geolite country data which is working great. However, for our
application I need to get more information (specifically a lat/lon) back
from nginx to pass to PHP.

Any way I can do this? Or is the geo module too simple for this task?

Phillip B Oldham
The Activity People
[email protected] mailto:[email protected]


Policies

This e-mail and its attachments are intended for the above named
recipient(s) only and may be confidential. If they have come to you in
error, please reply to this e-mail and highlight the error. No action
should be taken regarding content, nor must you copy or show them to
anyone.

This e-mail has been created in the knowledge that Internet e-mail is
not a 100% secure communications medium, and we have taken steps to
ensure that this e-mail and attachments are free from any virus. We must
advise that in keeping with good computing practice the recipient should
ensure they are completely virus free, and that you understand and
observe the lack of security when e-mailing us.

On Tue, May 06, 2008 at 04:06:06PM +0100, Phillip B Oldham wrote:

I’ve just got the geo module working, and I’ve geo2nginx.pl’d the
maxmind geolite country data which is working great. However, for our
application I need to get more information (specifically a lat/lon) back
from nginx to pass to PHP.

Any way I can do this? Or is the geo module too simple for this task?

The geo module simply maps ip to some string. You may use any strings.
Probably you need to modify geo2nginx.pl to process lat/lon.

Looking into that further, the data I’ve got for lat/lon at city level
adds up to 120Mb. I could convert this to a geo.conf file using
geo2nginx.pl, but what sort of impact would this have on nginx running?
Would each child process take up 120Mb ram? Would nginx slow down for
each request having to work through such a large data set?

Igor S. wrote:

The geo module simply maps ip to some string. You may use any strings.
Probably you need to modify geo2nginx.pl to process lat/lon.

Phillip B Oldham
The Activity People
[email protected] mailto:[email protected]


Policies

This e-mail and its attachments are intended for the above named
recipient(s) only and may be confidential. If they have come to you in
error, please reply to this e-mail and highlight the error. No action
should be taken regarding content, nor must you copy or show them to
anyone.

This e-mail has been created in the knowledge that Internet e-mail is
not a 100% secure communications medium, and we have taken steps to
ensure that this e-mail and attachments are free from any virus. We must
advise that in keeping with good computing practice the recipient should
ensure they are completely virus free, and that you understand and
observe the lack of security when e-mailing us.

I’ve not created the geo.conf file yet from the city data, but will do
so today and do some checks.

My main worry is that nginx is running on a virtual machine with only
300MB of ram. So if I’ve got a file around the 100MB mark and I’ve one
child, wouldn’t that mean 200MB of memory is being taken up? Even 100MB
of memory is a lot just to get a lat/lon of the visitor.

Igor S. wrote:

We are using 141240 lines geo file:

The geo module simply maps ip to some string. You may use any strings.
Probably you need to modify geo2nginx.pl to process lat/lon.

Phillip B Oldham
The Activity People
[email protected] mailto:[email protected]


Policies

This e-mail and its attachments are intended for the above named
recipient(s) only and may be confidential. If they have come to you in
error, please reply to this e-mail and highlight the error. No action
should be taken regarding content, nor must you copy or show them to
anyone.

This e-mail has been created in the knowledge that Internet e-mail is
not a 100% secure communications medium, and we have taken steps to
ensure that this e-mail and attachments are free from any virus. We must
advise that in keeping with good computing practice the recipient should
ensure they are completely virus free, and that you understand and
observe the lack of security when e-mailing us.

On Wed, May 07, 2008 at 03:59:06PM +0100, Phillip B Oldham wrote:

Looking into that further, the data I’ve got for lat/lon at city level
adds up to 120Mb. I could convert this to a geo.conf file using
geo2nginx.pl, but what sort of impact would this have on nginx running?
Would each child process take up 120Mb ram? Would nginx slow down for
each request having to work through such a large data set?

We are using 141240 lines geo file:

wc geo.conf
141240 282480 2979471 geo.conf

Could you show yours ? Also could you show pair lines of the file ?

Performance should not depend on file size if you have enough memory.
For geo map workers use the same memory inherited from parent on copy
on write basis. But as there are no writes to this memory, it remain
the same. Also duplicate values are stored only once.

What does

awk ‘{print $2}’ geo.conf | sort | uniq | wc -l

show ?

Ok, I’ve done part of the conversion, and here’s what I get:

wc -l geo.conf

3937100 geo.conf

Which is a little larger than the 141240 you’re using. Will such a large
file slow nginx down?

Phillip B Oldham wrote:

Phillip B Oldham
The Activity People
[email protected] mailto:[email protected]


Policies

This e-mail and its attachments are intended for the above named
recipient(s) only and may be confidential. If they have come to you in
error, please reply to this e-mail and highlight the error. No action
should be taken regarding content, nor must you copy or show them to
anyone.

This e-mail has been created in the knowledge that Internet e-mail is
not a 100% secure communications medium, and we have taken steps to
ensure that this e-mail and attachments are free from any virus. We must
advise that in keeping with good computing practice the recipient should
ensure they are completely virus free, and that you understand and
observe the lack of security when e-mailing us.

On Thu, May 08, 2008 at 09:22:58AM +0100, Phillip B Oldham wrote:

Ok, I’ve done part of the conversion, and here’s what I get:

wc -l geo.conf

3937100 geo.conf

Which is a little larger than the 141240 you’re using. Will such a large
file slow nginx down?

As I understand it should not. nginx uses radix tree for geo, the first
6
or 7 bits are available in single TLB miss and up to 6/7 data cache
misses.
If the most entries are /24 networks, then you will get 18/17 TLB and
data
cache misses in worst case.

On Wed, May 07, 2008 at 03:59:06PM +0100, Phillip B Oldham wrote:
We are using 141240 lines geo file:
What does

Probably you need to modify geo2nginx.pl to process lat/lon.
[email protected] mailto:[email protected]
This e-mail has been created in the knowledge that Internet e-mail is
not a 100% secure communications medium, and we have taken steps to
ensure that this e-mail and attachments are free from any virus. We must
advise that in keeping with good computing practice the recipient should
ensure they are completely virus free, and that you understand and
observe the lack of security when e-mailing us.


awk ‘{print $2}’ geo.conf | sort | uniq | wc -l

119313

So am I right in thinking that nginx only stores each unique value with
a mapping to the ip range(s)? If so, 119313 isn’t too bad, and comes in
a shade under your list.

Igor S. wrote:

We are using 141240 lines geo file:

The geo module simply maps ip to some string. You may use any strings.
Probably you need to modify geo2nginx.pl to process lat/lon.

Phillip B Oldham
The Activity People
[email protected] mailto:[email protected]


Policies

This e-mail and its attachments are intended for the above named
recipient(s) only and may be confidential. If they have come to you in
error, please reply to this e-mail and highlight the error. No action
should be taken regarding content, nor must you copy or show them to
anyone.

This e-mail has been created in the knowledge that Internet e-mail is
not a 100% secure communications medium, and we have taken steps to
ensure that this e-mail and attachments are free from any virus. We must
advise that in keeping with good computing practice the recipient should
ensure they are completely virus free, and that you understand and
observe the lack of security when e-mailing us.

On Thu, May 08, 2008 at 09:31:27AM +0100, Phillip B Oldham wrote:

awk ‘{print $2}’ geo.conf | sort | uniq | wc -l

119313

So am I right in thinking that nginx only stores each unique value with
a mapping to the ip range(s)? If so, 119313 isn’t too bad, and comes in
a shade under your list.

Yes, nginx stores uniq values only.

But memory is also required for the radix tree itself.

the same. Also duplicate values are stored only once.

On Tue, May 06, 2008 at 04:06:06PM +0100, Phillip B Oldham wrote:

Phillip B Oldham
should be taken regarding content, nor must you copy or show them to anyone.

This e-mail has been created in the knowledge that Internet e-mail is
not a 100% secure communications medium, and we have taken steps to
ensure that this e-mail and attachments are free from any virus. We must
advise that in keeping with good computing practice the recipient should
ensure they are completely virus free, and that you understand and
observe the lack of security when e-mailing us.


OK, I’ve merged the files I need to, and have generated a geo.conf. The
file size is huge: 140M! nginx actually fails to load on my virtual
machine which has 300M dedicated ram.

Unless anyone has any further suggestions, it looks as though I’ll have
to code up a plugin to query the maxmind .dat file for the information.

Igor S. wrote:

We are using 141240 lines geo file:

The geo module simply maps ip to some string. You may use any strings.
Probably you need to modify geo2nginx.pl to process lat/lon.

Phillip B Oldham
The Activity People
[email protected] mailto:[email protected]


Policies

This e-mail and its attachments are intended for the above named
recipient(s) only and may be confidential. If they have come to you in
error, please reply to this e-mail and highlight the error. No action
should be taken regarding content, nor must you copy or show them to
anyone.

This e-mail has been created in the knowledge that Internet e-mail is
not a 100% secure communications medium, and we have taken steps to
ensure that this e-mail and attachments are free from any virus. We must
advise that in keeping with good computing practice the recipient should
ensure they are completely virus free, and that you understand and
observe the lack of security when e-mailing us.

On Wed, May 14, 2008 at 02:03:35PM +0100, Phillip B Oldham wrote:

OK, I’ve merged the files I need to, and have generated a geo.conf. The
file size is huge: 140M! nginx actually fails to load on my virtual
machine which has 300M dedicated ram.

Unless anyone has any further suggestions, it looks as though I’ll have
to code up a plugin to query the maxmind .dat file for the information.

Probably, you may use maxmind php plugin: it does not load all base
in memory, but looks up file on runtime.

the same. Also duplicate values are stored only once.

On Tue, May 06, 2008 at 04:06:06PM +0100, Phillip B Oldham wrote:

Phillip B Oldham
should be taken regarding content, nor must you copy or show them to anyone.

This e-mail has been created in the knowledge that Internet e-mail is
not a 100% secure communications medium, and we have taken steps to
ensure that this e-mail and attachments are free from any virus. We must
advise that in keeping with good computing practice the recipient should
ensure they are completely virus free, and that you understand and
observe the lack of security when e-mailing us.