How to separate robot access log and human access log

Hi all,

I’m trying to separate the robot access log and human access log, so I’m
using below configuration:

http {

map $http_user_agent $ifbot {
default 0;
“~*rogerbot” 3;
“~*ChinasoSpider” 3;
“~*Yahoo” 1;
“~*Bot” 1;
“~*Spider” 1;
“~*archive” 1;
“~*search” 1;
“~*Yahoo” 1;
“~Mediapartners-Google” 1;
“~*bingbot” 1;
“~*YandexBot” 1;
“~*Feedly” 2;
“~*Superfeedr” 2;
“~*QuiteRSS” 2;
“~*g2reader” 2;
“~*Digg” 2;
“~*trendiction” 3;
“~*AhrefsBot” 3;
“~*curl” 3;
“~*Ruby” 3;
“~*Player” 3;
“~*Go\ http\ package” 3;
“~*Lynx” 3;
“~*Sleuth” 3;
“~*Python” 3;
“~*Wget” 3;
“~*perl” 3;
“~*httrack” 3;
“~*JikeSpider” 3;
“~*PHP” 3;
“~*WebIndex” 3;
“~*magpie-crawler” 3;
“~*JUC” 3;
“~*Scrapy” 3;
“~*libfetch” 3;
“~*WinHTTrack” 3;
“~*htmlparser” 3;
“~*urllib” 3;
“~*Zeus” 3;
“~*scan” 3;
“~*Indy\ Library” 3;
“~*libwww-perl” 3;
“~*GetRight” 3;
“~*GetWeb!” 3;
“~*Go!Zilla” 3;
“~*Go-Ahead-Got-It” 3;
“~*Download\ Demon” 3;
“~*TurnitinBot” 3;
“~*WebscanSpider” 3;
“~*WebBench” 3;
“~*YisouSpider” 3;
“~*check_http” 3;
“~*webmeup-crawler” 3;
“~*omgili” 3;
“~*blah” 3;
“~*fountainfo” 3;
“~*MicroMessenger” 3;
“~*QQDownload” 3;
“~*shoulu.jike.com” 3;
“~*omgilibot” 3;
“~*pyspider” 3;
}

}

And in server part, I’m using:

if ($ifbot = "1") {
set $spiderbot 1;

}
if ($ifbot = “2”) {
set $rssbot 1;
}
if ($ifbot = “3”) {
return 403;
access_log /web/log/badbot.log main;
}

access_log /web/log/location_access.log main;
access_log /web/log/spider_access.log main if=$spiderbot;
access_log /web/log/rssbot_access.log main if=$rssbot;

But it seems that nginx still writes some robot logs in to both
location_access.log and spider_access.log.

How can I separate the logs for the robot?

And another questions is that some robot logs are not written to
spider_access.log but exist in location_access.log. It seems that my map
is
not working. Is anything wrong when I define “map”?

Posted at Nginx Forum:

On Mon, Apr 27, 2015 at 07:45:17PM -0400, meteor8488 wrote:

Hi there,

I’m trying to separate the robot access log and human access log, so I’m
using below configuration:

“if=” on the access_log line is what you want.

access_log /web/log/location_access.log main;

“No ‘if=’ there” means “log all requests to this file”. (Unless
overridden
later.)

access_log /web/log/spider_access.log main if=$spiderbot;
access_log /web/log/rssbot_access.log main if=$rssbot;

But it seems that nginx still writes some robot logs in to both
location_access.log and spider_access.log.

Do you mean “some” or “all”?

How can I separate the logs for the robot?

If you want /web/log/location_access.log to only log some requests,
add an “if=” to mark the requests that you want logged.

And another questions is that some robot logs are not written to
spider_access.log but exist in location_access.log. It seems that my map is
not working. Is anything wrong when I define “map”?

Example?

Anything written to /web/log/rssbot_access.log would match that
description, but I guess that’s not what you mean.

f

Francis D. [email protected]

thanks for your reply.

I know that I can use if to enable conditional logging.
But what I want to do is

if $spiderbot=0, then log to location_access.log
if $spiderbot=1, then log to spider_access.log.

And I don’t want the same logs write to different files.
How can I do that?

thanks

Posted at Nginx Forum:

On Tue, May 05, 2015 at 08:20:35AM -0400, meteor8488 wrote:

Hi there,

if $spiderbot=0, then log to location_access.log

Set a variable which is non-zero when $spiderbot=0, and which is zero
or blank otherwise. Use that as the access_log if=$variable for
location_access.log.

if $spiderbot=1, then log to spider_access.log.

Set a variable which is non-zero when $spiderbot=1, and which is zero
or blank otherwise. ($spiderbot is probably perfect for this as-is.) Use
that as the access_log if=$variable for spider_access.log.

And I don’t want the same logs write to different files.

For each loggable request, make sure that exactly one of your
if=$variable
variables is non-zero.

f

Francis D. [email protected]