Logging inconsistencies during apparent DoS

John_Barratt · July 26, 2008, 8:55am

Hi,
We have been having problems with an apparent SYN-flood DoS attack.
However there are are inconsistencies with the resulting log entries in
nginx that along with the environment it is in, make me wonder if it
really is a DoS attack, and/or there is something else going wrong.

We are running nginx 0.6.31 on OSX 10.5 Server. Details of the problem
go something like this :

We would see hundreds of active connections start to pile up on nginx,
to the point where the site would become un-responsive.
It appears that one of the internal buffers had filled up at times,
though this was not exactly matched to the site being un-responsive.
The warning in the error log was :

2008/07/26 07:29:55 [warn] 36499#0: *15481336 kqueue change list is
filled up, client: 210.8.118.47, server: www.host.com, request: “GET
/index.html HTTP/1.1”, host: “www.host.com”, referrer:
“http://www.host.com/”

At the same time as this was happening we would be getting a bunch of
400’s from reserved, or unroutable IPs being logged eg :

248.48.44.0 - - [26/Jul/2008:08:05:03 +1000] “-” 400 0 “-” “-” “-”

We would get 100’s of these logged per minute, I am assuming many
didn’t get a chance to be logged, waiting for a timeout.
Mixed in amongst these requests though were occasional requests that
looked like perfectly valid redirects, except for the fact that they had
the same invalid IP as above (248.48.44.0), and there was a zero byte
response reported, eg :

248.240.43.0 host.com - [05/Jul/2008:06:26:38 +1000] “GET / HTTP/1.1”
301 0 “-” “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1;
.NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)” “-”

Normal redirects typically record a 178 byte response size.
These requests seem to form a valid part of a user browsing the site,
as at similar times there were often other valid 200, and 301 requests
logged that had matching user agents and sometimes cookies, and with a
different, valid IP address.
Also of note is that that particular IP address is explicitly blocked
upstream via the range it is in, and also nginx is behind a load
balancer that should only pass off connections that have already been
established. So un-routable connections should not be able to get
through.

Any comments/thoughts/suggestions veria much appreciated as to whether
this is indeed perhaps a problem with nginx and/or somewhere upstream.

Thanks,

John Barratt.

John_Barratt · July 26, 2008, 2:41pm

Hi!

In every modern operating system including: linux*, *bsd, a couple of
other unix-like systems there is syn cookie to avoid the situation when
somebody flood your server with only SYN packets starting thousands of
webserver process

http://cr.yp.to/syncookies.html

on linux:

echo 1 > /proc/sys/net/ipv4/tcp_syncookies

on freebsd

sysctl -w net.inet.tcp.syncookies=1

I dont know that much MacOS but I guess you have but try to search
something like this with sysctl -a | grep syn and probably there is the
same sysctl.

Regards,
Istvan

John_Barratt · July 27, 2008, 12:27am

Hi Istvan,
Thanks for your response. Sorry, I perhaps should have been clearer
in
saying it was an apparent attack. It looked like a syn attack, but
there were a number of inconsistencies that indicated it was actually
something else, and potentially a problem within nginx.

Istvan Szukacs wrote:

In every modern operating system including: linux*, *bsd, a couple of
other unix-like systems there is syn cookie to avoid the situation when
somebody flood your server with only SYN packets starting thousands of
webserver process

http://cr.yp.to/syncookies.html
The servers affected by this problem are behind a firewall, and a load
balancer that should be preventing these getting through. The load
balancer will only hand off a connection to the web server after a full
SYN/SYN-ACK/ACK handshake.

sysctl -w net.inet.tcp.syncookies=1

I dont know that much MacOS but I guess you have but try to search
something like this with sysctl -a | grep syn and probably there is the
same sysctl.
This particular variable doesn’t exist on OSX 10.5 (nor anything like
it), so not sure how it would handle this situation.

The main inconsistency is the fact that we are seeing logged requests
that would have had to have a full TCP connection established, from an
unroutable (reserved space) address. eg :

248.240.43.0 host.com - [05/Jul/2008:06:26:38 +1000] “GET / HTTP/1.1”
301 0 “-” “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1;
.NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)” “-”

So either nginx is logging incorrect information (and hence perhaps
pointing to a broader problem), or the load balancer is badly
translating connections though from the virtual IP.

I will add in extra firewalling at the individual hosts in an attempt to
help prevent this. However I am reluctant to do this without first
making sure we aren’t just covering up a larger problem, that will come
back to bite us soon.

Thanks again,

JB.

John_Barratt · July 27, 2008, 8:05am

On Sat, Jul 26, 2008 at 04:45:56PM +1000, John Barratt wrote:

It appears that one of the internal buffers had filled up at times,
though this was not exactly matched to the site being un-responsive.
The warning in the error log was :

2008/07/26 07:29:55 [warn] 36499#0: *15481336 kqueue change list is
filled up, client: 210.8.118.47, server: www.host.com, request: “GET
/index.html HTTP/1.1”, host: “www.host.com”, referrer:
“http://www.host.com/”

This is not error, but just warning indicating peak load.
You may have other kernel network buffers filled up.

It seems that MacOSX has no FreeBSD’s syncache and syncookies.

the same invalid IP as above (248.48.44.0), and there was a zero byte
logged that had matching user agents and sometimes cookies, and with a
different, valid IP address.

Also of note is that that particular IP address is explicitly blocked
upstream via the range it is in, and also nginx is behind a load
balancer that should only pass off connections that have already been
established. So un-routable connections should not be able to get through.

Any comments/thoughts/suggestions veria much appreciated as to whether
this is indeed perhaps a problem with nginx and/or somewhere upstream.

If you have no set set_real_ip_from directive, then nginx loggs IP that
it gets from accept() syscall.

You may set debug logging for request from these addresses:

events {
debug_connection 240.0.0.0/4;