A bit confused

I’m tryiong to make some sense out of this and am left a bit cold! What
could cause this:

( I’ve left out any attempt at anonymising in case I hide something )

From the docroot…

$ ls -l images/models/Lapierre/Overvolt*
-rw-r–r-- 1 right-bike right-bike 342373 Jun 11 20:09
images/models/Lapierre/Overvolt FS.png
-rw-r–r-- 1 right-bike right-bike 318335 Jun 11 20:09
images/models/Lapierre/Overvolt HT.png

$ curl -I right.bike
FS.png
HTTP/1.1 200 OK
Server: nginx/1.9.1
Date: Fri, 12 Jun 2015 01:47:14 GMT
Content-Type: image/png
Last-Modified: Thu, 11 Jun 2015 10:09:52 GMT
ETag: “55795e70-53965”
Expires: Sat, 13 Jun 2015 01:47:14 GMT
Cache-Control: max-age=86400
Accept-Ranges: bytes
Content-Length: 342373
Connection: Keep-Alive

$ curl -I right.bike
HT.png
HTTP/1.1 400 Bad Request
Server: nginx/1.9.1
Date: Fri, 12 Jun 2015 01:47:05 GMT
Content-Type: text/html
Content-Length: 172
Connection: close

The second one shows no entry at all in the access log but I can’t find
any reason why they’re processed differently at all.

Suggestions please!


Steve H. BSc(Hons) MIITP

Linkedin: http://www.linkedin.com/in/steveholdoway
Skype: sholdowa

Just a quick addition… I’ve tried it from this office, which is IPv4,
and from IPv6 enabled locations. This makes no difference.

On 12/06/15 13:50, steve wrote:

-rw-r–r-- 1 right-bike right-bike 318335 Jun 11 20:09
ETag: “55795e70-53965”
Date: Fri, 12 Jun 2015 01:47:05 GMT
Content-Type: text/html
Content-Length: 172
Connection: close

The second one shows no entry at all in the access log but I can’t
find any reason why they’re processed differently at all.

Suggestions please!


Steve H. BSc(Hons) MIITP

Linkedin: http://www.linkedin.com/in/steveholdoway
Skype: sholdowa

Interesting I tried at my side got the same results but it does work
like this:

curl -I
right.bike
HTTP/1.1 200 OK
Server: nginx/1.9.1
Date: Fri, 12 Jun 2015 01:55:27 GMT
Content-Type: image/png
Content-Length: 318335
Last-Modified: Thu, 11 Jun 2015 10:09:54 GMT
Connection: keep-alive
ETag: “55795e72-4db7f”
Expires: Sat, 13 Jun 2015 01:55:27 GMT
Cache-Control: max-age=86400
Accept-Ranges: bytes

Melhores Cumprimentos // Best Regards

Miguel C.
IT - Sys Admin & Developer

Aargh!

On 12/06/15 13:56, Miguel C. wrote:

ETag: “55795e72-4db7f”

Accept-Ranges: bytes
Connection: close

The second one shows no entry at all in the access log but I can’t find
any reason why they’re processed differently at all.

Suggestions please!

Well, looks like there’s a workaround probably available. This happens
to about 20 out of 700 files…

Thanks for the lateral thinking.

Steve


Steve H. BSc(Hons) MIITP

Linkedin: http://www.linkedin.com/in/steveholdoway
Skype: sholdowa

On Fri, Jun 12, 2015 at 3:00 AM, steve [email protected] wrote:

HTTP/1.1 200 OK

Just a quick addition… I’ve tried it from this office, which is IPv4,

( I’ve left out any attempt at anonymising in case I hide something )
$ curl -I right.bike
Content-Length: 342373

The second one shows no entry at all in the access log but I can’t find
any reason why they’re processed differently at all.

Suggestions please!

Well, looks like there’s a workaround probably available. This happens to
about 20 out of 700 files…

Thanks for the lateral thinking.

NP, I usually go for %20 cause its what browsers do anyway, but its
indeed interesting that curls works for some and not others, what does
nginx error log tells you??

BTW, I test a few more URLS, and all others give 404, but anything
with “right.bike H***”
fails with 400

not that only “Overvolt\ H” and "Overvolt H fails not “Overvolt\ h”
or “Overvolt h”

I just have no clue why, maybe something in the config
Melhores Cumprimentos // Best Regards

A bit more into…

On 12/06/15 14:15, Miguel C. wrote:

Miguel C.
IT - Sys Admin & Developer

It seems to be objecting to the string ’ H’ in the URL.


Steve H. BSc(Hons) MIITP

Linkedin: http://www.linkedin.com/in/steveholdoway
Skype: sholdowa

Hmm…

On 12/06/15 14:31, steve wrote:

I just have no clue why, maybe something in the config
Melhores Cumprimentos // Best Regards

Miguel C.
IT - Sys Admin & Developer

It seems to be objecting to the string ’ H’ in the URL.

Have tried on a number of different installs, 1.7 to 1.9 by touching
then attempting to access ‘f H.png’.

Most of my configs are for php-based CMSes, but I still get the same 400
code for static, cookie free setups ( that one was 1.7.1 ).

Should I be raising a bug, and if so, can someone point me towards a
howto please?

Steve


Steve H. BSc(Hons) MIITP

Linkedin: http://www.linkedin.com/in/steveholdoway
Skype: sholdowa

On Fri, Jun 12, 2015 at 01:50:15PM +1200, steve wrote:

Hi there,

I’m tryiong to make some sense out of this and am left a bit cold!
What could cause this:

Both requests are invalid - “space” may not appear in a url. Encode it
as %20 and things will work.

nginx happens to try one form of “dwim” error recovery when the
character
after the invalid space(s) is not “H”, and does not try it when the
character
is “H”.

$ curl -I right.bike FS.png
HTTP/1.1 200 OK

$ curl -I right.bike HT.png
HTTP/1.1 400 Bad Request

The second one shows no entry at all in the access log but I can’t
find any reason why they’re processed differently at all.

Suggestions please!

I presume that the nginx request-line parser stops at the whitespace
which
says “end of url, what follows is the HTTP version”, sees that it does
not start with “H”, and decides “perhaps this is an invalid url; I’ll
carry on parsing and maybe I can helpfully handle this broken request”;
or sees that it does start with “H” and decides “clearly this was the
end
of the url, I shall now identify the HTTP request version; oh, it’s
broken,
error 400”.

You could argue that nginx could try an extra level of dwimmery to try
to drag something useful out of the second broken request; or you could
argue that it should fail the first broken request as well.

Or you could accept that the client has broken the protocol, and the
server is mostly free to do what it likes in response.

I suspect that “fail the first broken request” won’t happen, as a
practical QoI matter; and “try to accept the second broken request”
might happen if someone who cares can provide a low-impact patch –
it’s easy for me to say “it’s a Simple Matter of Programming”, because
I don’t intend to write the patch :wink:

But “don’t make invalid requests” is the way to see the bicycle.

f

Francis D. [email protected]

Hello

Can you try something command like the ones at

to see if you have special chars in some of the filenames?

Hi,

On 12/06/15 18:59, Francis D. wrote:

after the invalid space(s) is not “H”, and does not try it when the character
I presume that the nginx request-line parser stops at the whitespace which

f
I have 750 image files, many of them have spaces in their names. The
example I showed, and the 30 that deliver a 400 bad request status all
contain a ’ H’ in the file name. ’ h’, ’ G’ and most things similar
return a 200 status.

No matter what, one passes and one fails. It’s not repeatable behaviour.
So don’t succeed riding the bicycle most of the time is the way I see
it.

The first time I’ve ever disagreed with you Francis!

Steve

On Fri, Jun 12, 2015 at 07:38:19PM +1200, Steve H. wrote:

On 12/06/15 18:59, Francis D. wrote:

On Fri, Jun 12, 2015 at 01:50:15PM +1200, steve wrote:

Hi there,

$ curl -I right.bike FS.png
HTTP/1.1 200 OK
$ curl -I right.bike HT.png
HTTP/1.1 400 Bad Request

I suspect that “fail the first broken request” won’t happen, as a
practical QoI matter; and “try to accept the second broken request”
might happen if someone who cares can provide a low-impact patch –
it’s easy for me to say “it’s a Simple Matter of Programming”, because
I don’t intend to write the patch :wink:

But “don’t make invalid requests” is the way to see the bicycle.

I have 750 image files, many of them have spaces in their names. The
example I showed, and the 30 that deliver a 400 bad request status
all contain a ’ H’ in the file name. ’ h’, ’ G’ and most things
similar return a 200 status.

A filename with spaces isn’t a problem.

A http request (url) with spaces is a problem.

Create different files called “50%good”, “50%bad”, “%”, “%25”, and
“wtf?”, and try to access them as if their filenames can be used
directly in http requests. You’ll see different responses – errors
or not-the-file-you-wanted – all of which are understandable when you
accept that a filename cannot be used directly in a url.

You must always url-encode a filename when creating a url.

If the filename is restricted to alnum-dot-underscore, then “url-encode”
is the identity transform. For the full details, RTFRFC.

No matter what, one passes and one fails. It’s not repeatable
behaviour.

space-H fails, space-anything-else passes. That looks repeatable to me
:slight_smile:

#196 (Inconsistent behavior on uri's with unencoded spaces followed by H) – nginx has some background. The short
version is that all should fail, and all did fail, but to be kind to
broken clients, nginx was changed to let most pass. That was a
convenient
change, but does lead to this confusion.

(I think that a subsequent change meant that the response is in HTTP/1
format, rather than the HTTP/0.9 that it originally should have been.
That
one was a good change.)

So don’t succeed riding the bicycle most of the time is
the way I see it.

urlpart=urlescape($filename)

Then always use $urlpart instead of $filename when you write the link,
and it will always work.

(This is a http thing, not an nginx thing. Other web servers will have
their own error-handling and error-correction, which will probably not
be identical to nginx’s.)

The first time I’ve ever disagreed with you Francis!

Not a problem. I think the only difference of opinion is whether, given
a broken request, most should pass or none should pass. And both
opinions
are reasonable.

I think that the Right Answer is for there to be Yet Another Option so
that one can configure “reject_malformed_http1_requests” to make all
requests containing space (and possibly all http/0.9 requests, as an
implementation-convenience consequence) fail immediately.

Or just revert the patch linked from the trac message and hear the
users complain.

But since I won’t be writing any of the code, my vote counts for little.

Cheers,

f

Francis D. [email protected]