Proxy_pass is double-encoding some pre-encoded uri's

Hello, just wanted to start by saying that nginx is my favorite server
for my personal projects - what an awesome piece of work. This is my
first bug/help request.

I’ve been using proxy_store as a “mirror on demand” for serving APT
packages to debian machines. Occasionally a package will have a tilde
(“~”) in the file name and the proxy_pass’s GET to the upstream server
will fail.

Looking through nginx’s debug logs and tcpdumps, it seems APT will make
the initial GET with the URI already encoded but the URL is encoded
again at the moment of proxy_pass making the GET request to the upstream
server.

My proxy_store config:

location /apt-cache/debian/lenny {
root /var/www/spawn.llnw.com/htdocs/proxy_store;
recursive_error_pages on;
error_page 404 = /apt-fetch-easynews$request_uri;
}

location /apt-fetch-easynews {
internal;
rewrite /apt-fetch-easynews/apt-cache/([^/])/([^/])(.*)
/linux/debian$3 break;

      recursive_error_pages on;
      proxy_intercept_errors on;
      proxy_connect_timeout 6;
      proxy_read_timeout 20;
      proxy_next_upstream error timeout invalid_header http_500

http_503 http_404;
proxy_pass http://debian.mirrors.easynews.com;

      proxy_store /var/www/default/htdocs/proxy_store/$request_uri;
      proxy_store_access user:rw group:rw all:r;

      error_page 404 503 504 = /apt-fetch-kernelorg$request_uri;

#failover to kernel.org
}

For URI:
http://localhost/apt-cache/debian/lenny/pool/main/b/binutils/binutils_2.18.17~cvs20080103-4+b1_amd64.deb

GET from client:

2008/05/22 01:32:42 [debug] 7400#0: *1 http request line: “GET
/apt-cache/debian/lenny/pool/main/b/binutils/binutils_2.18.1%7ecvs20080103-4+b1_amd64.deb
HTTP/1.1”
2008/05/22 01:32:42 [debug] 7400#0: *1 http uri:
“/apt-cache/debian/lenny/pool/main/b/binutils/binutils_2.18.1~cvs20080103-4+b1_amd64.deb”
2008/05/22 01:32:42 [debug] 7400#0: *1 http args: “”
2008/05/22 01:32:42 [debug] 7400#0: *1 http exten: “deb”
2008/05/22 01:32:42 [debug] 7400#0: *1 http process request header line
2008/05/22 01:32:42 [debug] 7400#0: *1 http header: “Host: localhost”
2008/05/22 01:32:42 [debug] 7400#0: *1 http header: “Connection:
keep-alive”
2008/05/22 01:32:42 [debug] 7400#0: *1 http header: “User-Agent: Debian
APT-HTTP/1.3 (0.7.11)”
2008/05/22 01:32:42 [debug] 7400#0: *1 http header done

GET to upstream server:
2008/05/22 01:32:42 [debug] 7400#0: *1 http proxy header: “User-Agent:
Debian APT-HTTP/1.3 (0.7.11)”
2008/05/22 01:32:42 [debug] 7400#0: *1 http proxy header:
"GET
/linux/debian/pool/main/b/binutils/binutils_2.18.1%257ecvs20080103-4+b1_amd64.deb
HTTP/1.0
Host: localhost
Connection: close
User-Agent: Debian APT-HTTP/1.3 (0.7.11)

"


404 from upstream server:
2008/05/22 01:32:43 [debug] 7400#0: epoll: fd:12 ev:0005
d:00002AAAAAAC5290
2008/05/22 01:32:43 [debug] 7400#0: *1 http upstream process header
2008/05/22 01:32:43 [debug] 7400#0: *1 malloc: 00000000006ABE90:4096
2008/05/22 01:32:43 [debug] 7400#0: *1 recv: fd:12 440 of 4096
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy status 404 “404 Not
Found”
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy header: “Date: Thu, 22
May 2008 01:32:42 GMT”
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy header: “Server:
Apache”
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy header:
“Content-Length: 276”
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy header: “Connection:
close”
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy header: “Content-Type:
text/html; charset=iso-8859-1”
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy header done
2008/05/22 01:32:43 [debug] 7400#0: *1 finalize http upstream request:
404
2008/05/22 01:32:43 [debug] 7400#0: *1 finalize http proxy request
2008/05/22 01:32:43 [debug] 7400#0: *1 free rr peer 1 0
2008/05/22 01:32:43 [debug] 7400#0: *1 close http upstream connection:
12
2008/05/22 01:32:43 [debug] 7400#0: *1 event timer del: 12:
1211419982557

The same transaction as seen through tcpdump:

01:36:36.253662 IP 127.0.0.1.60417 > 69.16.168.244.80: P 1:242(241) ack
1 win 92 <nop,nop,timestamp 1058195635 33813862>
E…%W]@.@…o.+E…P…c|vV…\N…
?..fGET
/linux/debian/pool/main/b/binutils/binutils_2.18.1%257ecvs20080103-4+b1_amd64.deb
HTTP/1.0
Host: localhost
Connection: close
User-Agent: Debian APT-HTTP/1.3 (0.7.11)

01:36:36.324114 IP 69.16.168.244.80 > 127.0.0.1.60417: P 1:441(440) ack
242 win 54 <nop,nop,timestamp 33813870 1058195635>
E…[email protected]…o.+.P…c|vV…6…
…n?..HTTP/1.1 404 Not Found
Date: Thu, 22 May 2008 01:36:36 GMT
Server: Apache
Content-Length: 276
Connection: close
Content-Type: text/html; charset=iso-8859-1

404 Not Found

Not Found

The requested URL /linux/debian/pool/main/b/binutils/binutils_2.18.1%7ecvs20080103-4+b1_amd64.deb was not found on this server.

If you take the uri and fix the double-encoding it by hand…
http://69.16.168.244/linux/debian/pool/main/b/binutils/binutils_2.18.1%257ecvs20080103-4+b1_amd64.deb
“%25” → “%”
http://69.16.168.244/linux/debian/pool/main/b/binutils/binutils_2.18.1~cvs20080103-4+b1_amd64.deb
…the once-encoded uri works.

I realize this can be considered an apt-get bug, but some browsers out
there may pre-encode “unreserved” special characters in their uris
(http://www.ietf.org/rfc/rfc2396.txt see: sect 2.3) like apt-get is
doing.

Nginx does seem to know when to decode the original URI and save it in
decoded form in all of the logs - can this same logic be used by
proxy_pass to determine whether it should encode a GET request or not to
the upstream server?

joey

slightly OT but you do know there is an “apt-cacher” utility that does
exactly this for you :slight_smile:

I forgot to mention my platform and nginx version:

Linux localhost 2.6.21.7 #4 SMP Wed Oct 31 04:21:58 MST 2007 x86_64
GNU/Linux

2008/05/22 18:10:33 [notice] 8354#0: nginx/0.6.31
2008/05/22 18:10:33 [notice] 8354#0: built by gcc 4.2.3 (Debian 4.2.3-2)
2008/05/22 18:10:33 [notice] 8354#0: OS: Linux 2.6.21.7

I also observed the same behavior with ngx 0.5.36 … just didn’t look
into
it.

I found where apt makes it’s uri encoding decisions so if no one can
duplicate this “bug” or recall any other HTTP software that behaves like
this, I can take it up with apt’s developers. I’m not quite sure how
best
to adhere to the RFC I cited.

joey

I fixed this with the following rewrite:

      location /apt-fetch-easynews {
             internal;

             +rewrite (.*)%25(.*) $1%$2;
             +rewrite (.*)%7e(.*) $1~$2;

             rewrite 

/apt-fetch-easynews/apt-cache/([^/])/([^/])(.*) /linux/debian$3 break;

Mike, I have looked at the various apt proxy programs. They all are
Twisted/Perl/CGI daemons of some sort.
All have pros and cons, but Nginx is much, much faster than all of them.
The only thing after these stanzas in nginx I needed was a script to
remove old files from the cache.

Hooray for nginx!
joey