Bare carriage returns in HTTP headers


#1

I’ve been using Mongrel for a while to write bare HTTP servlets as a
replacement for webrick and encountered an HTTP client using the
servlet that for some reason occasionally embeds carriage return
characters (’\r’, 0x0d) inside the value fields of message headers.
Mongrel doesn’t like that, and aborts the request with a parse error.
I’m not sure if this is strictly permitted by RFC 2616, but at any
rate, changing Mongrel to accept these kinds of HTTP headers was a
single character change in the Ragel parser, viz.:

*** START OF PATCH ***

Index: http11_parser_common.rl

— http11_parser_common.rl (revision 1037)
+++ http11_parser_common.rl (working copy)
@@ -46,7 +46,7 @@

field_value = any* >start_value %write_value;

  • message_header = field_name “:” " "* field_value :> CRLF;
  • message_header = field_name “:” " "* field_value :>> CRLF;

    Request = Request_Line ( message_header )* ( CRLF @done );

*** END OF PATCH ***

All that was necessary was to simply change the regular expression in
the Ragel parser to use a finish-guarded concatenation operator
instead of an entry-guarded one as in the original. From a cursory
reading of RFC 2616 I don’t see that a carriage return character there
should be illegal, but as Jon Postel was once quoted as saying: “Be
liberal in what you accept, and conservative in what you send.”


普通じゃないのが当然なら答える私は何ができる?
普通でも普通じゃなくて感じるまま感じることだけをするよ!
http://stormwyrm.blogspot.com


#2

“Be liberal in what you accept, and conservative in what you send.”

Sadly (to my perspective), this is definitely not the philosophy of
Mongrel, and the mongrel development ‘community’ (does it exist?) is not
partial to it.

I’ve run into other malformed HTTP requests in other circumstances, and
the solution I ended up with was using Apache rewrite maps to “fix”
those malformed requests before they even get to mongrel. I’m not sure
if that solution would work for this particular error, but sounds like
you’ve found another one.

I wouldn’t hold my breath for that patch to be incorporated in mongrel
though, the mongrel philosophy seems to be to be conservative in what it
accepts.

Jonathan


#3

Jonathan R. removed_email_address@domain.invalid wrote:

Dido S. wrote:

I’ve been using Mongrel for a while to write bare HTTP servlets as a
replacement for webrick and encountered an HTTP client using the
servlet that for some reason occasionally embeds carriage return
characters (’\r’, 0x0d) inside the value fields of message headers.
Mongrel doesn’t like that, and aborts the request with a parse error.
I’m not sure if this is strictly permitted by RFC 2616, but at any
rate, changing Mongrel to accept these kinds of HTTP headers was a
single character change in the Ragel parser, viz.:

From a cursory
reading of RFC 2616 I don’t see that a carriage return character there
should be illegal, but as Jon Postel was once quoted as saying: “Be
liberal in what you accept, and conservative in what you send.”

“\r” is a control character and is not allowed in field values. This
also has the potential to break things with 3rd party libraries and
applications because it’s not allowed by HTTP.

I consider stopping bad things early in the pipeline a good policy:

The kernel I use enforces TCP and doesn’t allow corrupt IP packets to
get to Mongrel. Thus Mongrel doesn’t have to worry about
bad/malicious TCP traffic.

Along the same lines, Mongrel enforces HTTP, so the application
doesn’t have to worry about non-compliant HTTP traffic at all.

“Be liberal in what you accept, and conservative in what you send.”

Sadly (to my perspective), this is definitely not the philosophy of
Mongrel, and the mongrel development ‘community’ (does it exist?) is not
partial to it.

I believe that philosophy leads to huge compatibility issues down the
line. It makes proliferation of a new technology easier and faster
(“worse is better”); but HTTP has already “won” as a protocol and
fortunately most clients do a pretty good job (unlike with HTML and
HTML authors vs parsers).

I’ve run into other malformed HTTP requests in other circumstances, and
the solution I ended up with was using Apache rewrite maps to “fix”
those malformed requests before they even get to mongrel. I’m not sure
if that solution would work for this particular error, but sounds like
you’ve found another one.

There was a bug we fixed last year where the parser was too strict with
certain requests made by IE. Other than that, I don’t believe there
has been any changes to the way the parser behaves.

I wouldn’t hold my breath for that patch to be incorporated in mongrel
though, the mongrel philosophy seems to be to be conservative in what it
accepts.

Not speaking for the rest of the team, but I’m very much against this
patch.

ref: http://mongrel.rubyforge.org/wiki/Security


#4

On Tue, Mar 24, 2009 at 01:26:44PM -0700, Eric W. wrote:

I believe that philosophy leads to huge compatibility issues down the
line. It makes proliferation of a new technology easier and faster
(“worse is better”); but HTTP has already “won” as a protocol and
fortunately most clients do a pretty good job (unlike with HTML and
HTML authors vs parsers).

Fwiw, “Be liberal in what you accept, and conservative in what you
send.” is
an awful engineering principle imo, and I strongly agree with Eric here.
Working around other code’s brokenness wwill just perpetuate it and make
things worse in the long run.


#5

I am ambivalent, so I will defer to Eric.

Evan


#6

On Wed, Mar 25, 2009 at 12:15 AM, Jonathan R. removed_email_address@domain.invalid
wrote:

I’ve run into other malformed HTTP requests in other circumstances, and the
solution I ended up with was using Apache rewrite maps to “fix” those
malformed requests before they even get to mongrel. Â I’m not sure if that
solution would work for this particular error, but sounds like you’ve found
another one.

Any hints on how I can do this? Obviously I have no control over what
hits my HTTP server and must deal with the broken clients as best I
can. Ironically, modifying the Mongrel parser to work around this
particular broken HTTP client turned out to be a lot easier than
sifting through the Apache documentation to find a particular module
that would do what needed to be done…

I wouldn’t hold my breath for that patch to be incorporated in mongrel
though, the mongrel philosophy seems to be to be conservative in what it
accepts.

Neither do I expect it to get incorporated, in spite of its
simplicity. The patch does not make me comfortable in the least.

By the way, if control characters are not allowed in field values, why
does the regex for field_value in the current SVN source have an any*
expression rather than a more restrictive class of characters that it
can accept? That means that we could put some other control character
there and have Mongrel accept it anyway, in spite of its being
prohibited by RFC 2616.


普通じゃないのが当然なら答える私は何ができる?
普通でも普通じゃなくて感じるまま感じることだけをするよ!
http://stormwyrm.blogspot.com


#7

Dido S. removed_email_address@domain.invalid wrote:

can. Ironically, modifying the Mongrel parser to work around this

By the way, if control characters are not allowed in field values, why
does the regex for field_value in the current SVN source have an any*
expression rather than a more restrictive class of characters that it
can accept? That means that we could put some other control character
there and have Mongrel accept it anyway, in spite of its being
prohibited by RFC 2616.

Good question. That’s for Zed since he originally created this parser
(but I don’t think he wants to be bothered about it anymore). Zed
probably had his reasons… maybe there are legit/real clients that send
invalid bytes we need to let through. But “\r” (and “\n”) have more
potential to break things than other control chars because it’s part of
the delimiter sequence that external libraries/apps may not expect.

My reverse proxy (nginx) rejects header values with “\r” before it ever
gets to Mongrel, however.


#8

My problem was with invalid query strings being sent to me by a vendor,
not with problems in the header. So it won’t be exactly the same. I’m
not sure if an apache rewrite map can change headers or not; it can
change path/query string, which is all I needed. But I can show you what
I did, in case it gives you ideas. It was a bit of a pain to figure out.

Here’s the apache.conf I use to deploy my app (actually, this is a Rails
erb template for such a file, but you’ll figure it out):

http://umlaut.rubyforge.org/svn/trunk/lib/generators/mongrel_deploy_files/templates/umlaut_http.conf

The parts to pay attention to is just the part that uses a perl script
as as an apache ‘external rewrite map’, here:

We want to re-write URLs with ‘bad’ < and > chars in the query

string (eg from EBSCO) to escape them. We use a perl script

that came with Umlaut to do that.

RewriteEngine on
RewriteMap query_escape prg:<%=
destination_path(‘script/umlaut/rewrite_map.pl’) %>
#RewriteLock /var/lock/subsys/apache.rewrite.lock
RewriteCond %{query_string} ^(.[><].)$
RewriteRule ^(.*)$ $1?${query_escape:%1} [R,L,NE]

And here’s the simple Perl script that replaced illegal chars in URL
path/query string:

http://umlaut.rubyforge.org/svn/trunk/script/umlaut/rewrite_map.pl

Hope that helps,

Jonathan


#9

Jonathan R. removed_email_address@domain.invalid wrote:

My problem was with invalid query strings being sent to me by a vendor,
not with problems in the header. So it won’t be exactly the same. I’m
not sure if an apache rewrite map can change headers or not; it can
change path/query string, which is all I needed. But I can show you what
I did, in case it gives you ideas. It was a bit of a pain to figure out.

And here’s the simple Perl script that replaced illegal chars in URL
path/query string:

http://umlaut.rubyforge.org/svn/trunk/script/umlaut/rewrite_map.pl

These two those are no longer needed with the SVN version (which
we currently run in production on a pretty heavy site). I think
it was IE6 sending them and we can’t ignore IE6 :<

    s/>/%3E/g;
    s/</%3C/g;

Unfortunately I don’t think it made the 1.1.5 release

http://mongrel.rubyforge.org/browser/trunk/ext/http11/http11_parser.c?rev=996

I don’t think I ever saw Mongrel error out on these. Is your vendor
really that brain damaged?
s///%2F/g;
s/\/%5C/g;

But man, this just creeps me out:
s/ /+/g;

ps: “tr/ /+/;” should be a tick faster than “s/ /+/g;” :slight_smile:


#10

Yes, my vendor is really that brain-damaged. Yes, I have told them
that. But I’m not absolutely sure if my vendor ever sends those, it was
< and > that I identified, but as long as I was writing the code, and
had been told that mongrel insisted on absolute legal URIs and if it
wasn’t legal by the standard I shouldn’t expect mongrel to do anything
but refuse it–I might as well catch anything else that could make an
illegal URI. But actually, yeah, what they are doing is putting
unescaped xml fragments in a url query string. &foo=bar.
So yeah, a backslash will be in a query string too.

Interesting to me that mongrel no longer chokes on these, since when it
was brought up before I was told that there was no way no how that
mongrel was ever going to do anything but choke on them. :slight_smile: If I can
find my test cases from my vendor around, I’ll see if current mongrels
no longer need my workaround, even though you guys told me that would
never ever happen. But I run latest ruby gem release, I don’t run from
svn trunk.

Jonathan


#11

Oh, and PS, I know that IE6 sends those. Because I discovered it. Safari
does too, for that matter. If they are (illegaly) in a URL in HTML or
entered in the location bar, etc.

My particular case in fact involved URLs in HTML (produced by a third
party, but targetting my app) delivered to an ordinary user agent like
IE6 or Firefox or Safari. Firefox would happily correct them before
sending them to the server. IE6 and Safari, no.

This is what I reported like a year and a half ago, and was told it
wasn’t mongrel’s problem. And brought up again like four months ago, to
see if with different developers you’d have a different opinion, and was
again told it wasn’t mongrel’s problem.

I guess someone with more pull than me found it inconvenient?

Jonathan


#12

Actually, I think the condition was that these URLs were being created
in Javascript and injected into the location header. I’m not sure
that either of these browsers (IE or Safari) don’t do the right thing
when < or > are entered in the location bar. I do know for a fact,
however, that the condition Jonathan is talking about is because the
vendor is doing a redirect in javascript (because I first brought it
up 2 1/2 years ago:
http://thread.gmane.org/gmane.comp.lang.ruby.mongrel.general/1107/focus=1179).

-Ross.


#13

Well, so much for benefit of the doubt…

-Ross.


#14

I tested it out in IE with manually entering it in the location bar. At
least in IE6, if you enter a < manually in a query string in the
location bar, IE will send it along in the HTTP request as is, without
escaping it.

(Which made it a lot easier to create a test case to make sure I had it
taken care of!)