HTTP parse error due to an extra percent sign


#1

If you append an extra percent sign to a URL that gets passed to
mongrel, it will return a Bad Request error. Kind of odd that
http://localhost/%” causes a “Bad Request” instead of a “Not Found”
error.

Here is the error from the mongrel log:
HTTP parse error, malformed request (127.0.0.1):
#<Mongrel::HttpParserError: Invalid HTTP format, parsing fails.>

I’m using Nginx in front of mongrel. I understand this is a bad URL,
but is there anyway to have mongrel ignore lone percent signs? Or
perhaps a Nginx rewrite rule that will encode extraneous percent signs?


#2

On Wed, Jan 7, 2009 at 11:44 AM, Robbie A. removed_email_address@domain.invalid
wrote:

but is there anyway to have mongrel ignore lone percent signs? Or
perhaps a Nginx rewrite rule that will encode extraneous percent signs?

Out of curiousity, why does mongrel’s handling of this case bother you?
Looks like entirely standard behaviour, see

http://groklaw.net/%
http://slashdot.org/%
http://w3c.org/%

(All produce status 400)

Stephan

Posted via http://www.ruby-forum.com/.


Mongrel-users mailing list
removed_email_address@domain.invalid
http://rubyforge.org/mailman/listinfo/mongrel-users


Stephan W.

-> http://stephan.sugarmotor.org
-> http://www.thrackle.org
-> http://www.buckmaster.ca
-> http://www.trafficlife.com
-> http://stephansmap.orgblog.stephansmap.org


#3

So how do you catch it? All of those errors are not very friendly and
completely bypass the site look and feel.

See these:

http://www.google.com/%
http://www.yahoo.com/%

Robbie

Stephan W. wrote:

On Wed, Jan 7, 2009 at 11:44 AM, Robbie A. removed_email_address@domain.invalid
wrote:

but is there anyway to have mongrel ignore lone percent signs? Or
perhaps a Nginx rewrite rule that will encode extraneous percent signs?

Out of curiousity, why does mongrel’s handling of this case bother you?
Looks like entirely standard behaviour, see

http://groklaw.net/%
http://slashdot.org/%
http://w3c.org/%

(All produce status 400)

Stephan


#4

Yes. I have run into this before. Mongrel will error on an invalid HTTP
URI, with one common case being characters not properly escaped, which
is what your example is. When one of the developers of my app brought
this up before, he was told by the Mongrel developer that this was
intentional, and would not be changed.

I didn’t like this then, and I don’t like it now, for a variety of
reasons, including that my app needs to respond to URLs sent by third
parties that are not under my control. Perhaps the current mongrel
developers (IS there even any active development on mongrel?) have a
different opinion, and this could be changed, or made configurable.

In the meantime, I have gotten around it with some mod_rewrite rules in
apache on top of mongrel, to take illegal URLs and escape/rewrite them
to be legal. Except due to some weird (bugs?) in apache and mod_rewrite
around escaping and difficulty of controlling escaping in the apache
conf, I actually had to use an external perl file too. Here’s what I do:

Apache conf, applying to mongrel urls (which in my setup are all urls on
a given apache virtual host)

RewriteEngine on
RewriteMap query_escape
prg:/data/web/findit/Umlaut/distribution/script/rewrite_map.pl
#RewriteLock /var/lock/subsys/apache.rewrite.lock
RewriteCond %{query_string} ^(.[><].)$
RewriteRule ^(.*)$ $1?${query_escape:%1} [R,L,NE]

The rewrite_map.pl file:

#!/usr/bin/perl
$| = 1; # Turn off buffering
while () {

    s/>/%3E/g;
    s/</%3C/g;
    s/\//%2F/g;
    s/\\/%5C/g;
    s/ /\+/g;
    print $_;

}

Looks like I’m not actually escaping bare ‘%’ chars, since i hadn’t run
into those before in the URLs I need to handle. It would be trickier to
add a regexp for that, since you need to distinguish an improper % from
an % that’s actually part of an entity reference. Maybe something like:

s/%([^A-F0-9]|$)([^A-F0-9]|$)/%25/g;

‘/%25’ would be a valid URI path representing the % char. ‘/%’ is not.

Hope this helps,

Jonathan

Robbie A. wrote:

but is there anyway to have mongrel ignore lone percent signs? Or
perhaps a Nginx rewrite rule that will encode extraneous percent signs?


Jonathan R.
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu


#5

On Wed, Jan 7, 2009 at 12:06 PM, Jonathan R. removed_email_address@domain.invalid
wrote:

could be changed, or made configurable.
RewriteEngine on
while () {

  s/>/%3E/g;
  s/</%3C/g;
  s/\//%2F/g;
  s/\\/%5C/g;
  s/ /\+/g;
  print $_;

}

It strikes me as a good thing that Apache weeds out bad URL’s. Less
parsing for mongrel, less work, and one less point of failure to worry
about. (When I see code like above after “Turn off buffering” - with
all respect - I get worried.)

On the other hand, does Apache not allow configuring the page returned
for 400 Bad Request. This would
then also allow addressing the issue that

“All of those errors are not very friendly and completely bypass the
site look and feel.” (“Robbie”)

Stephan

Here is the error from the mongrel log:
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886 rochkind (at) jhu.edu


Mongrel-users mailing list
removed_email_address@domain.invalid
http://rubyforge.org/mailman/listinfo/mongrel-users


Stephan W.

-> http://stephan.sugarmotor.org
-> http://www.thrackle.org
-> http://www.buckmaster.ca
-> http://www.trafficlife.com
-> http://stephansmap.orgblog.stephansmap.org


#6

On 1/7/09, Jonathan R. removed_email_address@domain.invalid wrote:

Yes. I have run into this before. Mongrel will error on an invalid HTTP URI,
with one common case being characters not properly escaped, which is what
your example is. When one of the developers of my app brought this up
before, he was told by the Mongrel developer that this was intentional, and
would not be changed.

Mongrel’s HTTP parser grammar was written by Zed to be very RFC
conformant.

I didn’t like this then, and I don’t like it now, for a variety of reasons,
including that my app needs to respond to URLs sent by third parties that
are not under my control. Perhaps the current mongrel developers (IS there
even any active development on mongrel?) have a different opinion, and this
could be changed, or made configurable.

The mongrel HTTP parser is very stable, and is in use by multiple
projects. I can’t speak for the other mongrel devs, but if it were up
to me alone, I’d keep mongrel’s HTTP parser RFC compliant.

Since you have a special case, I would suggest that you just take a
look at the grammar for the parser and consider compiling your own
parser. You could probably just remove ‘%’ from the unsafe type, and
see if that will work for you: It looks like this:

unsafe = (CTL | " " | “”" | “#” | “%” | “<” | “>”);

Kirk H.


#7

Stephan W. wrote:

It strikes me as a good thing that Apache weeds out bad URL’s. Less
parsing for mongrel, less work, and one less point of failure to worry
about. (When I see code like above after “Turn off buffering” - with
all respect - I get worried.)

Um, that code that worries you is the code that was neccesary to get
Apache to ‘fix’ these bad URLs to be good URLs.

If you have a better way to do it, let me know and I’m happy to use it!
That actually took me several solid days of work to get that far,
because Apache is weird when it comes to escaping and mod_rewrite.
Without using the external perl rewrite map, I could only get it to end
up double-escaped or not properly escaped at all, I could NOT get
mod_rewrite alone withotu perl to rewrite > to %3E etc all by itself. I
kept ending up with things like %253E instead, because Apache would go
ahead and apply another escaping when I didn’t want it to. I could get
apache to do no escaping, or double escaping, but couldn’t get it to do
the kind of escaping I needed—until I figured out I had to resort to
an external perl rewrite map.

Which yes, resulted in code that I don’t like that much either, but it
was all I could come up with to figure out a solution to my unavoidable
business problem.

So you like solving it in Apache rather than Mongrel, but don’t like the
best way I came up with to solve it in Apache after nearly a week of
hacking? Heh, I’m not sure what you’re suggesting.

Now that I’ve got it done, it works, but it was kind of a frustrating
four days of work hacking mod_rewrite and apache conf when that’s not
what I wanted to be doing. Oddly, I could find hardly anyone Googling
who had to deal with this problem before. I guess the circumstance of
having to deal with long complicated possibly ill-formed query strings
sent by third parties is rare. And having to deal with it at the Apache
layer is not the choice anyone else made, when they did have to deal
with it. (In general, doing complicated things in apache conf reminds me
of trying to do complicated things in sendmail. It gets unpredictable
and turns into ‘twist this knob and see what happens’ pretty quick. I’d
much rather be writing ruby than hacking apache confs.)

Jonathan

Looks like I’m not actually escaping bare ‘%’ chars, since i hadn’t run into
Jonathan

HTTP parse error, malformed request (127.0.0.1):
The Sheridan Libraries


Jonathan R.
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu


#8

On the other hand, does Apache not allow configuring the page returned
for 400 Bad Request. This would
then also allow addressing the issue that

I’m using Nginx and it does allow you to set error_pages, but that
doesn’t seem to work in this case (at least for me).

Has anyone got it to work so Nginx will use an actual error page instead
of the default when encountering a Bad Request?

Robbie


#9

Has anyone got it to work so Nginx will use an actual error page instead
of the default when encountering a Bad Request?

This is pretty easy to do, just use the “error_page” directive and
list all of the ones you want to capture and then specify it to use
your 500 page, e.g. we capture all 5xx errors using these nginx config
rules:

error_page 500 502 503 504 /500.html;
location = /500.html {
root /u/apps/project/public;
}

Just add 400 to the list of 5xx errors. You will also want to have a
“500.html” in your document root or modify the path to the file
accordingly.

/Cody


#10

This particular case actually doesn’t bother me in particular. It may
be fine for “/%” to be a 400 rather than a 404.

My particular case involved needing to process mal-formed query strings
sent by third parties. I had no control over these third parties. I
needed to be able to process query strings that included un-escaped
ampersands and such. Yes, the third party sending me this information in
a query string was doing it in a way that was illegal and violated
standards, but they are more powerful than I, and I can not make them
change their behavior, and I need to handle those URLs anyway.

I took care of it with a rewrite on the apache end before it reached
mongrel though. This ended up being somewhat more complicated then I
hoped it would be becuase of Apache’s weird and unpredictable behavior
when it came to escaping, but now that I have it working, it works out.

Jonathan

Stephan W. wrote:

http://w3c.org/%

removed_email_address@domain.invalid
http://rubyforge.org/mailman/listinfo/mongrel-users


Jonathan R.
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu


#11

On 7 Jan 2009, at 21:31, Jonathan R. wrote:

Yes, the third party sending me this information in a query string
was doing it in a way that was illegal and violated standards, but
they are more powerful than I, and I can not make them change their
behavior, and I need to handle those URLs anyway.

This is a sad day, they won.

Next time, scream from the rooftops, unless you already signed your
free speech away, that is.

Printing an RFC and highlighting the relevant clauses and handing that
to a level higher than disgruntled developers is often relatively
effective too.


#12

On Thu, Jan 8, 2009 at 18:51, Jonathan R. removed_email_address@domain.invalid
wrote:

I can scream to the rooftops all I want. […]

I complained both privately and publically about this. But the world is not
always as we would like it, and sometimes our software needs to deal with
incoming data that is not standards compliant, just the way it is.

Absolutely true. And I think the solution you devised is a good one:
take a
piece of software that (for reasons unknown to me) accepts malformed
input,
have it clean up the input and pass it on. No reason to disable checks
in a
tool that actually does what it should.

Out of curiosity: what did this company respond when you asked them to
provide protocol-compliant data? I’d like to think that they at least
apologised profusely for being unable to keep the tubes clean, as it
were,
instead of saying “standards shmandards”.

BR,

/David


#13

I can scream to the rooftops all I want. This particular product I am
working on is an OpenURL Resolver that handles OpenURLs representing
scholarly citations sent from third-party licensed search providers that
the library (where I work) pays for. We have contracts with literally
hundreds of such providers. This one provider in particular that sends
the bad URL is a particular large company (EBSCO), with billions of
dollars in revenue, and thousands of customers of which we are just
one. Most of their other customers are not using mongrel-fronted (or
Rails at all) solutions for OpenURL link resolving; the solutions they
are using manage to deal with the faulty URLs. Now ours does too, with
some ugly apache/perl hacking in front of mongrel.

I complained both privately and publically about this. But the world is
not always as we would like it, and sometimes our software needs to deal
with incoming data that is not standards compliant, just the way it is.

Jonathan

James T. wrote:

Next time, scream from the rooftops, unless you already signed your
free speech away, that is.

Printing an RFC and highlighting the relevant clauses and handing that
to a level higher than disgruntled developers is often relatively
effective too.


Mongrel-users mailing list
removed_email_address@domain.invalid
http://rubyforge.org/mailman/listinfo/mongrel-users


Jonathan R.
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu


#14

On 8 Jan 2009, at 17:51, Jonathan R. wrote:

We have contracts with literally hundreds of such providers.

Most of which do the right thing.

This one provider in particular that sends the bad URL is a
particular large company (EBSCO), with billions of dollars in
revenue, and thousands of customers of which we are just one.

It should generally take one, or a handful of lines of code to
properly escape urls. It sounds like they can definitely afford to do
this.

Most of their other customers are not using mongrel-fronted (or
Rails at all) solutions for OpenURL link resolving; the solutions
they are using manage to deal with the faulty URLs.

There are plenty of stories of how mongrels conformance to the RFC has
raised similar concerns. Indeed this happened with Google too. Some of
the stories can be found on Zeds blog. AFAIK, Google fixed their stuff.

Now ours does too, with some ugly apache/perl hacking in front of
mongrel.

As David said, your solution is a valid one. Changing mongrel is not.

I complained both privately and publically about this. But the world
is not always as we would like it, and sometimes our software needs
to deal with incoming data that is not standards compliant, just the
way it is.

Understood, nonetheless, you should keep trying, as we all should, to
improve this world.