If you append an extra percent sign to a URL that gets passed to mongrel, it will return a Bad Request error. Kind of odd that "http://localhost/%" causes a "Bad Request" instead of a "Not Found" error. Here is the error from the mongrel log: HTTP parse error, malformed request (127.0.0.1): #<Mongrel::HttpParserError: Invalid HTTP format, parsing fails.> I'm using Nginx in front of mongrel. I understand this is a bad URL, but is there anyway to have mongrel ignore lone percent signs? Or perhaps a Nginx rewrite rule that will encode extraneous percent signs?
on 2009-01-07 20:44
on 2009-01-07 21:23
On Wed, Jan 7, 2009 at 11:44 AM, Robbie Allen <lists@ruby-forum.com> wrote: > but is there anyway to have mongrel ignore lone percent signs? Or > perhaps a Nginx rewrite rule that will encode extraneous percent signs? Out of curiousity, why does mongrel's handling of this case bother you? Looks like entirely standard behaviour, see http://groklaw.net/% http://slashdot.org/% http://w3c.org/% (All produce status 400) Stephan > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Mongrel-users mailing list > Mongrel-users@rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users > -- Stephan Wehner -> http://stephan.sugarmotor.org -> http://www.thrackle.org -> http://www.buckmaster.ca -> http://www.trafficlife.com -> http://stephansmap.org -- blog.stephansmap.org
on 2009-01-07 21:28
So how do you catch it? All of those errors are not very friendly and completely bypass the site look and feel. See these: http://www.google.com/% http://www.yahoo.com/% Robbie Stephan Wehner wrote: > On Wed, Jan 7, 2009 at 11:44 AM, Robbie Allen <lists@ruby-forum.com> > wrote: >> but is there anyway to have mongrel ignore lone percent signs? Or >> perhaps a Nginx rewrite rule that will encode extraneous percent signs? > > Out of curiousity, why does mongrel's handling of this case bother you? > Looks like entirely standard behaviour, see > > http://groklaw.net/% > http://slashdot.org/% > http://w3c.org/% > > (All produce status 400) > > > Stephan
on 2009-01-07 21:48
Yes. I have run into this before. Mongrel will error on an invalid HTTP
URI, with one common case being characters not properly escaped, which
is what your example is. When one of the developers of my app brought
this up before, he was told by the Mongrel developer that this was
intentional, and would not be changed.
I didn't like this then, and I don't like it now, for a variety of
reasons, including that my app needs to respond to URLs sent by third
parties that are not under my control. Perhaps the current mongrel
developers (IS there even any active development on mongrel?) have a
different opinion, and this could be changed, or made configurable.
In the meantime, I have gotten around it with some mod_rewrite rules in
apache on top of mongrel, to take illegal URLs and escape/rewrite them
to be legal. Except due to some weird (bugs?) in apache and mod_rewrite
around escaping and difficulty of controlling escaping in the apache
conf, I actually had to use an external perl file too. Here's what I do:
Apache conf, applying to mongrel urls (which in my setup are all urls on
a given apache virtual host)
RewriteEngine on
RewriteMap query_escape
prg:/data/web/findit/Umlaut/distribution/script/rewrite_map.pl
#RewriteLock /var/lock/subsys/apache.rewrite.lock
RewriteCond %{query_string} ^(.*[\>\<].*)$
RewriteRule ^(.*)$ $1?${query_escape:%1} [R,L,NE]
The rewrite_map.pl file:
#!/usr/bin/perl
$| = 1; # Turn off buffering
while (<STDIN>) {
s/>/%3E/g;
s/</%3C/g;
s/\//%2F/g;
s/\\/%5C/g;
s/ /\+/g;
print $_;
}
##
Looks like I'm not actually escaping bare '%' chars, since i hadn't run
into those before in the URLs I need to handle. It would be trickier to
add a regexp for that, since you need to distinguish an improper % from
an % that's actually part of an entity reference. Maybe something like:
s/%([^A-F0-9]|$)([^A-F0-9]|$)/%25/g;
'/%25' would be a valid URI path representing the % char. '/%' is not.
Hope this helps,
Jonathan
Robbie Allen wrote:
> but is there anyway to have mongrel ignore lone percent signs? Or
> perhaps a Nginx rewrite rule that will encode extraneous percent signs?
>
--
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu
on 2009-01-07 22:16
On Wed, Jan 7, 2009 at 12:06 PM, Jonathan Rochkind <rochkind@jhu.edu> wrote: > could be changed, or made configurable. > RewriteEngine on > while (<STDIN>) { > > s/>/%3E/g; > s/</%3C/g; > s/\//%2F/g; > s/\\/%5C/g; > s/ /\+/g; > print $_; > } > ## It strikes me as a good thing that Apache weeds out bad URL's. Less parsing for mongrel, less work, and one less point of failure to worry about. (When I see code like above after "Turn off buffering" - with all respect - I get worried.) On the other hand, does Apache not allow configuring the page returned for 400 Bad Request. This would then also allow addressing the issue that "All of those errors are not very friendly and completely bypass the site look and feel." ("Robbie") Stephan > >> Here is the error from the mongrel log: > Digital Services Software Engineer > The Sheridan Libraries > Johns Hopkins University > 410.516.8886 rochkind (at) jhu.edu > _______________________________________________ > Mongrel-users mailing list > Mongrel-users@rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users > -- Stephan Wehner -> http://stephan.sugarmotor.org -> http://www.thrackle.org -> http://www.buckmaster.ca -> http://www.trafficlife.com -> http://stephansmap.org -- blog.stephansmap.org
on 2009-01-07 22:25
> On the other hand, does Apache not allow configuring the page returned > for 400 Bad Request. This would > then also allow addressing the issue that I'm using Nginx and it does allow you to set error_pages, but that doesn't seem to work in this case (at least for me). Has anyone got it to work so Nginx will use an actual error page instead of the default when encountering a Bad Request? Robbie
on 2009-01-07 22:41
On 1/7/09, Jonathan Rochkind <rochkind@jhu.edu> wrote: > Yes. I have run into this before. Mongrel will error on an invalid HTTP URI, > with one common case being characters not properly escaped, which is what > your example is. When one of the developers of my app brought this up > before, he was told by the Mongrel developer that this was intentional, and > would not be changed. Mongrel's HTTP parser grammar was written by Zed to be very RFC conformant. > I didn't like this then, and I don't like it now, for a variety of reasons, > including that my app needs to respond to URLs sent by third parties that > are not under my control. Perhaps the current mongrel developers (IS there > even any active development on mongrel?) have a different opinion, and this > could be changed, or made configurable. The mongrel HTTP parser is very stable, and is in use by multiple projects. I can't speak for the other mongrel devs, but if it were up to me alone, I'd keep mongrel's HTTP parser RFC compliant. Since you have a special case, I would suggest that you just take a look at the grammar for the parser and consider compiling your own parser. You could probably just remove '%' from the unsafe type, and see if that will work for you: It looks like this: unsafe = (CTL | " " | "\"" | "#" | "%" | "<" | ">"); Kirk Haines
on 2009-01-07 22:47
Stephan Wehner wrote: > > It strikes me as a good thing that Apache weeds out bad URL's. Less > parsing for mongrel, less work, and one less point of failure to worry > about. (When I see code like above after "Turn off buffering" - with > all respect - I get worried.) > Um, that code that worries you is the code that was neccesary to get Apache to 'fix' these bad URLs to be good URLs. If you have a better way to do it, let me know and I'm happy to use it! That actually took me several solid days of work to get that far, because Apache is _weird_ when it comes to escaping and mod_rewrite. Without using the external perl rewrite map, I could only get it to end up double-escaped or not properly escaped at all, I could NOT get mod_rewrite alone withotu perl to rewrite > to %3E etc all by itself. I kept ending up with things like %253E instead, because Apache would go ahead and apply another escaping when I didn't want it to. I could get apache to do no escaping, or double escaping, but couldn't get it to do the kind of escaping I needed---until I figured out I had to resort to an external perl rewrite map. Which yes, resulted in code that I don't like that much either, but it was all I could come up with to figure out a solution to my unavoidable business problem. So you like solving it in Apache rather than Mongrel, but don't like the best way I came up with to solve it in Apache after nearly a week of hacking? Heh, I'm not sure what you're suggesting. Now that I've got it done, it works, but it was kind of a frustrating four days of work hacking mod_rewrite and apache conf when that's not what I wanted to be doing. Oddly, I could find hardly anyone Googling who had to deal with this problem before. I guess the circumstance of having to deal with long complicated possibly ill-formed query strings sent by third parties is rare. And having to deal with it at the Apache layer is not the choice anyone else made, when they did have to deal with it. (In general, doing complicated things in apache conf reminds me of trying to do complicated things in sendmail. It gets unpredictable and turns into 'twist this knob and see what happens' pretty quick. I'd much rather be writing ruby than hacking apache confs.) Jonathan >> Looks like I'm not actually escaping bare '%' chars, since i hadn't run into >> Jonathan >>> HTTP parse error, malformed request (127.0.0.1): >> The Sheridan Libraries > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
on 2009-01-07 23:15
> Has anyone got it to work so Nginx will use an actual error page instead > of the default when encountering a Bad Request? This is pretty easy to do, just use the "error_page" directive and list all of the ones you want to capture and then specify it to use your 500 page, e.g. we capture all 5xx errors using these nginx config rules: error_page 500 502 503 504 /500.html; location = /500.html { root /u/apps/project/public; } Just add 400 to the list of 5xx errors. You will also want to have a "500.html" in your document root or modify the path to the file accordingly. /Cody
on 2009-01-07 23:49
This particular case actually doesn't bother me in particular. It may be fine for "/%" to be a 400 rather than a 404. My particular case involved needing to process mal-formed query strings sent by third parties. I had no control over these third parties. I _needed_ to be able to process query strings that included un-escaped ampersands and such. Yes, the third party sending me this information in a query string was doing it in a way that was illegal and violated standards, but they are more powerful than I, and I can not make them change their behavior, and I need to handle those URLs anyway. I took care of it with a rewrite on the apache end before it reached mongrel though. This ended up being somewhat more complicated then I hoped it would be becuase of Apache's weird and unpredictable behavior when it came to escaping, but now that I have it working, it works out. Jonathan Stephan Wehner wrote: >> > http://w3c.org/% >> Mongrel-users@rubyforge.org >> http://rubyforge.org/mailman/listinfo/mongrel-users >> >> > > > > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
on 2009-01-08 01:47
On 7 Jan 2009, at 21:31, Jonathan Rochkind wrote: > Yes, the third party sending me this information in a query string > was doing it in a way that was illegal and violated standards, but > they are more powerful than I, and I can not make them change their > behavior, and I need to handle those URLs anyway. This is a sad day, they won. Next time, scream from the rooftops, unless you already signed your free speech away, that is. Printing an RFC and highlighting the relevant clauses and handing that to a level higher than disgruntled developers is often relatively effective too.
on 2009-01-08 19:49
I can scream to the rooftops all I want. This particular product I am working on is an OpenURL Resolver that handles OpenURLs representing scholarly citations sent from third-party licensed search providers that the library (where I work) pays for. We have contracts with literally hundreds of such providers. This one provider in particular that sends the bad URL is a particular large company (EBSCO), with billions of dollars in revenue, and thousands of customers of which we are just one. Most of their other customers are not using mongrel-fronted (or Rails at all) solutions for OpenURL link resolving; the solutions they are using manage to deal with the faulty URLs. Now ours does too, with some ugly apache/perl hacking in front of mongrel. I complained both privately and publically about this. But the world is not always as we would like it, and sometimes our software needs to deal with incoming data that is not standards compliant, just the way it is. Jonathan James Tucker wrote: > Next time, scream from the rooftops, unless you already signed your > free speech away, that is. > > Printing an RFC and highlighting the relevant clauses and handing that > to a level higher than disgruntled developers is often relatively > effective too. > _______________________________________________ > Mongrel-users mailing list > Mongrel-users@rubyforge.org > http://rubyforge.org/mailman/listinfo/mongrel-users > -- Jonathan Rochkind Digital Services Software Engineer The Sheridan Libraries Johns Hopkins University 410.516.8886 rochkind (at) jhu.edu
on 2009-01-08 20:17
On Thu, Jan 8, 2009 at 18:51, Jonathan Rochkind <rochkind@jhu.edu> wrote: > I can scream to the rooftops all I want. [...] > > I complained both privately and publically about this. But the world is not > always as we would like it, and sometimes our software needs to deal with > incoming data that is not standards compliant, just the way it is. > Absolutely true. And I think the solution you devised is a good one: take a piece of software that (for reasons unknown to me) accepts malformed input, have it clean up the input and pass it on. No reason to disable checks in a tool that actually does what it should. Out of curiosity: what did this company respond when you asked them to provide protocol-compliant data? I'd like to think that they at least apologised profusely for being unable to keep the tubes clean, as it were, instead of saying "standards shmandards". BR, /David
on 2009-01-09 13:30
On 8 Jan 2009, at 17:51, Jonathan Rochkind wrote: > We have contracts with literally hundreds of such providers. Most of which do the right thing. > This one provider in particular that sends the bad URL is a > particular large company (EBSCO), with billions of dollars in > revenue, and thousands of customers of which we are just one. It should generally take one, or a handful of lines of code to properly escape urls. It sounds like they can definitely afford to do this. > Most of their other customers are not using mongrel-fronted (or > Rails at all) solutions for OpenURL link resolving; the solutions > they are using manage to deal with the faulty URLs. There are plenty of stories of how mongrels conformance to the RFC has raised similar concerns. Indeed this happened with Google too. Some of the stories can be found on Zeds blog. AFAIK, Google fixed their stuff. > Now ours does too, with some ugly apache/perl hacking in front of > mongrel. As David said, your solution is a valid one. Changing mongrel is not. > I complained both privately and publically about this. But the world > is not always as we would like it, and sometimes our software needs > to deal with incoming data that is not standards compliant, just the > way it is. Understood, nonetheless, you should keep trying, as we all should, to improve this world.
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.