CGI request query string parsing error in Rails

I’m using a 3rd party component that generates URLs with query strings.
These URLs are of the form:

http://host+port stuff/path/filename?&param1=xyz&param2=lskjf…

Notice the “&” on the first parameter in these URLs. In this form,
these URLs could not be served by WEBrick or Mongrel and I had to
modify this code to ensure that the first parameter did not have the
ampersand. A URL of the form:

http://host+port stuff/path/filename?param1=xyz&param2=lskjf…

works just fine for both WEBrick and Mongrel.

I’ve checked both RFC 1738 and RFC 3986, which define URL formats, and
it appears to me that the first parameter in a query string may indeed
have “&” in front of the name. The 3rd party developer develops against
Apache and has never had this problem.

In addition, I can make this error occur in an Apache + FastCGI setup as
well. I see the following error in all 3 server configurations.

You have a nil object when you didn’t expect it!
You might have expected an instance of Array.
The error occured while evaluating nil.include?

C:/ruby/lib/ruby/gems/1.8/gems/actionpack-1.12.5/lib/action_controller/cgi_ext/cgi_methods.rb:49:in
`parse_request_parameters’

Is this possibly a bug in how Rails is parsing request URLs or am I
mistaken about the format of query strings per the RFCs?

Thanks,
Wes

On 9/18/06, Wes G. [email protected] wrote:

Is this possibly a bug in how Rails is parsing request URLs or am I
mistaken about the format of query strings per the RFCs?

Empty query params are handled gracefully (ignored) in 1.2 - give Edge a
try.

jeremy

Empty query params are handled gracefully (ignored) in 1.2 - give Edge a
try.

jeremy

I don’t think I’m talking about “empty” query params here. Can you
define “empty query param” for me?

I’m talking about the very first query parameter in the query string
having a “&” in front of the name.

While I’ve been tempted to use Edge, I have enough trouble with released
versions that I’m going to continue to wait ;).

Wes

On Mon, 2006-09-18 at 21:14 +0200, Wes G. wrote:

I’m using a 3rd party component that generates URLs with query strings.
These URLs are of the form:

http://host+port stuff/path/filename?&param1=xyz&param2=lskjf…

It’s interesting how HTTP as touted as the “everyman protocol” with
claims that anyone can write a client or server, yet the RFC is so
complex that supposedly professional developers can’t get simple stuff
like this right.

I’ve checked both RFC 1738 and RFC 3986, which define URL formats, and
it appears to me that the first parameter in a query string may indeed
have “&” in front of the name. The 3rd party developer develops against
Apache and has never had this problem.

Yep, technically you can do this, but what does it mean? The standard
also says this should be a sequence of & and = characters that makes up
form values. So what does the stray & mean? How would you parse it?
This is generally translated into a Hash so is it dropped or is
something special done?

The issue you’re running into isn’t an HTTP level problem but more of
how an application should parse the parameters of a query form. When
they do this, it’s for a form from a browser. Without a good reason for
doing this it’s hard to justify suddenly allowing it. In fact I’d put
something like this in the realm of sneaky potentially dangerous stuff
since–if you decided to remove them when encountered–you could get
this:

/stuff/path/file?&&&&&&&&&&&&&&&&&&& &&& && &&&& &&&&&&param1=xyz

Just waiting to be abused there.

Instead, I’d tell the guy that while it’s not explicitly forbidden in
the HTTP RFC it is not standard practice, does not follow how any
browser submits forms, and isn’t allowed in many CGI processing
libraries.

Also, you should probably call him out on it since what he’s doing is
looping over the hash of params and he’s too lazy to do the join right
so he just tacks a “&” in front. I’m betting there’s other parts of the
code that are very questionable. He’s basically doing this:

string query = “”
foreach key in params
query += “&” + key + “=” + params[key]
end

He’ll most likely fight you on it rather than change it since he’s
probably got code like this all over and he thought it was an ultra
clever solution. He’s also most likely using a C type language with
poor strings that don’t have a “join”. If he’s using ruby then
definitely kick him to the curb since he should know better.

It’s classic Potpourri Turd Syndrome (his turd don’t stink, no everyone
else’s is crap).


Zed A. Shaw, MUDCRAP-CE Master Black Belt Sifu

http://mongrel.rubyforge.org/
http://www.lingr.com/room/3yXhqKbfPy8 – Come get help.

On Mon, Sep 18, 2006 at 10:34:06PM +0200, Wes G. wrote:

I’m talking about the very first query parameter in the query string
having a “&” in front of the name.

Yes, we know what you’re talking about. In reality, the “&” is before
the second param, as the first one is empty/missing.

Michael

Michael Darrin Chaney
[email protected]
http://www.michaelchaney.com/

My assertion is that the first parameter is allowed to have the “&” in
front of it, as in

filename?&param1=…

per the RFCs.

WG

Zed S. wrote:

Just some thoughts.

Yep, technically you can do this, but what does it mean? The standard
also says this should be a sequence of & and = characters that makes up
form values. So what does the stray & mean? How would you parse it?
This is generally translated into a Hash so is it dropped or is
something special done?

What would make the first & character “stray?” The standard says that
the query string lives between the “?” and the end of the URL (or a “#”
character if it exists). I don’t see why allowing the first & causes a
parsing problem. Then again, I don’t write parsers, so… :).

Instead, I’d tell the guy that while it’s not explicitly forbidden in
the HTTP RFC it is not standard practice, does not follow how any
browser submits forms, and isn’t allowed in many CGI processing
libraries.

I definitely see your point about “standard practice.” On the other
hand, there’s a reason that I went to the RFC since that defines the
standard (to the best of my knowledge). We’ll all give M$ crap for not
adhering to various HTML/CSS standards in a heartbeat, but here we are
picking and choosing. Again, I know what actually works is often
different than what should work, so that’s fine.

Apache will totally accept these kinds of URLs. So even if we go down
the path of “generally accepted,” I think you would agree that Apache is
the “ad-hoc gold standard” when it comes to what should be generally
accepted with regards to HTTP.

I’ve submitted a patch to him that handles this issue, and I’ve fixed my
local copy so it’s all good.

Thanks for the considered response,
Wes

On 9/18/06, Wes G. [email protected] wrote:

My assertion is that the first parameter is allowed to have the “&” in
front of it, as in

filename?&param1=…

per the RFCs.

Right. By empty param we mean query_string.split(‘&’).map { |pair| key,
value = pair.split(‘=’) } parses ‘&a=1&b=2’ as [[nil, nil], [‘a’, ‘1’],
[‘b’, ‘2’]]. Rails 1.2 ignores the first param whereas 1.1 tries to
make
sense of it and fails.

jeremy

Ah - the “empty parameter” that “exists” because of the use of
split(’&’) to parse the query string. Gotcha.

WG