Some questions about the upload module

Hi.

Since I’m writing a multipart form data parser for my WSGI framework:
http://hg.mperillo.ath.cx/wsgix/

I would like to ask a few questions about the upload module.

  • There is some external documentation for the module?
    For the example in the source, how is the request body passed to the
    PHP application encoded?

  • I see from the source that the accepted Content-Disposition values are
    form-data and attachment.

    But rfc2388 requires only form-data; why is attachment supported?

  • In case of multiple file fields with the same name, the UA should use
    multipart/mixed content type, but this is not supported by the
    module,
    I’m right?

    By the way, since rfc2388 does not defines what happens when multiple
    normal fields with the same are defined, I think I will not add
    support to multipart/mixed, too.

  • As per rfc2388, each part can have a content-transfer-encoding,
    but as far as I can see, this is not supported by the upload module.

    Does every know browser always use a 7bit/8bit/binary transfer
    encoding?

Thanks Manlio P.

“Manlio P.” [email protected] wrote:

For the example in the source, how is the request body passed to the
PHP application encoded?

I simply replace r->request_body->bufs with the chain of buffers,
containing new body. The ngx_http_read_client_request_body function
somehow manages to read no data more and fastcgi and proxy modules are
clever enough to serialize those buffers properly.

I think Igor can explain you more why it works.

Greetings!

Since I’m writing a multipart form data parser for my WSGI framework:
http://hg.mperillo.ath.cx/wsgix/

I would like to ask a few questions about the upload module.

  • There is some external documentation for the module?

No, there is no developer documentation. At the moment I see no reason
for
it to exist. I rarely document code and always keep all my projects in
memory.

For the example in the source, how is the request body passed to the
PHP application encoded?

  • I see from the source that the accepted Content-Disposition values are
    form-data and attachment.
    But rfc2388 requires only form-data; why is attachment supported?

As far as I remember Netscape and some old IEs did send “attachment”,
probably according to rfc 1806.

  • In case of multiple file fields with the same name, the UA should use
    multipart/mixed content type, but this is not supported by the module,
    I’m right?

Correct. I neither have seen any multipart/mixed in POST request nor
know
what to do with it.

By the way, since rfc2388 does not defines what happens when multiple
normal fields with the same are defined, I think I will not add
support to multipart/mixed, too.

  • As per rfc2388, each part can have a content-transfer-encoding,
    but as far as I can see, this is not supported by the upload module.

    Does every know browser always use a 7bit/8bit/binary transfer
    encoding?

Theoretically they can do something other than binary, but I haven’t
seen
such browser.

Manlio P. wrote:

The data is the same multipart/form-data encoded data, with extra
parameters (like the temporary file path) added in the content-disposition?

Yes, the data is the same multipart/form-data, but extra parameters are
added as fields.

Valery K. ha scritto:

“Manlio P.” [email protected] wrote:

For the example in the source, how is the request body passed to the
PHP application encoded?

I simply replace r->request_body->bufs with the chain of buffers,
containing new body. The ngx_http_read_client_request_body function
somehow manages to read no data more and fastcgi and proxy modules are
clever enough to serialize those buffers properly.

The data is the same multipart/form-data encoded data, with extra
parameters (like the temporary file path) added in the
content-disposition?

Thanks Manlio P.

Valery K. ha scritto:

clever enough to serialize those buffers properly.

The data is the same multipart/form-data encoded data, with extra
parameters (like the temporary file path) added in the
content-disposition?

Yes, the data is the same multipart/form-data, but extra parameters are
added as fields.

Can you make an example, using this sample data (from HTML 4.2 spec)?

Content-Type: multipart/form-data; boundary=AaB03x

--AaB03x
Content-Disposition: form-data; name="submit-name"

Larry
--AaB03x
Content-Disposition: form-data; name="files"; filename="file1.txt"
Content-Type: text/plain

... contents of file1.txt ...
--AaB03x--

Thanks Manlio P.

Manlio P. ha scritto:

Hi.

Since I’m writing a multipart form data parser for my WSGI framework:
http://hg.mperillo.ath.cx/wsgix/

I would like to ask a few questions about the upload module.

Valery, do you know if current browsers set the content-length for each
of the file fields?

Thanks again Manlio P.

of the file fields?
I have no idea. Anyway this doesn’t help for anything.

Manlio P. wrote:

Larry
--AaB03x
Content-Disposition: form-data; name="files"; filename="file1.txt"
Content-Type: text/plain

... contents of file1.txt ...
--AaB03x--

Assuming:

upload_set_form_field “${upload_field_name}_name” “$upload_file_name”;
upload_set_form_field “${upload_field_name}_type”
“$upload_content_type”;
upload_set_form_field “${upload_field_name}_path” “$upload_tmp_path”;

–AaB03x
Content-Disposition: form-data; name=“submit-name”

Larry
–AaB03x
Content-Disposition: form-data; name=“files_name”

file1.txt
–AaB03x
Content-Disposition: form-data; name=“files_type”

text/plain
–AaB03x
Content-Disposition: form-data; name=“files_path”

<path_temporary_file>
–AaB03x–

Where file <path_temporary_file> holds contents of file1.txt.

Valery K. ha scritto:

Content-Disposition: form-data; name="submit-name"

Where file <path_temporary_file> holds contents of file1.txt.

Then I’m not sure to understand.
What’s the benefit of using multipart/form-data if data is encoded in
this way?

Isn’t it better to use application/x-www-form-urlencoded?

submit-name=Larry&files_name=file1.txt&files_type=text/plain&files_path=<path_temporary_file

Note that this was the original idea I had in mind to implement.

Using multipart/form-data, I would instead return
–AaB03x
Content-Disposition: form-data; name=“submit-name”

Larry
–AaB03x
Content-Disposition: form-data; name=“files_name”; filename=“file1.txt”;
path=<path_temporary_file>

–AaB03x–

Sorry if I’m noisy, but this is important for me, since the architecture
of my multipart/form-data parser depends on this.

Thanks Manlio P.

“Manlio P.” [email protected] wrote:

Then I’m not sure to understand.
What’s the benefit of using multipart/form-data if data is encoded in
this way?

Isn’t it better to use application/x-www-form-urlencoded?

submit-name=Larry&files_name=file1.txt&files_type=text/plain&files_path=<path_temporary_file

Note that this was the original idea I had in mind to implement.

It depends. First, if you have simple form with one file field it might
be
more efficient to use application/x-www-form-urlencoded. But generally
you
may have something like a text area for an Email body and this
information
hardly fits into request URI and most frameworks are likely not to be
compatible with request URIs of unlimited size and this could be
troublesome.

Second, POST has side effects and GET hasn’t, thus they are incompatible
at this point.

Third, for application/x-www-form-urlencoded I have to do URL encoding.

All these things are arguments against application/x-www-form-urlencoded
from my point of view, thus I use multipart/form-data.

Using multipart/form-data, I would instead return
–AaB03x
Content-Disposition: form-data; name=“submit-name”

Larry
–AaB03x
Content-Disposition: form-data; name=“files_name”; filename=“file1.txt”;
path=<path_temporary_file>

–AaB03x–

I’m not sure an arbitrary framework can process “path” parameter and
moreover one would treat this field as file field, since filename is
present.

Valery K. ha scritto:

Valery K. wrote:

Valery, do you know if current browsers set the content-length for each
of the file fields?
I have no idea. Anyway this doesn’t help for anything.

However, I’m wrong. This would be a nice hint. I’m going to investigate in
this subject. Thanks.

I have checked with IE 6, Opera 8.52, Firefox (Iceweasel 3.0.1),
Konqueror 3.5.9, and they do not set the Content-Length.

They also never use multipart/mixed when there are multiple file fields
with the same name, as required(?) by HTML 4.2 spec and rfc 2388.

And, finally, they don’t set the Content-Type (with the charset
parameter) for the normal input fields.

Regards Manlio P.

Valery K. wrote:

Valery, do you know if current browsers set the content-length for each
of the file fields?

I have no idea. Anyway this doesn’t help for anything.

However, I’m wrong. This would be a nice hint. I’m going to investigate
in
this subject. Thanks.

Valery K. ha scritto:

at this point.

Note that I have never written that POST should be changed to GET.

Third, for application/x-www-form-urlencoded I have to do URL encoding.

This is a good point.
But Nginx has already support for the encoding.

All these things are arguments against application/x-www-form-urlencoded
from my point of view, thus I use multipart/form-data.

IMHO it is an overhead, but fine.
I will implement a WSGI middleware that will transform a POST
multipart/form-data in a POST with application/x-www-form-urlencoded.

[…]

Thanks Manlio P.