HTTP PUT for file uploads

mike · September 6, 2008, 11:22pm

On Sat, Sep 6, 2008 at 8:35 AM, Kon W. [email protected] wrote:

Interesting. This is exactly what we have been doing to get files
transmitted over satellite for many years ;-). Ofcourse the problem
there is one of no ACKs or NACKs during the transmission – they must
be batched up and sent via backchannel after the transmission has
ended, for retransmit. If this is something that interests you I would
suggest looking at protocols like NORM.

Well, I’d want to speak standard HTTP, just like anything else, and I
think it would be easy. Doesn’t have to get into low-level TCP, as the
chunks should be done in such a way that it’s like any normal HTTP
request. This will alleviate firewall issues, max upload size issues
(both webserver and PHP/scripting level limits), etc.

Just want to make sure the client side is memory efficient and doesn’t
load the entire file into memory to seek byte ranges, for example. Not
sure if that can be done or not, or if it’s language specific, etc.

But it is a good idea, no? I’ve been struggling to find any products
for a long time, and nothing is out there that doesn’t require ugly
applets and server configuration changes and/or separate servers etc.

mike · September 8, 2008, 11:22pm

On Sat, Sep 6, 2008 at 4:45 AM, Valery K.
[email protected] wrote:

|<-- part1 0-19999/80000 -->|<-- zeroes 000000 -->|<-- part2
40000-59999/80000 -->|

Does anyone know what would be easiest to add in to nginx? I assume
most would require external libraries and I would like to keep nginx’s
footprint low and not require an excessive amount of external
libraries.

The idea here would be the client encodes using one of these, and the
server will decode, to ensure safe transfer of the file (please let me
know Valery if you think this is totally not needed, I expect every
segment to be checksummed anyway, so maybe it doesn’t need to be)

uuencode/uudecode
base64 encoding/decoding
yenc (www.yenc.org)
? any other encoding methods ?

mike · September 9, 2008, 9:52am

On pon, wrz 08, 2008 at 02:12:30 -0700, mike wrote:

uuencode/uudecode
base64 encoding/decoding
yenc (www.yenc.org)
? any other encoding methods ?

UU/base64 are quite similar (both are rather simple and expand the
encoded text by 33%), yenc I haven’t heard of before. Do you need to
encode the chunks in any way? I’d probably send them raw with some
checksum (CRC32, MD4, MD5, SHA-something, depending on the application)
in headers.

BTW, if you keep the chunk size as a multiple of page size (usually 4K,
though you may encounter 8K and maybe more), you should be able to
mmap() chunks of the file for minimum overhead.

Best regards,
Grzegorz N.

mike · September 6, 2008, 11:22pm

On Sat, Sep 6, 2008 at 4:45 AM, Valery K.
[email protected] wrote:

multipart/form-data does not bloat the size of the file, it doesn’t
encode anything. rfc 1867 doesn’t explicitly say that there any encoding
should be applied;

that’s weird. i thought someone said it bloats it. i will have to
update that next time i post about this.

the ideal solution is to have byte ranges instead of Segment ID, since
concatenation of parts is not a scalable operation. With byte ranges the
server will be able to put the part into the proper place in the file, while
leaving other parts empty. I.e. if I have two parts
with byte ranges 0-19999/80000 and 40000-59999/80000, I can lseek to the
second part and start writing two parts simultaneously:

|<-- part1 0-19999/80000 -->|<-- zeroes 000000 -->|<-- part2
40000-59999/80000 -->|

The reason I decided segments is due to being able to transfer
multiple segments in parallel, and I don’t know enough about server
side code and shared filesystems to know if it would work properly
over NFS or something else. I am thinking of a solution that has no
“it may not work” type of restrictions.

If only one file can be sent at a time, then I was thinking PHP (since
this was a first attempt into it, and I only know PHP) can seek to the
specific byte range; however, being able to split up the file into
segments and send them at will allows for multiple segments to be
uploaded for the same file and does not have any NFS/locking risks.

If you want to code an extension that does this cleaner and uses
byteranges that will be safe over a network filesystem like NFS that
works for me I only know PHP, and have assumptions based on how
other things do it.

A 2 gig file at 128k chunks (segments) would wind up being 2000 * 8 =
16000 chunks. Thats a lot of files. I started thinking of making
“superchunks” which would be groupings of 100 chunks or something, so
after 100 chunks (in a row) were successful it would glue those
together, and reduce the number of files…

If this idea at least sounds viable, I think I could scrap together a
decent amount of cash from my side business and my company to fund
this. It would have to support operations safely over NFS, CIFS, or
single servers (so a local /tmp file wouldn’t work for NFS, since the
requests could be sent to any of the webservers, so it would have to
be on a shared directory, which should be user configurable)

I suppose client-side isn’t too hard to seek throughout a file as it
doesn’t have to worry about odd locking issues and writing. It would
be great if the server end could be created in a way that it could be
come an “unofficial” standard on how to upload large files or with
unreliable or slow connections.

It would also take care of progress bar stuff as it could give
feedback when chunks get completed back to the client, and the client
knows how fast it is sending data… so during a chunk it would be
relying on it’s own internal transfer stats, and it would be able to
confirm up to byte 70000 is completed on the server for example…

Also, there’s an issue of garbage collection. A job would have to
clean out the [shared] temp directory after a while - I thought
something like 4 days would be nice [user configurable is best],
because a 2 gig file could take a long time and people might have to
resume it over the course of a couple days, but any longer we’d have
to assume it’s an orphan file that won’t be resumed again.

If you’re interested in this, I would love to have someone as
experienced as you - who has already dealt with handling file uploads
and created nginx modules! Let me know, feel free to contact me off
list. We might have to work out some more specifics, and you might
want to know how much $$ - I’d have to ask at work what they would pay
for it, but I’d pledge $500 out of my own pocket. I don’t believe any
other webserver or anything out there has anything like this (besides
maybe some thick client apps with specific servers that only handle
file uploads…)

Ideally, this would be something that other people could create
modules for Apache, etc. as well and it could be adopted by browsers
directly and alleviate the need for thick Java/Flash/etc apps. If it’s
done in a “standard” enough way to be re-creatable on the client and
the server…

I’d love to hear any thoughts, opinions, get any code going, etc. I’ve
actually got a PHP version of the server component that I think is
actually functional (with minimal amounts of code, surprisingly) but
don’t have a client to test it with yet. Was going to create a PHP
client to test it too.

mike · September 9, 2008, 10:34am

On Tue, Sep 9, 2008 at 12:43 AM, Grzegorz N.
[email protected] wrote:

UU/base64 are quite similar (both are rather simple and expand the
encoded text by 33%), yenc I haven’t heard of before. Do you need to
encode the chunks in any way? I’d probably send them raw with some
checksum (CRC32, MD4, MD5, SHA-something, depending on the application)
in headers.

BTW, if you keep the chunk size as a multiple of page size (usually 4K,
though you may encounter 8K and maybe more), you should be able to
mmap() chunks of the file for minimum overhead.

Thanks for the pointers - Valery would be implementing this for nginx
(unless someone else did) but he’s already got quite a good amount of
experience and maybe I can talk him into it

After thinking about it more, it probably doesn’t need to be encoded,
as long as it is checksummed. I suppose headers can be used for that.

Also it would be neat if the server advertised it’s advanced upload
capability, or something so that the client side applet (and possibly
built-in to webservers some day) would know “hey, if I upload to this
host, I can use the intelligent transfer system”

mike · September 9, 2008, 11:35am

That would be the plan. First gotta get a working implementation done
and the kinks in the “protocol” but you can bet I will push the idea/
protocol/whatever to every browser and web server to get more adoption.

On Sep 9, 2008, at 1:37 AM, Phillip B Oldham
<[email protected]

mike · September 9, 2008, 10:45am

mike wrote:

Also it would be neat if the server advertised it’s advanced upload
capability, or something so that the client side applet (and possibly
built-in to webservers some day) would know “hey, if I upload to this
host, I can use the intelligent transfer system”

And if you provide enough documentation on it, someone will probably
write a firefox plug-in to take advantage of the system with the default
control. I might even take a whack at it if I’ve
got the time.

Phillip B Oldham
The Activity People
[email protected] mailto:[email protected]

Policies

This e-mail and its attachments are intended for the above named
recipient(s) only and may be confidential. If they have come to you in
error, please reply to this e-mail and highlight the error. No action
should be taken regarding content, nor must you copy or show them to
anyone.

This e-mail has been created in the knowledge that Internet e-mail is
not a 100% secure communications medium, and we have taken steps to
ensure that this e-mail and attachments are free from any virus. We must
advise that in keeping with good computing practice the recipient should
ensure they are completely virus free, and that you understand and
observe the lack of security when e-mailing us.

mike · September 9, 2008, 6:18pm

The only way to do this would be to put some ammunition behind it - an
IETF draft.

Cheers
Kon

mike · September 9, 2008, 9:48pm

On Tue, Sep 9, 2008 at 8:32 AM, Kon W. [email protected] wrote:

The only way to do this would be to put some ammunition behind it - an
IETF draft.

Cheers
Kon

Of course.

Although I think it might be good to have some real-world traction and
examples. I also don’t have time to try to rally official support, I
need it done sooner rather than later. Once it’s working then i might
try to bug people about getting something added to a spec for more
robust HTTP uploading (and hopefully adopt our key learnings /
“protocol”)

HTTP PUT for file uploads

And if you provide enough documentation on it, someone will probably write a firefox plug-in to take advantage of the system with the default control. I might even take a whack at it if I’ve got the time.

And if you provide enough documentation on it, someone will probably
write a firefox plug-in to take advantage of the system with the default
control. I might even take a whack at it if I’ve
got the time.