Something like a file server

mohits · October 18, 2009, 10:12am

Hi, I’m sorry if this is slightly OT, but I’m trying to find a way to do
the following. I have a bunch of processes that generate regular update
files. Each file may be between 100KB - 4MB in size.

On the other side, I have people who want to pick up the most recent
version of this file. So, I need to give them a fixed URL to the file.
When they make a request, they would like to get only the most recent
file. Initially, I thought that something like FTP would work, but I
run into the problem that if a request comes in when the file is being
updated, the client may not get a valid file.

The other extreme is to have an upload controller and a download
controller and then store the file in a database, so that it can be
served up through the database. The database serializes it so that I
don’t have to worry. But, since the files can be large, it seems a bit
of a waste to use this approach.

Is there a better way? Anything that you would recommend?

Thanks,
Mohit.

mohits · October 18, 2009, 10:37am

On Sun, Oct 18, 2009 at 01:10, Mohit S. [email protected]
wrote:

updated, the client may not get a valid file.
Mohit.

A common approach used for something like this is to have a “current”
symlink, and update it, whenever you have a newer file.

Eg:
$ touch some-file-v1
$ ln -s some-file-v1 current-version
$ touch some-file-v2
$ ln -sf some-file-v2 current-version

If you give people the URL to “current-version”, and only update the
symlink after you’ve created the new version, then they won’t be
downloading a file before it’s completely written out to disk.

We use the “store the files in the DB” approach for a few of our
projects at $work, and that works pretty well for us. It’s not really
a waste, especially if you plan on having multiple “physical”
webservers.

-Jacob

mohits · October 18, 2009, 10:44am

Hi Jacob,

Thanks for the quick reply.

Jacob H. wrote:

symlink after you’ve created the new version, then they won’t be
downloading a file before it’s completely written out to disk.

This seems simple enough! Would this approach also work if someone was
already accessing the older file when we try to do the second set of
steps:

$ touch some-file-v2
$ ln -sf some-file-v2 current-version

Can I update a symlink while someone is already reading a file?

We use the “store the files in the DB” approach for a few of our
projects at $work, and that works pretty well for us. It’s not really
a waste, especially if you plan on having multiple “physical”
webservers.
Actually, I do use this for one of our solutions. The only concern is
that you need many more Mongrels if your files are very large - since
sending the file from database locks up the Mongrel for a longer period
of time… with small files, it works quite well.

Cheers,
Mohit.
10/18/2009 | 4:42 PM.

mohits · October 18, 2009, 11:06am

On Sun, Oct 18, 2009 at 01:42, Mohit S. [email protected]
wrote:

$ touch some-file-v1
already accessing the older file when we try to do the second set of steps:

$ touch some-file-v2
$ ln -sf some-file-v2 current-version

Can I update a symlink while someone is already reading a file?

This shouldn’t cause a problem, because the file was already opened
using the old “real” file’s information about where it is on disk.
(Though this is pretty easy to confirm for whatever your particular
setup is.)

We use the “store the files in the DB” approach for a few of our
projects at $work, and that works pretty well for us. Â It’s not really
a waste, especially if you plan on having multiple “physical”
webservers.
Actually, I do use this for one of our solutions. Â The only concern is
that you need many more Mongrels if your files are very large - since
sending the file from database locks up the Mongrel for a longer period
of time… with small files, it works quite well.

It should also be possible to have whatever is delegating to Mongrel
directly serve up files that exist on disk already. Then you could
save things from the DB to disk, and not tie up a Mongrel worker after
the first hit. This makes your “always download the most recent
version” trickier, though.

-Jacob

mohits · October 18, 2009, 11:31am

Jacob H. wrote:

It should also be possible to have whatever is delegating to Mongrel
directly serve up files that exist on disk already. Then you could
save things from the DB to disk, and not tie up a Mongrel worker after
the first hit. This makes your “always download the most recent
version” trickier, though.

Thanks Jacob. I think you’ve given me a couple of pointers on how to
proceed from here. I guess the above idea can be implemented using
Rails caching also.

Cheers,
Mohit.
10/18/2009 | 5:29 PM.