Upload file in a database or a server?

Hi,

I would like some suggestions. I’m wondering if it’s better to upload a
file directly into a database or to put the file into the server and
keep the path in the database?

Thanks,

Eric

I would like some suggestions. I’m wondering if it’s better to upload a
file directly into a database or to put the file into the server and
keep the path in the database?

There’s no blanket one-size-fits-all answer to this question, though
there’s certainly no lack of opinions on it.

The file system is designed for storing, well, files, so from a
performance perspective, you will generally get better results storing
uploads in the file system, everything else being equal. However,
databases are much better than they were even a few short years ago at
handling BLOBs, and storing data in the database is now very workable
for many situations.

The big downsides to storing in the file system, are:

  1. If you run distributed (multiple application servers), the uploaded
    file will only be available on the application server where it was
    uploaded.

  2. It makes it harder to migrate your system, because you have data in
    more places rather than just your codebase and database.

There are a number of factors to look at. If, for example, your
database uses a fiber-attached SAN while your application server has an
internal IDE hard drive, you probably will get better performance using
the database even with the overhead hit.

So, “better” in this case is going to depend on a lot of factors, many
of which can’t be determined from your post. If you want to post more
specifics about your situation, you might get some useful feedback. :slight_smile:

Jeff LaMarche wrote:

So, “better” in this case is going to depend on a lot of factors, many
of which can’t be determined from your post. If you want to post more
specifics about your situation, you might get some useful feedback. :slight_smile:

I know that my question was too general. But this is what I wanted:
general guidelines – pros and cons on different situations. Normally, I
have to use Windows 2000 with MS SQL Server 2000 (it’s an inside
policy). I do small applications to serve 200 to 1000 users. Thanks to
you and to other opinions/suggestions.

Eric B. wrote:

I know that my question was too general. But this is what I wanted:
general guidelines – pros and cons on different situations. Normally, I
have to use Windows 2000 with MS SQL Server 2000 (it’s an inside
policy). I do small applications to serve 200 to 1000 users. Thanks to
you and to other opinions/suggestions.

Hmm… not sure, really. I’ve never used SQL Server with Rails, so I
don’t know how well the connection adapter handles binaries. I know that
SQL Server is capable of handling Blobs quite decently, and the fact
that you have a small userbase would nudge me toward using the database
unless they are very large files (> 10 megs) being stored, but that’s
not really a very educated opinion given my lack of knowledge of
ActiveRecord with MSSQLServer. :-/

I would advise strongly against using the database to store files. I
just
don’t see the point. File_Column for Rails makes it really easy to sync
files on the file system with the database. It’s just a simpler, more
elegant solution than storing and streaming blobs, especially if you
expect
these files to be read often.

If you need to scale, you just devise a method for placing your file
directory on a shared location that all servers can access. Consider
using a
SAN for this or some other method. If your application gets so big that
you
have to scale to more than one machine, a SAN is a worthwhile
investment.

The only advantage I ever see for storing it in the database is the fact
that it’s easier to back up. That’s a myth too because I can easily
write a
quick job to back up my database and my files at the same time. With
SQL
Server, it’s more difficult, but you can use a DTS Package to launch a
script that will backup your files and backup the database.

Jeff:
I said I would advise against it, not that someone should never do it.
However, the folks at 37signals agree with my thoughts.

This comes from another post on this topic.

“Unless you’re Flickr, it’s likely that NFS will carry you a long way
for very little investment in complexity. The 37signals cluster is
using NFS to handle all file uploads for hosted applications.”
– David Heinemeier H.

Brian H. wrote:

The only advantage I ever see for storing it in the database is the fact
that it’s easier to back up. That’s a myth too because I can easily
write a
quick job to back up my database and my files at the same time. With
SQL
Server, it’s more difficult, but you can use a DTS Package to launch a
script that will backup your files and backup the database.

Sure, you can write a script to backup the file system separately from
the database. But that’s an additional point of failure, an additional
log to check, an additional backup file, an additional step to restore,
and an additional piece of code that exists outside of your Rails
application. It also means a cron job or scheduled tasks and it’s one
more thing that has to be communicated to your successor or client when
you hand off the system, and one more thing that can get forgotten on a
hardware migration or system restore. Sure, none of these issues are
fatal, and they all can be mitigated by good procedures and
documentation, but in my personal definition of “elegance”, simplicity
is a factor, and you undoubtedly simplify your overall solution by
having files in the database, especially in a distributed environment,
though that simplicity does come at a price.

Don’t get me wrong: the file system is the right place to store files a
lot of the time, but it’s a bit shortsighted to make a blanket statement
that you should never, ever do it. Maybe in your personal experiences
it’s always made sense to store them in the file system, but I’ve
personally seen several situations where it made a lot more sense to
store files as Blobs. I’ve seen environments where access to the SAN was
limited; the database servers had it, but the application server didn’t
and couldn’t get it. I’ve seen IT staffs that were understaffed to the
point that even a few additional tasks to be monitored made a difference
but where the hardware resources devoted to the application were more
than capable of handling blobs for the application’s volume.

I’ve rarely seen a valid blanket statement in this industry. “Don’t
store files in the database” is not axiomatic.