Hosting images : DB or File System

pratik · June 4, 2006, 12:20pm

Hi,

I’m developing an application where I’ll have to store a lot of images
coming from the users. And I’m still not sure if I should store them
in MySQl as blob or just store them on filesystem.

If I store them on filesystem, how to scale when I’ll have to have
multiple servers ?

Thanks,
Pratik

pratik · June 4, 2006, 12:57pm

On 6/4/06, Pratik [email protected] wrote:

Pratik

rm -rf / 2>/dev/null - http://null.in

Rails mailing list
[email protected]
http://lists.rubyonrails.org/mailman/listinfo/rails

Indeed storing in the filesystem is a “quick and dirty” solution with
many problems. Scaling is just one of them. If you binary image data
is part of your system’s data, use the database.

pratik · June 4, 2006, 2:50pm

Pratik wrote:

I’m developing an application where I’ll have to store a lot of images
coming from the users. And I’m still not sure if I should store them
in MySQl as blob or just store them on filesystem.

If I store them on filesystem, how to scale when I’ll have to have
multiple servers ?

How to scale images on the file system with multiple servers:

Start from scratch (0 users/hits/images) and move to the next level when
the load/response/metric of your choice becomes unacceptable. At each
step, various hardware upgrades and optimizations are possible -
additional/faster disks, disk arrays - so there are more steps than
these.

Store the images on the same box as your db, web server and
application.
Move your db to a box of its own.
Move your application to a box of its own. Images stay on web server
box. Route image requests separately from application requests on the
web tier.
Add application servers. The web tier can serve static requests
faster than your application in all probability.
Move the images to a dedicated image server.
Create an image server cluster
Create multiple global image server clusters

Your path through these steps could be slightly different depending on
the complexity of you application, the size and number of images, your
requirement for immediate availability, patterns of use of images,
frequency of addition of new images, processing done on newly acquired
images, etc. You might add a dedicated image preprocessing server at
some stage of this build out.

I would be interested in seeing how you could handle images in the db
through the same scaling scenario. I’m not saying you couldn’t do it, or
even that it might not be a better fit under certain conditions, but it
would not be my first choice in common web applications that include
5-50 images per page.

If you go the db route, at every step of the build out you are going to
have to serve the images through the whole stack (db=>app=>web). That
reduces your flexibility in expanding and introduces a lot of overhead
for every transaction. If that weren’t enough, replicating multiple dbs
is much more brittle than syncing multiple file systems in my
experience.

That you are thinking about this at all is probably a premature
optimization. You don’t really know what the use patterns of your
application are, where the bottlenecks are, what problems can be solved
with existing hardware, etc. You don’t even really know what “a lot” is.
Is it 10K, 100K, 1M, 10M, or 100M? How many users? How many images per
user? How often do they view them?

For now, encapsulate access to the images and you can change when the
requirements become clearer.

–

Ray

pratik · June 5, 2006, 6:37am

If I store them on filesystem, how to scale when I’ll have to have
multiple servers ?

Unless you’re Flickr, it’s likely that NFS will carry you a long way
for very little investment in complexity. The 37signals cluster is
using NFS to handle all file uploads for hosted applications.

David Heinemeier H.
http://www.loudthinking.com – Broadcasting Brain
http://www.basecamphq.com – Online project management
http://www.backpackit.com – Personal information manager
http://www.rubyonrails.com – Web-application framework

pratik · June 4, 2006, 3:29pm

If this is a big solution then I would consider MogileFS. I first
heard about it in a podcast interview with the creator of odeo.com and
they use it to store and distribute MP3. It’s also used in
LiveJournal. It’s good at both distributing the load, providing file
replication for security and great fault tolerance.
Another plus is that from the ruby client level it’s actually simpler
that file storage.

The server and service info is here:
http://www.danga.com/mogilefs/
Ruby client libraries(mogilefs-client) are here:
dev.robotcoop.com

On 6/4/06, Pratik [email protected] wrote:

Pratik

rm -rf / 2>/dev/null - http://null.in

Rails mailing list
[email protected]
http://lists.rubyonrails.org/mailman/listinfo/rails

–

pratik · June 5, 2006, 7:04am

Yeah, I’m with David on this one, we just wrote a remote-backup
application
for our senior design project, here in Portland, using MySQL as the DB
back
end. We were seeing how the DB would freak-out with files being stored
within the Tables. Not a good idea, since MySQL does single file
Tables,
and our machine was pretty wimpy with half gig-o-ram and 100 gig HDDs
that
machine was swapping like there was no tomorrow once we approached 1gigs
worth of files stored.

We also looked at doing a distributed version for scaling, but had the
same
problem with swapping. Store the links/paths to the files on a FS/NFS
and
have fun!

Phil Johnston
http://newsclobber.com // News Aggregation for fun!
http://gwid.dietpeach.com // Event planning for fun!

pratik · June 5, 2006, 9:03am

are you used MogileFS : mogilefs · GitHub

2006/6/5, David Heinemeier H. [email protected]:

http://www.basecamphq.com – Online project management
http://www.backpackit.com – Personal information manager
http://www.rubyonrails.com – Web-application framework

Rails mailing list
[email protected]
http://lists.rubyonrails.org/mailman/listinfo/rails

–
Best Regards,

Caiwangqin
http://www.uuzone.com
Mobile: +8613951787088
Tel: +86025-84818086 ext 233
Fax: +86025-84814993

pratik · June 5, 2006, 9:09am

Thanks everyone

I’ll post about whatever the approach I take. I’m sure I’ll need more
inputs on the setup.

Regards,
Pratik

pratik · June 5, 2006, 8:08pm

On 5-jun-2006, at 6:35, David Heinemeier H. wrote:

If I store them on filesystem, how to scale when I’ll have to have
multiple servers ?

Unless you’re Flickr, it’s likely that NFS will carry you a long way
for very little investment in complexity. The 37signals cluster is
using NFS to handle all file uploads for hosted applications.

The only thing to watch out for is for the directory length - some
filesystems will throw up (literally) after 3-4 thousand files are in
a single directory. file_column manages this very nicely by
segmenting uploads into dirs all by itself. Storing images in a DB is
a no-no - it’s +1 (convenient for you) and -6 for all the others.

–
Julian ‘Julik’ Tarkhanov
please send all personal mail to
me at julik.nl

pratik · June 6, 2006, 12:28am

The Wikimedia project (including Wikipedia) stores files on the DB
without using anything fancy, and they don’t seem to have a problem.

Now, I’m not saying we all should do that; I’m just adding a point to
the discussion.

-Nathan

pratik · June 6, 2006, 8:17am

The only thing to watch out for is for the directory length - some
filesystems will throw up (literally) after 3-4 thousand files are in
a single directory

the subject is misleading. all filesystems are databases. just some are
more efficent than others - and youre going to get more efficiency out a
database backed by a block device than backed by another database (eg
mysql).

Reiser4 claims to handle 10 million files in a directory no problem. im
guessing ZFS i up to the task as well…

subdir=filename.split("")[0…n].join("/") can buy a lot of leeway on
other more crusty filesystems though…

pratik · June 6, 2006, 2:23am

Well… Not everyone can afford 150 servers and not everyone gets free
bandwidth and rackspace in 2 different countries.
And still they have bad days.

On 6/5/06, [email protected] [email protected] wrote:

On 5-jun-2006, at 6:35, David Heinemeier H. wrote:
a single directory. file_column manages this very nicely by
Rails mailing list
[email protected]
http://lists.rubyonrails.org/mailman/listinfo/rails

Rails mailing list
[email protected]
http://lists.rubyonrails.org/mailman/listinfo/rails

–

Hosting images : DB or File System

Pratik

Unless you’re Flickr, it’s likely that NFS will carry you a long way for very little investment in complexity. The 37signals cluster is using NFS to handle all file uploads for hosted applications.

Pratik

Unless you’re Flickr, it’s likely that NFS will carry you a long way
for very little investment in complexity. The 37signals cluster is
using NFS to handle all file uploads for hosted applications.