Pratik wrote:
I’m developing an application where I’ll have to store a lot of images
coming from the users. And I’m still not sure if I should store them
in MySQl as blob or just store them on filesystem.
If I store them on filesystem, how to scale when I’ll have to have
multiple servers ?
How to scale images on the file system with multiple servers:
Start from scratch (0 users/hits/images) and move to the next level when
the load/response/metric of your choice becomes unacceptable. At each
step, various hardware upgrades and optimizations are possible -
additional/faster disks, disk arrays - so there are more steps than
these.
-
Store the images on the same box as your db, web server and
application.
-
Move your db to a box of its own.
-
Move your application to a box of its own. Images stay on web server
box. Route image requests separately from application requests on the
web tier.
-
Add application servers. The web tier can serve static requests
faster than your application in all probability.
-
Move the images to a dedicated image server.
-
Create an image server cluster
-
Create multiple global image server clusters
Your path through these steps could be slightly different depending on
the complexity of you application, the size and number of images, your
requirement for immediate availability, patterns of use of images,
frequency of addition of new images, processing done on newly acquired
images, etc. You might add a dedicated image preprocessing server at
some stage of this build out.
I would be interested in seeing how you could handle images in the db
through the same scaling scenario. I’m not saying you couldn’t do it, or
even that it might not be a better fit under certain conditions, but it
would not be my first choice in common web applications that include
5-50 images per page.
If you go the db route, at every step of the build out you are going to
have to serve the images through the whole stack (db=>app=>web). That
reduces your flexibility in expanding and introduces a lot of overhead
for every transaction. If that weren’t enough, replicating multiple dbs
is much more brittle than syncing multiple file systems in my
experience.
That you are thinking about this at all is probably a premature
optimization. You don’t really know what the use patterns of your
application are, where the bottlenecks are, what problems can be solved
with existing hardware, etc. You don’t even really know what “a lot” is.
Is it 10K, 100K, 1M, 10M, or 100M? How many users? How many images per
user? How often do they view them?
For now, encapsulate access to the images and you can change when the
requirements become clearer.
–
Ray