Scalable file uploads with Rails


I’m involved in a project where I have to re-architect file uploads in
a Rails application to make it scalable. Users will be uploading large
XML files (approx. 1MB) with high probability of overlap (upload at
the same time) - which we try to minimize. The current system runs
Mongrel cluster (3 Mongrels) and Apache mod proxy balancer. The file
upload is done using attachment_fu.

What choices do I have?

  1. Throw more Mongrel processes in the Mongrel cluster. We are already
    have other applications running Mongrel clusters on the same machine,
    so this option is limited.

  2. Use BackgrounDRb. I looked a bit into BackgroundDRb, but I’m not
    sure it can help. Even if a middleman passes the upload task to a
    worker process, would that work? First of all, can you even pass the
    upload task? How would you do it? Would that completely free up the
    Mongrel process? Would I have to scale the BackroundDRb process, or is
    there scalability built in? I couldn’t find an example on the web that
    does just that.

  3. Use Merb. I’m still trying to get my head around it. I found 2
    examples that show how to do file uploads with Merb, but they are
    kinda old, and Merb went through a lot of changes in the last year.
    Even if I could get one upload example working, how do I deal with
    scalability? Would I start a bunch of these Merb processes and use a
    proxy balancer to distribute the file uploads? From what I’m reading,
    these would take much less memory than having Mongrel processes
    running Rails, so I guess that would help me. I don’t think I’ve seen
    any examples on the web that do it.

  4. Write my own cgi c/c++ upload functionality. This will get nasty
    because files are transmitted with multipart where each packet has a
    header, etc. If I could get this to work, then I leave the upload
    functionality to Apache (which I guess would do a good job about
    scaling the uploads and it will be fast too) and I’ll run some Ruby
    cron jobs which parse the files on the web server.

I appreciate feedback to any of these choices.


if you are up to it, you can also use JRuby. JRuby uses native
threads so you should get good non-blocking performance without having
to configure any “runtimes”. I use it and get great performance.


We started using the nginx upload module about a month ago and it works
great. Whatever you do you don’t want rails in the file upload loop on a
busy site. You can easily starve out other requests and put your servers
into a death spiral.