I’m involved in a project where I have to re-architect file uploads in
a Rails application to make it scalable. Users will be uploading large
XML files (approx. 1MB) with high probability of overlap (upload at
the same time) - which we try to minimize. The current system runs
Mongrel cluster (3 Mongrels) and Apache mod proxy balancer. The file
upload is done using attachment_fu.
What choices do I have?
Throw more Mongrel processes in the Mongrel cluster. We are already
have other applications running Mongrel clusters on the same machine,
so this option is limited.
Use BackgrounDRb. I looked a bit into BackgroundDRb, but I’m not
sure it can help. Even if a middleman passes the upload task to a
worker process, would that work? First of all, can you even pass the
upload task? How would you do it? Would that completely free up the
Mongrel process? Would I have to scale the BackroundDRb process, or is
there scalability built in? I couldn’t find an example on the web that
does just that.
Use Merb. I’m still trying to get my head around it. I found 2
examples that show how to do file uploads with Merb, but they are
kinda old, and Merb went through a lot of changes in the last year.
Even if I could get one upload example working, how do I deal with
scalability? Would I start a bunch of these Merb processes and use a
proxy balancer to distribute the file uploads? From what I’m reading,
these would take much less memory than having Mongrel processes
running Rails, so I guess that would help me. I don’t think I’ve seen
any examples on the web that do it.
Write my own cgi c/c++ upload functionality. This will get nasty
because files are transmitted with multipart where each packet has a
header, etc. If I could get this to work, then I leave the upload
functionality to Apache (which I guess would do a good job about
scaling the uploads and it will be fast too) and I’ll run some Ruby
cron jobs which parse the files on the web server.
I appreciate feedback to any of these choices.