Streaming from Database

I need to store rather large data in the database using blobs. I’ve been
working with Rails and have designed a proof-of-concept setup that works
pretty well, except there’s a considerable performance hit with very
large files due to the fact that ActiveRecord loads the entire blob into
memory before the data can get passed to send_data.

I know at least one person is going to feel obliged to jump in here and
tell me that I’m an idiot for trying to store large files in the
database. Please don’t; there are valid reasons for doing this here,
including:

  1. It’s a hard requirement. An existing J2EE system is able to stream
    data from the database by only allocating a buffer; I’m trying to show
    that Rails can handle it as well
  2. There will be multiple instances of the application running on
    multiple boxes all connecting to the same database, so the database is
    the only place we can store data that will be accessible to all
    instances
  3. The database is a big honking box with a fiber connected SAN which
    will perform much better than than the single IDE drives in the
    application server boxes even if we could replicate the files to every
    instance

I know that doing this will probably involve some database specific
code, and I don’t mind rolling up my sleeves and writing code, but I’m
still relatively new to Rails, so if anyone could offer any advice or
input about the best way to approach this or what objects and methods I
need to override, I’d really appreciate it.

Thanks much,
Jeff

BTW: My proof-of-concept system is PostgreSQL; the production system (if
and when) will either be Oracle or PostgreSQL. Right now I’m just
interested in PostgreSQL.

Okay, in case anyone is interested, I dove in and found a solution. I
don’t claim it’s the best solution, but here is one way that you can
bypass the memory overhead with large objects stored as bytea fields
(blobs) in PostgreSQL. The solution code is database-specific, so all
you MySQL folks won’t be able to use it, sorry.

Here was my original code to retrieve the object:

@thePic = Picture.find(params[:id])
send_data(@thePic.raw.data, :filename => @thePic.raw.filename, :type =>
@thePic.image.content_type, :disposition => “attachment”)

This wasn’t useable for my situation for large files. When I had a large
file (>10 megs), this code would take 20 seconds or longer before I
would begin to see the download in my dev environment (17" MacBook Pro).

With a little research and tweaking, I replaced it with this:

result = ActiveRecord::Base.connection.execute(“select a.content_type,
a.filename, a.data from picture_data a, pictures b where b.raw_image_id
= a.id and b.id = #{params[:id]}”);
@response.headers[‘Pragma’] = ’ ’
@response.headers[‘Cache-Control’] = ’ ’
@response.headers[‘Content-type’] = result.getvalue(0,0)
@response.headers[‘Content-Disposition’] = “attachment;
filename=#{result.getvalue(0,1)}”
@response.headers[‘Accept-Ranges’] = ‘bytes’
@response.headers[‘Content-Transfer-Encoding’] = ‘binary’
@response.headers[‘Content-Description’] = ‘File Transfer’
render_text result.getvalue(0,2)

It’s longer, ugly, and not as readable, but the end result? No
noticeable delay before the download starts. There most likely are ways
to improve upon this, but as it is, this gives me an order of magnitude
performance boost and makes it possible for me to keep large files in
the database :slight_smile:

Comments, improvements, etc. welcome.

Jeff

Whoah… hold off on using that - it doesn’t work. :-/