Streaming a large XML file; optimizing large file downloads in RAILS

I occasionally need to stream a large XML data file that represents
key data in a DB. I’m porting over an application from PHP Symfony,
and with my initial implementation, it takes around 7 times as long
with rails. Also with Symfony, data begins to download almost as soon
as I invoke the URL, whereas with rails, all data is processed on the
server side before the client gets the first byte. I have a hand-
crafted query to hit the DB, and use fetch_hash to use the raw data
from the mysql gem and that renders extremely quickly. Also I’ve
tried to write a tiny subset of XML while reading the entire
resultSet; with that I get much faster performance, but of course that
way the XML doesn’t come.

I spent most of the past weekend trying to determine how to optimize
this (hoping to do at least as well as PHP symfony) but can’t do it.

I tried:

  • used render :text => (lambda do |response, output| … )
  • ruby 1.8.7 vs. ruby 1.9.2
  • rails 2.3.5 vs. rails 3
  • XmlBuilder vs. Nokogiri::XML::Builder
  • HAML vs ERB
  • passenger vs. script/server
    Nothing honestly moved the performance needle in a serious way.

I’ve finally come to the conclusion that rails does not stream out as
I’d expect. Here’s a look at the perf stats rendered as the request
runs:

Rendered hgrants/_request_detail (2.2ms)
Rendered hgrants/_request_detail (3.9ms)
Rendered hgrants/_request_detail (2.4ms)
Rendered hgrants/_request_detail (2.3ms)
Rendered hgrants/_request_detail (242.7ms)
Rendered hgrants/_request_detail (2.2ms)
Rendered hgrants/_request_detail (1.9ms)
Rendered hgrants/_request_detail (1.8ms)

We went from an average 2ms up to 242ms then back down. I saw this
sporadically throughout the 1000 template renderings That suggests to
me that memory is getting garbage collected. Also, I’m invoking the
request from curl, and it reports no data downloaded until after my
logfile tells me rails has finished processing all records in the
view. The model IDs that result in the over-sized ms count vary from
one request to another, so I’m convinced there is nothing in the app
that is doing this. I even tested this by removing the call to the
HAML template and replacing it with a block of generated text and
observed similar behavior.

This is how I’m invoking HAML from the XML Builder template:
xml << render(:partial => ‘hgrants/
request_detail.html.haml’, :locals => { :model => model })

I also tried using this trick to try to get it to stream, but I
observed exactly the same behavior; no data showed up in curl until
all records had been processed.
render :text => (lambda do |response, output|
extend ApplicationHelper

     xml = Builder::XmlMarkup.new(
       :target => StreamingOutputWrapper.new(output),
       :indent => 2)
    eval(default_template.source, binding, default_template.path)
 end)

(Also, in rails 3, the render :text with a Proc, rails 3 renders the
Proc as a to_str rather than calling it.)

This particular issue I can certainly work around but it’s
disappointing if it’s true that there’s no way in rails to stream
output to the browser for large pages. And particularly disappointing
if PHP/Symfony can outgun rails for streaming. I’ve been using rails
since 2006 and most requests have fairly small responses so maybe the
answer is to defer to a different technology for streaming larger
files. But it seems like there should be a good solution for
streaming data and flushing the output stream.

Any help is greatly appreciated!
Eric

On Oct 11, 2:44 pm, ehansen486 [email protected] wrote:

I occasionally need to stream a large XML data file that represents
key data in a DB. I’m porting over an application from PHP Symfony,
and with my initial implementation, it takes around 7 times as long
with rails.
[…]
I’ve finally come to the conclusion that rails does not stream out as
I’d expect.
[…]

Have you tried send_data? I think that’s what most people use to
stream dynamic content.

Alternatively, how does Symfony do its streaming? Can you write
something equivalent for Rails?

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

On Oct 11, 2:44pm, ehansen486 [email protected] wrote:

I occasionally need to stream a large XML data file that represents
key data in a DB. I’m porting over an application from PHP Symfony,
and with my initial implementation, it takes around 7 times as long
with rails.
[…]
I’ve finally come to the conclusion that rails does not stream out as
I’d expect.
[…]

Have you tried send_data?

Alternatively, how does Symfony do its streaming? Can you write
something equivalent for Rails?

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

On Oct 11, 7:44pm, ehansen486 [email protected] wrote:

Nothing honestly moved the performance needle in a serious way.

I’ve finally come to the conclusion that rails does not stream out as
I’d expect. Here’s a look at the perf stats rendered as the request
runs:

it doesn’t. Rails 3.1 will change some of that apparently (http://
Automatic Flushing: The Rails 3.1 Plan)

If you drop down to the rack level (ie write this as a rails metal)
you should be able to stream responses - the rack body response can be
any thing that responds to each. and rack will keep calling that each
method until you’re done.

.The docs also say that render :text => lambda { …} allows streaming
but with various conflicting opinions form actual users (I’ve never
tried that). This may also depend on the server (mongrel, thin etc)
you use - it’s no good you streaming data to rack if the next person
down the chain sits on it until is done

Fred

Hi Fred-

What you’re saying makes a lot of sense. As your automatic-flushing-
the-rails-3-1-plan article relates, for most rails interactions it’s
difficult to stream because of all the evaluation that needs to
occur. Larger file downloads really are a special case. Using rails
metal to respond seems logical.

When I get a moment I’ll create a brand new rails app and see if I can
get rails to stream as I’d expect; perhaps there is something in rack
that is preventing the streaming.

In rails 3, the render :text => lambda { … } is definitely broken.

Thanks for the help!

→ Eric

On Oct 11, 2:19pm, Frederick C. [email protected]

On 11 Ott, 20:44, ehansen486 [email protected] wrote:

I occasionally need to stream a large XML data file that represents
key data in a DB. I’m porting over an application from PHP Symfony,
[…]
This particular issue I can certainly work around but it’s
disappointing if it’s true that there’s no way in rails to stream
output to the browser for large pages. And particularly disappointing
if PHP/Symfony can outgun rails for streaming. I’ve been using rails
since 2006 and most requests have fairly small responses so maybe the
answer is to defer to a different technology for streaming larger
files. But it seems like there should be a good solution for
streaming data and flushing the output stream.

I’m in the same boat, Rails 2-3-stable, output.flush is said to be
deprecated and no longer works, but it seems that using render :text
=> proc { |response, output| doesn’t send streamed data at all.
I also tried with send_data without luck.

After some research I thought that the flush would happen after a
output.write but that does not seem the case, at least where I looked.

We have potentially very large ajax requests (3mb) and from a java
server we were able to cut down the action time greatly by
manipulating the response; I’m trying to achieve the same from Rails
but nothing I tried currently works.

Claudio P. wrote in post #949941:
[…]

We have potentially very large ajax requests (3mb)

It sounds like Rails’ streaming needs to improve, but a 3MB Ajax request
is a huge design problem! For performance reasons, it should rarely be
necessary to request more than 100K or so through Ajax.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

ehansen486 wrote in post #949547:

In rails 3, the render :text => lambda { … } is definitely broken.

I suppose then it might not be a bad idea to submit a documentation
patch to either remove or note that this is broken in Rails 3.0.

send_data


Tip: if you want to stream large amounts of on-the-fly generated data to
the browser, then use render :text => proc { … } instead. See
ActionController::Base#render for more information.