Asynchronous network access with Rack?

I’ve read that “threading is considered harmful” for Ruby web apps.
Well, I’m writing a Sinatra app which will build a page based on the
responses of several servers (Net::HTTP.get). I want to do these .gets
in parallel, as doing them synchronously would obviously mean the users
would wait for a long time.

Would it be “considered harmful” to do:

resp_a, resp_b, resp_c = nil
thread_a = Thread.new { resp_a = Net::HTTP.get site_a }
thread_b = Thread.new { resp_b = Net::HTTP.get site_b }
thread_c = Thread.new { resp_c = Net::HTTP.get site_c }
thread_a.join
thread_b.join
thread_c.join

Is there any possible harm that could come from this? Can threading
interfere with Rack in some way? I haven’t done much previous
development of threaded apps, so I would appreciate any tips.

I’ve used threading with getting several web pages for a long time and
I’ve never had any problem so long as you catch errors if a specifc web
page can’t be obtained.
Tom R.

On Mar 8, 2010, at 3:13 PM, Tom R. wrote:

Would it be “considered harmful” to do:
interfere with Rack in some way? I haven’t done much previous
development of threaded apps, so I would appreciate any tips.

IMHO you would be better served using the ‘thin’ rack comat web server
and using its async mode along with EM::HTTP::Request. This way you
could use event driven style to hve zero threads and basically pause the
clients request connection while you make async calls to all the other
web services, once they all return then you fire the async callback for
thin to resume the clients connection and return the results.

Doing it this way will require a bit more mental twisting to get all
the async stuff correct but it will be far more scalable and will serve
you much better in the end.

Cheers-

Ezra Z.
[email protected]

On Mon, Mar 8, 2010 at 10:50 PM, Nick B. [email protected] wrote:

I’ve read that “threading is considered harmful” for Ruby web apps.
Well, I’m writing a Sinatra app which will build a page based on the
responses of several servers (Net::HTTP.get). I want to do these .gets
in parallel, as doing them synchronously would obviously mean the users
would wait for a long time.

There are some historical reasons behind threading == harmful (defaults
for
Rails,
GIL & native gems, and a general lack of robustness in older Ruby thread
implementations).

Is there any possible harm that could come from this? Can threading
interfere with Rack in some way? I haven’t done much previous
development of threaded apps, so I would appreciate any tips.

I believe Sinatra/Rack is thread safe, so you should be fine on that
count.

Whats more important is that this model isn’t exactly a good
architecture.
You are spawning a lot of threads per request and you have no real
external
oversight into how they are working. You can’t send back your response
until you have received all your outbound responses and you are
particularly
vulnerable to timeouts - in particular the client browser can timeout
your request, while you are still waiting on responses to outbound
connections.

You see a lot of solutions that use process level concurrency
(BackgroundRb,
DelayedJob etc) but most web solutions that aggregate content from
multiple
sites (i.e. mashups) do it all in the browser, with some cross site
scripting
& javascript.

Technically I dont see too many issues with the multi-threaded approach
you
propose for smaller requests, but you will want to set an aggressive
timeout
on the outbound requests.

Tom R.:

I’ve never had any problem…

Awesome! Good to hear :slight_smile:

Ezra Z.:

you would be better served using the ‘thin’ rack comat web
server and using its async mode along with EM::HTTP::Request.

Thanks. I’ve been using Apache+Passenger because that’s what I know, but
I will investigate Thin if it is indeed more scalable. Are you referring
to RAM usage when you say it’s more scalable?

Richard C.:

javascript … you will want to set an aggressive timeout

This must happen server-side. But you’re right about the timeouts. And
some searching has revealed Timeout::timeout() to me! It would appear
that:

resp_a = nil
thread_a = Thread.new{ Timeout::timeout(4){ resp_a = Net::HTTP.get
site_a }}

thread_a.join

will do what I need, so long as I catch exceptions, too. And again, I’m
still open to other suggestions if anyone else has any!

Or this slightly shorter version:

thread_a = Thread.new { Net::HTTP.get site_a }
thread_b = Thread.new { Net::HTTP.get site_b }
thread_c = Thread.new { Net::HTTP.get site_c }
val1 = thread_a.value
val2 = thread_b.value
val3 = thread_c.value