Calling an external API as quickly as possible

Swordfish · February 17, 2008, 7:27pm

Hi

I’m a Java developer working with a team that’s moving over to Ruby
and Ruby on Rails - we’re really excited!

We are writing a replacement for a large, java-based, e-commerce
website for a client who is based in the United Kingdom. This will be
the first web site of its kind written in RoR.

One of the things we need to do is to access various external API’s to
help us build each web page. We are required to support a number of
different text-based APIs (XML, key/value pairs etc etc) via HTTP.

Typically, we will call an API to get some data, let’s say a list of
Countries, for example. When we receive the data we build a list of
Country objects. If we simulate the same thing using ActiveRecord
(SELECT * FROM Countries), we find that the task is about 5 times
faster.

I am assuming that the difference is that ActiveRecord creates its
objects in-line, while the data is arriving and the HTTP method does
not start until the entire request is complete.

Can anyone suggest ideas of how we might go about doing this type of
in-line processing while reading the HTTP response. I do not want to
have this kind of low-level code in every controller, so we could
probably do some sort of helper.

Swordfish · February 17, 2008, 9:37pm

On Feb 17, 2008, at 10:26 AM, Swordfish wrote:

Can anyone suggest ideas of how we might go about doing this type of
in-line processing while reading the HTTP response. I do not want to
have this kind of low-level code in every controller, so we could
probably do some sort of helper.

I would recommend caching the results. Enduring HTTP request/response
cycles for infrequently changed data such as lists of countries is
very time-consuming. Another strategy you might examine is dRb. In
your worker, hit the Web services periodically for a refresh and from
your Rails application make the request to the local dRb instance. If
dRb is too big a hammer, then just run a cron job that periodically
updates a memcache server.

To extract the code from your controllers, use a before_filter in app/
controllers/application.rb and the request to construct your objects
will be made before each action is begun.

Still, it seems like a very expensive way to retrieve what seems to be
somewhat static data.

Hope this helps.

Swordfish · February 17, 2008, 9:50pm

Many thanks for the response. I only used a country list as an
example. In fact, the API calls will retrieve dynamic data. All I
want to do is to start reading the stream of data that comes down the
socket before the stream has finished.

Swordfish · February 17, 2008, 10:18pm

Have you considered serving each section of the page asynchronously as
an ajax request? Basically, load the page with a bunch of empty divs.
Then with your Prototype’s dom:loaded event, issue a bunch of Ajax
requests to fill these divs in. What happens is that each is populated
by a different HTTP request, allowing for a perceived performance
improvement because the page loads and content populates as it is
available.

Swordfish · February 17, 2008, 10:32pm

On 17 Feb 2008, at 22:17, s.ross wrote:

Have you considered serving each section of the page asynchronously as
an ajax request? Basically, load the page with a bunch of empty divs.
Then with your Prototype’s dom:loaded event, issue a bunch of Ajax
requests to fill these divs in. What happens is that each is populated
by a different HTTP request, allowing for a perceived performance
improvement because the page loads and content populates as it is
available.

Do know that although it is perceived as loading faster, it’ll
probably be slower and put a bigger strain on your server and database.

Three factors come into play:

Page deflating by apache will be less, since there’s less data to
work with and compress (the less data, the less efficient the
compression usually is, especially with text)
browser limiting the number of simultaneous connections to the same
domain, that’s why Rails 2 has the asset host feature nowadays, to
work around that issue
the number of extra database hits you’ll make:
example:
loading page in one request: fetch article with id specified in url
and eagerly load the associated comments, pictures: 1 database hit, 1
rendering cycle
loading page using several requests: fetch article, render page,
fetch article and comments, render comments section, fetch article
and pictures, render picture section: 3 database hits, 3 rendering
cycles

The last two can be worked around and optimized (balancing over
several virtual hostnames, caching of asynchronous pages), but one
may start wondering if it’s worth going through all this trouble.

On top of that, browsers that have javascript disabled will get a
largely empty page.

Best regards

Peter De Berdt

Swordfish · February 17, 2008, 10:46pm

Swordfish wrote:

help us build each web page. We are required to support a number of
different text-based APIs (XML, key/value pairs etc etc) via HTTP.

If you do XML parsing with Ruby it can become slow enough to be
noticeable in an interactive context. I’ve seen 100ms delay for in
memory parsing of small and simple contents (3 or four levels of
elements, probably less than 100 elements) recently with the built in
parser and I was really surprised.

I’m not sure how you can go around that. Manual parsing with regexs
might be at least one order of magnitude faster but the code can become
a mess if the XML is complex.

In your position I’d bench your XML parsing on actual data in memory to
avoid any network-related latency problem to find out if this is one of
the things that slow you down.

I’ve seen some XPath parsing being really slow on Ruby compiled with
pthread support too. All my systems now have Ruby compiled without
pthread support for a minor global speed boost so I can’t test if the
problem was really in the Xpath queries or in the basic XML parsing but
leaving pthread out of the way gave me 100x the perf on the Ruby script
I have in mind.

Lionel

Swordfish · February 18, 2008, 9:29am

Very interesting use of filters. This is exactly the kind of approach
I was thinking about. I will have a go.

Many thanks.

Swordfish · February 18, 2008, 1:48am

On Feb 17, 2008, at 1:32 PM, Peter De Berdt wrote:

improvement because the page loads and content populates as it is

browser limiting the number of simultaneous connections to the
cycles
Peter De Berdt
You raise good points. However, unless I completely missed Swordfish’s
question, the goal was to have the web service requests fulfilled
quickly and probably asynchronously. With Rails, options are limited
WRT any asynchronous execution. Seems one architecture that stays
inside the Rails framework would be something along the lines of:

before_filter :do_ws_fetches
after_filter :wait_for_fetches

…

def do_ws_fetches
@ws = []
@ws << Thread do
# code to do first ws fetch
end
@ws << Thread do
# code to do second ws fetch
end

… and so on …

don’t join so everything else can go ahead pretty much as

planned, except render.
end

def fetches_still_happening?
@ws.detect{|d| d.alive?}
end

def wait_for_fetches

subject to some give-up / timeout criteria

0.upto(timeout_in_milliseconds / 10)
break if !fetches_still_happening?
sleep(.01)
end
end

Of course, this whole idea leads us down the Rails and concurrency
path, but if all that being done is foraging for data from remote
sources, some careful programming can reduce the risk of stepping on
data.

WDYT?