Architecture question

PierreW · January 21, 2010, 11:24pm

Hi guys,

I need to make an “architecture” choice for my app. I believe it
corresponds to a fairly common case, yet I can’t find much input
online. I was hoping you could give me some recommendations or point
me in the right direction.

The app is a kind of mashup, so it does two things:

handle users requests. I call it the “live” part.
at regular intervals, build cache information, then merge it with
the main info. It is my “background” part.

The background part will run once a day every day, and it takes a lot
of time and resources (it connects to many web services).

So here are my initial thoughts:

create a background process for the “background” part
have the main process handle the “live” part.

But if I do this, a few things are not obvious to me, especially since
it involves threads:

1- in the background process, I will have to run threads (indeed I
need to run the requests to the web services in parallel, otherwise it
will take too long). Is that doable or is it “dangerous”?
2- can the cache tables be in the same database as the main one, or
shall I create a totally different DB?
3- when the merging happens, will it “work” (since both the background
process and the main one will want to deal with the same database)?

Please do let me know if I am getting the whole thing wrong (or if I
see problems where they don’t exist). I keep reading that I need to be
very careful with threads/processes and connections to the DB, but I
am not quite sure where is the limit and what are very bad design
choices.

Thanks a lot!
Pierre

PierreW · January 21, 2010, 11:55pm

Quoting PierreW [email protected]:

the main info. It is my “background” part.

There are a number of Ruby/Rails background processors
(Workling/Starling,
BackgroundRb, etc.). Most are multi-threaded or multi-process. Have a
cronjob (or use any built-in cron type scheduling) to dump a bunch of
Web
service requests into the queue and let the background processor handle
the
multi-tasking.

Any decent database server can handle requests from multiple processes.
You
may need to use some kind of locking (table locking, row locking,
transactions) to handle overlapping read/update/write requests. I know
MySQL
and Postgres can handle this. Probably MSSQL (I don’t work w/ Windows).
I
don’t know enough about sqlite3 to say.

Use any of the Rails servers (Mongrel, Passenger, Apache, Nginx, …)
for the
live part. Start with one of the debug/development friendly,
single-threaded
servers (Webrick and others). Deploy on a heavier duty server that can
handle
the expected load, or at least the load until your idea proves itself,
or
disproves itself with real users.

There are a number of solutions of varying scale that have relatively
light
switching costs. Start small and friendly and then swap pieces of the
solution as the load increases.

And don’t chase the latest and greatest, “best” technology at the
expense of
developing your idea.

HTH,
Jeffrey