Forum: Ruby on Rails So many users, so little time - looking for a better way

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
8dad1ec4d769734583f45fbbee5cd009?d=identicon&s=25 Jeff Pritchard (jeffpritchard)
on 2008-10-21 04:33
OK, time to come clean.  Right now my app is doing something really
cheesy.  The first step is alway admitting that you have a problem,
right?

So here's the deal.  One of the features of my app is that, based on
some user settings, it is supposed to send out "reminder" emails at
certain times of the day.  Each user has their own time settings for
these reminders.

Right now I have a background task that, once every 15 mins, looks at
every gosh darn user and decides whether or not to send them a reminder.
Once every 15 mins I hit every single user row in the database.
brrrrrrr.  Yeah, I know, gives me the willies too.

So is there some sort of a "best practice" established for how to do
some caching/intelligent-scheduling to overcome this horrendous mis-use
of the database?  Is there some sort of a plugin that would allow me to
sort the database by that time-of-day column and intelligently find just
those within a certain range of times?

I'm very nearly an SQL virgin, so if this is painfully simple and I'm
just a moron, be gentle.

thanks,
jp
E3ba60e3dcb813f8abcd7732350e74cf?d=identicon&s=25 Phillip Koebbe (pkoebbe)
on 2008-10-21 05:10
You should be able to construct a query (or use find with :conditions)
and let the database do the work. Post the relative tables, fields, and
classes and I'm sure you'll get many offers of help.

Peace.
Bee69cfed999cd13e3bff73d472a39ee?d=identicon&s=25 Hassan Schroeder (Guest)
on 2008-10-21 06:09
(Received via mailing list)
On Mon, Oct 20, 2008 at 7:33 PM, Jeff Pritchard
<rails-mailing-list@andreas-s.net> wrote:

> Right now I have a background task that, once every 15 mins, looks at
> every gosh darn user and decides whether or not to send them a reminder.
> Once every 15 mins I hit every single user row in the database.
> brrrrrrr.  Yeah, I know, gives me the willies too.

So, gosh, maybe you're running your server on an ancient Intel 286
box on Craplix "Muddy Mammoth" -- 15 minutes? so even that box
executed, oh, eleventeen gazillion instructions in that time period?

Plus /yet another/ SQL query. Brutal. I'll bet your phone is ringing off
the wall with customer complaints.

When your traffic gets twitteresque, please call back. Operators are
standing by.

/* see also  "premature optimization" */    :-)
--
Hassan Schroeder ------------------------ hassan.schroeder@gmail.com
Af2ce6689213fdb78913a9662b18da6b?d=identicon&s=25 Rick (Guest)
on 2008-10-21 06:44
(Received via mailing list)
Build yourself a carousel and with buckets some time (15 min?) apart.
Install each user's task in it's bucket and rotate the carousel with
the clock.  As a bucket appears, service any tasks it contains and
move them to the appropriate future bucket.

This is fairly easy to do with a doubly linked list - I'm not to sure
yet how it maps into the MVC world view.

On Oct 20, 4:33 pm, Jeff Pritchard <rails-mailing-l...@andreas-s.net>
41cc2c53544016a86472343ab3fdc0ff?d=identicon&s=25 Simon Macneall (Guest)
on 2008-10-21 06:52
(Received via mailing list)
Hi Jeff,

Depending on how you have the data stored, yes you should be able to
something in SQL to make the search better.

Does each user just have one time of day assigned to them? If so, then
you
can just put a condition on the find query.

Something like user.reminder_time > start_range and user.reminder_time <
end_range and let the DB do the grunt work for you.

If the user notification is something like 'every 4 hours' then you are
may have to process each user individually to work out whether the
current
time fits that range. Or have a 'next_notification' col that you update
each time you send a notification to that user. Then do the db query on
the next_notification field.

Cheers
Simon

On Tue, 21 Oct 2008 10:33:11 +0800, Jeff Pritchard
76d275c6ae76fc8023b1c99cbbb71881?d=identicon&s=25 Bsytlz (Guest)
on 2008-10-21 07:09
(Received via mailing list)
>
> When your traffic gets twitteresque, please call back. Operators are
> standing by.
>
> /* see also  "premature optimization" */    :-)

There is no so such thing as premature optimization, maybe mis-
prioritized optimization (may be more impactful things to address
first), but if you wait until you "have to" optimize to do it...it is
too late.  Especially is you cansider tha optimization may require
rethinking table structure and/or other fundamental  architecture.
There is however a such thing as a premature jerky, smart ass response
like the one above..

That said, a couple of questions first:
 - How is your query structured now? find(:all, ?????)?
 - What restrictions do you have on the reminder period if any? Can
they chose any time of the day, any frequency? Or is it restricted ot
"every 15 minutes", "every hour" etc?
 - How many users do you have now? How many do you expect need to
scale too?
 - How long are the queries taking now? too long?

This are questions that you should answer/aks when making a decision
on the need and approach for optimization. For instance restricting
the notification times/intervals to a few fixed values lends itself to
multiple improvements, at a cahcing/queing level, as well as at an SQL
query level.

A few VERY general ideas on how to approach this are:

1) SQL optimization, use direct SQL connections and optimize the query
as appropriate for your DB type using constructs such as SORT, GROUP
BY, creating indexes,etc.
2) Do the sort/search in "object space" using a partitioning approach.
3) Fake a crude "bucketing" system using additional look-up tables, or
more specifically has_many relationships to parition based on  fixed
intervals.  for example an hours_of_day model that has_many_and
belongs_to_many :reminders, with one row for each of the 24 hours of
the day.  You could then relate each reminder to on one of the rows.
Your background process then ask for all records that are associated
with the row representing the current time....convoluted I know.  The
performance banifit may be questionable because of the JOINs required
for a habtm look-up, but it should be less brute force then a find
all.

I have quite a few others ideas, but will leave it at this....
Bee69cfed999cd13e3bff73d472a39ee?d=identicon&s=25 Hassan Schroeder (Guest)
on 2008-10-21 07:28
(Received via mailing list)
On Mon, Oct 20, 2008 at 10:08 PM, Bsytlz <brianvdawson@gmail.com> wrote:

> There is no so such thing as premature optimization, maybe mis-
> prioritized optimization (may be more impactful things to address
> first), but if you wait until you "have to" optimize to do it...it is
> too late.

And the OP presented zero evidence of an actual problem, or hint
of any potential problem. Running one SQL query every 15 minutes
seems, uh, not too much to worry about. IMO.

> There is however a such thing as a premature jerky, smart ass response
> like the one above..

Thanks for the evaluation. And your qualifications on constructing
and running high-traffic web sites?  Just curious.

--
Hassan Schroeder ------------------------ hassan.schroeder@gmail.com
41cc2c53544016a86472343ab3fdc0ff?d=identicon&s=25 Simon Macneall (Guest)
on 2008-10-21 07:39
(Received via mailing list)
Did you read the original post?

It's not just one SQL, it is one SQL and then hit every user object
returned to see if it is time to send the email.
Anyone can see that eventually this is going to cause a problem, and I
don't know about you, but I would rather find that sort of thing now,
before my customers notice a problem than wait until it actually causes
one.

Oh, and you don't actually need a qualification to be able to pick a
premature jerky response, just some experience on the internet.

On Tue, 21 Oct 2008 13:28:22 +0800, Hassan Schroeder
8dad1ec4d769734583f45fbbee5cd009?d=identicon&s=25 Jeff Pritchard (jeffpritchard)
on 2008-10-21 09:55
There are up to two per day for each user, but they can be handled
separately.  Yes, currently using the disgusting "find(:all)" approach.

No, with just 100 users so far there is no sign of a problem.  I just
consider the current method to be very inelegant and would rather do it
in a more intelligent way.  Even if, once we get to a few thousand
users, it still isn't causing a "problem", it will be using significant
resources, thus reducing the responsiveness of the app.  That said, some
up-front profiling of the current method might convince me that I'm
wasting my time.  I'll do that first.  Point taken (even if made with
more than a touch of venom)

So is it generally the case that having the db hunt down the few records
we're after each time will execute substantially faster than returning
all of the records and doing the comparison "outside the database"?  As
a novice db guy, the two approaches sound roughly similar, since both
cases involve doing essentially the same comparison on the same number
of records.  Perhaps there is a lot of overhead involved in returning
records to the app.

I would think that some method of caching the info such that it didn't
have to actually go to the db every time would make the biggest
difference in execution time.  Unfortunately, such a solution would also
be more complex than just using a better query.

thanks for all the responses.

best,
jp

Bsytlz wrote:
>>
>> When your traffic gets twitteresque, please call back. Operators are
>> standing by.
>>
>> /* see also �"premature optimization" */ � �:-)
>
> There is no so such thing as premature optimization, maybe mis-
> prioritized optimization (may be more impactful things to address
> first), but if you wait until you "have to" optimize to do it...it is
> too late.  Especially is you cansider tha optimization may require
> rethinking table structure and/or other fundamental  architecture.
> There is however a such thing as a premature jerky, smart ass response
> like the one above..
>
> That said, a couple of questions first:
>  - How is your query structured now? find(:all, ?????)?
>  - What restrictions do you have on the reminder period if any? Can
> they chose any time of the day, any frequency? Or is it restricted ot
> "every 15 minutes", "every hour" etc?
>  - How many users do you have now? How many do you expect need to
> scale too?
>  - How long are the queries taking now? too long?
>
> This are questions that you should answer/aks when making a decision
> on the need and approach for optimization. For instance restricting
> the notification times/intervals to a few fixed values lends itself to
> multiple improvements, at a cahcing/queing level, as well as at an SQL
> query level.
>
> A few VERY general ideas on how to approach this are:
>
> 1) SQL optimization, use direct SQL connections and optimize the query
> as appropriate for your DB type using constructs such as SORT, GROUP
> BY, creating indexes,etc.
> 2) Do the sort/search in "object space" using a partitioning approach.
> 3) Fake a crude "bucketing" system using additional look-up tables, or
> more specifically has_many relationships to parition based on  fixed
> intervals.  for example an hours_of_day model that has_many_and
> belongs_to_many :reminders, with one row for each of the 24 hours of
> the day.  You could then relate each reminder to on one of the rows.
> Your background process then ask for all records that are associated
> with the row representing the current time....convoluted I know.  The
> performance banifit may be questionable because of the JOINs required
> for a habtm look-up, but it should be less brute force then a find
> all.
>
> I have quite a few others ideas, but will leave it at this....
E3ba60e3dcb813f8abcd7732350e74cf?d=identicon&s=25 Phillip Koebbe (pkoebbe)
on 2008-10-21 12:44
Jeff Pritchard wrote:
> So is it generally the case that having the db hunt down the few records
> we're after each time will execute substantially faster than returning
> all of the records and doing the comparison "outside the database"?  As
> a novice db guy, the two approaches sound roughly similar, since both
> cases involve doing essentially the same comparison on the same number
> of records.  Perhaps there is a lot of overhead involved in returning
> records to the app.

Yes, the database is going to be faster at figuring out which records
need to be returned, ASSUMING proper indexing, table structure, and
generally sound approach to querying. If you don't use indexes, your
data is not structured well, and you're querying on half the fields in
the database, it may well be faster to just pull it all back into Rails
and work through it.

As to why this is so, two major things come to mind: marshaling the data
from the db server to the application and object creation. Suppose you
have your thousand users in your system, but only 20 of them have
reminders to send. If the database finds them for you, it returns just
twenty and your application creates just twenty objects (and you have no
more decisions to make). If you pull all one thousand back first, the
database has to send more data back to you and then you're going to
create 1000 objects in your application. If you need related data for
the decision making, you're going to be making more than 1000 objects,
and possibly more database queries (unless you've used eager loading,
which increases the amount of data sent back to the app).

Whatever creates the data should be the fastest at ripping through it.
The database engine is optimized for data storage and retrieval. When
used properly, nothing is going to be faster than the db engine's native
access to the data.

Peace.,
D3fc5887a2f39f2e0c8989d39ce5e6f9?d=identicon&s=25 Bharat Ruparel (bruparel)
on 2008-10-21 17:06
(Received via mailing list)
Jeff,
A few pointers since I am maintaining and enhancing a rather large
application with similar functionality.  It seems to work well (most
of the times):

I am firing an SQL query every minute to check for reminders instead
of 15 minutes so the actual query is not an issue.  Its construction
is:  you may want to create a separate table called "reminders" (or
something similar) which stores reminders records for only those users
who want them.  That way if majority of your users do not use this
functionality, you are not incurring an unnecessary overhead.  This
table will have an id as every rails table does and will have an index
on id.  It should also have the foreign keys, at least for the users
(user_id), make sure that it is indexed as well since you will be
joining reminders and users table based on this foreign key.  You can
have other foreign keys as well, for example, event_id if the
reminders are for particular events.  That is how one to many
relationship works.  You may want to refer to any of the online
tutorials available on basic database design which is a prerequisite
to making good rails applications.

The scheduling of these recurring tasks is also important.  You want
to use a cron style scheduler that does not load the rails environment
every time the task is run.  If you use the unix/linux cron to
schedule a rails runner, it will load rails environment everytime you
call the task which is a very slow and demanding process.  My
application uses Backgroundrb.  I am looking for alternatives which
are simpler, easier to debug, and more reliable.

The point of all this is that you are not taking on a trivial task,
rather a substantial one if you want to better architect your
application so you may want to think through it carefully.  Having
said that, you are thinking of the right things.  May be you want to
experiment with a prototype which is not in your critical path?

Hope this helps.

Regards,

Bharat
Bee69cfed999cd13e3bff73d472a39ee?d=identicon&s=25 Hassan Schroeder (Guest)
on 2008-10-21 20:58
(Received via mailing list)
On Tue, Oct 21, 2008 at 12:55 AM, Jeff Pritchard
<rails-mailing-list@andreas-s.net>

> There are up to two per day for each user, but they can be handled
> separately.  Yes, currently using the disgusting "find(:all)" approach.

Since you mentioned using a background task for this, I assumed a
straight SQL query. My bad.

> No, with just 100 users so far there is no sign of a problem.  I just
> consider the current method to be very inelegant and would rather do it
> in a more intelligent way.  Even if, once we get to a few thousand
> users, it still isn't causing a "problem", it will be using significant
> resources, thus reducing the responsiveness of the app.

My point, which I attempted -- apparently unsuccessfully! -- to make
humorously, is that running a single SQL qeuery against a properly
indexed database, with even a row count in the millions, is trivial.
Really. And if it's being done from a background app, should have
little or no effect on your Rails app's performance from the user's
perspective anyway.

> So is it generally the case that having the db hunt down the few records
> we're after each time will execute substantially faster than returning
> all of the records and doing the comparison "outside the database"?

Absolutely -- that's what an RDBMS is designed for.

> I would think that some method of caching the info such that it didn't
> have to actually go to the db every time would make the biggest
> difference in execution time.  Unfortunately, such a solution would also
> be more complex than just using a better query.

The execution time to retrieve the list and build the queue is trivial
compared to the amount of time to actually process the mail queue,
particularly as the user count grows.

Still, for the sake of argument -- build your "mailing list" in memory,
and update it when a change event occurs in the app (user submits
a changed profile, new user joins, etc.). Though you'll still incur at
least one update/insert into the DB, which is more expensive than a
select.

One last venom-free comment regarding premature optimization --
if you "fix" something today that /isn't/ broken, you may be making it
harder to fix a /real/ bottleneck that crops up tomorrow.

Just sayin' ...

Best of luck,
--
Hassan Schroeder ------------------------ hassan.schroeder@gmail.com
87e41d0d468ad56a3b07d9a6482fd6d5?d=identicon&s=25 Hemant Kumar (gnufied)
on 2008-10-21 20:58
(Received via mailing list)
On Tue, Oct 21, 2008 at 8:36 PM, Bharat <bcruparel@yahoo.com> wrote:
> who want them.  That way if majority of your users do not use this
>
> The scheduling of these recurring tasks is also important.  You want
> to use a cron style scheduler that does not load the rails environment
> every time the task is run.  If you use the unix/linux cron to
> schedule a rails runner, it will load rails environment everytime you
> call the task which is a very slow and demanding process.  My
> application uses Backgroundrb.  I am looking for alternatives which
> are simpler, easier to debug, and more reliable.

Can you file a bug report with a reliable way to reproduce them? I am
sure, we can iron it out.
D3fc5887a2f39f2e0c8989d39ce5e6f9?d=identicon&s=25 Bharat Ruparel (bruparel)
on 2008-10-21 21:13
Hemant Kumar wrote:

> Can you file a bug report with a reliable way to reproduce them? I am
> sure, we can iron it out.

OK Hemant, I will try to do it as constructively as possible.  You have
been most helpful.  Just the thought of debugging Backgroundrb requires
a lot of determination and will-power.  Is there a simple guide to debug
a mal-functioning Backgroundrb worker anywhere?  I searched long and
hard on the web and did not come up with much.  If you could publish a
series of tutorials on debugging that would be most helpful.  The
Advanced Rails Recipes book recipe is of no use now since the API has
changed.

I have done everything by the book and my app still produces obscure
debuggin messages in backgroundrb_debug.log files though things seem to
be working.  Some decent troubleshooting guides is what the community
needs sorely.

Regards,

Bharat
04952a6ee948f345e9c3727850d09a1b?d=identicon&s=25 Dejan Dimic (Guest)
on 2008-10-21 21:44
(Received via mailing list)
First of all this is not an easy task.

There are two main part to for fill this task: gathering of users that
needs to remind and sending the actual reminder.

Depending of your application data structure the selection of users
should be done by the used database engine as they are build for that
kind of tasks.
You should build appropriate SQL statement, optimize indexes for that
query and try to gain as much performance of it as possible.

Sending the reminders are the different issue and I will probably use
set of daemons to perform it.

However, there are some additional issues you had to address.

Like:
 - What if for any reason the reminders are not send in appropriate
time? Should you ignore that unsend reminder or not.
 - How would you handle increase number of reminders if your system
can not send them in one circle time? How will you detect such
situation?
 - If you use more threads to perform sending the reminder how will
they share the same cache of selected users.

This is one of interesting problems that arise from a seams to be easy
new feature like sending the reminder to the user.

If we have more information perhaps our observations could be more
concrete.
87e41d0d468ad56a3b07d9a6482fd6d5?d=identicon&s=25 Hemant Kumar (gnufied)
on 2008-10-21 21:49
(Received via mailing list)
On Wed, Oct 22, 2008 at 12:43 AM, Bharat Ruparel
<rails-mailing-list@andreas-s.net> wrote:
> hard on the web and did not come up with much.  If you could publish a
> series of tutorials on debugging that would be most helpful.  The
> Advanced Rails Recipes book recipe is of no use now since the API has
> changed.
>
> I have done everything by the book and my app still produces obscure
> debuggin messages in backgroundrb_debug.log files though things seem to
> be working.  Some decent troubleshooting guides is what the community
> needs sorely.
>

Okay, I will see, what can be done about this. In the meanwhile, you
can upgrade to 1.1 version where debugging is more sanitized.
Fac81a9d95c81d817a2debb6f717f3ac?d=identicon&s=25 Sazima (Guest)
on 2008-10-21 21:52
(Received via mailing list)
Jeff,

You could use the same approach as Ryan in his last backgournd job
screencast: http://railscasts.com/episodes/129-custom-daemon.

Or take a look at some scheduler for the task (already passing the
specific user information).

I spent some time researching background tasks, my notes might help
you:

                BACKGROUND PROCESSES              --------------------

http://wiki.rubyonrails.org/rails/pages/HowToRunBa...

== Asynchronous Tasks

Launched from Rails request/reponse cycle + optional Message Queueing

- Options
  . ar_mailer (only for emails)  => GOOD (only email)
  . starling + workling      => EXCELLENT
  . Ap4r
  . Spawn / background      => SIMPLE
  . ActiveMessaging
  . BackgroundFu
  . WorkerQueue
  . Rake in Background      => SIMPLES
    (http://railscasts.com/episodes/127-rake-in-background)

== Recurring Jobs (cron-style)

Scheduled or permanently running in the background

- Options
  . daemon_generator (rails_cron)  => SIMPLE
    http://railscasts.com/episodes/129-custom-daemon
  . BackgrounDRb
  . rufus-scheduler        => GOOD
  . Taskr
  . UNIX Cron and Runner (simple, but consumes resources and you can't
controll well)


Cheers, Sazima

On Oct 21, 12:33 am, Jeff Pritchard <rails-mailing-l...@andreas-s.net>
This topic is locked and can not be replied to.