Performance suggestions or best practices ideas?

mlopke · May 3, 2006, 12:49am

Any suggestions on applications that involve alot of calcualtions on a
fairly large data set?

My app uses a set of raw data ~5k, applies some default/override rules
on the raw data and does some calculations on the data in combination
with a list of assumptions. A ranked list along with detailed metrics
is generated. The end user can manipulate some of the rules and
assumptions to generate different metrics for comparison.

My current approach maintains the original data set, default/overrides
and assumptions as seperate models and does all the ranking calculations
on the fly. The performance is a bit sluggish if I’m looking at
more that 400 objects and this is with only one user.

Any suggestions on caching options to explore? Should I consider using
a seperate model for storing the results? This would make the ranking
and comparisons quicker, but I’m concerned about the overhead of having
to write large amount of temporary data to the database.

Thanks,
Mike

mlopke · May 3, 2006, 8:36pm

The app looks at potential investments. Raw data is 5K rows of mixed
info. ( numbers, categories, dates, etc. )

The calculations are on par with Excel. I don’t think there is alot of
room for improvement on the calculation algorithm. I was looking for a
way to avoid the expense of recalculating the same set of metrics. The
end user goes through these steps:

Set initial query paramters ( $to invest, investment type, etc. )
App gets raw data, combines with overrides/defaults/assumptions &
does metrics calculation to generate ranked results
User looks at ranked results
User looks at details for particular investment
User returns to ranked results to investigate other opportunities
…

I wanted to cache the ranked results rather than have to regenerate it
each time I return to the list. Is there a way to have a persistant
object in memory or is my only option to write the data to a table. A
new ranked results would be generated for each new initial query mod or
a change in the overrides/defaults/assumptions.

Thanks,
Mike

mlopke · May 3, 2006, 7:06pm

Could you do similar calculations in, say, Microsoft Excel? How does the
performance of your application compare with that of Excel? If Excel is
a lot faster, I’d say go hunting for tuning opportunities. But if
they’re comparable, and scale roughly the same way with increasing
problem size, you’re probably not going to be able to squeeze much
performance out without a better algorithm.

What sort of calculations are they? What is the 5K … numbers, rows of
a database table …??

Mike L. wrote:

and assumptions as seperate models and does all the ranking calculations

–
M. Edward (Ed) Borasky

mlopke · May 3, 2006, 9:48pm

On 5/3/06, Mike L. [email protected] wrote:

Is there a way to have a persistant
object in memory or is my only option to write the data to a table.

You could store an array of the ranking scores and id’s of the
investment options in the session[] object. That way they would
persist between pages.

Isaac

mlopke · May 3, 2006, 11:37pm

On May 3, 2006, at 1:22 PM, Mike L. wrote:

tried storing this in the session but I don’t really know.

Mike

For something like this you might want to look at memcached and Eric

Hodels CachedModel extension.

Or you can setup a small drb server to hold this info and stored and

retrieve the results.

Cheers-
-Ezra

mlopke · May 3, 2006, 10:22pm

Isaac R. wrote:

On 5/3/06, Mike L. [email protected] wrote:

Is there a way to have a persistant
object in memory or is my only option to write the data to a table.

You could store an array of the ranking scores and id’s of the
investment options in the session[] object. That way they would
persist between pages.

Isaac

Thanks.

I’ve considered using the session[] object. This, however, brings up an
interesting question. What is the practical limit on what can be stored
in it? Lets say I have 500 investments I’m considering and each has 20
metrics I’ve calculated. My hunch is that it would cause problems if I
tried storing this in the session but I don’t really know.

Mike

mlopke · May 5, 2006, 11:54pm

For something like this you might want to look at memcached and Eric
Hodels CachedModel extension.

http://dev.robotcoop.com/Libraries/cached_model/files/README.html

Or you can setup a small drb server to hold this info and stored and
retrieve the results.

Cheers-
-Ezra

Thanks. I’ll give it a look.