Large binary trees & stack level too deep

Hi All

This is my first post to the list, although I’ve been an avid reader
for over a year now. I’ll save everyone the usual Ruby is great and
rocks my world (and thanks Matz) lines since it is self-explanatory.

I’m counting on the experience of the community to shove me in the
right direction with my issue, but not with explicit help. Baptism by
fire, the best way to really learn.

I’m writing a postfix log parser, based on the pflog2mysql library in
the RAA, but without the MySQL storage. We need to analyze mail
traffic through our various mail relays and then bill per usage. The
parsing works fine, the analysis is the tricky part, so herewith my
story.

For each message encountered in the logs I build a Message object and
populate it with the data from the logs as we go along, having a
single object that represent the entire life cycle of the message as
it traversed the server processes. The Message objects are stored in
an aptly named MessageList class, which is a wrapper around a very
simple balanced binary tree.

I have two different tree classes at my disposal, one specializing in
storing messages and referencing them on their queue_id’s, and a
second more dynamic tree that I use to sort the messages based on
various criteria. Find the top messages based on size, number of
recipients etc…

When ever I perform a lookup in the MessageList class, it repopulates
the dynamic tree from the data inside the message tree, passing the
correct attributes for positioning in the tree and then performing the
lookups later. This process works very well and is extremely quick
(3500 lines sample log file parsed and sorted on two different
criteria in less than a second), however, on realistic data sets the
dynamic tree croaks with stack level too deep exceptions, and the
message tree doesn’t. I can only assume the message tree is much
broader than the dynamic tree.

This raises the question, is a binary tree right for sorting such high
volumes of information? I’m not really on par with the various
algorithms for performing tasks like this (web background). If someone
has any pointers for using Ruby to sift through big collections of
objects, and even quickly search the tree (or alternate data
structure) it would be dearly appreciated. I read another post in the
archives where a writer clarified the differences between a B-Tree and
normal binary tree, would the B-Tree yield better handling of so much
data?

I plan to release the final product as FOSS, so if anyone wants a
glance at severe pre-alpha proof of concept code, just shout. Please
refrain from giving explicit code examples, this is a personal
challenge for myself and I’m looking for guidance.

Thanks in advance.

Kenneth K.
[email protected]

Folding@home stats
http://fah-web.stanford.edu/cgi-bin/main.py?qtype=userpage&username=kenneth.kalmer

Kenneth K. wrote:

I’m writing a postfix log parser, based on the pflog2mysql library in
the RAA, but without the MySQL storage. We need to analyze mail
traffic through our various mail relays and then bill per usage. The
parsing works fine, the analysis is the tricky part, so herewith my
story.

This raises the question, is a binary tree right for sorting such high
volumes of information?

I plan to release the final product as FOSS, so if anyone wants a
Please refrain from giving explicit code examples, this is a personal
challenge for myself and I’m looking for guidance.

Well, if a binary tree gets too large and it is being called
recursively, then you really do need to make sure that you do not blow
your stack. Thinking back to old school ways of handling such things,
this is more of a coding approach rather than Ruby specific advice here,
you might consider using databases. If you set it up so that each
branch references the ways that you can go on the tree, you can then
walk the tree iteratively rather than recursively.

Just a thought.

On 7/23/07, Lloyd L. [email protected] wrote:

Well, if a binary tree gets too large and it is being called
recursively, then you really do need to make sure that you do not blow
your stack. Thinking back to old school ways of handling such things,
this is more of a coding approach rather than Ruby specific advice here,
you might consider using databases. If you set it up so that each
branch references the ways that you can go on the tree, you can then
walk the tree iteratively rather than recursively.

Just a thought.

Thanks Lloyd

Part of the exercise for me was exploring the possibility of using a
complete memory based solution as opposed to touching ad database. But
it seems you’ve got a point, I’ll compare using in-memory hashtables
and MySQL/sqlite3 to see how it pans out.

Best

Kenneth K.
[email protected]

Folding@home stats
http://fah-web.stanford.edu/cgi-bin/main.py?qtype=userpage&username=kenneth.kalmer