Hi All
This is my first post to the list, although I’ve been an avid reader
for over a year now. I’ll save everyone the usual Ruby is great and
rocks my world (and thanks Matz) lines since it is self-explanatory.
I’m counting on the experience of the community to shove me in the
right direction with my issue, but not with explicit help. Baptism by
fire, the best way to really learn.
I’m writing a postfix log parser, based on the pflog2mysql library in
the RAA, but without the MySQL storage. We need to analyze mail
traffic through our various mail relays and then bill per usage. The
parsing works fine, the analysis is the tricky part, so herewith my
story.
For each message encountered in the logs I build a Message object and
populate it with the data from the logs as we go along, having a
single object that represent the entire life cycle of the message as
it traversed the server processes. The Message objects are stored in
an aptly named MessageList class, which is a wrapper around a very
simple balanced binary tree.
I have two different tree classes at my disposal, one specializing in
storing messages and referencing them on their queue_id’s, and a
second more dynamic tree that I use to sort the messages based on
various criteria. Find the top messages based on size, number of
recipients etc…
When ever I perform a lookup in the MessageList class, it repopulates
the dynamic tree from the data inside the message tree, passing the
correct attributes for positioning in the tree and then performing the
lookups later. This process works very well and is extremely quick
(3500 lines sample log file parsed and sorted on two different
criteria in less than a second), however, on realistic data sets the
dynamic tree croaks with stack level too deep exceptions, and the
message tree doesn’t. I can only assume the message tree is much
broader than the dynamic tree.
This raises the question, is a binary tree right for sorting such high
volumes of information? I’m not really on par with the various
algorithms for performing tasks like this (web background). If someone
has any pointers for using Ruby to sift through big collections of
objects, and even quickly search the tree (or alternate data
structure) it would be dearly appreciated. I read another post in the
archives where a writer clarified the differences between a B-Tree and
normal binary tree, would the B-Tree yield better handling of so much
data?
I plan to release the final product as FOSS, so if anyone wants a
glance at severe pre-alpha proof of concept code, just shout. Please
refrain from giving explicit code examples, this is a personal
challenge for myself and I’m looking for guidance.
Thanks in advance.
–
Kenneth K.
[email protected]
Folding@home stats
http://fah-web.stanford.edu/cgi-bin/main.py?qtype=userpage&username=kenneth.kalmer