Good News: Performance improvement.
Bad News: Memory leak problem still exists.
rmmseg version 0.1.2
by pluskid
http://rmmseg.rubyforge.org
== DESCRIPTION
RMMSeg is an implementation of MMSEG Chinese word segmentation
algorithm. It is based on two variants of maximum matching
algorithms. Two algorithms are available for using:
- simple algorithm that uses only forward maximum matching.
- complex algorithm that uses three-word chunk maximum matching and 3
aditonal rules to solve ambiguities.
For more information about the algorithm, please refer to the
following essays:
- MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm
- LifeGoo.com is for sale | HugeDomains
== CHANGES
- Add cache to find_match_words: performance improved.
- Implement Chunk as a module instead of a class: performance improved.
- Don’t store unnecessary data in dictionary: memory usage reduced.