rmmseg version 0.1.0 has been released!
RMMSeg is an implementation of MMSEG Chinese word segmentation
algorithm. It is based on two variants of maximum matching
algorithms. Two algorithms are available for using:
- simple algorithm that uses only forward maximum matching.
- complex algorithm that uses three-word chunk maximum matching and 3
aditonal rules to solve ambiguities.
For more information about the algorithm, please refer to the
following essays:
- MMSEG: A Word Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm
- LifeGoo.com is for sale | HugeDomains
Changes:
0.1.0 / 2008-02-01
-
Add filter to filter out Chinese punctuations.