RMMSeg 0.1.2 Released

Good News: Performance improvement.
Bad News: Memory leak problem still exists.

rmmseg version 0.1.2
by pluskid


RMMSeg is an implementation of MMSEG Chinese word segmentation
algorithm. It is based on two variants of maximum matching
algorithms. Two algorithms are available for using:

  • simple algorithm that uses only forward maximum matching.
  • complex algorithm that uses three-word chunk maximum matching and 3
    aditonal rules to solve ambiguities.

For more information about the algorithm, please refer to the
following essays:


  • Add cache to find_match_words: performance improved.
  • Implement Chunk as a module instead of a class: performance improved.
  • Don’t store unnecessary data in dictionary: memory usage reduced.