Bug ? with stop words in French UTF-8 locale


#1

Strange thing.
Minimal example :
Indexing a three accented words text like “aprèl après aprèt” and asking
for one of the three words, then two cases appear :

  • plain indexing : all three give a hit,
  • indexing with FULL_FRENCH_STOP_WORDS, only one (“après”) gives a hit.

I made extensive checks : no clear pattern appears for what type of
accented words work and what do not : f.i. “Hélène” does not work,
“Jérôme” works…

By the way, the list of French stop words appearing in stopwords.c is
strange, as some of them do not exist in the French language (flexed
participles…).