Solr 3.4 problem with words separated by coma without space

elisabeth benoit Thu, 08 Dec 2011 01:26:36 -0800

Hello,

I'm using Solr 3.4, and I'm having a problem with a request returning
different results if I have or not a space after a coma.


The request "name, number rue taine paris" returns results with 4 words out
of 5 matching ("name", "number", "rue", "paris")

The request "name,number rue taine paris" (no space between coma and
"number") returns no results, unless I set mm=3, and then matching words
are "rue", "taine", "paris".

If I check in the solr.admin.analyzer, I get the same analysis for the two
different requests. But it seems, if fact, that the lacking space after
coma prevents name and number from matching.


My field type is


      <analyzer type="query">
        <!-- découpage standard -->
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- normalisation des accents, cédilles, e dans l'o,... -->
        <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        <!-- suppression des . (I.B.M. => IBM) -->
        <filter class="solr.StandardFilterFactory"/>
        <!-- passage en minuscules -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- suppression de la ponctuation -->
        <filter class="solr.PatternReplaceFilterFactory"
pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
        <!-- suppression des tokens vides et des mots démesurés -->
        <filter class="solr.LengthFilterFactory" min="1" max="100" />
        <!-- découpage des mots composés -->
        <filter class="solr.WordDelimiterFilterFactory"
splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="1"
generateWordParts="1"

generateNumberParts="1" catenateWords="0" catenateNumbers="1"
catenateAll="0" preserveOriginal="1"/>
        <!-- suppression des élisions (l', qu',...) -->
        <filter class="solr.ElisionFilterFactory"
articles="elisionwords.txt"/>
        <!-- suppression des mots insignifiants -->
        <filter class="solr.StopFilterFactory" ignoreCase="1"
words="stopwords.txt" enablePositionIncrements="true"/>
        <!-- lemmatisation (pluriels,...) -->
        <filter class="solr.SnowballPorterFilterFactory" language="French"
protected="protwords.txt"/>
        <!-- suppression des doublons éventuels -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

Anyone has a clue?

Thanks,
Elisabeth

Solr 3.4 problem with words separated by coma without space

Reply via email to