Hi Solr people!

querying for "series:RCWP" returns me the response below. Why does "RCWP 
Moisture Resistant" score worse than "D/CRCW-P e3" with the field definition 
below? OK, we are ignoring dashes and spaces, but I would have expected that 
matches towards the beginning score better. Can I change this behavior (in Solr 
4)?

----------------------------------------------------------------------------------------------------------------------------------
<result>
        <doc>
                <str name="series">RCWP</str>
                <float name="score">3.2698402</float>
        </doc>
        <doc>
                <str name="series">D/CRCW-P e3</str>
                <float name="score">1.3624334</float>
        </doc>
        <doc>
                <str name="series">RCWP Moisture Resistant</str>
                <float name="score">0.5449734</float>
        </doc>
</result>
----------------------------------------------------------------------------------------------------------------------------------

<fieldType name="series" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
                <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="[\-\s]+" replacement=""/>
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" enablePositionIncrements="true"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.NGramFilterFactory" minGramSize="2" 
maxGramSize="50"/>
        </analyzer>
        <analyzer type="query">
                <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="[\-\s]+" replacement=""/>
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
</fieldType>

Thanks,
Alexander

Reply via email to