Hello,
I have the the same field type defined in Solr 4.6 and Solr 7.5. When I
search with both versions, I get different results, and I don't know why
I have the following *field type definition in Solr 4.6*:
<fieldType name="text_type1" class="solr.TextField"
positionIncrementGap="1000">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I have the following *field type definition in Solr 7.5*:
<fieldType name="text_type1" class="solr.TextField"
positionIncrementGap="1000">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
<filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
/>
<filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
* I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of
solr.WordDelimiterGraphFilterFactory so the field types will be more alike,
but the result was the same.
I have the following *6 values set for field text1 of type text_type1 for 6
different documents* (the type(s) from above):
KI_d5e7b43a
KI_b7c490bd
KI_7df2f026
KI_fa7d129d
KI_5867aec7
KI_7c3c0b93
My query is *text1=KI_7*.
Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93
Using Solr 7.5, I get all 6 results.
Questions:
1. How come I get different results with the same data, when my fields
definitions are the same (as far as I can tell)?
2. What are the expected results?
I think that the results Solr 7.5 returns are the correct ones, since at the
end of the of the analysis I get *KA* as a term and *7* as a term, both
during the indexing analysis and the query analysis, so, to my
understanding, all 6 results should be found.
Is this correct? if not, what am I missing? what don't I understand
correctly?
I would very much appreciate a full/partial answer, but even a link that
could explain at least the expected results part would be great.
Thanks in advance, I know this might be a tough one to answer [Hope not :)]
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html