As I said, maybe there might have been bugs fixed since 3.1. WDF has changed
over time. Expecting it to give identical results across releases is a
classic Fool's Errand. Ditto for scoring in general - it's subject to change
across major releases.
I mean, sure, we could track down what specific change caused the
discrepancy, but what good would that do you? If it does happen to be a bug,
then of course it can be fixed in a future release, but as of this moment,
there is no evidence to suggest that it is the result of a bug, especially
considering WDF's evolution over time.
Compare the analyzer output between the two releases again. Maybe toggling
one of the attributes will cause the 4.x output to more closely match the
3.1 output.
Rereading your previous message - maybe catenateWords was indeed broken in
3.1. If that was the case, then that explains the difference and that is a
GOOD difference, nothing to fret over. Or, maybe you need to turn that
attribute off if that is your own preference.
-- Jack Krupansky
-----Original Message-----
From: roySolr
Sent: Monday, January 28, 2013 9:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Fieldnorm solr 4 -> specialchars(worddelimiter)
Hello Jack,
I'm using exactly the same fieldtype:
<fieldType name="text_delimiter" class="solr.TextField">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" catenateWords="1" splitOnCaseChange="0"
splitOnNumerics="0" stemEnglishPossessive="0" />
</analyzer>
<analyzer type="query">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
catenateWords="0" splitOnCaseChange="0" splitOnNumerics="0"
stemEnglishPossessive="0" />
</analyzer>
</fieldType>
It looks like the catenatewords has another influence in solr 4.1 than in
previous version.(3.1)
The analyze is the same in both versions. I want exactly the same results
but can't get it.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Fieldnorm-solr-4-specialchars-worddelimiter-tp4036248p4036749.html
Sent from the Solr - User mailing list archive at Nabble.com.