Re: Fieldnorm solr 4 -> specialchars(worddelimiter)

Jack Krupansky Mon, 28 Jan 2013 06:32:49 -0800

As I said, maybe there might have been bugs fixed since 3.1. WDF has changedover time. Expecting it to give identical results across releases is aclassic Fool's Errand. Ditto for scoring in general - it's subject to changeacross major releases.

I mean, sure, we could track down what specific change caused thediscrepancy, but what good would that do you? If it does happen to be a bug,then of course it can be fixed in a future release, but as of this moment,there is no evidence to suggest that it is the result of a bug, especiallyconsidering WDF's evolution over time.

Compare the analyzer output between the two releases again. Maybe togglingone of the attributes will cause the 4.x output to more closely match the3.1 output.

Rereading your previous message - maybe catenateWords was indeed broken in3.1. If that was the case, then that explains the difference and that is aGOOD difference, nothing to fret over. Or, maybe you need to turn thatattribute off if that is your own preference.


-- Jack Krupansky

-----Original Message-----From: roySolr

Sent: Monday, January 28, 2013 9:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Fieldnorm solr 4 -> specialchars(worddelimiter)

Hello Jack,

I'm using exactly the same fieldtype:

<fieldType name="text_delimiter" class="solr.TextField">
     <analyzer type="index">
     <charFilter class="solr.HTMLStripCharFilterFactory"/>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.ASCIIFoldingFilterFactory"/>
       <filter class="solr.TrimFilterFactory"/>
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" catenateWords="1" splitOnCaseChange="0"
splitOnNumerics="0" stemEnglishPossessive="0" />
     </analyzer>
     <analyzer type="query">
     <charFilter class="solr.HTMLStripCharFilterFactory"/>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.ASCIIFoldingFilterFactory"/>
       <filter class="solr.TrimFilterFactory"/>
     <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
catenateWords="0" splitOnCaseChange="0" splitOnNumerics="0"
stemEnglishPossessive="0" />
     </analyzer>
   </fieldType>

It looks like the catenatewords has another influence in solr 4.1 than in
previous version.(3.1)
The analyze is the same in both versions. I want exactly the same results
but can't get it.








--

View this message in context:http://lucene.472066.n3.nabble.com/Fieldnorm-solr-4-specialchars-worddelimiter-tp4036248p4036749.htmlSent from the Solr - User mailing list archive at Nabble.com.

Re: Fieldnorm solr 4 -> specialchars(worddelimiter)

Reply via email to