Using the definition you provided, I don't get the same output. Are you
sure you are doing what you think? The generateNumberParts=0 keeps the '12'
from making it through the filter in 1.4 and 3.6 so I suspect you're not
quite doing something the same way in both.

Perhaps looking at index tokenization in one and query in the other?

Best
Erick


On Mon, Nov 26, 2012 at 9:06 AM, Frederico Azeiteiro <
frederico.azeite...@cision.com> wrote:

> Hi,
>
>
>
> While updating our SOLR to 3.6.1 I noticed some results differences when
> using search strings with letters+number.
>
> For a text field defined as:
>
> <analyzer type="index">
> <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml>
>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
> <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>
> <filter class="solr.WordDelimiterFilterFactory"
> protected="protwords.txt" splitOnCaseChange="1" catenateAll="0"
> catenateNumbers="1" catenateWords="1" generateNumberParts="0"
> generateWordParts="1" stemEnglishPossessive="0"/>
>
> </analyzer>
>
> <analyzer type="query">
> <http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml>
>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
> <filter class="solr.SynonymFilterFactory" ignoreCase="true"
> expand="true" synonyms="synonyms.txt"/>
>
> <filter class="solr.WordDelimiterFilterFactory"
> protected="protwords.txt" splitOnCaseChange="1" catenateAll="0"
> catenateNumbers="0" catenateWords="0" generateNumberParts="0"
> generateWordParts="1"/>
>
> </analyzer>
>
>
>
> Searching for string GAMES12 returns a lot of results on 3.6.1 that are
> not returned on 1.4.0.
>
>
>
> It looks like WordDelimiterFilterFactory  is acting different for 3.6.1,
> the numeric part of the keyword is being ignored and the search is
> performed using only GAMES.
>
>
>
> Analisys returns for 1.4.0:
>
> org.apache.solr.analysis.WordDelimiterFilterFactory
> {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0,
> catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0}
>
> term position
>
> 1
>
> 2
>
> term text
>
> GAMES
>
> 12
>
> term type
>
> word
>
> word
>
> source start,end
>
> 0,5
>
> 5,7
>
> payload
>
>
>
>
>
> AND for 3.6.1
>
>
>
> org.apache.solr.analysis.WordDelimiterFilterFactory
> {protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0,
> catenateWords=0, luceneMatchVersion=LUCENE_36, generateWordParts=1,
> catenateAll=0, catenateNumbers=0}
>
> position
>
> 1
>
> term text
>
> GAMES
>
> startOffset
>
> 0
>
> endOffset
>
> 5
>
> type
>
> word
>
> positionLength
>
> 1
>
>
>
>
>
> Is this something that can be modified/fixed to return the same results?
>
>
>
> Thank you.
>
>
>
> Regards,
>
> Frederico
>
>
>
>
>
>

Reply via email to