You need to add the generateNumberParts=1 attribute - assuming you actually
want the number generated.
The fact that your schema worked in 1.4 was probably simply a side effect of
this bug:
https://issues.apache.org/jira/browse/SOLR-1706
"wrong tokens output from WordDelimiterFilter depending upon options"
-- Jack Krupansky
-----Original Message-----
From: Frederico Azeiteiro
Sent: Monday, November 26, 2012 9:06 AM
To: solr-user@lucene.apache.org
Subject: Search differences between solr 1.4.0 and 3.6.1
Hi,
While updating our SOLR to 3.6.1 I noticed some results differences when
using search strings with letters+number.
For a text field defined as:
<analyzer type="index">
<http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt" splitOnCaseChange="1" catenateAll="0"
catenateNumbers="1" catenateWords="1" generateNumberParts="0"
generateWordParts="1" stemEnglishPossessive="0"/>
</analyzer>
<analyzer type="query">
<http://cbrsrvmtr04:8983/solr/WISE/admin/file/?file=schema.xml>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" ignoreCase="true"
expand="true" synonyms="synonyms.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt" splitOnCaseChange="1" catenateAll="0"
catenateNumbers="0" catenateWords="0" generateNumberParts="0"
generateWordParts="1"/>
</analyzer>
Searching for string GAMES12 returns a lot of results on 3.6.1 that are
not returned on 1.4.0.
It looks like WordDelimiterFilterFactory is acting different for 3.6.1,
the numeric part of the keyword is being ignored and the search is
performed using only GAMES.
Analisys returns for 1.4.0:
org.apache.solr.analysis.WordDelimiterFilterFactory
{protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0,
catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0}
term position
1
2
term text
GAMES
12
term type
word
word
source start,end
0,5
5,7
payload
AND for 3.6.1
org.apache.solr.analysis.WordDelimiterFilterFactory
{protected=protwords.txt, splitOnCaseChange=1, generateNumberParts=0,
catenateWords=0, luceneMatchVersion=LUCENE_36, generateWordParts=1,
catenateAll=0, catenateNumbers=0}
position
1
term text
GAMES
startOffset
0
endOffset
5
type
word
positionLength
1
Is this something that can be modified/fixed to return the same results?
Thank you.
Regards,
Frederico