Hi Guys, I encountered a problem when enabling WordDelimiterFilterFactory for both index and query (pasted relative part of schema.xml at the bottom of email).
*1. Steps to reproduce:* 1.1 The indexed sample document contains only one sentence: "This is a TechNote." 1.2 Query is: q=TechNote 1.3 Result: no matches return, while the above sentence contains word 'TechNote' absolutely. * 2. Output when enabling debugQuery* By turning on debugQuery http://localhost:7111/solr/test/select?indent=on&version=2.2&q=TechNote&fq=&start=0&rows=0&fl=*%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=id%3A001&hl.fl=, get following information: <str name="rawquerystring">TechNote</str> <str name="querystring">TechNote</str> <str name="parsedquery">PhraseQuery(all:"tech note")</str> <str name="parsedquery_toString">all:"tech note"</str> <lst name="explain"/> <str name="otherQuery">id:001</str> <lst name="explainOther"> <str name="001"> 0.0 = fieldWeight(all:"tech note" in 0), product of: 0.0 = tf(phraseFreq=0.0) 0.61370564 = idf(all: tech=1 note=1) 0.25 = fieldNorm(field=all, doc=0) </str> </lst> Seems that the raw query string is converted to phrase query "tech note", while its term frequency is 0, so no matches. *3. Result from admin/analysis.jsp page* >From analysis.jsp, seems the query 'TechNote' matches the input document, see below words marked by RED color. Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1234 term text ThisisaTechNote. term type wordwordwordword source start,end 0,45,78,910,19 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, expand=true, ignoreCase=true} term position 1234 term text ThisisaTechNote. term type wordwordwordword source start,end 0,45,78,910,19 payload org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1, generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1} term position 12345 term text ThisisaTechNote TechNote term type wordwordwordwordword word source start,end 0,45,78,910,1414,18 10,18 payload org.apache.solr.analysis.LowerCaseFilterFactory {} term position 12345 term text thisisatechnote technote term type wordwordwordwordword word source start,end 0,45,78,910,1414,18 10,18 payload org.apache.solr.analysis.SnowballPorterFilterFactory {protected=protwords.txt, language=English} term position 12345 term text thisisa*tech**note* technot term type wordwordwordwordword word source start,end 0,45,78,910,1414,18 10,18 payload Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 term text TechNote term type word source start,end 0,8 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, expand=true, ignoreCase=true} term position 1 term text TechNote term type word source start,end 0,8 payload org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1, generateNumberParts=1, catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0} term position 12 term text TechNote term type wordword source start,end 0,44,8 payload org.apache.solr.analysis.LowerCaseFilterFactory {} term position 12 term text technote term type wordword source start,end 0,44,8 payload org.apache.solr.analysis.SnowballPorterFilterFactory {protected=protwords.txt, language=English} term position 12 term text tech note term type wordword source start,end 0,44,8 payload * 4. My questions are:* 4.1: Why debugQuery and analysis.jsp has different result? 4.2: From my understanding, during indexing, the word 'TechNote' will be converted to: 1) 'technote' and 2) 'tech note' according to my config in schema.xml. And at query time, 'TechNote' will be converted to 'tech note', thus it SHOULD match. Am I right? 4.3: Why the phrase frequency 'tech note' is 0 in the output of debugQuery result (0.0 = tf(phraseFreq=0.0))? Any suggestion/comments are absolutely welcome! *5. fieldType definition in schema.xml* <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> </fieldType> Thanks very much!