After upgrading to 1.4.1, it is fixed. Thanks very much for your help!
Regards, Yandong Yao 2010/9/14 yandong yao <yydz...@gmail.com> > Hi Robert, > > I am using solr 1.4, will try with 1.4.1 tomorrow. > > Thanks very much! > > Regards, > Yandong Yao > > 2010/9/14 Robert Muir <rcm...@gmail.com> > > did you index with solr 1.4 (or are you using solr 1.4) ? >> >> at a quick glance, it looks like it might be this: >> https://issues.apache.org/jira/browse/SOLR-1852 , which was fixed in >> 1.4.1 >> >> On Tue, Sep 14, 2010 at 5:40 AM, yandong yao <yydz...@gmail.com> wrote: >> >> > Hi Guys, >> > >> > I encountered a problem when enabling WordDelimiterFilterFactory for >> both >> > index and query (pasted relative part of schema.xml at the bottom of >> > email). >> > >> > *1. Steps to reproduce:* >> > 1.1 The indexed sample document contains only one sentence: "This is >> a >> > TechNote." >> > 1.2 Query is: q=TechNote >> > 1.3 Result: no matches return, while the above sentence contains >> word >> > 'TechNote' absolutely. >> > >> > * >> > 2. Output when enabling debugQuery* >> > By turning on debugQuery >> > >> > >> http://localhost:7111/solr/test/select?indent=on&version=2.2&q=TechNote&fq=&start=0&rows=0&fl=*%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=id%3A001&hl.fl= >> > , >> > get following information: >> > >> > <str name="rawquerystring">TechNote</str> >> > <str name="querystring">TechNote</str> >> > <str name="parsedquery">PhraseQuery(all:"tech note")</str> >> > <str name="parsedquery_toString">all:"tech note"</str> >> > <lst name="explain"/> >> > <str name="otherQuery">id:001</str> >> > <lst name="explainOther"> >> > <str name="001"> >> > 0.0 = fieldWeight(all:"tech note" in 0), product of: 0.0 = >> > tf(phraseFreq=0.0) >> > 0.61370564 = idf(all: tech=1 note=1) >> > 0.25 = fieldNorm(field=all, doc=0) >> > </str> >> > </lst> >> > >> > Seems that the raw query string is converted to phrase query "tech >> note", >> > while its term frequency is 0, so no matches. >> > >> > *3. Result from admin/analysis.jsp page* >> > >> > From analysis.jsp, seems the query 'TechNote' matches the input >> document, >> > see below words marked by RED color. >> > >> > Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} >> term >> > position 1234 term text ThisisaTechNote. term type wordwordwordword >> source >> > start,end 0,45,78,910,19 payload >> > >> > >> > >> > org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, >> > expand=true, ignoreCase=true} term position 1234 term text >> > ThisisaTechNote. term >> > type wordwordwordword source start,end 0,45,78,910,19 payload >> > >> > >> > >> > org.apache.solr.analysis.WordDelimiterFilterFactory >> {splitOnCaseChange=1, >> > generateNumberParts=1, catenateWords=1, generateWordParts=1, >> catenateAll=0, >> > catenateNumbers=1} term position 12345 term text ThisisaTechNote >> TechNote >> > term >> > type wordwordwordwordword word source start,end 0,45,78,910,1414,18 >> 10,18 >> > payload >> > >> > >> > >> > >> > >> > org.apache.solr.analysis.LowerCaseFilterFactory {} term position 12345 >> > term >> > text thisisatechnote technote term type wordwordwordwordword word source >> > start,end 0,45,78,910,1414,18 10,18 payload >> > >> > >> > >> > >> > >> > org.apache.solr.analysis.SnowballPorterFilterFactory >> > {protected=protwords.txt, language=English} term position 12345 term >> text >> > thisisa*tech**note* technot term type wordwordwordwordword word source >> > start,end 0,45,78,910,1414,18 10,18 payload >> > >> > >> > >> > >> > >> > Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} >> > term >> > position 1 term text TechNote term type word source start,end 0,8 >> payload >> > org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, >> > expand=true, ignoreCase=true} term position 1 term text TechNote term >> type >> > word source start,end 0,8 payload >> > org.apache.solr.analysis.WordDelimiterFilterFactory >> {splitOnCaseChange=1, >> > generateNumberParts=1, catenateWords=0, generateWordParts=1, >> catenateAll=0, >> > catenateNumbers=0} term position 12 term text TechNote term type >> > wordword source >> > start,end 0,44,8 payload >> > >> > org.apache.solr.analysis.LowerCaseFilterFactory {} term position 12 >> term >> > text technote term type wordword source start,end 0,44,8 payload >> > >> > org.apache.solr.analysis.SnowballPorterFilterFactory >> > {protected=protwords.txt, language=English} term position 12 term text >> tech >> > note term type wordword source start,end 0,44,8 payload >> > >> > >> > * >> > 4. My questions are:* >> > 4.1: Why debugQuery and analysis.jsp has different result? >> > 4.2: From my understanding, during indexing, the word 'TechNote' will >> be >> > converted to: 1) 'technote' and 2) 'tech note' according to my config in >> > schema.xml. And at query time, 'TechNote' will be converted to 'tech >> note', >> > thus it SHOULD match. Am I right? >> > 4.3: Why the phrase frequency 'tech note' is 0 in the output of >> > debugQuery result (0.0 = tf(phraseFreq=0.0))? >> > >> > Any suggestion/comments are absolutely welcome! >> > >> > >> > *5. fieldType definition in schema.xml* >> > >> > <fieldType name="text" class="solr.TextField" >> > positionIncrementGap="100"> >> > <analyzer type="index"> >> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >> > ignoreCase="true" expand="true"/> >> > <filter class="solr.WordDelimiterFilterFactory" >> > generateWordParts="1" generateNumberParts="1" catenateWords="1" >> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> >> > <filter class="solr.LowerCaseFilterFactory"/> >> > <filter class="solr.SnowballPorterFilterFactory" >> language="English" >> > protected="protwords.txt"/> >> > </analyzer> >> > <analyzer type="query"> >> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >> > ignoreCase="true" expand="true"/> >> > <filter class="solr.WordDelimiterFilterFactory" >> > generateWordParts="1" generateNumberParts="1" catenateWords="0" >> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> >> > <filter class="solr.LowerCaseFilterFactory"/> >> > <filter class="solr.SnowballPorterFilterFactory" >> language="English" >> > protected="protwords.txt"/> >> > </analyzer> >> > </fieldType> >> > >> > >> > Thanks very much! >> > >> >> >> >> -- >> Robert Muir >> rcm...@gmail.com >> > >