Hi Robert, I am using solr 1.4, will try with 1.4.1 tomorrow.
Thanks very much! Regards, Yandong Yao 2010/9/14 Robert Muir <rcm...@gmail.com> > did you index with solr 1.4 (or are you using solr 1.4) ? > > at a quick glance, it looks like it might be this: > https://issues.apache.org/jira/browse/SOLR-1852 , which was fixed in 1.4.1 > > On Tue, Sep 14, 2010 at 5:40 AM, yandong yao <yydz...@gmail.com> wrote: > > > Hi Guys, > > > > I encountered a problem when enabling WordDelimiterFilterFactory for both > > index and query (pasted relative part of schema.xml at the bottom of > > email). > > > > *1. Steps to reproduce:* > > 1.1 The indexed sample document contains only one sentence: "This is a > > TechNote." > > 1.2 Query is: q=TechNote > > 1.3 Result: no matches return, while the above sentence contains word > > 'TechNote' absolutely. > > > > * > > 2. Output when enabling debugQuery* > > By turning on debugQuery > > > > > http://localhost:7111/solr/test/select?indent=on&version=2.2&q=TechNote&fq=&start=0&rows=0&fl=*%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=id%3A001&hl.fl= > > , > > get following information: > > > > <str name="rawquerystring">TechNote</str> > > <str name="querystring">TechNote</str> > > <str name="parsedquery">PhraseQuery(all:"tech note")</str> > > <str name="parsedquery_toString">all:"tech note"</str> > > <lst name="explain"/> > > <str name="otherQuery">id:001</str> > > <lst name="explainOther"> > > <str name="001"> > > 0.0 = fieldWeight(all:"tech note" in 0), product of: 0.0 = > > tf(phraseFreq=0.0) > > 0.61370564 = idf(all: tech=1 note=1) > > 0.25 = fieldNorm(field=all, doc=0) > > </str> > > </lst> > > > > Seems that the raw query string is converted to phrase query "tech note", > > while its term frequency is 0, so no matches. > > > > *3. Result from admin/analysis.jsp page* > > > > From analysis.jsp, seems the query 'TechNote' matches the input document, > > see below words marked by RED color. > > > > Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} > term > > position 1234 term text ThisisaTechNote. term type wordwordwordword > source > > start,end 0,45,78,910,19 payload > > > > > > > > org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, > > expand=true, ignoreCase=true} term position 1234 term text > > ThisisaTechNote. term > > type wordwordwordword source start,end 0,45,78,910,19 payload > > > > > > > > org.apache.solr.analysis.WordDelimiterFilterFactory > {splitOnCaseChange=1, > > generateNumberParts=1, catenateWords=1, generateWordParts=1, > catenateAll=0, > > catenateNumbers=1} term position 12345 term text ThisisaTechNote > TechNote > > term > > type wordwordwordwordword word source start,end 0,45,78,910,1414,18 10,18 > > payload > > > > > > > > > > > > org.apache.solr.analysis.LowerCaseFilterFactory {} term position 12345 > > term > > text thisisatechnote technote term type wordwordwordwordword word source > > start,end 0,45,78,910,1414,18 10,18 payload > > > > > > > > > > > > org.apache.solr.analysis.SnowballPorterFilterFactory > > {protected=protwords.txt, language=English} term position 12345 term > text > > thisisa*tech**note* technot term type wordwordwordwordword word source > > start,end 0,45,78,910,1414,18 10,18 payload > > > > > > > > > > > > Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} > > term > > position 1 term text TechNote term type word source start,end 0,8 payload > > org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, > > expand=true, ignoreCase=true} term position 1 term text TechNote term > type > > word source start,end 0,8 payload > > org.apache.solr.analysis.WordDelimiterFilterFactory > {splitOnCaseChange=1, > > generateNumberParts=1, catenateWords=0, generateWordParts=1, > catenateAll=0, > > catenateNumbers=0} term position 12 term text TechNote term type > > wordword source > > start,end 0,44,8 payload > > > > org.apache.solr.analysis.LowerCaseFilterFactory {} term position 12 > term > > text technote term type wordword source start,end 0,44,8 payload > > > > org.apache.solr.analysis.SnowballPorterFilterFactory > > {protected=protwords.txt, language=English} term position 12 term text > tech > > note term type wordword source start,end 0,44,8 payload > > > > > > * > > 4. My questions are:* > > 4.1: Why debugQuery and analysis.jsp has different result? > > 4.2: From my understanding, during indexing, the word 'TechNote' will > be > > converted to: 1) 'technote' and 2) 'tech note' according to my config in > > schema.xml. And at query time, 'TechNote' will be converted to 'tech > note', > > thus it SHOULD match. Am I right? > > 4.3: Why the phrase frequency 'tech note' is 0 in the output of > > debugQuery result (0.0 = tf(phraseFreq=0.0))? > > > > Any suggestion/comments are absolutely welcome! > > > > > > *5. fieldType definition in schema.xml* > > > > <fieldType name="text" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer type="index"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > > ignoreCase="true" expand="true"/> > > <filter class="solr.WordDelimiterFilterFactory" > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.SnowballPorterFilterFactory" > language="English" > > protected="protwords.txt"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > > ignoreCase="true" expand="true"/> > > <filter class="solr.WordDelimiterFilterFactory" > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.SnowballPorterFilterFactory" > language="English" > > protected="protwords.txt"/> > > </analyzer> > > </fieldType> > > > > > > Thanks very much! > > > > > > -- > Robert Muir > rcm...@gmail.com >