Re: A question on WordDelimiterFilterFactory

yandong yao Tue, 14 Sep 2010 06:48:40 -0700

Hi Robert,

I am using solr 1.4, will try with 1.4.1 tomorrow.


Thanks very much!

Regards,
Yandong Yao

2010/9/14 Robert Muir <rcm...@gmail.com>

> did you index with solr 1.4 (or are you using solr 1.4) ?
>
> at a quick glance, it looks like it might be this:
> https://issues.apache.org/jira/browse/SOLR-1852 , which was fixed in 1.4.1
>
> On Tue, Sep 14, 2010 at 5:40 AM, yandong yao <yydz...@gmail.com> wrote:
>
> > Hi Guys,
> >
> > I encountered a problem when enabling WordDelimiterFilterFactory for both
> > index and query (pasted relative part of schema.xml at the bottom of
> > email).
> >
> > *1. Steps to reproduce:*
> >    1.1 The indexed sample document contains only one sentence: "This is a
> > TechNote."
> >    1.2 Query is: q=TechNote
> >    1.3  Result: no matches return, while the above sentence contains word
> > 'TechNote' absolutely.
> >
> > *
> > 2. Output when enabling debugQuery*
> > By turning on debugQuery
> >
> >
> http://localhost:7111/solr/test/select?indent=on&version=2.2&q=TechNote&fq=&start=0&rows=0&fl=*%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=id%3A001&hl.fl=
> > ,
> > get following information:
> >
> > <str name="rawquerystring">TechNote</str>
> > <str name="querystring">TechNote</str>
> > <str name="parsedquery">PhraseQuery(all:"tech note")</str>
> > <str name="parsedquery_toString">all:"tech note"</str>
> > <lst name="explain"/>
> > <str name="otherQuery">id:001</str>
> > <lst name="explainOther">
> > <str name="001">
> > 0.0 = fieldWeight(all:"tech note" in 0), product of: 0.0 =
> > tf(phraseFreq=0.0)
> >  0.61370564 = idf(all: tech=1 note=1)
> >  0.25 = fieldNorm(field=all, doc=0)
> > </str>
> > </lst>
> >
> > Seems that the raw query string is converted to phrase query "tech note",
> > while its term frequency is 0, so no matches.
> >
> > *3. Result from admin/analysis.jsp page*
> >
> > From analysis.jsp, seems the query 'TechNote' matches the input document,
> > see below words marked by RED color.
> >
> > Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>  term
> > position 1234 term text ThisisaTechNote. term type wordwordwordword
> source
> > start,end 0,45,78,910,19 payload
> >
> >
> >
> >  org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
> > expand=true, ignoreCase=true}  term position 1234 term text
> > ThisisaTechNote. term
> > type wordwordwordword source start,end 0,45,78,910,19 payload
> >
> >
> >
> >  org.apache.solr.analysis.WordDelimiterFilterFactory
> {splitOnCaseChange=1,
> > generateNumberParts=1, catenateWords=1, generateWordParts=1,
> catenateAll=0,
> > catenateNumbers=1}  term position 12345 term text ThisisaTechNote
> TechNote
> > term
> > type wordwordwordwordword word source start,end 0,45,78,910,1414,18 10,18
> > payload
> >
> >
> >
> >
> >
> >  org.apache.solr.analysis.LowerCaseFilterFactory {}  term position 12345
> > term
> > text thisisatechnote technote term type wordwordwordwordword word source
> > start,end 0,45,78,910,1414,18 10,18 payload
> >
> >
> >
> >
> >
> >  org.apache.solr.analysis.SnowballPorterFilterFactory
> > {protected=protwords.txt, language=English}  term position 12345 term
> text
> > thisisa*tech**note* technot term type wordwordwordwordword word source
> > start,end 0,45,78,910,1414,18 10,18 payload
> >
> >
> >
> >
> >
> >  Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {}
> >  term
> > position 1 term text TechNote term type word source start,end 0,8 payload
> >  org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
> > expand=true, ignoreCase=true}  term position 1 term text TechNote term
> type
> > word source start,end 0,8 payload
> >  org.apache.solr.analysis.WordDelimiterFilterFactory
> {splitOnCaseChange=1,
> > generateNumberParts=1, catenateWords=0, generateWordParts=1,
> catenateAll=0,
> > catenateNumbers=0}  term position 12 term text TechNote term type
> > wordword source
> > start,end 0,44,8 payload
> >
> >  org.apache.solr.analysis.LowerCaseFilterFactory {}  term position 12
> term
> > text technote term type wordword source start,end 0,44,8 payload
> >
> >  org.apache.solr.analysis.SnowballPorterFilterFactory
> > {protected=protwords.txt, language=English} term position 12 term text
> tech
> > note term type wordword source start,end 0,44,8 payload
> >
> >
> > *
> > 4. My questions are:*
> >    4.1: Why debugQuery and analysis.jsp has different result?
> >    4.2: From my understanding, during indexing, the word 'TechNote' will
> be
> > converted to: 1) 'technote' and 2) 'tech note' according to my config in
> > schema.xml. And at query time, 'TechNote' will be converted to 'tech
> note',
> > thus it SHOULD match.  Am I right?
> >     4.3: Why the phrase frequency 'tech note' is 0 in the output of
> > debugQuery result (0.0 = tf(phraseFreq=0.0))?
> >
> > Any suggestion/comments are absolutely welcome!
> >
> >
> > *5. fieldType definition in schema.xml*
> >
> >    <fieldType name="text" class="solr.TextField"
> > positionIncrementGap="100">
> >      <analyzer type="index">
> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> >        <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >        <filter class="solr.LowerCaseFilterFactory"/>
> >        <filter class="solr.SnowballPorterFilterFactory"
> language="English"
> > protected="protwords.txt"/>
> >      </analyzer>
> >      <analyzer type="query">
> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> >        <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >        <filter class="solr.LowerCaseFilterFactory"/>
> >        <filter class="solr.SnowballPorterFilterFactory"
> language="English"
> > protected="protwords.txt"/>
> >      </analyzer>
> >    </fieldType>
> >
> >
> > Thanks very much!
> >
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>

Re: A question on WordDelimiterFilterFactory

Reply via email to