Re: A question on WordDelimiterFilterFactory

yandong yao Tue, 14 Sep 2010 20:00:27 -0700

After upgrading to 1.4.1, it is fixed.

Thanks very much for your help!


Regards,
Yandong Yao

2010/9/14 yandong yao <yydz...@gmail.com>

> Hi Robert,
>
> I am using solr 1.4, will try with 1.4.1 tomorrow.
>
> Thanks very much!
>
> Regards,
> Yandong Yao
>
> 2010/9/14 Robert Muir <rcm...@gmail.com>
>
> did you index with solr 1.4 (or are you using solr 1.4) ?
>>
>> at a quick glance, it looks like it might be this:
>> https://issues.apache.org/jira/browse/SOLR-1852 , which was fixed in
>> 1.4.1
>>
>> On Tue, Sep 14, 2010 at 5:40 AM, yandong yao <yydz...@gmail.com> wrote:
>>
>> > Hi Guys,
>> >
>> > I encountered a problem when enabling WordDelimiterFilterFactory for
>> both
>> > index and query (pasted relative part of schema.xml at the bottom of
>> > email).
>> >
>> > *1. Steps to reproduce:*
>> >    1.1 The indexed sample document contains only one sentence: "This is
>> a
>> > TechNote."
>> >    1.2 Query is: q=TechNote
>> >    1.3  Result: no matches return, while the above sentence contains
>> word
>> > 'TechNote' absolutely.
>> >
>> > *
>> > 2. Output when enabling debugQuery*
>> > By turning on debugQuery
>> >
>> >
>> http://localhost:7111/solr/test/select?indent=on&version=2.2&q=TechNote&fq=&start=0&rows=0&fl=*%2Cscore&qt=standard&wt=standard&debugQuery=on&explainOther=id%3A001&hl.fl=
>> > ,
>> > get following information:
>> >
>> > <str name="rawquerystring">TechNote</str>
>> > <str name="querystring">TechNote</str>
>> > <str name="parsedquery">PhraseQuery(all:"tech note")</str>
>> > <str name="parsedquery_toString">all:"tech note"</str>
>> > <lst name="explain"/>
>> > <str name="otherQuery">id:001</str>
>> > <lst name="explainOther">
>> > <str name="001">
>> > 0.0 = fieldWeight(all:"tech note" in 0), product of: 0.0 =
>> > tf(phraseFreq=0.0)
>> >  0.61370564 = idf(all: tech=1 note=1)
>> >  0.25 = fieldNorm(field=all, doc=0)
>> > </str>
>> > </lst>
>> >
>> > Seems that the raw query string is converted to phrase query "tech
>> note",
>> > while its term frequency is 0, so no matches.
>> >
>> > *3. Result from admin/analysis.jsp page*
>> >
>> > From analysis.jsp, seems the query 'TechNote' matches the input
>> document,
>> > see below words marked by RED color.
>> >
>> > Index Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>>  term
>> > position 1234 term text ThisisaTechNote. term type wordwordwordword
>> source
>> > start,end 0,45,78,910,19 payload
>> >
>> >
>> >
>> >  org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
>> > expand=true, ignoreCase=true}  term position 1234 term text
>> > ThisisaTechNote. term
>> > type wordwordwordword source start,end 0,45,78,910,19 payload
>> >
>> >
>> >
>> >  org.apache.solr.analysis.WordDelimiterFilterFactory
>> {splitOnCaseChange=1,
>> > generateNumberParts=1, catenateWords=1, generateWordParts=1,
>> catenateAll=0,
>> > catenateNumbers=1}  term position 12345 term text ThisisaTechNote
>> TechNote
>> > term
>> > type wordwordwordwordword word source start,end 0,45,78,910,1414,18
>> 10,18
>> > payload
>> >
>> >
>> >
>> >
>> >
>> >  org.apache.solr.analysis.LowerCaseFilterFactory {}  term position 12345
>> > term
>> > text thisisatechnote technote term type wordwordwordwordword word source
>> > start,end 0,45,78,910,1414,18 10,18 payload
>> >
>> >
>> >
>> >
>> >
>> >  org.apache.solr.analysis.SnowballPorterFilterFactory
>> > {protected=protwords.txt, language=English}  term position 12345 term
>> text
>> > thisisa*tech**note* technot term type wordwordwordwordword word source
>> > start,end 0,45,78,910,1414,18 10,18 payload
>> >
>> >
>> >
>> >
>> >
>> >  Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {}
>> >  term
>> > position 1 term text TechNote term type word source start,end 0,8
>> payload
>> >  org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt,
>> > expand=true, ignoreCase=true}  term position 1 term text TechNote term
>> type
>> > word source start,end 0,8 payload
>> >  org.apache.solr.analysis.WordDelimiterFilterFactory
>> {splitOnCaseChange=1,
>> > generateNumberParts=1, catenateWords=0, generateWordParts=1,
>> catenateAll=0,
>> > catenateNumbers=0}  term position 12 term text TechNote term type
>> > wordword source
>> > start,end 0,44,8 payload
>> >
>> >  org.apache.solr.analysis.LowerCaseFilterFactory {}  term position 12
>> term
>> > text technote term type wordword source start,end 0,44,8 payload
>> >
>> >  org.apache.solr.analysis.SnowballPorterFilterFactory
>> > {protected=protwords.txt, language=English} term position 12 term text
>> tech
>> > note term type wordword source start,end 0,44,8 payload
>> >
>> >
>> > *
>> > 4. My questions are:*
>> >    4.1: Why debugQuery and analysis.jsp has different result?
>> >    4.2: From my understanding, during indexing, the word 'TechNote' will
>> be
>> > converted to: 1) 'technote' and 2) 'tech note' according to my config in
>> > schema.xml. And at query time, 'TechNote' will be converted to 'tech
>> note',
>> > thus it SHOULD match.  Am I right?
>> >     4.3: Why the phrase frequency 'tech note' is 0 in the output of
>> > debugQuery result (0.0 = tf(phraseFreq=0.0))?
>> >
>> > Any suggestion/comments are absolutely welcome!
>> >
>> >
>> > *5. fieldType definition in schema.xml*
>> >
>> >    <fieldType name="text" class="solr.TextField"
>> > positionIncrementGap="100">
>> >      <analyzer type="index">
>> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> > ignoreCase="true" expand="true"/>
>> >        <filter class="solr.WordDelimiterFilterFactory"
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> >        <filter class="solr.LowerCaseFilterFactory"/>
>> >        <filter class="solr.SnowballPorterFilterFactory"
>> language="English"
>> > protected="protwords.txt"/>
>> >      </analyzer>
>> >      <analyzer type="query">
>> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> > ignoreCase="true" expand="true"/>
>> >        <filter class="solr.WordDelimiterFilterFactory"
>> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> >        <filter class="solr.LowerCaseFilterFactory"/>
>> >        <filter class="solr.SnowballPorterFilterFactory"
>> language="English"
>> > protected="protwords.txt"/>
>> >      </analyzer>
>> >    </fieldType>
>> >
>> >
>> > Thanks very much!
>> >
>>
>>
>>
>> --
>> Robert Muir
>> rcm...@gmail.com
>>
>
>

Re: A question on WordDelimiterFilterFactory

Reply via email to