RE: StopFilterFactory and "qf" containing some fields that use it and some that do not

Dyer, James Wed, 12 Jan 2011 15:22:55 -0800

Here is what debug says each of these queries parse to:

1. q=life&defType=edismax&qf=Title  ... returns 277,635 results
2. q=the life&defType=edismax&qf=Title ... returns 277,635 results
3. q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
4. q=the life&defType=edismax&qf=Title Contributor ... returns 0 results


1. +DisjunctionMaxQuery((Title:life))
2. +((DisjunctionMaxQuery((Title:life)))~1)
3. +DisjunctionMaxQuery((CTBR_SEARCH:life | Title:life))
4. +((DisjunctionMaxQuery((Contributor:the)) 
DisjunctionMaxQuery((Contributor:life | Title:life)))~2)

I see what's going on here.  Because "the" is a stop word for Title, it gets 
removed from first part of the expression.  This means that "Contributor" is 
required to contain "the".  dismax does the same thing too.  I guess I should 
have run debug before asking the mail list!

It looks like the only workarounds I have is to either filter out the stopwords 
in the client when this happens, or enable stop words for all the fields that 
are used in "qf" with stopword-enabled fields.  Unless...someone has a better 
idea??

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Wednesday, January 12, 2011 4:44 PM
To: solr-user@lucene.apache.org
Cc: Jayendra Patil
Subject: Re: StopFilterFactory and "qf" containing some fields that use it and 
some that do not


> Have used edismax and Stopword filters as well. But usually use the fq
> parameter e.g. fq=title:the life and never had any issues.

That is because filter queries are not relevant for the mm parameter which is 
being used for the main query.

> 
> Can you turn on the debugQuery and check whats the Query formed for all the
> combinations you mentioned.
> 
> Regards,
> Jayendra
> 
> On Wed, Jan 12, 2011 at 5:19 PM, Dyer, James 
<james.d...@ingrambook.com>wrote:
> > I'm running into a problem with StopFilterFactory in conjunction with
> > (e)dismax queries that have a mix of fields, only some of which use
> > StopFilterFactory.  It seems that if even 1 field on the "qf" parameter
> > does not use StopFilterFactory, then stop words are not removed when
> > searching any fields.  Here's an example of what I mean:
> > 
> > - I have 2 fields indexed:
> >  > Title is "textStemmed", which includes StopFilterFactory (see below).
> >  > Contributor is "textSimple", which does not include StopFilterFactory
> > 
> > (see below).
> > - "The" is a stop word in stopwords.txt
> > - q=life&defType=edismax&qf=Title  ... returns 277,635 results
> > - q=the life&defType=edismax&qf=Title ... returns 277,635 results
> > - q=life&defType=edismax&qf=Title Contributor  ... returns 277,635
> > results - q=the life&defType=edismax&qf=Title Contributor ... returns 0
> > results
> > 
> > It seems as if the stop words are not being stripped from the query
> > because "qf" contains a field that doesn't use StopFilterFactory.  I did
> > testing with combining Stemmed fields with not Stemmed fields in "qf"
> > and it seems as if stemming gets applied regardless.  But stop words do
> > not.
> > 
> > Does anyone have ideas on what is going on?  Is this a feature or
> > possibly a bug?  Any known workarounds?  Any advice is appreciated.
> > 
> > James Dyer
> > E-Commerce Systems
> > Ingram Content Group
> > (615) 213-4311
> > ________________________________
> > <fieldType name="textSimple" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > </fieldType>
> > 
> > <fieldType name="textStemmed" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true" />
> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> > stemEnglishPossessive="1" />
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.PorterStemFilterFactory"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true" />
> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > generateNumberParts="0" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> > stemEnglishPossessive="1" />
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.PorterStemFilterFactory"/>
> > </analyzer>
> > </fieldType>

RE: StopFilterFactory and "qf" containing some fields that use it and some that do not

Reply via email to