Re: Phrase query search with stopwords

Yonik Seeley Mon, 24 Nov 2008 11:05:39 -0800

Robert,

I've reproduced (sort of) this bad behavior with the example schema.
There was an example configuration "bug" introduced in SOLR-521
where enablePositionIncrements="true" was only set on the index
analyzer but not the query analyzer for the "text" fieldType.


A query on the example data of
features:"Optimized for High Volume Web Traffic"
will not match any documents.

You seem to indicate that enablePositionIncrements="true" is set for
both your index and query analyzer.  Can you verify that, and verify
that you restarted solr and reindexed after that change was made?

-Yonik



On Thu, Nov 20, 2008 at 1:30 PM, Robert Haschart <[EMAIL PROTECTED]> wrote:
> Greetings all,
>
> I'm having trouble tracking down why a particular query is not working.   A
> user is trying to do a search for alternate_form_title_text:"three films by
> louis malle"  specifically to find the 4 records that contain the phrase
> "Three films by Louis Malle" in their alternate_form_title_text field.
> However the search return 0 records.
>
> The modified searches:
>
> alternate_form_title_text:"three films by louis malle"~1
>
> or
>
> alternate_form_title_text:"three films" AND alternate_form_title_text:"louis
> malle"
>
> both return the 4 records.   So it seems that it is the word "by" which is
> listed in the stopword filter list is causing the problem.
>
> The analyzer/filter sequence for indexing the alternate_form_title_text
> field is _almost_ exactly the same as the sequence for querying that field.
>
> for indexing the sequence is:
>
> org.apache.solr.analysis.HTMLStripWhitespaceTokenizerFactory   {}
> schema.UnicodeNormalizationFilterFactory {composed=false,
> remove_modifiers=true, fold=true, version=icu4j, remove_diacritics=true}
> schema.CJKFilterFactory   {bigrams=false}
> org.apache.solr.analysis.StopFilterFactory   {words=stopwords.txt,
> ignoreCase=true, enablePositionIncrements=true}
> org.apache.solr.analysis.WordDelimiterFilterFactory{generateNumberParts=1,
> catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1}
> org.apache.solr.analysis.LowerCaseFilterFactory   {}
> org.apache.solr.analysis.EnglishPorterFilterFactory
> {protected=protwords.txt}
> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory   {}
>
> for querying the sequence is:
>
> org.apache.solr.analysis.WhitespaceTokenizerFactory   {}
> schema.UnicodeNormalizationFilterFactory {composed=false,
> remove_modifiers=true, fold=true, version=icu4j, remove_diacritics=true}
> schema.CJKFilterFactory   {bigrams=false}
> org.apache.solr.analysis.SynonymFilterFactory   {synonyms=synonyms.txt,
> expand=true, ignoreCase=true}
> org.apache.solr.analysis.StopFilterFactory   {words=stopwords.txt,
> ignoreCase=true, enablePositionIncrements=true}
> org.apache.solr.analysis.WordDelimiterFilterFactory{generateNumberParts=1,
> catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0}
> org.apache.solr.analysis.LowerCaseFilterFactory   {}
> org.apache.solr.analysis.EnglishPorterFilterFactory
> {protected=protwords.txt}
> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory   {}
>
>
> If I run a test through the field anaylsis admin page, submitting the
> string* three films by louis malle *through both the Field value (Index) and
> the Field value (query) the reslts (shown below) seem to indicate the the
> query ought to find the 4 records in question, by it does not, and I'm at a
> loss to explain why.
>
>
>     Index Analyzer
>
> term position   1       2       4       5
> term text       three   film    loui    mall
> term type       word    word    word    word
> source start,end        0,5     6,11    15,20   21,26
>
>
>
>     Query Analyzer
>
> term position   1       2       4       5
> term text       three   film    loui    mall
> term type       word    word    word    word
> source start,end        0,5     6,11    15,20   21,26
>
>
>
>

Re: Phrase query search with stopwords

Reply via email to