Re: Phrase query search with stopwords

Yonik Seeley Fri, 28 Nov 2008 08:59:47 -0800

See https://issues.apache.org/jira/browse/SOLR-879
we never enabled position increments in the query parser.


-Yonik

On Mon, Nov 24, 2008 at 9:48 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> Ack!  I tried it too, and it failed for me also.
> The analysis page indicates that the tokens are all in the same
> positions... need to look into this deeper.
> Could you open up a JIRA issue?
>
> -Yonik
>
> On Mon, Nov 24, 2008 at 5:58 PM, Robert Haschart <[EMAIL PROTECTED]> wrote:
>> Yonik,
>>
>> I did make sure enablePositionIncrements="true"  for both indexing and
>> queries and just did a test where I  re-indexed a couple of test record
>> sets, and submitted a query from the solr admin page, this time searching
>> for  title_text:"gone with the wind"  which should return three hits, and
>> again it returns 0 hits.
>>
>> I also tried modifying SolrQueryParser to set  setEnablePositionIncrements
>> to true thinkg that would fix the problem,  but it doesn't seem to.
>>
>>
>> -Bob
>>
>>
>> Yonik Seeley wrote:
>>
>>> Robert,
>>>
>>> I've reproduced (sort of) this bad behavior with the example schema.
>>> There was an example configuration "bug" introduced in SOLR-521
>>> where enablePositionIncrements="true" was only set on the index
>>> analyzer but not the query analyzer for the "text" fieldType.
>>>
>>> A query on the example data of
>>> features:"Optimized for High Volume Web Traffic"
>>> will not match any documents.
>>>
>>> You seem to indicate that enablePositionIncrements="true" is set for
>>> both your index and query analyzer.  Can you verify that, and verify
>>> that you restarted solr and reindexed after that change was made?
>>>
>>> -Yonik
>>>
>>>
>>>
>>> On Thu, Nov 20, 2008 at 1:30 PM, Robert Haschart <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>>
>>>> Greetings all,
>>>>
>>>> I'm having trouble tracking down why a particular query is not working.
>>>> A
>>>> user is trying to do a search for alternate_form_title_text:"three films
>>>> by
>>>> louis malle"  specifically to find the 4 records that contain the phrase
>>>> "Three films by Louis Malle" in their alternate_form_title_text field.
>>>> However the search return 0 records.
>>>>
>>>> The modified searches:
>>>>
>>>> alternate_form_title_text:"three films by louis malle"~1
>>>>
>>>> or
>>>>
>>>> alternate_form_title_text:"three films" AND
>>>> alternate_form_title_text:"louis
>>>> malle"
>>>>
>>>> both return the 4 records.   So it seems that it is the word "by" which
>>>> is
>>>> listed in the stopword filter list is causing the problem.
>>>>
>>>> The analyzer/filter sequence for indexing the alternate_form_title_text
>>>> field is _almost_ exactly the same as the sequence for querying that
>>>> field.
>>>>
>>>> for indexing the sequence is:
>>>>
>>>> org.apache.solr.analysis.HTMLStripWhitespaceTokenizerFactory   {}
>>>> schema.UnicodeNormalizationFilterFactory {composed=false,
>>>> remove_modifiers=true, fold=true, version=icu4j, remove_diacritics=true}
>>>> schema.CJKFilterFactory   {bigrams=false}
>>>> org.apache.solr.analysis.StopFilterFactory   {words=stopwords.txt,
>>>> ignoreCase=true, enablePositionIncrements=true}
>>>>
>>>> org.apache.solr.analysis.WordDelimiterFilterFactory{generateNumberParts=1,
>>>> catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1}
>>>> org.apache.solr.analysis.LowerCaseFilterFactory   {}
>>>> org.apache.solr.analysis.EnglishPorterFilterFactory
>>>> {protected=protwords.txt}
>>>> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory   {}
>>>>
>>>> for querying the sequence is:
>>>>
>>>> org.apache.solr.analysis.WhitespaceTokenizerFactory   {}
>>>> schema.UnicodeNormalizationFilterFactory {composed=false,
>>>> remove_modifiers=true, fold=true, version=icu4j, remove_diacritics=true}
>>>> schema.CJKFilterFactory   {bigrams=false}
>>>> org.apache.solr.analysis.SynonymFilterFactory   {synonyms=synonyms.txt,
>>>> expand=true, ignoreCase=true}
>>>> org.apache.solr.analysis.StopFilterFactory   {words=stopwords.txt,
>>>> ignoreCase=true, enablePositionIncrements=true}
>>>>
>>>> org.apache.solr.analysis.WordDelimiterFilterFactory{generateNumberParts=1,
>>>> catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0}
>>>> org.apache.solr.analysis.LowerCaseFilterFactory   {}
>>>> org.apache.solr.analysis.EnglishPorterFilterFactory
>>>> {protected=protwords.txt}
>>>> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory   {}
>>>>
>>>>
>>>> If I run a test through the field anaylsis admin page, submitting the
>>>> string* three films by louis malle *through both the Field value (Index)
>>>> and
>>>> the Field value (query) the reslts (shown below) seem to indicate the the
>>>> query ought to find the 4 records in question, by it does not, and I'm at
>>>> a
>>>> loss to explain why.
>>>>
>>>>
>>>>   Index Analyzer
>>>>
>>>> term position   1       2       4       5
>>>> term text       three   film    loui    mall
>>>> term type       word    word    word    word
>>>> source start,end        0,5     6,11    15,20   21,26
>>>>
>>>>
>>>>
>>>>   Query Analyzer
>>>>
>>>> term position   1       2       4       5
>>>> term text       three   film    loui    mall
>>>> term type       word    word    word    word
>>>> source start,end        0,5     6,11    15,20   21,26
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>>
>

Re: Phrase query search with stopwords

Reply via email to