Yonik,

I did make sure enablePositionIncrements="true" for both indexing and queries and just did a test where I re-indexed a couple of test record sets, and submitted a query from the solr admin page, this time searching for title_text:"gone with the wind" which should return three hits, and again it returns 0 hits.

I also tried modifying SolrQueryParser to set setEnablePositionIncrements to true thinkg that would fix the problem, but it doesn't seem to.


-Bob


Yonik Seeley wrote:

Robert,

I've reproduced (sort of) this bad behavior with the example schema.
There was an example configuration "bug" introduced in SOLR-521
where enablePositionIncrements="true" was only set on the index
analyzer but not the query analyzer for the "text" fieldType.

A query on the example data of
features:"Optimized for High Volume Web Traffic"
will not match any documents.

You seem to indicate that enablePositionIncrements="true" is set for
both your index and query analyzer.  Can you verify that, and verify
that you restarted solr and reindexed after that change was made?

-Yonik



On Thu, Nov 20, 2008 at 1:30 PM, Robert Haschart <[EMAIL PROTECTED]> wrote:
Greetings all,

I'm having trouble tracking down why a particular query is not working.   A
user is trying to do a search for alternate_form_title_text:"three films by
louis malle"  specifically to find the 4 records that contain the phrase
"Three films by Louis Malle" in their alternate_form_title_text field.
However the search return 0 records.

The modified searches:

alternate_form_title_text:"three films by louis malle"~1

or

alternate_form_title_text:"three films" AND alternate_form_title_text:"louis
malle"

both return the 4 records.   So it seems that it is the word "by" which is
listed in the stopword filter list is causing the problem.

The analyzer/filter sequence for indexing the alternate_form_title_text
field is _almost_ exactly the same as the sequence for querying that field.

for indexing the sequence is:

org.apache.solr.analysis.HTMLStripWhitespaceTokenizerFactory   {}
schema.UnicodeNormalizationFilterFactory {composed=false,
remove_modifiers=true, fold=true, version=icu4j, remove_diacritics=true}
schema.CJKFilterFactory   {bigrams=false}
org.apache.solr.analysis.StopFilterFactory   {words=stopwords.txt,
ignoreCase=true, enablePositionIncrements=true}
org.apache.solr.analysis.WordDelimiterFilterFactory{generateNumberParts=1,
catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1}
org.apache.solr.analysis.LowerCaseFilterFactory   {}
org.apache.solr.analysis.EnglishPorterFilterFactory
{protected=protwords.txt}
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory   {}

for querying the sequence is:

org.apache.solr.analysis.WhitespaceTokenizerFactory   {}
schema.UnicodeNormalizationFilterFactory {composed=false,
remove_modifiers=true, fold=true, version=icu4j, remove_diacritics=true}
schema.CJKFilterFactory   {bigrams=false}
org.apache.solr.analysis.SynonymFilterFactory   {synonyms=synonyms.txt,
expand=true, ignoreCase=true}
org.apache.solr.analysis.StopFilterFactory   {words=stopwords.txt,
ignoreCase=true, enablePositionIncrements=true}
org.apache.solr.analysis.WordDelimiterFilterFactory{generateNumberParts=1,
catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0}
org.apache.solr.analysis.LowerCaseFilterFactory   {}
org.apache.solr.analysis.EnglishPorterFilterFactory
{protected=protwords.txt}
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory   {}


If I run a test through the field anaylsis admin page, submitting the
string* three films by louis malle *through both the Field value (Index) and
the Field value (query) the reslts (shown below) seem to indicate the the
query ought to find the 4 records in question, by it does not, and I'm at a
loss to explain why.


   Index Analyzer

term position   1       2       4       5
term text       three   film    loui    mall
term type       word    word    word    word
source start,end        0,5     6,11    15,20   21,26



   Query Analyzer

term position   1       2       4       5
term text       three   film    loui    mall
term type       word    word    word    word
source start,end        0,5     6,11    15,20   21,26






Reply via email to