Greetings all,

I'm having trouble tracking down why a particular query is not working. A user is trying to do a search for alternate_form_title_text:"three films by louis malle" specifically to find the 4 records that contain the phrase "Three films by Louis Malle" in their alternate_form_title_text field.
However the search return 0 records.

The modified searches:

alternate_form_title_text:"three films by louis malle"~1

or

alternate_form_title_text:"three films" AND alternate_form_title_text:"louis malle"

both return the 4 records. So it seems that it is the word "by" which is listed in the stopword filter list is causing the problem.

The analyzer/filter sequence for indexing the alternate_form_title_text field is _almost_ exactly the same as the sequence for querying that field.

for indexing the sequence is:

org.apache.solr.analysis.HTMLStripWhitespaceTokenizerFactory   {}
schema.UnicodeNormalizationFilterFactory {composed=false, 
remove_modifiers=true, fold=true, version=icu4j, remove_diacritics=true}
schema.CJKFilterFactory   {bigrams=false}
org.apache.solr.analysis.StopFilterFactory   {words=stopwords.txt, 
ignoreCase=true, enablePositionIncrements=true}
org.apache.solr.analysis.WordDelimiterFilterFactory{generateNumberParts=1, 
catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1}
org.apache.solr.analysis.LowerCaseFilterFactory   {}
org.apache.solr.analysis.EnglishPorterFilterFactory   {protected=protwords.txt}
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory   {}

for querying the sequence is:

org.apache.solr.analysis.WhitespaceTokenizerFactory   {}
schema.UnicodeNormalizationFilterFactory {composed=false, 
remove_modifiers=true, fold=true, version=icu4j, remove_diacritics=true}
schema.CJKFilterFactory   {bigrams=false}
org.apache.solr.analysis.SynonymFilterFactory   {synonyms=synonyms.txt, 
expand=true, ignoreCase=true}
org.apache.solr.analysis.StopFilterFactory   {words=stopwords.txt, 
ignoreCase=true, enablePositionIncrements=true}
org.apache.solr.analysis.WordDelimiterFilterFactory{generateNumberParts=1, 
catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0}
org.apache.solr.analysis.LowerCaseFilterFactory   {}
org.apache.solr.analysis.EnglishPorterFilterFactory   {protected=protwords.txt}
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory   {}


If I run a test through the field anaylsis admin page, submitting the string* three films by louis malle *through both the Field value (Index) and the Field value (query) the reslts (shown below) seem to indicate the the query ought to find the 4 records in question, by it does not, and I'm at a loss to explain why.


     Index Analyzer

term position   1       2       4       5
term text       three   film    loui    mall
term type       word    word    word    word
source start,end        0,5     6,11    15,20   21,26



     Query Analyzer

term position   1       2       4       5
term text       three   film    loui    mall
term type       word    word    word    word
source start,end        0,5     6,11    15,20   21,26



Reply via email to