Robert, I've reproduced (sort of) this bad behavior with the example schema. There was an example configuration "bug" introduced in SOLR-521 where enablePositionIncrements="true" was only set on the index analyzer but not the query analyzer for the "text" fieldType.
A query on the example data of features:"Optimized for High Volume Web Traffic" will not match any documents. You seem to indicate that enablePositionIncrements="true" is set for both your index and query analyzer. Can you verify that, and verify that you restarted solr and reindexed after that change was made? -Yonik On Thu, Nov 20, 2008 at 1:30 PM, Robert Haschart <[EMAIL PROTECTED]> wrote: > Greetings all, > > I'm having trouble tracking down why a particular query is not working. A > user is trying to do a search for alternate_form_title_text:"three films by > louis malle" specifically to find the 4 records that contain the phrase > "Three films by Louis Malle" in their alternate_form_title_text field. > However the search return 0 records. > > The modified searches: > > alternate_form_title_text:"three films by louis malle"~1 > > or > > alternate_form_title_text:"three films" AND alternate_form_title_text:"louis > malle" > > both return the 4 records. So it seems that it is the word "by" which is > listed in the stopword filter list is causing the problem. > > The analyzer/filter sequence for indexing the alternate_form_title_text > field is _almost_ exactly the same as the sequence for querying that field. > > for indexing the sequence is: > > org.apache.solr.analysis.HTMLStripWhitespaceTokenizerFactory {} > schema.UnicodeNormalizationFilterFactory {composed=false, > remove_modifiers=true, fold=true, version=icu4j, remove_diacritics=true} > schema.CJKFilterFactory {bigrams=false} > org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, > ignoreCase=true, enablePositionIncrements=true} > org.apache.solr.analysis.WordDelimiterFilterFactory{generateNumberParts=1, > catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1} > org.apache.solr.analysis.LowerCaseFilterFactory {} > org.apache.solr.analysis.EnglishPorterFilterFactory > {protected=protwords.txt} > org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} > > for querying the sequence is: > > org.apache.solr.analysis.WhitespaceTokenizerFactory {} > schema.UnicodeNormalizationFilterFactory {composed=false, > remove_modifiers=true, fold=true, version=icu4j, remove_diacritics=true} > schema.CJKFilterFactory {bigrams=false} > org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, > expand=true, ignoreCase=true} > org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt, > ignoreCase=true, enablePositionIncrements=true} > org.apache.solr.analysis.WordDelimiterFilterFactory{generateNumberParts=1, > catenateWords=0, generateWordParts=1, catenateAll=0, catenateNumbers=0} > org.apache.solr.analysis.LowerCaseFilterFactory {} > org.apache.solr.analysis.EnglishPorterFilterFactory > {protected=protwords.txt} > org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} > > > If I run a test through the field anaylsis admin page, submitting the > string* three films by louis malle *through both the Field value (Index) and > the Field value (query) the reslts (shown below) seem to indicate the the > query ought to find the 4 records in question, by it does not, and I'm at a > loss to explain why. > > > Index Analyzer > > term position 1 2 4 5 > term text three film loui mall > term type word word word word > source start,end 0,5 6,11 15,20 21,26 > > > > Query Analyzer > > term position 1 2 4 5 > term text three film loui mall > term type word word word word > source start,end 0,5 6,11 15,20 21,26 > > > >