Specifically what is happening is that the query parser passes "of" to the analyzer for the name field, which removes the stopwords, including "of", which results in no term to be queried. A Lucene BooleanQuery with no terms will match... nothing. But then when you add another clause, you have the combination of an empty term, and a specific term, which is equivalent to just using the specific term. Think of a sequence of terms to be ANDed as a set - if a term analyzing to no terms, there are no terms to add to the set of terms to be ANDed.
Diving a little deeper, the "AND" operator of the two terms simply means that all terms "MUST" be present, but since your first term analyzed to no terms, only one term is present. Another example where this could happen is a query such as "$,@. AND 371" - the "$,@." gets parsed as a term, but then all the punctuation gets removed by the analyzer, leaving no term. These days, the recommended practice is to keep stopwords in the index but remove them at query time unless all of the terms in the query are stop words. In fact, it would be better to only remove stop words at query time when they are not at either end of the query. This way, queries such as "to be or not to be", "vitamin a", and "the office" can still provide meaningful and precise matches even as stop words are generally ignored. -- Jack Krupansky On Mon, Feb 16, 2015 at 4:32 PM, Arun Rangarajan <arunrangara...@gmail.com> wrote: > Solr version 4.2.1 > > In my schema, I have "text" type defined as follows: > --- > <fieldType name="text" class="solr.TextField" > positionIncrementGap="100"> > > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" words="stopwords.txt" > ignoreCase="true"/> > <filter class="solr.WordDelimiterFilterFactory" > preserveOriginal="1" generateWordParts="1" generateNumberParts="1" > catenateWords="1" catenateNumbers="0" catenateAll="1" > splitOnCaseChange="1"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > </analyzer> > > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" words="stopwords.txt" > ignoreCase="true"/> > <filter class="solr.WordDelimiterFilterFactory" > preserveOriginal="1" generateWordParts="1" generateNumberParts="1" > catenateWords="0" catenateNumbers="0" catenateAll="0" > splitOnCaseChange="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > </analyzer> > > </fieldType> > --- > > Field "name" is of type "text". > > I have another multi-valued int field called "all_class_ids". > > Both fields are indexed. I have 'of' in stopwords.txt file. > > I am using lucene query parser. > > This query > q=name:of&rows=0 > gives no results as expected. > > However, this query: > q=name:of AND all_class_ids:(371)&rows=0 > gives results and is equal to the same number of results as > q=all_class_ids:(371)&rows=0 > > This is happening only for stopwords. Why? > > Thanks. >