If you have "doors" in your index and a person enters: "the doors", why not just drop stop-words at query time? If a person searches for "music by the doors" and you have "music doors" in the index and really uses quotes to get the exact phrase, you can try it like Hoss said, and retry without stop words in you get inadequate response from the first query, or you could drop stop words from the phrase, but add some slop to the phrase to account for gaps.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: Ronald K. Braun <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, March 26, 2008 9:05:08 PM Subject: Re: Making stop-words optional with DisMax? Hi Otis, > I skimmed your email. You are indexing book and music titles. Those tend to > be short. > Do you really benefit from removing stop words in the first place? I'd try > keeping all the stop > words and seeing if that has any negative side-effects in your context. Thanks for your skim and response! We do keep all stop-words -- as you say, makes sense since we aren't dealing with long free text fields and because some titles are pure stops. The negative side-effects lie in stop-words being treated with the same importance as non-stop-words for matching purposes. This manifests in two ways: 1. Users occasionally get the stop-words wrong -- say, wrong choice of preposition, which torpedoes the query since some of the query terms aren't present in the target. For example "on mice and men" may return nothing (no match for "on") even though it is equivalent to "of mice and men" in a stopped sense. 2. Our original indexed data doesn't always have leading articles and such. For example, we index on "Doors" since that is our sourced data but frequently get queried for "The Doors". Articles and prepositions (the stuff of good stop-lists) seem to me to be in a fuzzier class -- use 'em if you have 'em during matching, but don't kill your queries because of them. Hence some desire to make them in some way "optional" during matching. Ron