If you have "doors" in your index and a person enters: "the doors", why not 
just drop stop-words at query time?
If a person searches for "music by the doors" and you have "music doors" in the 
index and really uses quotes to get the exact phrase, you can try it like Hoss 
said, and retry without stop words in you get inadequate response from the 
first query, or you could drop stop words from the phrase, but add some slop to 
the phrase to account for gaps.

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Ronald K. Braun <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, March 26, 2008 9:05:08 PM
Subject: Re: Making stop-words optional with DisMax?

Hi Otis,

> I skimmed your email.  You are indexing book and music titles.  Those tend to 
> be short.
> Do you really benefit from removing stop words in the first place?  I'd try 
> keeping all the stop
> words and seeing if that has any negative side-effects in your context.

Thanks for your skim and response!  We do keep all stop-words -- as
you say, makes sense since we aren't dealing with long free text
fields and because some titles are pure stops.

The negative side-effects lie in stop-words being treated with the
same importance as non-stop-words for matching purposes.  This
manifests in two ways:  1. Users occasionally get the stop-words wrong
-- say, wrong choice of preposition, which torpedoes the query since
some of the query terms aren't present in the target.  For example "on
mice and men" may return nothing (no match for "on") even though it is
equivalent to "of mice and men" in a stopped sense.  2. Our original
indexed data doesn't always have leading articles and such.  For
example, we index on "Doors" since that is our sourced data but
frequently get queried for "The Doors".  Articles and prepositions
(the stuff of good stop-lists) seem to me to be in a fuzzier class --
use 'em if you have 'em during matching, but don't kill your queries
because of them.  Hence some desire to make them in some way
"optional" during matching.

Ron



Reply via email to