Hi Otis,

> I skimmed your email.  You are indexing book and music titles.  Those tend to 
> be short.
> Do you really benefit from removing stop words in the first place?  I'd try 
> keeping all the stop
> words and seeing if that has any negative side-effects in your context.

Thanks for your skim and response!  We do keep all stop-words -- as
you say, makes sense since we aren't dealing with long free text
fields and because some titles are pure stops.

The negative side-effects lie in stop-words being treated with the
same importance as non-stop-words for matching purposes.  This
manifests in two ways:  1. Users occasionally get the stop-words wrong
-- say, wrong choice of preposition, which torpedoes the query since
some of the query terms aren't present in the target.  For example "on
mice and men" may return nothing (no match for "on") even though it is
equivalent to "of mice and men" in a stopped sense.  2. Our original
indexed data doesn't always have leading articles and such.  For
example, we index on "Doors" since that is our sourced data but
frequently get queried for "The Doors".  Articles and prepositions
(the stuff of good stop-lists) seem to me to be in a fuzzier class --
use 'em if you have 'em during matching, but don't kill your queries
because of them.  Hence some desire to make them in some way
"optional" during matching.

Ron

Reply via email to