We use two fields, one with and one without stopwords. The exact
field has a higher boost than the other. That works pretty well.

It helps to have an automated relevance test when tuning the boost
(and other things). I extracted queries and clicks from the logs
for a couple of months. Not perfect, but it is hard to argue with
32 million clicks.

wunder

On 3/26/08 6:05 PM, "Ronald K. Braun" <[EMAIL PROTECTED]> wrote:

> Hi Otis,
> 
>> I skimmed your email.  You are indexing book and music titles.  Those tend to
>> be short.
>> Do you really benefit from removing stop words in the first place?  I'd try
>> keeping all the stop
>> words and seeing if that has any negative side-effects in your context.
> 
> Thanks for your skim and response!  We do keep all stop-words -- as
> you say, makes sense since we aren't dealing with long free text
> fields and because some titles are pure stops.
> 
> The negative side-effects lie in stop-words being treated with the
> same importance as non-stop-words for matching purposes.  This
> manifests in two ways:  1. Users occasionally get the stop-words wrong
> -- say, wrong choice of preposition, which torpedoes the query since
> some of the query terms aren't present in the target.  For example "on
> mice and men" may return nothing (no match for "on") even though it is
> equivalent to "of mice and men" in a stopped sense.  2. Our original
> indexed data doesn't always have leading articles and such.  For
> example, we index on "Doors" since that is our sourced data but
> frequently get queried for "The Doors".  Articles and prepositions
> (the stuff of good stop-lists) seem to me to be in a fuzzier class --
> use 'em if you have 'em during matching, but don't kill your queries
> because of them.  Hence some desire to make them in some way
> "optional" during matching.
> 
> Ron

Reply via email to