We use two fields, one with and one without stopwords. The exact field has a higher boost than the other. That works pretty well.
It helps to have an automated relevance test when tuning the boost (and other things). I extracted queries and clicks from the logs for a couple of months. Not perfect, but it is hard to argue with 32 million clicks. wunder On 3/26/08 6:05 PM, "Ronald K. Braun" <[EMAIL PROTECTED]> wrote: > Hi Otis, > >> I skimmed your email. You are indexing book and music titles. Those tend to >> be short. >> Do you really benefit from removing stop words in the first place? I'd try >> keeping all the stop >> words and seeing if that has any negative side-effects in your context. > > Thanks for your skim and response! We do keep all stop-words -- as > you say, makes sense since we aren't dealing with long free text > fields and because some titles are pure stops. > > The negative side-effects lie in stop-words being treated with the > same importance as non-stop-words for matching purposes. This > manifests in two ways: 1. Users occasionally get the stop-words wrong > -- say, wrong choice of preposition, which torpedoes the query since > some of the query terms aren't present in the target. For example "on > mice and men" may return nothing (no match for "on") even though it is > equivalent to "of mice and men" in a stopped sense. 2. Our original > indexed data doesn't always have leading articles and such. For > example, we index on "Doors" since that is our sourced data but > frequently get queried for "The Doors". Articles and prepositions > (the stuff of good stop-lists) seem to me to be in a fuzzier class -- > use 'em if you have 'em during matching, but don't kill your queries > because of them. Hence some desire to make them in some way > "optional" during matching. > > Ron