Ultraseek (and Infoseek) never used stopwords. They cause odd failures, like 
not being able to search for “Vitamin A”.

Stopwords are an on/off approach to term frequency. idf is a proportional 
approach. Once you have idf, you don’t need stopwords.

When I was bringing up Solr for Netflix, I started with an analysis chain that 
used stopwords. A surprising number of movie titles entirely disappeared. I 
wrote a blog post about it. Ten years ago!

https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/

Mostly, stopwords were a performance hack back when people ran search engines 
on 16-bit machines. Neither disks nor RAM were big enough to hold the posting 
lists for common words.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jun 29, 2017, at 1:46 PM, Rick Leir <rl...@leirtech.com> wrote:
> 
> Walter
> Sorry for the tangent, but the stopwords feature sounds useful. You say you 
> do not use this? Did Ultraseek not do it either? 
> Thanks
> Rick
> 
> On June 29, 2017 10:53:42 AM EDT, Walter Underwood <wun...@wunderwood.org> 
> wrote:
>> Nope. Haven’t used stopwords for the last 20 years.
>> 
>> I wonder if lowercaseOperators is true. The docs don’t give the default
>> value for that in edismax.
>> 
>> https://lucene.apache.org/solr/guide/6_6/the-extended-dismax-query-parser.html
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Jun 29, 2017, at 4:42 AM, Rick Leir <rl...@leirtech.com> wrote:
>>> 
>>> Stopwords?
>>> 
>>> On June 28, 2017 5:13:43 PM EDT, Walter Underwood
>> <wun...@wunderwood.org> wrote:
>>>> Is there some special casing in the highlighter to skip query syntax
>>>> words? The words “and” and “or” don’t get highlighted.
>>>> 
>>>> This is in 6.5.0.
>>>> 
>>>>     <str name="hl.fl">question</str>
>>>>     <str name="hl.encoder">html</str>
>>>>     <str name="hl.fragsize">440</str>
>>>>     <str name="hl.method">fastVector</str>
>>>>     <str name="hl.snippets">1</str>
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wun...@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> -- 
>>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
> 
> -- 
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
> -- 
> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Reply via email to