bq: Mostly, stopwords were a performance hack back when people ran search engines on 16-bit machines
Ah, _those_ were the days when programmers were _real_ programmers. Actually I'm glad they're gone but that's another story. "to be or not to be". Can't search that if you enable stopwords. Chris Hostetter wrote a fun blog on the fact that Lucene query parsers are not strict boolean logic with the title "Why Not AND, OR, And NOT" purposely choosing that title as it's totally unsearchable if you're using stopwords. FWIW, Erick On Thu, Jun 29, 2017 at 1:57 PM, David Hastings <hastings.recurs...@gmail.com> wrote: > Agreed. Stop words from the moment I started using them caused complaints > and problems right off the bat. They may have been implemented less than a > week before needing a re-index to fix all the problems they caused. > > On Thu, Jun 29, 2017 at 4:55 PM, Walter Underwood <wun...@wunderwood.org> > wrote: > >> Ultraseek (and Infoseek) never used stopwords. They cause odd failures, >> like not being able to search for “Vitamin A”. >> >> Stopwords are an on/off approach to term frequency. idf is a proportional >> approach. Once you have idf, you don’t need stopwords. >> >> When I was bringing up Solr for Netflix, I started with an analysis chain >> that used stopwords. A surprising number of movie titles entirely >> disappeared. I wrote a blog post about it. Ten years ago! >> >> https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/ >> >> Mostly, stopwords were a performance hack back when people ran search >> engines on 16-bit machines. Neither disks nor RAM were big enough to hold >> the posting lists for common words. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >> > On Jun 29, 2017, at 1:46 PM, Rick Leir <rl...@leirtech.com> wrote: >> > >> > Walter >> > Sorry for the tangent, but the stopwords feature sounds useful. You say >> you do not use this? Did Ultraseek not do it either? >> > Thanks >> > Rick >> > >> > On June 29, 2017 10:53:42 AM EDT, Walter Underwood < >> wun...@wunderwood.org> wrote: >> >> Nope. Haven’t used stopwords for the last 20 years. >> >> >> >> I wonder if lowercaseOperators is true. The docs don’t give the default >> >> value for that in edismax. >> >> >> >> https://lucene.apache.org/solr/guide/6_6/the-extended- >> dismax-query-parser.html >> >> >> >> wunder >> >> Walter Underwood >> >> wun...@wunderwood.org >> >> http://observer.wunderwood.org/ (my blog) >> >> >> >> >> >>> On Jun 29, 2017, at 4:42 AM, Rick Leir <rl...@leirtech.com> wrote: >> >>> >> >>> Stopwords? >> >>> >> >>> On June 28, 2017 5:13:43 PM EDT, Walter Underwood >> >> <wun...@wunderwood.org> wrote: >> >>>> Is there some special casing in the highlighter to skip query syntax >> >>>> words? The words “and” and “or” don’t get highlighted. >> >>>> >> >>>> This is in 6.5.0. >> >>>> >> >>>> <str name="hl.fl">question</str> >> >>>> <str name="hl.encoder">html</str> >> >>>> <str name="hl.fragsize">440</str> >> >>>> <str name="hl.method">fastVector</str> >> >>>> <str name="hl.snippets">1</str> >> >>>> >> >>>> wunder >> >>>> Walter Underwood >> >>>> wun...@wunderwood.org >> >>>> http://observer.wunderwood.org/ (my blog) >> >>> >> >>> -- >> >>> Sorry for being brief. Alternate email is rickleir at yahoo dot com >> > >> > -- >> > Sorry for being brief. Alternate email is rickleir at yahoo dot com >> > -- >> > Sorry for being brief. Alternate email is rickleir at yahoo dot com >> >>