My blog post has a list of movie titles. I forgot to list the TV series “Once and Again”.
Some bands that are not searchable with stopwords: * The Who * Was (not Was) * The The wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 29, 2017, at 2:09 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > bq: Mostly, stopwords were a performance hack back when people ran > search engines on 16-bit machines > > Ah, _those_ were the days when programmers were _real_ programmers. > Actually I'm glad they're gone but that's another story. > > "to be or not to be". Can't search that if you enable stopwords. > > Chris Hostetter wrote a fun blog on the fact that Lucene query parsers > are not strict boolean logic with the title "Why Not AND, OR, And NOT" > purposely choosing that title as it's totally unsearchable if you're > using stopwords. > > FWIW, > Erick > > On Thu, Jun 29, 2017 at 1:57 PM, David Hastings > <hastings.recurs...@gmail.com> wrote: >> Agreed. Stop words from the moment I started using them caused complaints >> and problems right off the bat. They may have been implemented less than a >> week before needing a re-index to fix all the problems they caused. >> >> On Thu, Jun 29, 2017 at 4:55 PM, Walter Underwood <wun...@wunderwood.org> >> wrote: >> >>> Ultraseek (and Infoseek) never used stopwords. They cause odd failures, >>> like not being able to search for “Vitamin A”. >>> >>> Stopwords are an on/off approach to term frequency. idf is a proportional >>> approach. Once you have idf, you don’t need stopwords. >>> >>> When I was bringing up Solr for Netflix, I started with an analysis chain >>> that used stopwords. A surprising number of movie titles entirely >>> disappeared. I wrote a blog post about it. Ten years ago! >>> >>> https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/ >>> >>> Mostly, stopwords were a performance hack back when people ran search >>> engines on 16-bit machines. Neither disks nor RAM were big enough to hold >>> the posting lists for common words. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>> >>>> On Jun 29, 2017, at 1:46 PM, Rick Leir <rl...@leirtech.com> wrote: >>>> >>>> Walter >>>> Sorry for the tangent, but the stopwords feature sounds useful. You say >>> you do not use this? Did Ultraseek not do it either? >>>> Thanks >>>> Rick >>>> >>>> On June 29, 2017 10:53:42 AM EDT, Walter Underwood < >>> wun...@wunderwood.org> wrote: >>>>> Nope. Haven’t used stopwords for the last 20 years. >>>>> >>>>> I wonder if lowercaseOperators is true. The docs don’t give the default >>>>> value for that in edismax. >>>>> >>>>> https://lucene.apache.org/solr/guide/6_6/the-extended- >>> dismax-query-parser.html >>>>> >>>>> wunder >>>>> Walter Underwood >>>>> wun...@wunderwood.org >>>>> http://observer.wunderwood.org/ (my blog) >>>>> >>>>> >>>>>> On Jun 29, 2017, at 4:42 AM, Rick Leir <rl...@leirtech.com> wrote: >>>>>> >>>>>> Stopwords? >>>>>> >>>>>> On June 28, 2017 5:13:43 PM EDT, Walter Underwood >>>>> <wun...@wunderwood.org> wrote: >>>>>>> Is there some special casing in the highlighter to skip query syntax >>>>>>> words? The words “and” and “or” don’t get highlighted. >>>>>>> >>>>>>> This is in 6.5.0. >>>>>>> >>>>>>> <str name="hl.fl">question</str> >>>>>>> <str name="hl.encoder">html</str> >>>>>>> <str name="hl.fragsize">440</str> >>>>>>> <str name="hl.method">fastVector</str> >>>>>>> <str name="hl.snippets">1</str> >>>>>>> >>>>>>> wunder >>>>>>> Walter Underwood >>>>>>> wun...@wunderwood.org >>>>>>> http://observer.wunderwood.org/ (my blog) >>>>>> >>>>>> -- >>>>>> Sorry for being brief. Alternate email is rickleir at yahoo dot com >>>> >>>> -- >>>> Sorry for being brief. Alternate email is rickleir at yahoo dot com >>>> -- >>>> Sorry for being brief. Alternate email is rickleir at yahoo dot com >>> >>>