I would agree with removing the stopword filter from the example configs. It is not a “best practice” or even a recommended practice.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 29, 2017, at 8:01 PM, Rick Leir <rl...@leirtech.com> wrote: > > Walter, Erick, David > Thanks for the info. Maybe the default for stopwords should be disabled? > Cheers -- Rick > > On June 29, 2017 5:14:16 PM EDT, Walter Underwood <wun...@wunderwood.org> > wrote: >> My blog post has a list of movie titles. I forgot to list the TV series >> “Once and Again”. >> >> Some bands that are not searchable with stopwords: >> >> * The Who >> * Was (not Was) >> * The The >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >>> On Jun 29, 2017, at 2:09 PM, Erick Erickson <erickerick...@gmail.com> >> wrote: >>> >>> bq: Mostly, stopwords were a performance hack back when people ran >>> search engines on 16-bit machines >>> >>> Ah, _those_ were the days when programmers were _real_ programmers. >>> Actually I'm glad they're gone but that's another story. >>> >>> "to be or not to be". Can't search that if you enable stopwords. >>> >>> Chris Hostetter wrote a fun blog on the fact that Lucene query >> parsers >>> are not strict boolean logic with the title "Why Not AND, OR, And >> NOT" >>> purposely choosing that title as it's totally unsearchable if you're >>> using stopwords. >>> >>> FWIW, >>> Erick >>> >>> On Thu, Jun 29, 2017 at 1:57 PM, David Hastings >>> <hastings.recurs...@gmail.com> wrote: >>>> Agreed. Stop words from the moment I started using them caused >> complaints >>>> and problems right off the bat. They may have been implemented less >> than a >>>> week before needing a re-index to fix all the problems they caused. >>>> >>>> On Thu, Jun 29, 2017 at 4:55 PM, Walter Underwood >> <wun...@wunderwood.org> >>>> wrote: >>>> >>>>> Ultraseek (and Infoseek) never used stopwords. They cause odd >> failures, >>>>> like not being able to search for “Vitamin A”. >>>>> >>>>> Stopwords are an on/off approach to term frequency. idf is a >> proportional >>>>> approach. Once you have idf, you don’t need stopwords. >>>>> >>>>> When I was bringing up Solr for Netflix, I started with an analysis >> chain >>>>> that used stopwords. A surprising number of movie titles entirely >>>>> disappeared. I wrote a blog post about it. Ten years ago! >>>>> >>>>> >> https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/ >>>>> >>>>> Mostly, stopwords were a performance hack back when people ran >> search >>>>> engines on 16-bit machines. Neither disks nor RAM were big enough >> to hold >>>>> the posting lists for common words. >>>>> >>>>> wunder >>>>> Walter Underwood >>>>> wun...@wunderwood.org >>>>> http://observer.wunderwood.org/ (my blog) >>>>> >>>>> >>>>>> On Jun 29, 2017, at 1:46 PM, Rick Leir <rl...@leirtech.com> wrote: >>>>>> >>>>>> Walter >>>>>> Sorry for the tangent, but the stopwords feature sounds useful. >> You say >>>>> you do not use this? Did Ultraseek not do it either? >>>>>> Thanks >>>>>> Rick >>>>>> >>>>>> On June 29, 2017 10:53:42 AM EDT, Walter Underwood < >>>>> wun...@wunderwood.org> wrote: >>>>>>> Nope. Haven’t used stopwords for the last 20 years. >>>>>>> >>>>>>> I wonder if lowercaseOperators is true. The docs don’t give the >> default >>>>>>> value for that in edismax. >>>>>>> >>>>>>> https://lucene.apache.org/solr/guide/6_6/the-extended- >>>>> dismax-query-parser.html >>>>>>> >>>>>>> wunder >>>>>>> Walter Underwood >>>>>>> wun...@wunderwood.org >>>>>>> http://observer.wunderwood.org/ (my blog) >>>>>>> >>>>>>> >>>>>>>> On Jun 29, 2017, at 4:42 AM, Rick Leir <rl...@leirtech.com> >> wrote: >>>>>>>> >>>>>>>> Stopwords? >>>>>>>> >>>>>>>> On June 28, 2017 5:13:43 PM EDT, Walter Underwood >>>>>>> <wun...@wunderwood.org> wrote: >>>>>>>>> Is there some special casing in the highlighter to skip query >> syntax >>>>>>>>> words? The words “and” and “or” don’t get highlighted. >>>>>>>>> >>>>>>>>> This is in 6.5.0. >>>>>>>>> >>>>>>>>> <str name="hl.fl">question</str> >>>>>>>>> <str name="hl.encoder">html</str> >>>>>>>>> <str name="hl.fragsize">440</str> >>>>>>>>> <str name="hl.method">fastVector</str> >>>>>>>>> <str name="hl.snippets">1</str> >>>>>>>>> >>>>>>>>> wunder >>>>>>>>> Walter Underwood >>>>>>>>> wun...@wunderwood.org >>>>>>>>> http://observer.wunderwood.org/ (my blog) >>>>>>>> >>>>>>>> -- >>>>>>>> Sorry for being brief. Alternate email is rickleir at yahoo dot >> com >>>>>> >>>>>> -- >>>>>> Sorry for being brief. Alternate email is rickleir at yahoo dot >> com >>>>>> -- >>>>>> Sorry for being brief. Alternate email is rickleir at yahoo dot >> com >>>>> >>>>> > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com