Walter, Erick, David Thanks for the info. Maybe the default for stopwords should be disabled? Cheers -- Rick
On June 29, 2017 5:14:16 PM EDT, Walter Underwood <wun...@wunderwood.org> wrote: >My blog post has a list of movie titles. I forgot to list the TV series >“Once and Again”. > >Some bands that are not searchable with stopwords: > >* The Who >* Was (not Was) >* The The > >wunder >Walter Underwood >wun...@wunderwood.org >http://observer.wunderwood.org/ (my blog) > > >> On Jun 29, 2017, at 2:09 PM, Erick Erickson <erickerick...@gmail.com> >wrote: >> >> bq: Mostly, stopwords were a performance hack back when people ran >> search engines on 16-bit machines >> >> Ah, _those_ were the days when programmers were _real_ programmers. >> Actually I'm glad they're gone but that's another story. >> >> "to be or not to be". Can't search that if you enable stopwords. >> >> Chris Hostetter wrote a fun blog on the fact that Lucene query >parsers >> are not strict boolean logic with the title "Why Not AND, OR, And >NOT" >> purposely choosing that title as it's totally unsearchable if you're >> using stopwords. >> >> FWIW, >> Erick >> >> On Thu, Jun 29, 2017 at 1:57 PM, David Hastings >> <hastings.recurs...@gmail.com> wrote: >>> Agreed. Stop words from the moment I started using them caused >complaints >>> and problems right off the bat. They may have been implemented less >than a >>> week before needing a re-index to fix all the problems they caused. >>> >>> On Thu, Jun 29, 2017 at 4:55 PM, Walter Underwood ><wun...@wunderwood.org> >>> wrote: >>> >>>> Ultraseek (and Infoseek) never used stopwords. They cause odd >failures, >>>> like not being able to search for “Vitamin A”. >>>> >>>> Stopwords are an on/off approach to term frequency. idf is a >proportional >>>> approach. Once you have idf, you don’t need stopwords. >>>> >>>> When I was bringing up Solr for Netflix, I started with an analysis >chain >>>> that used stopwords. A surprising number of movie titles entirely >>>> disappeared. I wrote a blog post about it. Ten years ago! >>>> >>>> >https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/ >>>> >>>> Mostly, stopwords were a performance hack back when people ran >search >>>> engines on 16-bit machines. Neither disks nor RAM were big enough >to hold >>>> the posting lists for common words. >>>> >>>> wunder >>>> Walter Underwood >>>> wun...@wunderwood.org >>>> http://observer.wunderwood.org/ (my blog) >>>> >>>> >>>>> On Jun 29, 2017, at 1:46 PM, Rick Leir <rl...@leirtech.com> wrote: >>>>> >>>>> Walter >>>>> Sorry for the tangent, but the stopwords feature sounds useful. >You say >>>> you do not use this? Did Ultraseek not do it either? >>>>> Thanks >>>>> Rick >>>>> >>>>> On June 29, 2017 10:53:42 AM EDT, Walter Underwood < >>>> wun...@wunderwood.org> wrote: >>>>>> Nope. Haven’t used stopwords for the last 20 years. >>>>>> >>>>>> I wonder if lowercaseOperators is true. The docs don’t give the >default >>>>>> value for that in edismax. >>>>>> >>>>>> https://lucene.apache.org/solr/guide/6_6/the-extended- >>>> dismax-query-parser.html >>>>>> >>>>>> wunder >>>>>> Walter Underwood >>>>>> wun...@wunderwood.org >>>>>> http://observer.wunderwood.org/ (my blog) >>>>>> >>>>>> >>>>>>> On Jun 29, 2017, at 4:42 AM, Rick Leir <rl...@leirtech.com> >wrote: >>>>>>> >>>>>>> Stopwords? >>>>>>> >>>>>>> On June 28, 2017 5:13:43 PM EDT, Walter Underwood >>>>>> <wun...@wunderwood.org> wrote: >>>>>>>> Is there some special casing in the highlighter to skip query >syntax >>>>>>>> words? The words “and” and “or” don’t get highlighted. >>>>>>>> >>>>>>>> This is in 6.5.0. >>>>>>>> >>>>>>>> <str name="hl.fl">question</str> >>>>>>>> <str name="hl.encoder">html</str> >>>>>>>> <str name="hl.fragsize">440</str> >>>>>>>> <str name="hl.method">fastVector</str> >>>>>>>> <str name="hl.snippets">1</str> >>>>>>>> >>>>>>>> wunder >>>>>>>> Walter Underwood >>>>>>>> wun...@wunderwood.org >>>>>>>> http://observer.wunderwood.org/ (my blog) >>>>>>> >>>>>>> -- >>>>>>> Sorry for being brief. Alternate email is rickleir at yahoo dot >com >>>>> >>>>> -- >>>>> Sorry for being brief. Alternate email is rickleir at yahoo dot >com >>>>> -- >>>>> Sorry for being brief. Alternate email is rickleir at yahoo dot >com >>>> >>>> -- Sorry for being brief. Alternate email is rickleir at yahoo dot com