Walter, Erick, David
Thanks for the info. Maybe the default for stopwords should be disabled? 
Cheers -- Rick

On June 29, 2017 5:14:16 PM EDT, Walter Underwood <wun...@wunderwood.org> wrote:
>My blog post has a list of movie titles. I forgot to list the TV series
>“Once and Again”.
>
>Some bands that are not searchable with stopwords:
>
>* The Who
>* Was (not Was)
>* The The
>
>wunder
>Walter Underwood
>wun...@wunderwood.org
>http://observer.wunderwood.org/  (my blog)
>
>
>> On Jun 29, 2017, at 2:09 PM, Erick Erickson <erickerick...@gmail.com>
>wrote:
>> 
>> bq: Mostly, stopwords were a performance hack back when people ran
>> search engines on 16-bit machines
>> 
>> Ah, _those_ were the days when programmers were _real_ programmers.
>> Actually I'm glad they're gone but that's another story.
>> 
>> "to be or not to be". Can't search that if you enable stopwords.
>> 
>> Chris Hostetter wrote a fun blog on the fact that Lucene query
>parsers
>> are not strict boolean logic with the title "Why Not AND, OR, And
>NOT"
>> purposely choosing that title as it's totally unsearchable if you're
>> using stopwords.
>> 
>> FWIW,
>> Erick
>> 
>> On Thu, Jun 29, 2017 at 1:57 PM, David Hastings
>> <hastings.recurs...@gmail.com> wrote:
>>> Agreed.  Stop words from the moment I started using them caused
>complaints
>>> and problems right off the bat.  They may have been implemented less
>than a
>>> week before needing a re-index to fix all the problems they caused.
>>> 
>>> On Thu, Jun 29, 2017 at 4:55 PM, Walter Underwood
><wun...@wunderwood.org>
>>> wrote:
>>> 
>>>> Ultraseek (and Infoseek) never used stopwords. They cause odd
>failures,
>>>> like not being able to search for “Vitamin A”.
>>>> 
>>>> Stopwords are an on/off approach to term frequency. idf is a
>proportional
>>>> approach. Once you have idf, you don’t need stopwords.
>>>> 
>>>> When I was bringing up Solr for Netflix, I started with an analysis
>chain
>>>> that used stopwords. A surprising number of movie titles entirely
>>>> disappeared. I wrote a blog post about it. Ten years ago!
>>>> 
>>>>
>https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/
>>>> 
>>>> Mostly, stopwords were a performance hack back when people ran
>search
>>>> engines on 16-bit machines. Neither disks nor RAM were big enough
>to hold
>>>> the posting lists for common words.
>>>> 
>>>> wunder
>>>> Walter Underwood
>>>> wun...@wunderwood.org
>>>> http://observer.wunderwood.org/  (my blog)
>>>> 
>>>> 
>>>>> On Jun 29, 2017, at 1:46 PM, Rick Leir <rl...@leirtech.com> wrote:
>>>>> 
>>>>> Walter
>>>>> Sorry for the tangent, but the stopwords feature sounds useful.
>You say
>>>> you do not use this? Did Ultraseek not do it either?
>>>>> Thanks
>>>>> Rick
>>>>> 
>>>>> On June 29, 2017 10:53:42 AM EDT, Walter Underwood <
>>>> wun...@wunderwood.org> wrote:
>>>>>> Nope. Haven’t used stopwords for the last 20 years.
>>>>>> 
>>>>>> I wonder if lowercaseOperators is true. The docs don’t give the
>default
>>>>>> value for that in edismax.
>>>>>> 
>>>>>> https://lucene.apache.org/solr/guide/6_6/the-extended-
>>>> dismax-query-parser.html
>>>>>> 
>>>>>> wunder
>>>>>> Walter Underwood
>>>>>> wun...@wunderwood.org
>>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>> 
>>>>>> 
>>>>>>> On Jun 29, 2017, at 4:42 AM, Rick Leir <rl...@leirtech.com>
>wrote:
>>>>>>> 
>>>>>>> Stopwords?
>>>>>>> 
>>>>>>> On June 28, 2017 5:13:43 PM EDT, Walter Underwood
>>>>>> <wun...@wunderwood.org> wrote:
>>>>>>>> Is there some special casing in the highlighter to skip query
>syntax
>>>>>>>> words? The words “and” and “or” don’t get highlighted.
>>>>>>>> 
>>>>>>>> This is in 6.5.0.
>>>>>>>> 
>>>>>>>>    <str name="hl.fl">question</str>
>>>>>>>>    <str name="hl.encoder">html</str>
>>>>>>>>    <str name="hl.fragsize">440</str>
>>>>>>>>    <str name="hl.method">fastVector</str>
>>>>>>>>    <str name="hl.snippets">1</str>
>>>>>>>> 
>>>>>>>> wunder
>>>>>>>> Walter Underwood
>>>>>>>> wun...@wunderwood.org
>>>>>>>> http://observer.wunderwood.org/  (my blog)
>>>>>>> 
>>>>>>> --
>>>>>>> Sorry for being brief. Alternate email is rickleir at yahoo dot
>com
>>>>> 
>>>>> --
>>>>> Sorry for being brief. Alternate email is rickleir at yahoo dot
>com
>>>>> --
>>>>> Sorry for being brief. Alternate email is rickleir at yahoo dot
>com
>>>> 
>>>> 

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Reply via email to