Re: Stopwords impact on search

2020-04-26 Thread Steven White
Thanks Walter. Much appreciated. To the Solr dev team, it would be of great help if there Walter's IDF summary is made part of stop-filter: https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#stop-filter Steve On Fri, Apr 24, 2020 at 8:49 PM Walter Underwood wrote: > IDF and sto

Re: Stopwords impact on search

2020-04-24 Thread Walter Underwood
IDF and stopword removal are different approaches to the same thing. Removing stopwords is a binary decision on how important common words are for search. It says some words are completely useless. IDF is a proportional measure on how important common words are for search. Instead of removing a

Re: Stopwords impact on search

2020-04-24 Thread Steven White
Hi everyone, I get it why and when if stopwords are note indexed is a bad idea and can give you 0 or incomplete results. But what about the quality of search result when stopwords are indexed vs. not indexed? 1) Stopwords are removed and I do word search, not phrase for "solr and lucene are so c

Re: Stopwords impact on search

2020-04-24 Thread Walter Underwood
I’m astonished that the default still has that. It was a bad idea in Solr 1.3, when it bit my ass. We help people with this about once a month and the advice is always the same. Imagine all the poor people who never ask about it and run with that default! wunder Walter Underwood wun...@wunderwoo

Re: Stopwords impact on search

2020-04-24 Thread Jan Høydahl
Turns out there is already a JIRA for this SOLR-10992 where both you and I commented already :) But it’s 3 years old... > 24. apr. 2020 kl. 16:34 skrev Erick Erickson : > > +1 to removing stopword filters. > >> On Apr 24, 2020, at 10:28 AM, Jan

Re: Stopwords impact on search

2020-04-24 Thread Rohan Kasat
So do we use stopwords filter as part of query analyzer, to avoid highlighting of these stop words ? Regards, Rohan On Fri, Apr 24, 2020 at 7:45 AM Walter Underwood wrote: > Agreed. Here is an article from 13 years ago when I accidentally turned on > stopword removal at Netflix. It caused bad p

Re: Stopwords impact on search

2020-04-24 Thread Walter Underwood
Agreed. Here is an article from 13 years ago when I accidentally turned on stopword removal at Netflix. It caused bad problems. https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/ Infoseek was not removing stopwords when I joined them in 1996. Since then, I’ve always left

Re: Stopwords impact on search

2020-04-24 Thread Erick Erickson
+1 to removing stopword filters. > On Apr 24, 2020, at 10:28 AM, Jan Høydahl wrote: > > I tend to agree. Should we simply remove the stopword filters from the > default configsets shipping with Solr? > > Jan > >> 24. apr. 2020 kl. 14:44 skrev David Hastings : >> >> you should never use the s

Re: Stopwords impact on search

2020-04-24 Thread Jan Høydahl
I tend to agree. Should we simply remove the stopword filters from the default configsets shipping with Solr? Jan > 24. apr. 2020 kl. 14:44 skrev David Hastings : > > you should never use the stopword filter unless you have a very specific > purpose > > On Fri, Apr 24, 2020 at 8:33 AM Steven W

Re: Stopwords impact on search

2020-04-24 Thread David Hastings
you should never use the stopword filter unless you have a very specific purpose On Fri, Apr 24, 2020 at 8:33 AM Steven White wrote: > Hi everyone, > > What is, if any, the impact of stopwords in to my search ranking quality? > Will my ranking improve is I do not index stopwords? > > I'm trying

Re: StopWords behavior with phrases

2019-05-21 Thread Jan Høydahl
Well perhaps you don't need to remove stopwords at all? :) Or a middle ground is to NOT removing stopwords in your 'index' analyzer, then you have the flexibility of removing them on query side. Thus if you use &stopwords=false on your call perhaps that works? -- Jan Høydahl, search solution arc

Re: Stopwords param of edismax parser not working

2019-03-29 Thread Branham, Jeremy (Experis)
Hi Ashish – Are you using v7.3? If so, I think this is the spot in code that should be executing: https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.3.0/solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java#L310 Haven’t dug into the logic, but I tested on my server [

Re: Stopwords param of edismax parser not working

2019-03-28 Thread Erick Erickson
and to say anything about your particular situation we need to see the field definitions from the schema for the field you expect stopwrods to be removed from and the stopwords file for those fields. But Walter’s comment is germane. Stopwords lead to a number of incongruities and are best just

Re: Stopwords param of edismax parser not working

2019-03-28 Thread Walter Underwood
Why are you removing stopwords? That hack made sense in the 1950s, but I haven’t removed stopwords for the last twenty years. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 28, 2019, at 2:47 AM, Ashish Bisht wrote: > > Hi, > > We are trying

Re: Stopwords magic

2015-03-31 Thread Jack Krupansky
Use the Solr Admin UI analysis page to see how the text is analyzed at both index and query time. My e-book does have more narrative and examples for stop word processing: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html -- Jack Kr

Re: Stopwords in shingles suggester

2015-02-13 Thread O. Klein
I found the issue in Jira https://issues.apache.org/jira/browse/SOLR-6468 O. Klein wrote > With more and more people starting to use the Suggester it seems that > enablePositionIncrements for StopFilterFactory is still needed. > > Not sure why it is being removed from Solr5, but is there a way t

Re: Stopwords in shingles suggester

2015-02-12 Thread O. Klein
With more and more people starting to use the Suggester it seems that enablePositionIncrements for StopFilterFactory is still needed. Not sure why it is being removed from Solr5, but is there a way to keep the functionality beyond lucene 4.3 ? Or can this feature be reinstated? -- View this mes

Re: Stopwords in shingles suggester

2014-10-27 Thread O. Klein
I changed luceneMatchVersion to 4.3 and got the behavior i was looking for. -- View this message in context: http://lucene.472066.n3.nabble.com/Stopwords-in-shingles-suggester-tp4166057p4166192.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stopwords in shingles suggester

2014-10-27 Thread Ahmet Arslan
Hi, I think you can set fillerToken value? Ahmet On Monday, October 27, 2014 8:03 PM, O. Klein wrote: Thank you all for your input. The stopword is being replaced by the fillerToken as shown in the article. Changing positionIncrementGap makes no difference and as of Solr 4.4, the enablePo

Re: Stopwords in shingles suggester

2014-10-27 Thread O. Klein
Thank you all for your input. The stopword is being replaced by the fillerToken as shown in the article. Changing positionIncrementGap makes no difference and as of Solr 4.4, the enablePositionIncrements argument is no longer supported in the StopFilterFactory. So how do I get this working in S

Re: Stopwords in shingles suggester

2014-10-27 Thread Vikas Agarwal
Is this what you are looking for? Basically, you can use analyzers for this purpose. You can even write your own analyzer. On Mon, Oct 27, 2014 at 6:26 PM, O. Klein wrote: > Is there a way in Solr to filter out

Re: Stopwords in shingles suggester

2014-10-27 Thread Shawn Heisey
On 10/27/2014 6:56 AM, O. Klein wrote: > Is there a way in Solr to filter out stopwords in shingles like ES does? > > http://www.elasticsearch.org/blog/searching-with-shingles/ If I read that correctly, ES isn't doing anything differently than Solr does. They use the same filters that Solr does.

Re: Stopwords in shingles suggester

2014-10-27 Thread Dikshant Shahi
Configure a fieldType in schema.xml as below: .. .. ** Thanks, Dikshant On Mon, Oct 27, 2014 at 6:26 PM, O. Klein wrote: > Is there a way in Solr to filter out stopwords in shingles like ES does? > > http://www.elasticsearch.org/blog/searching-w

RE: Stopwords in shingles suggester

2014-10-27 Thread Markus Jelsma
You do not want stopwords in your shingles? Then put the stopword filter on top of the shingle filter. Markus -Original message- > From:O. Klein > Sent: Monday 27th October 2014 13:56 > To: solr-user@lucene.apache.org > Subject: Stopwords in shingles suggester > > Is there a way in Sol

Re: Stopwords

2014-06-26 Thread David Stuart
Hi, Not really as the words don’t exist in the corpus field. They way we have got around it in the past is to have another non stopped field that is also searched on (in addition to the the stopped field) with a boost to the score for matches. As an slight alternative you could do the above bu

Re: Stopwords

2014-06-26 Thread Shuai Zhang
Hi, In fact, you can use analysis page to check the result of query or index process!   -- Gabriel Zhang On Thursday, June 26, 2014 5:33 PM, Geert Van Huychem wrote: Hello   We have the default dutch stopwords implemented in our Solr instance, so words like ‘de’, ‘het’, ‘ben’ are filt

Re: stopwords issue with edismax

2014-03-04 Thread Jack Krupansky
--Original Message- From: sureshrk19 Sent: Tuesday, March 4, 2014 1:57 PM To: solr-user@lucene.apache.org Subject: Re: stopwords issue with edismax Thanks Jack. I could fix this problem by adding stopwords 'filter' condition in definition for "number" and "all_code&qu

Re: stopwords issue with edismax

2014-03-04 Thread sureshrk19
Thanks Jack. I could fix this problem by adding stopwords 'filter' condition in definition for "number" and "all_code" -- View this message in context: http://lucene.472066.n3.nabble.com/stopwords-issue-with-edismax-tp4120339p4121176.html Sent from the Solr - User mailing list archive at N

Re: stopwords issue with edismax

2014-03-02 Thread Jack Krupansky
Original Message- From: sureshrk19 Sent: Monday, March 3, 2014 1:05 AM To: solr-user@lucene.apache.org Subject: Re: stopwords issue with edismax Jack, Thanks for the reply. Yes. your observation is right. I see, stopwords are not being ignore at query time. Say, I'm searching f

Re: stopwords issue with edismax

2014-03-02 Thread sureshrk19
Jack, Thanks for the reply. Yes. your observation is right. I see, stopwords are not being ignore at query time. Say, I'm searching for 'bank of america'. I'm expecting 'of' should not be the part of search. But, here I see 'of' is being sent. Same is the query syntax for 'OR' and 'AND' operator

Re: stopwords issue with edismax

2014-02-28 Thread Jack Krupansky
Krupansky -Original Message- From: sureshrk19 Sent: Friday, February 28, 2014 1:12 PM To: solr-user@lucene.apache.org Subject: Re: stopwords issue with edismax Thanks for taking time on this... Here is my request handler definition: edismax explicit 10 all_text nu

Re: stopwords issue with edismax

2014-02-28 Thread sureshrk19
Thanks for taking time on this... Here is my request handler definition: edismax explicit 10 all_text number party name all_code ent_name all_text number^3 name^5 party^3 all_code^2 ent_name^7 id description AND Name which is indexed

Re: stopwords issue with edismax

2014-02-28 Thread Ahmet Arslan
Hi, From the URLs you provided, it is not clear that you use edismax query parser at all. Thats why I asked complete list of parameters. Can you paste request handler definition from solrconfig.xml?  And what do you expect and what is not working for you. On Friday, February 28, 2014 7:30

Re: stopwords issue with edismax

2014-02-28 Thread sureshrk19
explicit For all handlers I have the same setting. Another observation I have is, I'm getting results when I use, 'q.op=OR' the default operator set in solrconfig.xml is 'AND' the query working fine is: http://localhost:8080/solr/collection1/select?q=bank+america&wt=json&indent=true&q.op=OR

Re: stopwords issue with edismax

2014-02-28 Thread Ahmet Arslan
Can give parameters defined in defaults sections of request handler / solrconfig.xml? By the way echoParams=all will list all parameters. On Friday, February 28, 2014 5:18 PM, sureshrk19 wrote: Ahmet, Thanks for the reply.. Here is the query: http://localhost:8080/solr/collection1/select?q

Re: stopwords issue with edismax

2014-02-28 Thread sureshrk19
Ahmet, Thanks for the reply.. Here is the query: http://localhost:8080/solr/collection1/select?q=a+of+b&fq=type%3AEntity&wt=json&indent=true And here is my stopwords_en.txt content a an and are as at be but by for if in into is it no not of on or -- View this message in context: http://l

Re: stopwords issue with edismax

2014-02-28 Thread Ahmet Arslan
Hi Suresh, Can you give us full set of parameters you use for edismax? qf, mm, etc. And content of your stopwords.txt. Is a listed there too? Ahmet On Friday, February 28, 2014 8:54 AM, sureshrk19 wrote: Hi All, I'm having a problem while searching for some string with a word defined in stop

Re: stopwords in solr

2012-11-28 Thread 曹霖
yep,lt is a bad idea to eliminate stopword during indexing,may be u can eliminate stopword during querying,That is flexible 2012/11/28 Walter Underwood > Eliminating stopwords is generally a bad idea. It means you cannot search > for "vitamin a".

Re: stopwords in solr

2012-11-27 Thread Walter Underwood
Eliminating stopwords is generally a bad idea. It means you cannot search for "vitamin a". Back in the 1970's, search engines eliminated stopwords so they could work on 16-bit machines. That isn't a problem any more. wunder On Nov 27, 2012, at 10:33 PM, Joe Zhang wrote: > that is really stran

Re: stopwords in solr

2012-11-27 Thread Andy Lester
On Nov 28, 2012, at 12:33 AM, Joe Zhang wrote: > that is really strange. so basic stopwords such as "a" "the' are not > eliminated from the index? There is no list of "basic stopwords" anywhere. If you want stop words, you have to put them in the file yourself. There are not really any sensi

Re: stopwords in solr

2012-11-27 Thread Joe Zhang
that is really strange. so basic stopwords such as "a" "the' are not eliminated from the index? On Tue, Nov 27, 2012 at 11:16 PM, 曹霖 wrote: > justt no stopwords are considered in that case > > 2012/11/28 Joe Zhang > > > t no stopwords are considered in > > this case > > >

Re: stopwords in solr

2012-11-27 Thread 曹霖
justt no stopwords are considered in that case 2012/11/28 Joe Zhang > t no stopwords are considered in > this case >

Re: stopwords as privacy measure

2012-01-10 Thread Michael Lissner
It's a bit of a privacy through obscurity measure, unfortunately. The problem is that American courts do a lousy job of removing social security numbers from cases that I put on my site. I do anonymization before sending the cases to Solr, but if you're clever (and the stopwords weren't in plac

Re: stopwords as privacy measure

2012-01-09 Thread Erik Hatcher
Mike - Indeed users won't be able to *search* for things removed by the stop filter at index time (the terms literally aren't in the index then). But be careful with the stored value. Analysis does not affect stored content. Are you anonymizing before sending to Solr (if so, why stop-word blo

Re: stopwords as privacy measure

2012-01-08 Thread Michael Lissner
I've got them configured at index and query time, so sounds like I'm all set. I'm doing anonymization of social security numbers, converting them to xxx-xx-. I don't *think* users can find a way of identifying these docs if the stopwords-based block works. Thank you both for the confirma

Re: stopwords as privacy measure

2012-01-08 Thread Gora Mohanty
On Mon, Jan 9, 2012 at 5:03 AM, Michael Lissner wrote: > I have a unique use case where I have words in my corpus that users > shouldn't ever be allowed to search for. My theory is that if I add these to > the stopwords list, that should do the trick. Yes, that should work. Are you including the

Re: stopwords as privacy measure

2012-01-08 Thread Ted Dunning
On Sun, Jan 8, 2012 at 3:33 PM, Michael Lissner < mliss...@michaeljaylissner.com> wrote: > I have a unique use case where I have words in my corpus that users > shouldn't ever be allowed to search for. My theory is that if I add these > to the stopwords list, that should do the trick. > That shou

Re: StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-23 Thread Shawn Heisey
On 9/23/2011 1:45 AM, Pranav Prakash wrote: Maybe I am wrong. But my intentions of using both of them is - first I want to use phrase queries so used CommonGramsFilterFactory. Secondly, I dont want those stopwords in my index, so I have used StopFilterFactory to remove them. CommonGrams is n

Re: StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-23 Thread Pranav Prakash
> You've got CommonGramsFilterFactory and StopFilterFactory both using > stopwords.txt, which is a confusing configuration. Normally you'd want one > or the other, not both ... but if you did legitimately have both, you'd want > them to each use a different wordlist. > Maybe I am wrong. But my in

Re: StopWords coming in Top 10 terms despite using StopFilterFactory

2011-09-22 Thread Shawn Heisey
On 9/22/2011 3:54 AM, Pranav Prakash wrote: Hi List, I included StopFilterFactory and I can see it taking action in the Analyzer Interface. However, when I go to Schema Analyzer, I see those stop words in the top 10 terms. Is this normal? You've got CommonGramsFilterFactory and

Re: stopwords not working in multicore setup

2011-03-25 Thread Christopher Bottaro
Ahh, thank you for the hints Martin... German stopwords without Umlaut work correctly. So I'm trying to figure out where the UTF-8 chars are getting messed up. Using the Solr admin web UI, I did a search for title:für and the xml (or json) output in the browser shows the query with the proper enc

Re: stopwords file configuration

2010-11-16 Thread alendo
I reply to myself because I founded the mistake. The italian stopwords file that I founded on apache site contains on the same line of each stopword a comment shell style, the stopwords tokenizer probably is basical and doesn't accept comments on the same line of stopwords. I dropped them and now

Re: stopwords in AND clauses

2010-09-13 Thread Xavier Noria
On Mon, Sep 13, 2010 at 4:29 PM, Simon Willnauer wrote: > On Mon, Sep 13, 2010 at 3:27 PM, Xavier Noria wrote: >> Let's suppose we have a regular search field body_t, and an internal >> boolean flag flag_t not exposed to the user. >> >> I'd like >> >>    body_t:foo AND flag_t:true > > this is so

Re: stopwords in AND clauses

2010-09-13 Thread Simon Willnauer
On Mon, Sep 13, 2010 at 3:27 PM, Xavier Noria wrote: > Let's suppose we have a regular search field body_t, and an internal > boolean flag flag_t not exposed to the user. > > I'd like > >    body_t:foo AND flag_t:true this is solr right? why don't you use filterquery for you unexposed flat_t fiel

Re: Stopwords

2010-03-17 Thread Mark Miller
On 03/17/2010 12:03 PM, Robert Muir wrote: On Wed, Mar 17, 2010 at 11:48 AM, Grant Ingersoll wrote: Yes and no. Putting our historian hat on, stop words were often seen as contributing very little to scores and also taking up a lot of room on disk back in the days when disk was very pre

Re: Stopwords

2010-03-17 Thread Robert Muir
On Wed, Mar 17, 2010 at 11:48 AM, Grant Ingersoll wrote: > Yes and no.  Putting our historian hat on, stop words were often seen as > contributing very little to scores and also taking up a lot of room on disk > back in the days when disk was very precious.  Times, as they say, have > changed.

Re: Stopwords

2010-03-17 Thread Grant Ingersoll
On Mar 16, 2010, at 9:51 PM, blargy wrote: > > I was reading "Scaling Lucen and Solr" > (http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/) > and I came across the section StopWords. > > In there it mentioned that its not recommended to remove st

Re: Stopwords

2010-03-17 Thread Anthony Serfes
010 11:13 AM To: Subject: Re: Stopwords That discussion cites a paper via a URL: http://doc.rero.ch/lm.php?url#16;00,43,4,20091218142456-GY/Dolamic_Ljiljana__When_Stopword_Lists_Make_the_Difference_20091218.pdf Unfortunately when I go to this URL I get: "L'accès à ce document est limité.

Re: Stopwords

2010-03-17 Thread Glen Newton
That discussion cites a paper via a URL: http://doc.rero.ch/lm.php?url#16;00,43,4,20091218142456-GY/Dolamic_Ljiljana__When_Stopword_Lists_Make_the_Difference_20091218.pdf Unfortunately when I go to this URL I get: "L'accès à ce document est limité." But I tracked down the paper. Here is its refe

Re: Stopwords

2010-03-17 Thread Ahmet Arslan
> I was reading "Scaling Lucen and Solr" > (http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/) > and I came across the section StopWords. > > In there it mentioned that its not recommended to remove > stop words at index > time. Why is this the cas

Re: Stopwords not working as expected

2010-01-02 Thread Bogdan Vatkov
@Mahout experts: could you please, elaborate on that? It seems that I am stopping successfully quite some words with the stopwords mechanism in Solr (I do not get search results when querying with stopwords with the localhost/solr/select interface) but this somehow is not effective when Solr index

Re: Stopwords not working as expected

2010-01-02 Thread Lance Norskog
Fields are both stored and indexed. The stored copy is exactly what you sent in. The index is built with the "text" type's analysis stack and is not stored. This output has the stopwords removed. The output is not stored in one place, but parts of it are scattered around the Lucene index data struc

Re: Stopwords when facetting

2009-07-07 Thread Chris Hostetter
: When indexing or querying text, i'm using the solr.StopFilterFactory ; it seems to works just fine... : : But I want to use the text field as a facet, and get all the commonly : used words in a set of results, without the stopwords. As far as I : tried, I always get stopwords, and numerical

Re: Stopwords when facetting

2009-07-03 Thread Erik Hatcher
Pierre - the field you're faceting must not have the StopFilter applied at indexing time, or the words you want removed aren't in the stop word list file. Erik On Jul 3, 2009, at 5:21 AM, Pierre-Yves LANDRON wrote: Hello, When indexing or querying text, i'm using the solr.StopF

Re: stopwords and phrase queries

2008-03-25 Thread Vinci
Hi, I think Solr allow you to do asymmetric query processing and indexing.(*Not all the preprocessing can be asymmetric - stemming, lowercasing must be symmetric) To make the query work, at least you need to make the stop words to be indexed and then the query should not do the stop word removal

Re: stopwords and phrase queries

2008-03-25 Thread Sean Timm
Music is another domain where this is a real problem. E.g., "The The", "The Who", not to mention the song and album names. -Sean Walter Underwood wrote: We do a similar thing with a no stopword, no stemming field. There are a surprising number of movie titles that are entirely stopwords. "Be

Re: stopwords and phrase queries

2008-03-21 Thread Walter Underwood
We do a similar thing with a no stopword, no stemming field. There are a surprising number of movie titles that are entirely stopwords. "Being There" was the first one I noticed, but "To be and to have" wins the prize for being all-stopwords in two languages. See my list, here: http://wunderwood

RE: stopwords and phrase queries

2008-03-21 Thread Lance Norskog
Yes. Our in-house example is the movie title "The Sound Of Music". Given in quotes as a phrase this will pull up "anystopword Sound anystopword Music". For example, "A Sound With Music". Your example is also a test case of ours. For some Lucenicious reason "six stopwords in a row" does not find an