Re: Filtering results by minimum relevancy score

2017-04-13 Thread Walter Underwood
BM25 came out of work on probabilistic engines, but using BM25 in Solr doesn’t automatically make it probabilistic. I read a paper once that showed the two models are not that different, maybe by Karen Sparck-Jones. Still, even with a probabilistic model, relevance cutoffs don’t work. It is st

Re: Filtering results by minimum relevancy score

2017-04-13 Thread alessandro.benedetti
Hi Koji, strictly talking about TF-IDF ( and BM25 which is an evolution of that approach) I would say it is a weighting function/numerical statistic that can be used for ranking functions and is based on probabilistic concepts ( such as IDF) but it is not a probabilistic function[1]. Indeed a BM25

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Koji Sekiguchi
Hi Walter, May I ask a tangential question? I'm curious the following line you wrote: > Solr is a vector-space engine. Some early engines (Verity VDK) were probabilistic engines. Those do give an absolute estimate of the relevance of each hit. Unfortunately, the relevance of results is just no

Re: Filtering results by minimum relevancy score

2017-04-12 Thread David Kramer
Thank you! That worked. From: Ahmet Arslan Date: Wednesday, April 12, 2017 at 3:15 PM To: "solr-user@lucene.apache.org" , David Kramer Subject: Re: Filtering results by minimum relevancy score Hi, I cannot find it. However it should be something like q=hello&fq={!frange

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Doug Turnbull
David I think it can be done, but a score has no real *meaning* to your domain other than the one you engineer into it. There's no 1-100 scale that guarantees at 100 that your users will love the results. Solr isn't really a turn key solution. It requires you to understand more deeply what relevan

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Ahmet Arslan
Hi, I cannot find it. However it should be something like  q=hello&fq={!frange l=0.5}query($q) Ahmet On Wednesday, April 12, 2017, 10:07:54 PM GMT+3, Ahmet Arslan wrote: Hi David, A function query named "query" returns the score for the given subquery.  Combined with frange query parser this is

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Walter Underwood
Fine. It can’t be done. If it was easy, Solr/Lucene would already have the feature, right? Solr is a vector-space engine. Some early engines (Verity VDK) were probabilistic engines. Those do give an absolute estimate of the relevance of each hit. Unfortunately, the relevance of results is just

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Ahmet Arslan
Hi David, A function query named "query" returns the score for the given subquery.  Combined with frange query parser this is possible. I tried it in the past.I am searching the original post. I think it was Yonik's post. https://cwiki.apache.org/confluence/display/solr/Function+Queries Ahmet

Re: Filtering results by minimum relevancy score

2017-04-12 Thread David Kramer
The idea is to not return poorly matching results, not to limit the number of results returned. One query may have hundreds of excellent matches and another query may have 7. So cutting off by the number of results is trivial but not useful. Again, we are not doing this for performance reasons

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Erick Erickson
Well, just because ES has it doesn't mean it's A Good Thing. IMO, it's just a "feel good" kind of thing for people who don't really understand scoring. >From that page: "Note, most times, this does not make much sense, but is provided for advanced use cases." I've written enough weasel-worded cav

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Shawn Heisey
On 4/10/2017 8:59 AM, David Kramer wrote: > I’ve done quite a bit of searching on this. Pretty much every page I > find says it’s a bad idea and won’t work well, but I’ve been asked to > at least try it to reduce the number of completely unrelated results > returned. We are not trying to normalize

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Dorian Hoxha
@alessandro Elastic-search has it: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-min-score.html On Wed, Apr 12, 2017 at 1:49 PM, alessandro.benedetti wrote: > I am not completely sure that the potential benefit of merging less docs in > sharded pagination overcom

Re: Filtering results by minimum relevancy score

2017-04-12 Thread alessandro.benedetti
I am not completely sure that the potential benefit of merging less docs in sharded pagination overcomes the additional time needed to apply the filtering function query. I would need to investigate more in details the frange internals. Cheers - --- Alessandro Benedetti Search C

Re: Filtering results by minimum relevancy score

2017-04-11 Thread Dorian Hoxha
Can't the filter be used in cases when you're paginating in sharded-scenario ? So if you do limit=10, offset=10, each shard will return 20 docs ? While if you do limit=10, _score<=last_page.min_score, then each shard will return 10 docs ? (they will still score all docs, but merging will be faster)

Re: Filtering results by minimum relevancy score

2017-04-11 Thread alessandro.benedetti
Can i ask what is the final requirement here ? What are you trying to do ? - just display less results ? you can easily do at search client time, cutting after a certain amount - make search faster returning less results ? This is not going to work, as you need to score all of them as Erick expla

Re: Filtering results by minimum relevancy score

2017-04-10 Thread Mikhail Khludnev
Here we go https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-FunctionRangeQueryParser . But it's %100 YAGNI. You'd better tweak search to be more precise. On Mon, Apr 10, 2017 at 7:12 PM, Ahmet Arslan wrote: > Hi, > I remember that this is possible via frange query pars

Re: Filtering results by minimum relevancy score

2017-04-10 Thread Ahmet Arslan
Hi, I remember that this is possible via frange query parser.But I don't have the query string at hand. Ahmet On Monday, April 10, 2017, 9:00:09 PM GMT+3, David Kramer wrote: I’ve done quite a bit of searching on this.  Pretty much every page I find says it’s a bad idea and won’t work well, but

Re: Filtering results by minimum relevancy score

2017-04-10 Thread Erick Erickson
Well, that's rather the point, the low-scoring docs aren't unrelated, someone just thinks they are. Flippancy aside, the score is, as you've researched, a bad gauge. Since Lucene has to compute the score of a doc before it knows the score, at any point in the collection process you may get a doc t

Re: Filtering results based on a set of values for a field

2011-08-19 Thread Erick Erickson
Good luck, and let us know what the results are. About dropping the cache.. That shouldn't be a problem, it should just be computed when your component is called the first time, so starting the server (or opening a new searcher) should re-compute it. Your filters shouldn't be very big, just maxDoc

Re: Filtering results based on a set of values for a field

2011-08-18 Thread Tomas Zerolo
On Thu, Aug 18, 2011 at 02:32:48PM -0400, Erick Erickson wrote: > Hmmm, I'm still not getting it... > > You have one or more lists. These lists change once a month or so. Are > you trying > to include or exclude the documents in these lists? In our specific case to include *only* the documents ha

Re: Filtering results based on a set of values for a field

2011-08-18 Thread Erick Erickson
Hmmm, I'm still not getting it... You have one or more lists. These lists change once a month or so. Are you trying to include or exclude the documents in these lists? And do the authors you want to include or exclude change on a per-query basis or would you be all set if you just had a filter tha

Re: Filtering results based on a set of values for a field

2011-08-18 Thread Tomas Zerolo
On Thu, Aug 18, 2011 at 08:36:08AM -0400, Erick Erickson wrote: > How does this list of authors get selected? The reason I'm asking is > I'm wondering > if you can "define the problem away". In other words, I'm wondering if this > is an XY problem (http://people.apache.org/~hossman/#xyproblem). :-

Re: Filtering results based on a set of values for a field

2011-08-18 Thread Erick Erickson
How does this list of authors get selected? The reason I'm asking is I'm wondering if you can "define the problem away". In other words, I'm wondering if this is an XY problem (http://people.apache.org/~hossman/#xyproblem). I can't imagine you expect a user to specify up to 2k authors, so there mu

Re: Filtering results based on a set of values for a field

2011-08-17 Thread Tomas Zerolo
On Tue, Aug 16, 2011 at 07:56:51AM +, tomas.zer...@axelspringer.de wrote: > Hello, Solrs > > we are trying to filter out documents written by (one or more of) the authors > from > a mediumish list (~2K). The document set itself is in the millions. [...] Sorry. Forgot to say that we are usin

Re: Filtering results based on score

2010-11-01 Thread Ahmet Arslan
> As part of solr results i am able to get the max score.If i > want to filter > the results based on the max score, let say the max > score  is 10 And i need > only the results between max score  to 50 % of max > score.This max score is > going to change dynamically.How can we implement this?Do we

Re: Filtering results

2010-02-04 Thread Ahmet Arslan
> Hi > I want to add a filter to my query which takes documents > whose "city" > field has either Bangalore of cochin or Bombay. how do i do > this? > > fq=city:bangalore&fq=city:bombay& fq=city:cochin > will take the > intersection. I need the union. fq=city:(bangalore OR cochin OR bombay) sam

Re: Filtering results

2008-09-18 Thread ristretto . rb
t; To: solr-user@lucene.apache.org >> Sent: Thursday, September 18, 2008 7:35:43 PM >> Subject: Re: Filtering results >> >> Otis, >> >> Would be reasonable to run a query like this >> >> http://localhost:8280/solr/select/?q=terms_x&version=2.2&

Re: Filtering results

2008-09-18 Thread Otis Gospodnetic
-- Lucene - Solr - Nutch - Original Message > From: ristretto.rb <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, September 18, 2008 7:35:43 PM > Subject: Re: Filtering results > > Otis, > > Would be reasonable to run a query like thi

Re: Filtering results

2008-09-18 Thread ristretto . rb
l requirements, but running >> 1+10 queries doesn't sound good to me from scalability/performance point of >> view. >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> - Original Mess

Re: Filtering results

2008-09-16 Thread Gene Campbell
1+10 queries doesn't sound good to me from scalability/performance point of > view. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: ristretto.rb <[EMAIL PROTECTED]> >> To: solr-user

Re: Filtering results

2008-09-16 Thread Otis Gospodnetic
- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: ristretto.rb <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, September 16, 2008 6:45:02 PM > Subject: Re: Filtering results > > thanks. very interesting

Re: Filtering results

2008-09-16 Thread ristretto . rb
thanks. very interesting. The plot thickens. And, yes, I think field collapsing is exactly what I'm after. I'm am considering now trying this patch. I have a solr 1.2 instance on Jetty. I looks like I need to install the patch. Does anyone use that patch? Recommend it? The wiki page (http:/

Re: Filtering results

2008-09-16 Thread Chris Hostetter
: 1. Identify all records that would match search terms. (Suppose I : search for 'dog', and get 450,000 matches) : 2. Of those records, find the distinct list of groups over all the : matches. (Suppose there are 300.) : 3. Now get the top ranked record from each group, as if you search : just

Re: Filtering results

2008-09-16 Thread Gene Campbell
Thanks for the reply Erik Sorry for being vague. To be clear we have 1-2 million records, and rough 12000-14000 groups. Each record is in one and only one group. I see it working something like this 1. Identify all records that would match search terms. (Suppose I search for 'dog', and get 45

Re: Filtering results

2008-09-16 Thread Erik Hatcher
Personally, I'd send three requests for solr, one for each group. &rows=1&fq=category:A ... and so on. But that'd depend on how many groups you have. One can always hack custom request handlers to do this sort of thing all as a single request, but I'd guess it ain't that much slower to ju