Re: Filtering results by minimum relevancy score

Erick Erickson Mon, 10 Apr 2017 09:06:01 -0700

Well, that's rather the point, the low-scoring docs aren't unrelated,
someone just thinks they are.

Flippancy aside, the score is, as you've researched, a bad gauge.
Since Lucene has to compute the score of a doc before it knows the
score, at any point in the collection process you may get a doc that's
10x the previous top score. Or 1/10x the previous low score.

Point being that until the complete list is assembled, you really
can't say much about any particular document.

I think it's just a bad idea to try to use _score_ for this. Rather,
refine how you query to reduce the numbers of unrelated documents.

Of course then someone will complain that "there are docs I know that
should be returned that aren't.".

You mentioned trying to use the score in a filter query. How would
that work? You don't know on the way in whether the top scoring doc
will be 100 or 1. Even a normalized score can't be computed until you
know the min/max, which you don't know until the last doc is scored.

This is the inescapable tension between precision and recall. In
essence, you're being asked to increase precision at the expense of
recall (i.e. return fewer documents that are "more relevant"). The
best way to do that is refine the query.

Of course one option is to just count on people getting tired of paging.

Best,
er...@notverymuchhelp.com

On Mon, Apr 10, 2017 at 7:59 AM, David Kramer <david.kra...@shoebuy.com> wrote:
> I’ve done quite a bit of searching on this.  Pretty much every page I find 
> says it’s a bad idea and won’t work well, but I’ve been asked to at least try 
> it to reduce the number of completely unrelated results returned.  We are not 
> trying to normalize the number, or display it as a percentage, and I 
> understand why those are not mathematically sound.  We are relying on Solr 
> for pagination, so we can’t just filter out low scores from the results.
>
> I had assumed that you could use score in the filter query, but that doesn’t 
> appear to be the case.  Is there a special way to reference it, or is there 
> another way to attack the problem?  It seems like something that should be 
> allowed and possible.
>
> Thanks.

Re: Filtering results by minimum relevancy score

Reply via email to