+1. Of course it is doable, but that doesn't mean you should, which is what I was trying to say before, (but was typing on my iPod so it wasn't fast) and which Walter has done so. It is entirely conceivable to me that someone could search for a very common word such that the score of all relevant (and thus, "good") documents are below your predefined threshold.

At any rate, proceed at your own peril. To implement it, look into the SearchComponent functionality.

On Feb 11, 2009, at 12:20 PM, Walter Underwood wrote:

Don't bother doing this. It doesn't work.

This seems like a good idea, something that would be useful for
almost every Lucene installation, but it isn't in Lucene because
it does not work in the real world.

A few problems:

* Some users want every match and don't care how many pages of
results they look at.

* Some users are very bad at creating queries that match their
information needs. Others are merely bad, not very bad. The good
matches for their query are on top, but the good matches for
their information need are on the third page.

* Misspellings can put the right match (partial match) at the
bottom. I did this yesterday at my library site, typeing
"Katherine Kerr" instead of the correct "Katharine Kerr".
Their search engine showed no matches (grrr), so I had to
search again with "Kerr".

* Most users do not know how to repair their queries, like
I did with "Katherine Kerr", changing it to "Kerr". Even if
they do, you shouldn't make them. Just show the weakly
relevant results.

* Documents have errors, just like queries. I find bad data
on our site about once a month, and we have professional
editors. We still haven't fixed our entry for "Betty Page"
to read "Bettie Page".

* People may use non-title words in the query, like searching
for "batman" when they want "The Dark Knight".

So, don't do this. If you are forced to do it, make sure that you
measure your search quality before and after it is implemented,
because it will get worse. Then you can stop doing it.

wunder

On 2/11/09 8:28 AM, "Cheng Zhang" <zhangyongji...@yahoo.com> wrote:

Just did some research. It seems that it's doable with additional code added
to Solr but not out of box. Thank you, Grant.



----- Original Message ----
From: Grant Ingersoll <gsing...@apache.org>
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Sent: Wednesday, February 11, 2009 8:14:01 AM
Subject: Re: score filter

At what point do you draw the line? 0.01 is too low, but what about 0.5 or
0.3?  In fact, there may be queries where 0.01 is relevant.

Relevance is a tricky thing and putting in arbitrary cutoffs is usually not a good thing. An alternative might be to instead look at the difference between scores and see if the gap is larger than some delta, but even that is subject
to the vagaries of scoring.

What kind of relevance testing have you done so far to come up with those
values?  See also
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debug
ging-Relevance-Issues-in-Search/


On Feb 11, 2009, at 10:16, Cheng Zhang <zhangyongji...@yahoo.com> wrote:

Hi Grant,

In my case, for example searching a book. Some of the returned documents are with high relevance (score > 3), but some of document with low score (<0.01)
are useless.

Without a "score filter", I have to go through each document to find out the number of documents I'm interested (score > nnn). This causes some problem for pagination. For example if I only need to display the first 10 records I need to retrieve all 1000 documents to figure out the number of meaningful
documents which have score > nnn.

Thx,
Kevin




----- Original Message ----
From: Grant Ingersoll <gsing...@apache.org>
To: solr-user@lucene.apache.org
Sent: Wednesday, February 11, 2009 6:47:11 AM
Subject: Re: score filter

What's the motivation for wanting to do this? The reason I ask, is score is a relative thing determined by Lucene based on your index statistics. It is only meaningful for comparing the results of a specific query with a specific instance of the index. In other words, it isn't useful to filter on b/c there is no way of knowing what a good cutoff value would be. So, you won't be able to do score:[1.2 TO *] because score is a not an actual Field.

That being said, you probably could implement a HitCollector at the Lucene level and somehow hook it into Solr to do what you want. Or, of course, just stop processing the results in your app after you see a score below a certain value. Naturally, this still means you have to retrieve the results.

-Grant


On Feb 10, 2009, at 10:01 PM, Cheng Zhang wrote:

Hello,

Is there a way to set a score filter? I tried "+score:[1.2 TO *]" but it did
not work.

Many thanks,
Kevin


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
Solr/Lucene:
http://www.lucidimagination.com/search



Reply via email to