Don't bother doing this. It doesn't work. This seems like a good idea, something that would be useful for almost every Lucene installation, but it isn't in Lucene because it does not work in the real world.
A few problems: * Some users want every match and don't care how many pages of results they look at. * Some users are very bad at creating queries that match their information needs. Others are merely bad, not very bad. The good matches for their query are on top, but the good matches for their information need are on the third page. * Misspellings can put the right match (partial match) at the bottom. I did this yesterday at my library site, typeing "Katherine Kerr" instead of the correct "Katharine Kerr". Their search engine showed no matches (grrr), so I had to search again with "Kerr". * Most users do not know how to repair their queries, like I did with "Katherine Kerr", changing it to "Kerr". Even if they do, you shouldn't make them. Just show the weakly relevant results. * Documents have errors, just like queries. I find bad data on our site about once a month, and we have professional editors. We still haven't fixed our entry for "Betty Page" to read "Bettie Page". * People may use non-title words in the query, like searching for "batman" when they want "The Dark Knight". So, don't do this. If you are forced to do it, make sure that you measure your search quality before and after it is implemented, because it will get worse. Then you can stop doing it. wunder On 2/11/09 8:28 AM, "Cheng Zhang" <zhangyongji...@yahoo.com> wrote: > Just did some research. It seems that it's doable with additional code added > to Solr but not out of box. Thank you, Grant. > > > > ----- Original Message ---- > From: Grant Ingersoll <gsing...@apache.org> > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Sent: Wednesday, February 11, 2009 8:14:01 AM > Subject: Re: score filter > > At what point do you draw the line? 0.01 is too low, but what about 0.5 or > 0.3? In fact, there may be queries where 0.01 is relevant. > > Relevance is a tricky thing and putting in arbitrary cutoffs is usually not a > good thing. An alternative might be to instead look at the difference between > scores and see if the gap is larger than some delta, but even that is subject > to the vagaries of scoring. > > What kind of relevance testing have you done so far to come up with those > values? See also > http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debug > ging-Relevance-Issues-in-Search/ > > > On Feb 11, 2009, at 10:16, Cheng Zhang <zhangyongji...@yahoo.com> wrote: > >> Hi Grant, >> >> In my case, for example searching a book. Some of the returned documents are >> with high relevance (score > 3), but some of document with low score (<0.01) >> are useless. >> >> Without a "score filter", I have to go through each document to find out the >> number of documents I'm interested (score > nnn). This causes some problem >> for pagination. For example if I only need to display the first 10 records I >> need to retrieve all 1000 documents to figure out the number of meaningful >> documents which have score > nnn. >> >> Thx, >> Kevin >> >> >> >> >> ----- Original Message ---- >> From: Grant Ingersoll <gsing...@apache.org> >> To: solr-user@lucene.apache.org >> Sent: Wednesday, February 11, 2009 6:47:11 AM >> Subject: Re: score filter >> >> What's the motivation for wanting to do this? The reason I ask, is score is >> a relative thing determined by Lucene based on your index statistics. It is >> only meaningful for comparing the results of a specific query with a specific >> instance of the index. In other words, it isn't useful to filter on b/c >> there is no way of knowing what a good cutoff value would be. So, you won't >> be able to do score:[1.2 TO *] because score is a not an actual Field. >> >> That being said, you probably could implement a HitCollector at the Lucene >> level and somehow hook it into Solr to do what you want. Or, of course, just >> stop processing the results in your app after you see a score below a certain >> value. Naturally, this still means you have to retrieve the results. >> >> -Grant >> >> >> On Feb 10, 2009, at 10:01 PM, Cheng Zhang wrote: >> >>> Hello, >>> >>> Is there a way to set a score filter? I tried "+score:[1.2 TO *]" but it did >>> not work. >>> >>> Many thanks, >>> Kevin >>> >> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >> Solr/Lucene: >> http://www.lucidimagination.com/search >