+1. Of course it is doable, but that doesn't mean you should, which
is what I was trying to say before, (but was typing on my iPod so it
wasn't fast) and which Walter has done so. It is entirely conceivable
to me that someone could search for a very common word such that the
score of all relevant (and thus, "good") documents are below your
predefined threshold.
At any rate, proceed at your own peril. To implement it, look into
the SearchComponent functionality.
On Feb 11, 2009, at 12:20 PM, Walter Underwood wrote:
Don't bother doing this. It doesn't work.
This seems like a good idea, something that would be useful for
almost every Lucene installation, but it isn't in Lucene because
it does not work in the real world.
A few problems:
* Some users want every match and don't care how many pages of
results they look at.
* Some users are very bad at creating queries that match their
information needs. Others are merely bad, not very bad. The good
matches for their query are on top, but the good matches for
their information need are on the third page.
* Misspellings can put the right match (partial match) at the
bottom. I did this yesterday at my library site, typeing
"Katherine Kerr" instead of the correct "Katharine Kerr".
Their search engine showed no matches (grrr), so I had to
search again with "Kerr".
* Most users do not know how to repair their queries, like
I did with "Katherine Kerr", changing it to "Kerr". Even if
they do, you shouldn't make them. Just show the weakly
relevant results.
* Documents have errors, just like queries. I find bad data
on our site about once a month, and we have professional
editors. We still haven't fixed our entry for "Betty Page"
to read "Bettie Page".
* People may use non-title words in the query, like searching
for "batman" when they want "The Dark Knight".
So, don't do this. If you are forced to do it, make sure that you
measure your search quality before and after it is implemented,
because it will get worse. Then you can stop doing it.
wunder
On 2/11/09 8:28 AM, "Cheng Zhang" <zhangyongji...@yahoo.com> wrote:
Just did some research. It seems that it's doable with additional
code added
to Solr but not out of box. Thank you, Grant.
----- Original Message ----
From: Grant Ingersoll <gsing...@apache.org>
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Sent: Wednesday, February 11, 2009 8:14:01 AM
Subject: Re: score filter
At what point do you draw the line? 0.01 is too low, but what
about 0.5 or
0.3? In fact, there may be queries where 0.01 is relevant.
Relevance is a tricky thing and putting in arbitrary cutoffs is
usually not a
good thing. An alternative might be to instead look at the
difference between
scores and see if the gap is larger than some delta, but even that
is subject
to the vagaries of scoring.
What kind of relevance testing have you done so far to come up with
those
values? See also
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debug
ging-Relevance-Issues-in-Search/
On Feb 11, 2009, at 10:16, Cheng Zhang <zhangyongji...@yahoo.com>
wrote:
Hi Grant,
In my case, for example searching a book. Some of the returned
documents are
with high relevance (score > 3), but some of document with low
score (<0.01)
are useless.
Without a "score filter", I have to go through each document to
find out the
number of documents I'm interested (score > nnn). This causes some
problem
for pagination. For example if I only need to display the first
10 records I
need to retrieve all 1000 documents to figure out the number of
meaningful
documents which have score > nnn.
Thx,
Kevin
----- Original Message ----
From: Grant Ingersoll <gsing...@apache.org>
To: solr-user@lucene.apache.org
Sent: Wednesday, February 11, 2009 6:47:11 AM
Subject: Re: score filter
What's the motivation for wanting to do this? The reason I ask,
is score is
a relative thing determined by Lucene based on your index
statistics. It is
only meaningful for comparing the results of a specific query with
a specific
instance of the index. In other words, it isn't useful to filter
on b/c
there is no way of knowing what a good cutoff value would be. So,
you won't
be able to do score:[1.2 TO *] because score is a not an actual
Field.
That being said, you probably could implement a HitCollector at
the Lucene
level and somehow hook it into Solr to do what you want. Or, of
course, just
stop processing the results in your app after you see a score
below a certain
value. Naturally, this still means you have to retrieve the
results.
-Grant
On Feb 10, 2009, at 10:01 PM, Cheng Zhang wrote:
Hello,
Is there a way to set a score filter? I tried "+score:[1.2 TO *]"
but it did
not work.
Many thanks,
Kevin
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using
Solr/Lucene:
http://www.lucidimagination.com/search