I've done that already. All you need to do is to create your custom request handler.

My handler, among other things, what it does is the following:

It receives a factor threshold, such as 0.85. This means that the score of the first document returned will be the assumed as the "best" matching document. Then the document number #30 (definable) or the last document if it returns less than 30, will be the "worst" document.

factor = 0.85 (for example)
bestScore = 1000 (for example)
worstScore = 500 (for example score of the document #30)
Then the handler applies the function : threshold = bestScore * factor + worstScore * (1 - factor)

in the example case the threshold = 925. This means that the documents whose score is above 925 are at least an 85% similar to the first document returned.

So we obtain the threshold based on the score of the documents returned. Why 30? Because statistically there is no much difference between 30 and 50 or 100 (This may depend on the number of documents you want return, in my case is the best 3 or 4).

Once we get the threshold based on the score, all I need to do is to check if the score of the next document to include in the returning set is above the threshold.

If you need any further help, don't hesitate to ask for it.

Pako



Umar Shah wrote:
Hi,

is there some way of limiting the results  above some fixed threshold?

thanks in anticipation
-umar


Reply via email to