Hello Franck,

I've had the same issue in the past.

I addressed that by adding a random value to each document.
I use this value in the "bf" parameter, so that the random value alters more or less the documents' score.

This results in a natural shuffling of documents which had the same score before.

I think you can also use a random field (random sort field type) (see http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html) Using random sort field gives a unique random value to each doc per requested field name (i.e. random_1234() gives a different random values distribution than random_4321(), which can be helpful to give documents a different random value without reindexing everything, additionally you can change the random_call() every day to make sure you change the results order from time to time, but not at each query :-))

The only reason why I chose not to use random sort fields is very personal : I needed to box the random values (using scale(random_whatever(),0,1) so that the random tie breaker doesn't take precedence on natural scoring of documents, and that scale function needs to compute min and max random values for the selected documents, which seemed to be costly for large sets. (*10 on query time for a docset of about 100k doc) -- but I might be wrong here.

I hope this helps,

--
Tanguy

Le 21/03/2012 13:51, fbrisbart a écrit :
Hi all,

I have, in my dataset, documents from different sources (forum, news,
reviews, ...)
And I'd like to have a mix of them in my search results.


The problem is that, depending only on the relevance, the results are
often grouped by source (Ex.:50 'forum' docs before the first 'review'
doc)
So, I am looking for a way to slightly disseminate the results and avoid
this behaviour.

I could run 1 search per source and manually do the mix. But, I have ~10
different sources, and I'm afraid this will be too slow.

Is there a clean&  fast way to do that ? I eventually think about
implementing a custom Scorer.



Thanks,
Franck


Reply via email to