Hello Franck,
I've had the same issue in the past.
I addressed that by adding a random value to each document.
I use this value in the "bf" parameter, so that the random value alters
more or less the documents' score.
This results in a natural shuffling of documents which had the same
score before.
I think you can also use a random field (random sort field type) (see
http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html)
Using random sort field gives a unique random value to each doc per
requested field name (i.e. random_1234() gives a different random values
distribution than random_4321(), which can be helpful to give documents
a different random value without reindexing everything, additionally you
can change the random_call() every day to make sure you change the
results order from time to time, but not at each query :-))
The only reason why I chose not to use random sort fields is very
personal : I needed to box the random values (using
scale(random_whatever(),0,1) so that the random tie breaker doesn't take
precedence on natural scoring of documents, and that scale function
needs to compute min and max random values for the selected documents,
which seemed to be costly for large sets. (*10 on query time for a
docset of about 100k doc) -- but I might be wrong here.
I hope this helps,
--
Tanguy
Le 21/03/2012 13:51, fbrisbart a écrit :
Hi all,
I have, in my dataset, documents from different sources (forum, news,
reviews, ...)
And I'd like to have a mix of them in my search results.
The problem is that, depending only on the relevance, the results are
often grouped by source (Ex.:50 'forum' docs before the first 'review'
doc)
So, I am looking for a way to slightly disseminate the results and avoid
this behaviour.
I could run 1 search per source and manually do the mix. But, I have ~10
different sources, and I'm afraid this will be too slow.
Is there a clean& fast way to do that ? I eventually think about
implementing a custom Scorer.
Thanks,
Franck