Re: Disseminate results from different sources

Tanguy Moal Wed, 21 Mar 2012 10:33:13 -0700

Hello Franck,

I've had the same issue in the past.


I addressed that by adding a random value to each document.

I use this value in the "bf" parameter, so that the random value altersmore or less the documents' score.

This results in a natural shuffling of documents which had the samescore before.

I think you can also use a random field (random sort field type) (seehttp://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html)Using random sort field gives a unique random value to each doc perrequested field name (i.e. random_1234() gives a different random valuesdistribution than random_4321(), which can be helpful to give documentsa different random value without reindexing everything, additionally youcan change the random_call() every day to make sure you change theresults order from time to time, but not at each query :-))

The only reason why I chose not to use random sort fields is verypersonal : I needed to box the random values (usingscale(random_whatever(),0,1) so that the random tie breaker doesn't takeprecedence on natural scoring of documents, and that scale functionneeds to compute min and max random values for the selected documents,which seemed to be costly for large sets. (*10 on query time for adocset of about 100k doc) -- but I might be wrong here.


I hope this helps,

--
Tanguy

Le 21/03/2012 13:51, fbrisbart a écrit :

Hi all,

I have, in my dataset, documents from different sources (forum, news,
reviews, ...)
And I'd like to have a mix of them in my search results.


The problem is that, depending only on the relevance, the results are
often grouped by source (Ex.:50 'forum' docs before the first 'review'
doc)
So, I am looking for a way to slightly disseminate the results and avoid
this behaviour.

I could run 1 search per source and manually do the mix. But, I have ~10
different sources, and I'm afraid this will be too slow.

Is there a clean&  fast way to do that ? I eventually think about
implementing a custom Scorer.



Thanks,
Franck

Re: Disseminate results from different sources

Reply via email to