: bq: we don't want to use either the primary key or the record's
: update date as the tie-breaker, as it may introduce an new bias into the
: ranking algorithm
: 
: Are you thinking of adding something to your main clause to force this?
: If so, why not just use sorting by adding a sort clause like:
: 
: &sort=score desc, datefield desc

i think that is what Gregg mentioned wanting to avoid -- because it will 
bais results in favor of documents with newer values in the date field.

i believe he wants a consistent ordering that resolves ties in docs 
with identical scores in some way thta doesn't favor documents based on 
any externally visible propery of the documents themselves.

hashing on the uniqueKey seems like it should work, since it would 
esentially be a random value generated with a consistent seed (the key) 
regardless of the shards or document addition order -- but depending on 
your hashing algorithm it could still intorduce some bias assuming your 
uniqueKeys have some semantic meaning to begin with (if they don't you 
oculd just sort on them).  

To be safe, you could generate the hash using more then just the uniqueKey 
... why not use *all* of the fields in the document?

https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/SignatureUpdateProcessorFactory.html
http://wiki.apache.org/solr/Deduplication


-Hoss

Reply via email to