benwtrent commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1755980866

   To address some of @jpountz's worries around adversarial cases, I tested one.
   
   Cohere-Wiki, I created 100 clusters via KMeans and indexed the documents 
sorted by their respective cluster labels.
   
   This ended up creating 11 total segments. Two ofthe segments needed to be 
requantized totally. The other 9 just needed their offsets recalculated.
   
   HNSW Float32 Recall@10: 0.840
   
   Quantized Recall@10: `0.787`
   Recall@10|15 (did I get the true top 10 when gathering 15): `0.848`
   
   So, we can achieve similar recall without having to requantize all vectors, 
even in adversarial cases. Additionally, in extreme cases, we will requantize 
the segment and potentially recalculate the quantiles.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to