Re: [PR] Fix global score update bug in MultiLeafKnnCollector [lucene]

via GitHub Thu, 13 Jun 2024 12:59:02 -0700


benwtrent commented on PR #13463:
URL: https://github.com/apache/lucene/pull/13463#issuecomment-2166662079


   @gsmiller @mayya-sharipova 
   
   OK, I did some more testing. My initial testing didn't fully exercise these 
paths as the segments were still very large. So, I switched to flushing at 
every 1MB.
   
   CohereV2 (1M, 768 dims, flushing every 1MB, `mip` similarity).
   
   | fanout ->    | 0     | 10    | 50    | 100   | 200   |
   |--------------|-------|-------|-------|-------|-------|
   | candidate    | 12715 | 13319 | 15642 | 18312 | 23225 |
   | baseline     | 15759 | 16514 | 19449 | 22457 | 27245 |
   
   So, this PR is actually BETTER than baseline. 
   
   Additionally, I ran this same index with NO multi-leaf collector: 36361
   
   My previous experiments might have just hit a bad edge case where the 
difference between is so slight, the candidate is actually worse.
   
   I am gonna test with a different data set unless others beat me to it.
   
   Hopefully further testing proves out that this candidate is indeed overall 
better :). I would be very confused if it was truly worse.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Fix global score update bug in MultiLeafKnnCollector [lucene]

Reply via email to