benwtrent commented on PR #13463: URL: https://github.com/apache/lucene/pull/13463#issuecomment-2166662079
@gsmiller @mayya-sharipova OK, I did some more testing. My initial testing didn't fully exercise these paths as the segments were still very large. So, I switched to flushing at every 1MB. CohereV2 (1M, 768 dims, flushing every 1MB, `mip` similarity). | fanout -> | 0 | 10 | 50 | 100 | 200 | |--------------|-------|-------|-------|-------|-------| | candidate | 12715 | 13319 | 15642 | 18312 | 23225 | | baseline | 15759 | 16514 | 19449 | 22457 | 27245 | So, this PR is actually BETTER than baseline. Additionally, I ran this same index with NO multi-leaf collector: 36361 My previous experiments might have just hit a bad edge case where the difference between is so slight, the candidate is actually worse. I am gonna test with a different data set unless others beat me to it. Hopefully further testing proves out that this candidate is indeed overall better :). I would be very confused if it was truly worse. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org