jpountz commented on issue #12448: URL: https://github.com/apache/lucene/issues/12448#issuecomment-1643869696
Thinking a bit more about this optimization, I wonder if it would still work well under concurrent indexing. If I understand the optimization correctly, it relies on the fact that the n-th collected document would generally have a more competitive value than the (n-k)-th collected document to keep inserting into the circular buffer. But this wouldn't be true, e.g. under concurrent indexing if flushing segments that have (k+1) docs or more? For instance, assume two indexing threads that index 10 documents each between two consecutive refreshes. The first segment could have timestamps 0, 2, 4, ..., 18 and the second segment could have timestamps 1, 3, 5, ..., 19. Then when they get merged, this would create a segment whose timestamps would be 0, 2, 4, ..., 18, 1, 3, 5, ..., 19. Now if you collect the top-5 hits by descending timestamp, the optimization would automatically disable itself when it has timestamps `[10, 12, 14, 16, 18]` in the queue and sees timestamp `1`, since `1 < 10`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org