[GitHub] [lucene] jpountz commented on issue #12448: [Performance] sort query improvement for sequential ordered data [e.g. timestamp field sort in log data]

via GitHub Thu, 20 Jul 2023 05:50:52 -0700


jpountz commented on issue #12448:
URL: https://github.com/apache/lucene/issues/12448#issuecomment-1643869696


   Thinking a bit more about this optimization, I wonder if it would still work 
well under concurrent indexing. If I understand the optimization correctly, it 
relies on the fact that the n-th collected document would generally have a more 
competitive value than the (n-k)-th collected document to keep inserting into 
the circular buffer. But this wouldn't be true, e.g. under concurrent indexing 
if flushing segments that have (k+1) docs or more?
   
   For instance, assume two indexing threads that index 10 documents each 
between two consecutive refreshes. The first segment could have timestamps 0, 
2, 4, ..., 18 and the second segment could have timestamps 1, 3, 5, ..., 19. 
Then when they get merged, this would create a segment whose timestamps would 
be 0, 2, 4, ..., 18, 1, 3, 5, ..., 19. Now if you collect the top-5 hits by 
descending timestamp, the optimization would automatically disable itself when 
it has timestamps `[10, 12, 14, 16, 18]` in the queue and sees timestamp `1`, 
since `1 < 10`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on issue #12448: [Performance] sort query improvement for sequential ordered data [e.g. timestamp field sort in log data]

Reply via email to