[GitHub] [lucene] jpountz commented on issue #12448: [Performance] sort query improvement for sequential ordered data [e.g. timestamp field sort in log data]

via GitHub Thu, 20 Jul 2023 01:32:35 -0700


jpountz commented on issue #12448:
URL: https://github.com/apache/lucene/issues/12448#issuecomment-1643505863


   Lazily heapifying sounds interesting, and thanks for sharing performance 
numbers when data occurs in random order. Do you also have performance numbers 
for the case when the index sort is the opposite order compared to the query 
sort? I'm curious how much this optimization can save in that case since this 
is what you're trying to optimize.
   
   > We dont have benchmark for numeric sort in Lucene itself
   
   Did you look at this task on nightly benchmarks? 
http://people.apache.org/~mikemccand/lucenebench/TermDTSort.html
   
   You might also be interested in checking out this 
[paper](https://www.vldb.org/pvldb/vol15/p3472-yu.pdf) where Tencent describes 
optimizations that they made for a similar problem in section 4.5.2: they 
configure an index sort by ascending timestamp on their data, but still want to 
be able to perform both queries by ascending timestamp and descending 
timestamp. To handle the case when the index sort and the query sort are 
opposite, they query on exponentially growing windows of documents that match 
the end of the doc ID space.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] jpountz commented on issue #12448: [Performance] sort query improvement for sequential ordered data [e.g. timestamp field sort in log data]

Reply via email to