jpountz opened a new pull request, #12011:
URL: https://github.com/apache/lucene/pull/12011

   When flushing segments that have an index sort configured, postings lists 
get loaded into arrays and get reordered according to the index sort.
   
   This reordering is implemented with `TimSorter`, a variant of merge sort. 
Like merge sort, an important part of `TimSorter` consists of merging two 
contiguous sorted slices of the array into a combined sorted slice. This 
merging can be done either with external memory, which is the classical 
approach, or in-place, which still runs in linear time but with a much higher 
factor. Until now we were allocating a fixed budget of `maxDoc/64` for doing 
these merges with external memory. If this is not enough, sorted slices would 
be merged in place.
   
   I've been looking at some profiles recently for an index where a 
non-negligible chunk of the time was spent on in-place merges. So I would like 
to propose the following change:
    - Increase the maximum RAM budget to `maxDoc / 8`. This should help avoid 
in-place merges for all postings up to `docFreq = maxDoc / 4`.
    - Make this RAM budget lazily allocated, rather than eagerly like today. 
This would help not allocate memory in O(maxDoc) for fields like primary keys 
that only have a couple postings per term.
   
   So overall memory usage would never be more than 50% higher than what it is 
today, because `TimSorter` never needs more than X temporary slots if the 
postings list doesn't have at least 2*X entries, and these 2*X entries already 
get loaded into memory today. And for fields that have short postings, memory 
usage should actually be lower.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to