[GitHub] [lucene] sherman opened a new issue, #12203: Scalable merge/compaction of big doc values segments.

via GitHub Mon, 13 Mar 2023 11:53:05 -0700


sherman opened a new issue, #12203:
URL: https://github.com/apache/lucene/issues/12203


   ### Description
   
   The question is regarding the scalable merge/compaction of doc values, given 
the following context:
   
   I have a large sharded index.
   Each shard can contain segments of millions of documents.
   There are several hundred fields in the index, and half of them are doc 
values.
   
   I sometimes face issues with merge times when I need to merge or compact a 
large segment. The problem is that it's a single-threaded operation where a 
single segment is merged in a single merger thread.
   
   From the codec doc values format of version 9.x, it appears possible to use 
map-reduce techniques when writing new large doc value segments. This is 
because all metadata is read before any field data can be read, and all doc 
value types have offset and length fields in the metadata.
   
   My basic idea is to write each field in parallel to a single file and then 
perform a low-level merge of the binary data. After that, I can rewrite only 
the metadata to update the offsets.
   
   As I am still new to Lucene development, could someone please provide some 
critique of this idea?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] sherman opened a new issue, #12203: Scalable merge/compaction of big doc values segments.

Reply via email to