sherman opened a new issue, #12203: URL: https://github.com/apache/lucene/issues/12203
### Description The question is regarding the scalable merge/compaction of doc values, given the following context: I have a large sharded index. Each shard can contain segments of millions of documents. There are several hundred fields in the index, and half of them are doc values. I sometimes face issues with merge times when I need to merge or compact a large segment. The problem is that it's a single-threaded operation where a single segment is merged in a single merger thread. From the codec doc values format of version 9.x, it appears possible to use map-reduce techniques when writing new large doc value segments. This is because all metadata is read before any field data can be read, and all doc value types have offset and length fields in the metadata. My basic idea is to write each field in parallel to a single file and then perform a low-level merge of the binary data. After that, I can rewrite only the metadata to update the offsets. As I am still new to Lucene development, could someone please provide some critique of this idea? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org