wjp719 opened a new pull request, #11995: URL: https://github.com/apache/lucene/pull/11995
when index sorting, fdt files needs to be decompressed and compressed according to new doc id order. This pr wants to add a docId offset index, so that we only copy origin fdt files to a new fdt file, and we only need to main the doc offset index according to the new doc id order. This can work in flush and merge process. This pr has two benefits: 1. now if index sorting, before flush, we need to write all origin uncompressed data to temp file, then read data back when flush. This pr can write final fdt file before flush, then write doc offset index when flush. This can reduce 30% IO throughput in our log scenario 2. improve 30% doc indexing performance in our log scenario the additional overhead is the new doc offset index files storage, 1% in our log scenario -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org