[GitHub] [lucene] wjp719 opened a new pull request, #11995: enable fully directly copy merge/flush fdt files when index sorting

GitBox Sat, 03 Dec 2022 04:58:12 -0800


wjp719 opened a new pull request, #11995:
URL: https://github.com/apache/lucene/pull/11995


   when index sorting, fdt files needs to be decompressed and compressed 
according to  new doc id order. This pr wants to add a docId offset index, so 
that we only copy origin fdt files to a new fdt file, and we only need to  main 
the doc offset index according to the new doc id order. This can work in flush 
and merge process.
   
   This pr has two benefits:
   1. now if index sorting, before flush, we need to write all origin 
uncompressed data to temp file, then read data back when flush. This pr can 
write final fdt file before flush, then write doc offset index when flush. This 
can reduce 30% IO throughput in our log scenario
   2. improve 30% doc indexing performance in our log scenario
   
   the additional overhead is the new doc offset index files storage, 1% in our 
log scenario
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] wjp719 opened a new pull request, #11995: enable fully directly copy merge/flush fdt files when index sorting

Reply via email to