Deepika0510 opened a new issue, #12395:
URL: https://github.com/apache/lucene/issues/12395

   ### Description
   
   There is an opportunity to improve functionality and performance of existing 
Disk Usage API, through a re-implementation.
   
   Currently, the best tool we have for this is based on a custom Codec that 
separates storage by field; to get the statistics we read an existing index and 
write it out using `AddIndexes` and force-merging, using the custom codec. This 
is time-consuming and inefficient and tends not to get done.
   
   What we could do is estimate the storage of each field by iterating its 
structures (i.e., inverted index, doc-values, stored fields, etc.) and tracking 
the number of read-bytes. Since we will enumerate the index, it wouldn't 
require us to force-merge all the data through `addIndexes`, and at the same 
time it doesn't invade the codec apis.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to