ajantha-bhat opened a new pull request, #6661:
URL: https://github.com/apache/iceberg/pull/6661

   Currently [partitions metadata 
table](https://iceberg.apache.org/docs/latest/spark-queries/#partitions) only 
has the data file stats 
   ```
   file_count
   record_count
   ```
   
   When the delete files are present, these stats are inaccurate (as we don't 
decrement these values). 
   So, capture the delete file stats to give a rough idea about why these stats 
are inaccurate.
   **Note that we are not applying the deletes to the data file and computing 
the effective result as it will be a very expensive operation. Users are 
suggested to execute rewrite_data_files periodically to apply the delete files 
to the data files.**
   
   Delete file stats to be added:
   
   ```
   pos_delete_file_count
   pos_delete_record_count
   eq_delete_file_count
   eq_delete_record_count
   ```
   
   Note:
   - Docs will be updated in a follow-up PR, probably after renaming 
file_count, and record_count.
   - The same schema will also be used for the partition stats feature during 
implementation.
   
   Fixes #6042 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to