ajantha-bhat opened a new pull request, #6661: URL: https://github.com/apache/iceberg/pull/6661
Currently [partitions metadata table](https://iceberg.apache.org/docs/latest/spark-queries/#partitions) only has the data file stats ``` file_count record_count ``` When the delete files are present, these stats are inaccurate (as we don't decrement these values). So, capture the delete file stats to give a rough idea about why these stats are inaccurate. **Note that we are not applying the deletes to the data file and computing the effective result as it will be a very expensive operation. Users are suggested to execute rewrite_data_files periodically to apply the delete files to the data files.** Delete file stats to be added: ``` pos_delete_file_count pos_delete_record_count eq_delete_file_count eq_delete_record_count ``` Note: - Docs will be updated in a follow-up PR, probably after renaming file_count, and record_count. - The same schema will also be used for the partition stats feature during implementation. Fixes #6042 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org