wypoon commented on issue #6042: URL: https://github.com/apache/iceberg/issues/6042#issuecomment-1316220819
I'm trying to understand the proposed behavior. To go back to @ajantha-bhat's example: Suppose you have a partition `{A}` with `record_count`=6 and `file_count`=2 (3 records in each file). Suppose you now delete 3 records in one file. I understand that `pos_delete_file_count` will be 1 and `pos_delete_record_count` will be 3. But what about `record_count` and `file_count`? Will `file_count` be 3 (is it supposed to be the total number of data files, including delete files)? And `record_count`? When is it possible to correctly compute the `record_count` using metadata alone (without applying delete files)? Another example: Suppose you have two partitions `{A}` and `{B}`. Let's say `record_count`=1000 and `file_count`=1 for partition `{B}`. Suppose you rename `B` to `C` (using an `UPDATE <table> SET <partition column> = 'C' where <partition column = 'B'` where we use merge-on-read, resulting in 1 delete file and 1 new pure data file). If you do a `SELECT * FROM <table>.partitions` currently, you will get an entry for each of `{A}`, `{B}` and `{C}`. What should the behavior be (should there be an entry for `{B}` and if so, what should be shown for it? and for `{C}`?)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org