wypoon commented on issue #6042:
URL: https://github.com/apache/iceberg/issues/6042#issuecomment-1316220819

   I'm trying to understand the proposed behavior.
   To go back to @ajantha-bhat's example: Suppose you have a partition `{A}` 
with `record_count`=6 and `file_count`=2 (3 records in each file). Suppose you 
now delete 3 records in one file. I understand that `pos_delete_file_count` 
will be 1 and `pos_delete_record_count` will be 3. But what about 
`record_count` and `file_count`? Will `file_count` be 3 (is it supposed to be 
the total number of data files, including delete files)? And `record_count`? 
When is it possible to correctly compute the `record_count` using metadata 
alone (without applying delete files)?
   
   Another example: Suppose you have two partitions `{A}` and `{B}`. Let's say 
`record_count`=1000 and `file_count`=1 for partition `{B}`. Suppose you rename 
`B` to `C` (using an `UPDATE <table> SET <partition column> = 'C' where 
<partition column = 'B'` where we use merge-on-read, resulting in 1 delete file 
and 1 new pure data file). If you do a `SELECT * FROM <table>.partitions` 
currently, you will get an entry for each of `{A}`, `{B}` and `{C}`. What 
should the behavior be (should there be an entry for `{B}` and if so, what 
should be shown for it? and for `{C}`?)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to