aokolnychyi commented on code in PR #12098: URL: https://github.com/apache/iceberg/pull/12098#discussion_r1940457488
########## format/spec.md: ########## @@ -927,20 +927,21 @@ These rows must be sorted (in ascending manner with NULL FIRST) by `partition` f The schema of the partition statistics file is as follows: -| v1 | v2 | Field id, name | Type | Description | -|----|----|----------------|------|-------------| -| _required_ | _required_ | **`1 partition`** | `struct<..>` | Partition data tuple, schema based on the unified partition type considering all specs in a table | -| _required_ | _required_ | **`2 spec_id`** | `int` | Partition spec id | -| _required_ | _required_ | **`3 data_record_count`** | `long` | Count of records in data files | -| _required_ | _required_ | **`4 data_file_count`** | `int` | Count of data files | -| _required_ | _required_ | **`5 total_data_file_size_in_bytes`** | `long` | Total size of data files in bytes | -| _optional_ | _optional_ | **`6 position_delete_record_count`** | `long` | Count of records in position delete files | -| _optional_ | _optional_ | **`7 position_delete_file_count`** | `int` | Count of position delete files | -| _optional_ | _optional_ | **`8 equality_delete_record_count`** | `long` | Count of records in equality delete files | -| _optional_ | _optional_ | **`9 equality_delete_file_count`** | `int` | Count of equality delete files | -| _optional_ | _optional_ | **`10 total_record_count`** | `long` | Accurate count of records in a partition after applying the delete files if any | -| _optional_ | _optional_ | **`11 last_updated_at`** | `long` | Timestamp in milliseconds from the unix epoch when the partition was last updated | -| _optional_ | _optional_ | **`12 last_updated_snapshot_id`** | `long` | ID of snapshot that last updated this partition | +| v1 | v2 | v3 | Field id, name | Type | Description | +|----|----|----|----------------|------|-------------| +| _required_ | _required_ | _required_ | **`1 partition`** | `struct<..>` | Partition data tuple, schema based on the unified partition type considering all specs in a table | +| _required_ | _required_ | _required_ | **`2 spec_id`** | `int` | Partition spec id | +| _required_ | _required_ | _required_ | **`3 data_record_count`** | `long` | Count of records in data files | +| _required_ | _required_ | _required_ | **`4 data_file_count`** | `int` | Count of data files | +| _required_ | _required_ | _required_ | **`5 total_data_file_size_in_bytes`** | `long` | Total size of data files in bytes | +| _optional_ | _optional_ | _required_ | **`6 position_delete_record_count`** | `long` | Count of position deletes across position delete files and deletion vectors | +| _optional_ | _optional_ | _required_ | **`7 position_delete_file_count`** | `int` | Count of position delete files ignoring deletion vectors | +| | | _required_ | **`13 dv_count`** | `int` | Count of deletion vectors | +| _optional_ | _optional_ | _required_ | **`8 equality_delete_record_count`** | `long` | Count of records in equality delete files | +| _optional_ | _optional_ | _required_ | **`9 equality_delete_file_count`** | `int` | Count of equality delete files | +| _optional_ | _optional_ | _optional_ | **`10 total_record_count`** | `long` | Accurate count of records in a partition after applying deletes if any | Review Comment: I don't think we can make this required and I am afraid we have to fix the implementation. If there are equality deletes, determining this would be very expensive. If we don't have equality deletes and have only DVs, then we can populate this value without an expensive computation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org