amogh-jahagirdar commented on code in PR #12098:
URL: https://github.com/apache/iceberg/pull/12098#discussion_r1934775737


##########
format/spec.md:
##########
@@ -927,20 +927,21 @@ These rows must be sorted (in ascending manner with NULL 
FIRST) by `partition` f
 
 The schema of the partition statistics file is as follows:
 
-| v1 | v2 | Field id, name | Type | Description |
-|----|----|----------------|------|-------------|
-| _required_ | _required_ | **`1 partition`** | `struct<..>` | Partition data 
tuple, schema based on the unified partition type considering all specs in a 
table |
-| _required_ | _required_ | **`2 spec_id`** | `int` | Partition spec id |
-| _required_ | _required_ | **`3 data_record_count`** | `long` | Count of 
records in data files |
-| _required_ | _required_ | **`4 data_file_count`** | `int` | Count of data 
files |
-| _required_ | _required_ | **`5 total_data_file_size_in_bytes`** | `long` | 
Total size of data files in bytes |
-| _optional_ | _optional_ | **`6 position_delete_record_count`** | `long` | 
Count of records in position delete files |
-| _optional_ | _optional_ | **`7 position_delete_file_count`** | `int` | Count 
of position delete files |
-| _optional_ | _optional_ | **`8 equality_delete_record_count`** | `long` | 
Count of records in equality delete files |
-| _optional_ | _optional_ | **`9 equality_delete_file_count`** | `int` | Count 
of equality delete files |
-| _optional_ | _optional_ | **`10 total_record_count`** | `long` | Accurate 
count of records in a partition after applying the delete files if any |
-| _optional_ | _optional_ | **`11 last_updated_at`** | `long` | Timestamp in 
milliseconds from the unix epoch when the partition was last updated |
-| _optional_ | _optional_ | **`12 last_updated_snapshot_id`** | `long` | ID of 
snapshot that last updated this partition |
+| v1 | v2 | v3 | Field id, name | Type | Description |
+|----|----|----|----------------|------|-------------|
+| _required_ | _required_ | _required_ | **`1 partition`** | `struct<..>` | 
Partition data tuple, schema based on the unified partition type considering 
all specs in a table |
+| _required_ | _required_ | _required_ | **`2 spec_id`** | `int` | Partition 
spec id |
+| _required_ | _required_ | _required_ | **`3 data_record_count`** | `long` | 
Count of records in data files |
+| _required_ | _required_ | _required_ | **`4 data_file_count`** | `int` | 
Count of data files |
+| _required_ | _required_ | _required_ | **`5 total_data_file_size_in_bytes`** 
| `long` | Total size of data files in bytes |
+| _optional_ | _optional_ | _required_ | **`6 position_delete_record_count`** 
| `long` | Count of position deletes across position delete files and deletion 
vectors |
+| _optional_ | _optional_ | _required_ | **`7 position_delete_file_count`** | 
`int` | Count of position delete files ignoring deletion vectors |

Review Comment:
   IMO, there's value in having these be required because having null in these 
fields isn't really that informative since it just means that the state is 
unknown and needs to be derived using other metadata. Populating these fields 
accurately isn't that much of a burden for writers (so if there are no deletes, 
it'll just be 0) and eliminates any special casing around having to deal with 
null values here. I'd imagine the potential extra storage from these fields is 
also pretty negligible. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to