flyrain commented on code in PR #7105: URL: https://github.com/apache/iceberg/pull/7105#discussion_r1322057077
########## format/spec.md: ########## @@ -702,6 +703,41 @@ Blob metadata is a struct with the following fields: | _optional_ | _optional_ | **`properties`** | `map<string, string>` | Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. | +#### Partition statistics + +Partition statistics files are based on [Partition Statistics file spec](#partition-statistics-file). Partition statistics are informational. A reader can choose to +ignore partition statistics information. Partition statistics support is not required to read the table correctly. +Each table snapshot may be associated with at most one partition statistic file and the table can contain many partition statistics files associated with different table snapshots. +A writer can optionally write the partition statistics file during each write operation. If the statistics file is written for the specific snapshot, +it must be accurate and must be registered in the table metadata file to be considered as a valid statistics file for the reader. + +Partition statistics files metadata within `partition-statistics` table metadata field is a struct with the following fields: + +| v1 | v2 | Field name | Type | Description | +|----|----|------------|------|-------------| +| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg table's snapshot the partition statistics file is associated with. | +| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of the partition statistics file. See [Partition Statistics file](#partition-statistics-file). | +| _required_ | _required_ | **`max-data-sequence-number`** | `long` | Maximum data sequence number of the Iceberg table's snapshot the partition statistics was computed from. | Review Comment: How about removing `max-data-sequence-number` temporarily, so that we can move on this PR? We can get a sequence number from a snapshot without any issue. And we can always add the `max-data-sequence-number` back if necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
