RussellSpitzer commented on code in PR #11606: URL: https://github.com/apache/iceberg/pull/11606#discussion_r1855332823
########## docs/docs/spark-procedures.md: ########## @@ -936,3 +936,40 @@ as an `UPDATE_AFTER` image, resulting in the following pre/post update images: |-----|--------|--------------| | 3 | Robert | UPDATE_BEFORE| | 3 | Dan | UPDATE_AFTER | + +## Table stats + +### `compute_table_stats` + +This procedure calculates the Number of Distinct Values (NDV) statistics for a specified table. +By default, statistics are computed for all columns using the table's current snapshot. +The procedure can be optionally configured to compute statistics for a specific snapshot and/or a subset of columns. + +| Argument Name | Required? | Type | Description | +|---------------|-----------|---------------|-------------------------------------| +| `table` | ✔️ | string | Name of the table | +| `snapshot_id` | | string | id of the snapshot to collect stats | +| `columns` | | array<string> | columns to collect stats | + +#### Output + +| Output Name | Type | Description | +|-------------------|--------|-------------------------------------------------| +| `statistics_file` | string | path to stats file created from by this command | Review Comment: Why would we add this to the documentation here? I'm not sure what the value is since it will be evident from the value returned where the file is? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org