szehon-ho commented on code in PR #11606:
URL: https://github.com/apache/iceberg/pull/11606#discussion_r1870121070


##########
docs/docs/spark-procedures.md:
##########
@@ -936,3 +936,40 @@ as an `UPDATE_AFTER` image, resulting in the following 
pre/post update images:
 |-----|--------|--------------|
 | 3   | Robert | UPDATE_BEFORE|
 | 3   | Dan    | UPDATE_AFTER |
+
+## Table Statistics
+
+### `compute_table_stats`
+
+This procedure calculates the [Number of Distinct Values (NDV) 
statistics](../../format/puffin-spec.md) for a specific table.
+By default, statistics are computed for all columns using the table's current 
snapshot.
+The procedure can be optionally configured to compute statistics for a 
specific snapshot and/or a subset of columns.
+
+| Argument Name | Required? | Type          | Description                      
   |
+|---------------|-----------|---------------|-------------------------------------|
+| `table`       | ✔️        | string        | Name of the table                
   |
+| `snapshot_id` |           | string        | Id of the snapshot to collect 
stats |
+| `columns`     |           | array<string> | Columns to collect stats         
   |
+
+#### Output
+
+| Output Name       | Type   | Description                                     
|
+|-------------------|--------|-------------------------------------------------|
+| `statistics_file` | string | Path to stats file created from by this command 
|
+
+#### Examples
+
+Collect statistics of the latest snapshot of table `my_table`
+```sql
+CALL catalog_name.system.compute_table_stats('my_table');
+```
+
+Collect statistics of the snapshot with id `snap1` of table `my_table`
+```sql
+CALL catalog_name.system.compute_table_stats(snapshot_id => 'snap1', table => 
'my_table' );

Review Comment:
   Im sorry to keep adding more comment sporadically, but can we put table 
first here and next one?  
   
   I feel each example add more arguments than the previous one, and itd be 
clearer to add them always to the end.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to