Re: [PR] Spark: Add sort_by parameter to rewrite_manifests procedure [iceberg]

via GitHub Thu, 05 Mar 2026 09:12:36 -0800


RussellSpitzer commented on code in PR #15467:
URL: https://github.com/apache/iceberg/pull/15467#discussion_r2891250022



##########
docs/docs/spark-procedures.md:
##########
@@ -491,6 +491,7 @@ Data files in manifests are sorted by fields in the 
partition spec. This procedu
 | `table`       | ✔️  | string | Name of the table to update                   
                |
 | `use_caching` | ️   | boolean | Use Spark caching during operation (defaults 
to false). Enabling caching can increase memory footprint on executors. |
 | `spec_id`     | ️   | int | Spec id of the manifests to rewrite (defaults to 
current spec id) |
+| `sort_by`     | ️   | array<string> | List of partition field names to 
cluster manifests by. Choosing frequently queried partition fields can reduce 
planning time by skipping unnecessary manifests. If not set, manifests will be 
sorted by all partition fields in spec order. |

Review Comment:
   "partition transform names"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: Add sort_by parameter to rewrite_manifests procedure [iceberg]

Reply via email to