zachdisc commented on code in PR #12840:
URL: https://github.com/apache/iceberg/pull/12840#discussion_r2054892334
##########
api/src/main/java/org/apache/iceberg/actions/RewriteManifests.java:
##########
@@ -44,6 +45,28 @@ public interface RewriteManifests
*/
RewriteManifests rewriteIf(Predicate<ManifestFile> predicate);
+ /**
+ * Rewrite manifests in a given order, based on partition field names
+ *
+ * <p>Supply an optional set of partition field names to cluster the
rewritten manifests by. For
+ * example, given a table PARTITIONED BY (a, b, c, d), one may wish to
rewrite and cluster
+ * manifests by ('d', 'b') only, based on known query patterns. Rewriting
Manifests in this way
+ * will yield manifest_lists that point to manifest_files containing data
files for common 'd' and
+ * 'b' partitions.
+ *
+ * <p>If not set, manifests will be rewritten in the order of the transforms
in the table's
+ * current partition spec.
+ *
+ * @param partitionFields Exact transformed column names used for
partitioning; not the raw column
+ * names that partitions are derived from. E.G. supply 'data_bucket' and
not 'data' for a
+ * bucket(N, data) partition * definition
+ * @return this for method chaining
+ */
+ default RewriteManifests clusterBy(List<String> partitionFields) {
Review Comment:
In the original proposal we talked about updating the Spark api to match the
Java API, where they have a `clusterBy` interface
https://github.com/apache/iceberg/blob/90d1c90b6e6f26fdfe7c0c6c09a1ecb2fc2b3f2a/core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java#L116.
Educate me - I see clustering and sorting as synonyms. Clustering is just
the word for the spark technique to sort and `repartitionByRange` data into
"clusters". I'm not married to either.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]