RussellSpitzer commented on code in PR #9731: URL: https://github.com/apache/iceberg/pull/9731#discussion_r1799738308
########## api/src/main/java/org/apache/iceberg/actions/RewriteManifests.java: ########## @@ -44,6 +47,43 @@ public interface RewriteManifests */ RewriteManifests rewriteIf(Predicate<ManifestFile> predicate); + /** + * Rewrite manifests in a given order, based on partition field names + * + * <p>Supply an optional set of partition field names to cluster the rewritten manifests by. For + * example, given a table PARTITIONED BY (a, b, c, d), you may wish to rewrite and cluster + * manifests by ('d', 'b') only, based on your query patterns. Rewriting Manifests in this way + * will yield manifest_lists that point to manifest_files containing data files for common 'd' and + * 'b' partitions. + * + * <p>If not set, manifests will be rewritten in the order of the transforms in the table's + * current partition spec. + * + * @param partitionFieldClustering Exact transformed column names used for partitioning; not the + * raw column names that partitions are derived from. E.G. supply 'data_bucket' and not 'data' + * for a bucket(N, data) partition * definition + * @return this for method chaining + */ + default RewriteManifests clusterBy(List<String> partitionFieldClustering) { + throw new UnsupportedOperationException( + this.getClass().getName() + " doesn't implement clusterBy(List<String>)"); + } + + /** + * Rewrite manifests in a given order, dictated by a custom Function + * + * <p>Supply a Function which will apply its own custom clustering logic based on supplied {@link + * org.apache.iceberg.DataFile} attributes. + * + * @param clusterStrategyFunction A Function that returns a String to be used for manifest + * clustering + * @return this method for chaining + */ + default RewriteManifests clusterBy(Function<DataFile, String> clusterStrategyFunction) { Review Comment: I don't feel good about opening up a public api like this that takes an arbitrary function unless we have a very good reason to. Do we need this kind of flexability? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org