Re: [PR] Spark: Adding simple custom partition sort order option to RewriteManifests Spark Action [iceberg]

via GitHub Mon, 04 Mar 2024 19:05:30 -0800


zachdisc commented on code in PR #9731:
URL: https://github.com/apache/iceberg/pull/9731#discussion_r1512074114



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteManifestsSparkAction.java:
##########
@@ -309,24 +312,6 @@ private List<ManifestFile> writePartitionedManifests(
       clusteredManifestEntryDF =
           manifestEntryDF.withColumn(
               CUSTOM_CLUSTERING_COLUMN_NAME, 
clusteringUdf.apply(col("data_file")));
-    } else if (partitionFieldSortOrder != null) {
-      LOG.info(
-          "Sorting manifests for specId {} by partition columns in order of {} 
",
-          spec.specId(),
-          partitionFieldSortOrder);
-
-      // Map the top level partition column names to the column name 
referenced within the manifest
-      // entry dataframe
-      Column[] actualPartitionColumns =
-          partitionFieldSortOrder.stream()
-              .map(p -> col("data_file.partition." + p))
-              .toArray(Column[]::new);

Review Comment:
   This part might be the trick though, I wonder if we can modify the 
`PartitionSortFunction` to return a column, which we could define as a struct 
with partition column names in the standard `sort` case, and just a String in 
the more customizable case.
   
   Actually, I'm wondering if we could make it return a struct in general, or 
something that we could treat as such. For the same reason as I give below with 
wanting to deliver some level of hierarchy. 
   
   I'll have to think on it and tinker. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Adding simple custom partition sort order option to RewriteManifests Spark Action [iceberg]

Reply via email to