pvary commented on code in PR #12629:
URL: https://github.com/apache/iceberg/pull/12629#discussion_r2031216069


##########
data/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java:
##########
@@ -149,6 +162,92 @@ public static PartitionStatisticsFile 
computeAndWriteStatsFile(Table table, long
         table, snapshot.snapshotId(), schema(partitionType), sortedStats);
   }
 
+  /**
+   * Incrementally computes the stats after the snapshot that has partition 
stats file till the
+   * given snapshot and writes the combined result into a {@link 
PartitionStatisticsFile} after
+   * merging the stats.
+   *
+   * @param table The {@link Table} for which the partition statistics is 
computed.
+   * @param snapshotId snapshot for which partition statistics are computed.
+   * @return {@link PartitionStatisticsFile} for the given snapshot, or null 
if no statistics are
+   *     present.
+   */
+  public static PartitionStatisticsFile computeAndWriteStatsFileIncremental(
+      Table table, long snapshotId) throws IOException {
+    Preconditions.checkArgument(table != null, "Table cannot be null");
+    Snapshot snapshot = table.snapshot(snapshotId);
+    Preconditions.checkArgument(snapshot != null, "Snapshot not found: %s", 
snapshotId);
+
+    StructType partitionType = Partitioning.partitionType(table);
+    Schema statsFileSchema = schema(partitionType);
+    PartitionStatisticsFile statisticsFile = latestStatsFile(table, 
snapshotId);

Review Comment:
   With the new `InternalData` we can do this in `PartitionStatsHandler`:
   ```
     private static DataWriter<StructLike> dataWriter(
         Schema dataSchema, OutputFile outputFile, FileFormat fileFormat) 
throws IOException {
       return new DataWriter<>(
           InternalData.write(fileFormat, 
outputFile).schema(dataSchema).build(),
           fileFormat,
           outputFile.location(),
           PartitionSpec.unpartitioned(),
           null,
           null,
           null);
     }
   
     private static CloseableIterable<StructLike> dataReader(Schema schema, 
InputFile inputFile) {
       FileFormat fileFormat = FileFormat.fromFileName(inputFile.location());
       Preconditions.checkArgument(
           fileFormat != null, "Unable to determine format of file: %s", 
inputFile.location());
       return InternalData.read(fileFormat, inputFile).project(schema).build();
     }
   ```
   
   In this case we can move the `PartitionStatsHandler` to `core`, and 
basically merge the 2 util classes.
   
   Do I miss something?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to