szehon-ho commented on code in PR #9335:
URL: https://github.com/apache/iceberg/pull/9335#discussion_r1777502004


##########
core/src/main/java/org/apache/iceberg/AllManifestsTableTaskParser.java:
##########
@@ -39,6 +39,8 @@ class AllManifestsTableTaskParser {
   private static final String MANIFEST_LIST_LOCATION = 
"manifest-list-Location";
   private static final String RESIDUAL = "residual-filter";
   private static final String REFERENCE_SNAPSHOT_ID = "reference-snapshot-id";
+  private static final String REFERENCE_SNAPSHOT_TIMESTAMP_MILLIS =

Review Comment:
   while its descriptive, its overly longer than other column names, how about 
combine with @RussellSpitzer original suggestion and 'reference-snapshot-time'



##########
core/src/main/java/org/apache/iceberg/AllDataFilesTable.java:
##########
@@ -66,5 +68,14 @@ protected TableScan newRefinedScan(Table table, Schema 
schema, TableScanContext
     protected CloseableIterable<ManifestFile> manifests() {
       return reachableManifests(snapshot -> 
snapshot.dataManifests(table().io()));
     }
+
+    @Override
+    protected CloseableIterable<Pair<Snapshot, ManifestFile>> 
snapshotManifestPairs() {

Review Comment:
   Can we deprecate the other one?



##########
core/src/main/java/org/apache/iceberg/MetricsUtil.java:
##########
@@ -456,4 +458,54 @@ public <T> void set(int pos, T value) {
       throw new UnsupportedOperationException("StructWithReadableMetrics is 
read only");
     }
   }
+
+  static class StructWithRefSnapshot implements StructLike {
+    private final StructLike struct;
+    private final Schema projectedSchema;
+    private final Snapshot snapshot;
+
+    private final int refSnapshotIdPosition;
+    private final int refSnapshotTimestampMillisPosition;
+
+    StructWithRefSnapshot(
+        StructLike struct,
+        Schema projectedSchema,
+        Snapshot snapshot,
+        Types.NestedField refSnapshotIdField,
+        Types.NestedField refSnapshotTimestampMillisField) {
+      this.struct = struct;
+      this.projectedSchema = projectedSchema;
+      this.snapshot = snapshot;
+      this.refSnapshotIdPosition =
+          Optional.ofNullable(refSnapshotIdField)
+              .map(f -> projectedSchema.columns().indexOf(refSnapshotIdField))
+              .orElse(-1);
+
+      this.refSnapshotTimestampMillisPosition =
+          Optional.ofNullable(refSnapshotIdField)
+              .map(f -> 
projectedSchema.columns().indexOf(refSnapshotTimestampMillisField))
+              .orElse(-1);
+    }
+
+    @Override
+    public int size() {
+      return projectedSchema.columns().size();
+    }
+
+    @Override
+    public <T> T get(int pos, Class<T> javaClass) {

Review Comment:
   Wont this not work for Flink, the pos are not always at the end? See  
https://github.com/apache/iceberg/pull/6222/files 
   
   
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/MetricsUtil.java#L441
   
   I think we should make a generic struct for this scenario now that we have 
so many fields.  ie, have a position map kind of like:  
https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/util/StructProjection.java.
  do you think its possible?
   
   



##########
core/src/main/java/org/apache/iceberg/BaseEntriesTable.java:
##########
@@ -273,18 +273,30 @@ static class ManifestReadTask extends BaseFileScanTask 
implements DataTask {
     private final Schema fileProjection;
     private final Schema dataTableSchema;
     private final FileIO io;
+    private final Snapshot snapshot;
     private final ManifestFile manifest;
     private final Map<Integer, PartitionSpec> specsById;
 
     private ManifestReadTask(
-        Table table, ManifestFile manifest, Schema projection, Expression 
filter) {
-      this(table.schema(), table.io(), table.specs(), manifest, projection, 
filter);
+        Table table,
+        Pair<Snapshot, ManifestFile> snapshotManifestPair,

Review Comment:
   Can we just add an argument Snapshot, instead of passing pair?



##########
core/src/main/java/org/apache/iceberg/AllDataFilesTable.java:
##########
@@ -66,5 +68,14 @@ protected TableScan newRefinedScan(Table table, Schema 
schema, TableScanContext
     protected CloseableIterable<ManifestFile> manifests() {
       return reachableManifests(snapshot -> 
snapshot.dataManifests(table().io()));
     }
+
+    @Override
+    protected CloseableIterable<Pair<Snapshot, ManifestFile>> 
snapshotManifestPairs() {

Review Comment:
   Also just my opinion on name, but to me Pairs is an implementation detail, 
maybe we can call it manifestsWithSnapshot to explain more logically



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to