szehon-ho commented on code in PR #9335: URL: https://github.com/apache/iceberg/pull/9335#discussion_r1777502004
########## core/src/main/java/org/apache/iceberg/AllManifestsTableTaskParser.java: ########## @@ -39,6 +39,8 @@ class AllManifestsTableTaskParser { private static final String MANIFEST_LIST_LOCATION = "manifest-list-Location"; private static final String RESIDUAL = "residual-filter"; private static final String REFERENCE_SNAPSHOT_ID = "reference-snapshot-id"; + private static final String REFERENCE_SNAPSHOT_TIMESTAMP_MILLIS = Review Comment: while its descriptive, its overly longer than other column names, how about combine with @RussellSpitzer original suggestion and 'reference-snapshot-time' ########## core/src/main/java/org/apache/iceberg/AllDataFilesTable.java: ########## @@ -66,5 +68,14 @@ protected TableScan newRefinedScan(Table table, Schema schema, TableScanContext protected CloseableIterable<ManifestFile> manifests() { return reachableManifests(snapshot -> snapshot.dataManifests(table().io())); } + + @Override + protected CloseableIterable<Pair<Snapshot, ManifestFile>> snapshotManifestPairs() { Review Comment: Can we deprecate the other one? ########## core/src/main/java/org/apache/iceberg/MetricsUtil.java: ########## @@ -456,4 +458,54 @@ public <T> void set(int pos, T value) { throw new UnsupportedOperationException("StructWithReadableMetrics is read only"); } } + + static class StructWithRefSnapshot implements StructLike { + private final StructLike struct; + private final Schema projectedSchema; + private final Snapshot snapshot; + + private final int refSnapshotIdPosition; + private final int refSnapshotTimestampMillisPosition; + + StructWithRefSnapshot( + StructLike struct, + Schema projectedSchema, + Snapshot snapshot, + Types.NestedField refSnapshotIdField, + Types.NestedField refSnapshotTimestampMillisField) { + this.struct = struct; + this.projectedSchema = projectedSchema; + this.snapshot = snapshot; + this.refSnapshotIdPosition = + Optional.ofNullable(refSnapshotIdField) + .map(f -> projectedSchema.columns().indexOf(refSnapshotIdField)) + .orElse(-1); + + this.refSnapshotTimestampMillisPosition = + Optional.ofNullable(refSnapshotIdField) + .map(f -> projectedSchema.columns().indexOf(refSnapshotTimestampMillisField)) + .orElse(-1); + } + + @Override + public int size() { + return projectedSchema.columns().size(); + } + + @Override + public <T> T get(int pos, Class<T> javaClass) { Review Comment: Wont this not work for Flink, the pos are not always at the end? See https://github.com/apache/iceberg/pull/6222/files https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/MetricsUtil.java#L441 I think we should make a generic struct for this scenario now that we have so many fields. ie, have a position map kind of like: https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/util/StructProjection.java. do you think its possible? ########## core/src/main/java/org/apache/iceberg/BaseEntriesTable.java: ########## @@ -273,18 +273,30 @@ static class ManifestReadTask extends BaseFileScanTask implements DataTask { private final Schema fileProjection; private final Schema dataTableSchema; private final FileIO io; + private final Snapshot snapshot; private final ManifestFile manifest; private final Map<Integer, PartitionSpec> specsById; private ManifestReadTask( - Table table, ManifestFile manifest, Schema projection, Expression filter) { - this(table.schema(), table.io(), table.specs(), manifest, projection, filter); + Table table, + Pair<Snapshot, ManifestFile> snapshotManifestPair, Review Comment: Can we just add an argument Snapshot, instead of passing pair? ########## core/src/main/java/org/apache/iceberg/AllDataFilesTable.java: ########## @@ -66,5 +68,14 @@ protected TableScan newRefinedScan(Table table, Schema schema, TableScanContext protected CloseableIterable<ManifestFile> manifests() { return reachableManifests(snapshot -> snapshot.dataManifests(table().io())); } + + @Override + protected CloseableIterable<Pair<Snapshot, ManifestFile>> snapshotManifestPairs() { Review Comment: Also just my opinion on name, but to me Pairs is an implementation detail, maybe we can call it manifestsWithSnapshot to explain more logically -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org