ajantha-bhat commented on code in PR #9437: URL: https://github.com/apache/iceberg/pull/9437#discussion_r1454359159
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/BaseSparkAction.java: ########## @@ -150,6 +154,21 @@ protected Dataset<FileInfo> contentFileDS(Table table, Set<Long> snapshotIds) { Broadcast<Table> tableBroadcast = sparkContext.broadcast(serializableTable); int numShufflePartitions = spark.sessionState().conf().numShufflePartitions(); + return manifestBeanDS(table, snapshotIds, numShufflePartitions) + .flatMap(new ReadManifest(tableBroadcast), FileInfo.ENCODER); + } + + protected Dataset<PartitionEntryBean> partitionEntryDS(Table table) { + Table serializableTable = SerializableTableWithSize.copyOf(table); + Broadcast<Table> tableBroadcast = sparkContext.broadcast(serializableTable); + int numShufflePartitions = spark.sessionState().conf().numShufflePartitions(); + + return manifestBeanDS(table, null, numShufflePartitions) Review Comment: > Is it actually correct? This code would go via ALL_MANIFESTS table. Shouldn't we only look for manifests in a particular snapshot for which we compute the stats? I just followed the same pattern from partitions metadata table which goes through all the manifests from all the snapshot. https://github.com/apache/iceberg/blob/31d18f51b9e8590f7ca316463b080bd1153e8f9e/core/src/main/java/org/apache/iceberg/PartitionsTable.java#L186-L195 Let me think on this today and get back to you. Also, I need to understand why they are going through all manifests in partitions metadata table (cc: @szehon-ho) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org