ajantha-bhat commented on code in PR #9437:
URL: https://github.com/apache/iceberg/pull/9437#discussion_r1454359159


##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/BaseSparkAction.java:
##########
@@ -150,6 +154,21 @@ protected Dataset<FileInfo> contentFileDS(Table table, 
Set<Long> snapshotIds) {
     Broadcast<Table> tableBroadcast = 
sparkContext.broadcast(serializableTable);
     int numShufflePartitions = 
spark.sessionState().conf().numShufflePartitions();
 
+    return manifestBeanDS(table, snapshotIds, numShufflePartitions)
+        .flatMap(new ReadManifest(tableBroadcast), FileInfo.ENCODER);
+  }
+
+  protected Dataset<PartitionEntryBean> partitionEntryDS(Table table) {
+    Table serializableTable = SerializableTableWithSize.copyOf(table);
+    Broadcast<Table> tableBroadcast = 
sparkContext.broadcast(serializableTable);
+    int numShufflePartitions = 
spark.sessionState().conf().numShufflePartitions();
+
+    return manifestBeanDS(table, null, numShufflePartitions)

Review Comment:
   > Is it actually correct? This code would go via ALL_MANIFESTS table. 
Shouldn't we only look for manifests in a particular snapshot for which we 
compute the stats?
   
   I just followed the same pattern from partitions metadata table which goes 
through all the manifests from all the snapshot.  
   
   
https://github.com/apache/iceberg/blob/31d18f51b9e8590f7ca316463b080bd1153e8f9e/core/src/main/java/org/apache/iceberg/PartitionsTable.java#L186-L195
   
   Let me think on this today and get back to you. Also, I need to understand 
why they are going through all manifests in partitions metadata table (cc: 
@szehon-ho) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to