[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #7920: Core: Add total data size to Partitions table

via GitHub Tue, 27 Jun 2023 06:26:35 -0700


ajantha-bhat commented on code in PR #7920:
URL: https://github.com/apache/iceberg/pull/7920#discussion_r1243699771



##########
docs/spark-queries.md:
##########
@@ -319,7 +319,7 @@ SELECT * FROM prod.db.table.partitions;
 |  {20211002, 10}|           1|         1|         0|
 
 Note:
-1. For unpartitioned tables, the partitions table will contain only the 
record_count and file_count columns.
+1. For unpartitioned tables, the partitions table will contain only the 
record_count, file_count and total_data_size_in_bytes columns.

Review Comment:
   This statement needs to be updated. Unpartitioned table will have few more 
fields and previous PR missed to update this. 
   
   
https://github.com/apache/iceberg/blob/4e6c7bad35ae3886378ec50a79dac0368df875a8/core/src/main/java/org/apache/iceberg/PartitionsTable.java#L96-L105



##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -82,7 +82,12 @@ public class PartitionsTable extends BaseMetadataTable {
                 10,
                 "last_updated_snapshot_id",
                 Types.LongType.get(),
-                "Id of snapshot that last updated this partition"));
+                "Id of snapshot that last updated this partition"),
+            Types.NestedField.required(
+                11,
+                "total_data_size_in_bytes",

Review Comment:
   For partition stats, I am using the field name as `data_file_size_in_bytes` 
here.
   
   
https://github.com/apache/iceberg/pull/7105/files#diff-36347a47c3bf67ea2ef6309ea96201814032d21bb5f162dfae4045508c15588aR736
   
   Let us wait and see what others think about the naming of this field. Based 
on that either you or me have to modify accordingly. 



##########
docs/flink-queries.md:
##########
@@ -436,7 +436,7 @@ SELECT * FROM prod.db.table$partitions;
 | {20211002, 10} | 1            | 1          | 0       |
 
 Note:
-For unpartitioned tables, the partitions table will contain only the 
record_count and file_count columns.
+For unpartitioned tables, the partitions table will contain only the 
record_count, file_count and total_data_size_in_bytes columns.

Review Comment:
   This statement needs to be updated. Unpartitioned table will have few more 
fields and previous PR missed to update this. 
   
   
https://github.com/apache/iceberg/blob/4e6c7bad35ae3886378ec50a79dac0368df875a8/core/src/main/java/org/apache/iceberg/PartitionsTable.java#L96-L105



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #7920: Core: Add total data size to Partitions table

Reply via email to