[GitHub] [iceberg] dramaticlly opened a new issue, #6794: Support Changed PartitionCount in AddFiles Procedure output

via GitHub Thu, 09 Feb 2023 11:47:54 -0800


dramaticlly opened a new issue, #6794:
URL: https://github.com/apache/iceberg/issues/6794


   ### Feature Request / Improvement
   
   Today Spark 
https://iceberg.apache.org/docs/latest/spark-procedures/#add_files only return 
number of files added to the iceberg table but missing other statistics such as 
partition changed. Partition stats are helpful especially files added are used 
across multiple partitions.
   
   
   Since number of added files is sourced from snapshot summary 
https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/AddFilesProcedure.java#L156-L158
 , we can easily do this for changed partition 
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L53
   
   But as what @RussellSpitzer suggested, we need to be cautious as this change 
the return type and need to wait for next major release
   
   Input
   ```scala
   val sql = """
   |CALL iceberg.system.add_files(
   |table =>'foo.bar',
   |source_table => '`parquet`.`s3a://bucket/warehouse/foo.db/bar/data/`', 
   |check_duplicate_files => false)"""
   spark.sql(sql).show
   ```
   
   Output
   ```
   +-----------------+
   |added_files_count|
   +-----------------+
   |             1084| 
   +-----------------+
   ```
   
   desired
   ```
   +-----------------|-----------------------+
   |added_files_count|changed_partition_count|
   +-----------------|-----------------------+
   |             1084|          24           |
   +-----------------|-----------------------+
   ```
   
   
   ### Query engine
   
   Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] dramaticlly opened a new issue, #6794: Support Changed PartitionCount in AddFiles Procedure output

Reply via email to