Re: [I] [feat] `add_files` support parquet files with field ids [iceberg-python]

via GitHub Tue, 15 Oct 2024 06:07:33 -0700


sungwy commented on issue #1227:
URL: 
https://github.com/apache/iceberg-python/issues/1227#issuecomment-2413868478


   Thanks for raising this @MrDerecho - in the initial version of add_files, we 
wanted to limit it to just parquet files that that were created in an external 
system. The assumption is that unless the files are created by an Iceberg 
client and are cognizant of the Iceberg schema, there would be no way for the 
parquet writing process to be use the correct field IDs in the produced parquet 
schema.
   
   > Currently, if I am using pyiceberg to create/maintain my iceberg tables 
and I use Trino (AWS Athena) to do compaction on the same (using Spark)- the 
files created via compaction are unable to be "re-added" using the add_files 
method at a later time.
   
   This sounds like a really cool use case, but I'd like to understand it 
better - why isn't the application (Trino/Spark) that is doing the compaction 
committing the compacted files into Iceberg itself?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [feat] `add_files` support parquet files with field ids [iceberg-python]

Reply via email to