sungwy commented on issue #1227: URL: https://github.com/apache/iceberg-python/issues/1227#issuecomment-2413868478
Thanks for raising this @MrDerecho - in the initial version of add_files, we wanted to limit it to just parquet files that that were created in an external system. The assumption is that unless the files are created by an Iceberg client and are cognizant of the Iceberg schema, there would be no way for the parquet writing process to be use the correct field IDs in the produced parquet schema. > Currently, if I am using pyiceberg to create/maintain my iceberg tables and I use Trino (AWS Athena) to do compaction on the same (using Spark)- the files created via compaction are unable to be "re-added" using the add_files method at a later time. This sounds like a really cool use case, but I'd like to understand it better - why isn't the application (Trino/Spark) that is doing the compaction committing the compacted files into Iceberg itself? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org