afiodorov commented on PR #1743: URL: https://github.com/apache/iceberg-python/pull/1743#issuecomment-2692884445
Hey! The regex was just an example, it's not part of the API though - the partition deduction function is. The issue I am having at a works is that our pipelines keep writing hive-partitioned parquet files to s3 and we have 100s of tables and many I don't maintain. However we need a quick conversion (and upkeep) of those tables to iceberg. add_files is almost what we need for both initial migration and the subsequent upkeep however it assumes the partition columns are written in the parquet files - which isn't the case. We don't want to rewrite all parquet files nor touch pipelines at all, we just need a quick hassle-free inplace migration. If you add another API that deals with DataFile directly as opposed to add_files that assumes certain things that'd work for us. Until then I might just have to deploy my fork to solve our problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org