Re: [PR] Manual deduction of partitions [iceberg-python]

via GitHub Sun, 02 Mar 2025 12:09:35 -0800


afiodorov commented on PR #1743:
URL: https://github.com/apache/iceberg-python/pull/1743#issuecomment-2692884445


   Hey! The regex was just an example, it's not part of the API though - the 
partition deduction function is.
   
   The issue I am having at a works is that our pipelines keep writing 
hive-partitioned parquet files to s3 and we have 100s of tables and many I 
don't maintain. However we need a quick conversion (and upkeep) of those tables 
to iceberg.
   
   add_files is almost what we need for both initial migration and the 
subsequent upkeep however it assumes the partition columns are written in the 
parquet files - which isn't the case. We don't want to rewrite all parquet 
files nor touch pipelines at all, we just need a quick hassle-free inplace 
migration. 
   
   If you add another API that deals with DataFile directly as opposed to 
add_files that assumes certain things that'd work for us. Until then I might 
just have to deploy my fork to solve our problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Manual deduction of partitions [iceberg-python]

Reply via email to