andormarkus commented on PR #1742: URL: https://github.com/apache/iceberg-python/pull/1742#issuecomment-2692428005
Thanks for your feedback! I understand your concern about adding this to the `Table` class. I agree it's not the ideal location and could lead to confusion. The primary issue I'm trying to solve involves distributed environments. While your suggested approach works well in a single process, my use case involves multiple distributed processes. One process writes data files and another commits them to the table, requiring simple communication between these processes. Passing `DataFile` objects between processes requires serialization, which I've found challenging to implement properly. I tried `jsonpickle` and custom serialization methods but encountered significant issues. What I need is a simpler workflow where: 1. A process writes Parquet files in Iceberg-compatible format (like `Table.apped` does) 2. It returns just simple strings (easily passed between systems) 3. Another process can take these simple string and use any API to commit them This approach avoids having to pass complex objects like `DataFile` between distributed components. I'm open to alternative implementations that meet these requirements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org