andormarkus commented on PR #1742:
URL: https://github.com/apache/iceberg-python/pull/1742#issuecomment-2692428005

   Thanks for your feedback!
   
   I understand your concern about adding this to the `Table` class. I agree 
it's not the ideal location and could lead to confusion.
   
   The primary issue I'm trying to solve involves distributed environments. 
While your suggested approach works well in a single process, my use case 
involves multiple distributed processes. One process writes data files and 
another commits them to the table, requiring simple communication between these 
processes.
   
   Passing `DataFile` objects between processes requires serialization, which 
I've found challenging to implement properly. I tried `jsonpickle` and custom 
serialization methods but encountered significant issues.
   
   What I need is a simpler workflow where:
    1. A process writes Parquet files in Iceberg-compatible format (like 
`Table.apped` does)
    2. It returns just simple strings (easily passed between systems)
    3. Another process can take these simple string and use any API to commit 
them
   
   This approach avoids having to pass complex objects like `DataFile` between 
distributed components. I'm open to alternative implementations that meet these 
requirements.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to