andormarkus commented on issue #1751:
URL: 
https://github.com/apache/iceberg-python/issues/1751#issuecomment-3625274692

   Hi @Fokko,
   
   I see this issue has been marked as stale, but I'd like to follow up as we 
now have production-scale code working with this approach.
   
   I'm happy to submit a documentation PR to help the community benefit from 
this distributed write pattern. Our implementation has been running 
successfully in production, handling high-volume concurrent writes with 
centralized commits to avoid the blocking bottleneck.
   
   However, I still need your guidance on a few points before proceeding:
   
   1. **\`__bytes__\` support**: You mentioned supporting \`__bytes__\` to 
return Avro encoded bytes. Could you please elaborate on this? What would be 
the preferred API design? Should this be a method on the \`DataFile\` class 
itself?
   
   2. **Public API for \`_dataframe_to_data_files\`**: Currently, achieving 
distributed writes requires using the private method 
\`pyiceberg.io.pyarrow._dataframe_to_data_files\`. Should we consider making 
this (or a wrapper around it) part of the public API? This seems essential for 
the distributed write pattern.
   
   3. **Documentation scope**: Where would be the most appropriate place to 
document this pattern? Should it be:
      - A new section in the main docs about distributed/concurrent writes?
      - An advanced patterns/recipes page?
      - Part of the write operations documentation?
   
   I'm ready to contribute once I have clarity on the direction you'd like to 
take. Our production code includes both partitioned and non-partitioned table 
handling, proper serialization/deserialization, and queue-based coordination 
(AWS Kinesis, though SQS works fine as well).
   
   Looking forward to your guidance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to