Re: [I] [Feature] Add Support for Distributed Write [iceberg-python]

2025-05-29 Thread via GitHub
b-phi commented on issue #1751: URL: https://github.com/apache/iceberg-python/issues/1751#issuecomment-2920271698 I was able to get distributed writes working smoothly for my use case. Tasks in a Ray cluster use `_dataframe_to_data_files` to write the data files to S3, then the resulting D

Re: [I] [Feature] Add Support for Distributed Write [iceberg-python]

2025-05-16 Thread via GitHub
andormarkus commented on issue #1751: URL: https://github.com/apache/iceberg-python/issues/1751#issuecomment-2877526453 Hello @potatochipcoconut My solution is tested and working with AWS Lambda + AWS SQS. How soon do you need solution because I can share our code, however I need to

Re: [I] [Feature] Add Support for Distributed Write [iceberg-python]

2025-05-16 Thread via GitHub
andormarkus commented on issue #1751: URL: https://github.com/apache/iceberg-python/issues/1751#issuecomment-2877529033 Hi @Fokko Please can you please elaborate on this? > Should we support `__bytes__` to return the Avro encoded bytes? -- This is an automated m

Re: [I] [Feature] Add Support for Distributed Write [iceberg-python]

2025-05-13 Thread via GitHub
potatochipcoconut commented on issue #1751: URL: https://github.com/apache/iceberg-python/issues/1751#issuecomment-2876995679 Hello, I'm interested in this feature and am trying to test out the strategy in a lambda environment. I am getting the following error during deserialization

Re: [I] [Feature] Add Support for Distributed Write [iceberg-python]

2025-03-12 Thread via GitHub
Fokko commented on issue #1751: URL: https://github.com/apache/iceberg-python/issues/1751#issuecomment-2719219162 Hey @andormarkus Thanks for sharing. that looks great! I'm all in favor of supporting this. Very much looking forward to the PR Should we support `__bytes__` to return th

Re: [I] [Feature] Add Support for Distributed Write [iceberg-python]

2025-03-07 Thread via GitHub
andormarkus commented on issue #1751: URL: https://github.com/apache/iceberg-python/issues/1751#issuecomment-2706056065 Hi @Fokko, I'd like to share a working example that demonstrates how to serialize and deserialize both partition and non-partitioned tables: ```python output

Re: [I] [Feature] Add Support for Distributed Write [iceberg-python]

2025-03-05 Thread via GitHub
andormarkus commented on issue #1751: URL: https://github.com/apache/iceberg-python/issues/1751#issuecomment-2702284211 Hi @Fokko Thank you soo much for the code snippet. I have extended the test and run into the following problem with partitioned tables (non partitioned tabl

Re: [I] [Feature] Add Support for Distributed Write [iceberg-python]

2025-03-04 Thread via GitHub
Fokko commented on issue #1751: URL: https://github.com/apache/iceberg-python/issues/1751#issuecomment-2697492762 @andormarkus Sure thing, does the following help: ``` from io import BytesIO from pyiceberg.avro.decoder_fast import CythonBinaryDecoder from pyiceberg.avro.e

Re: [I] [Feature] Add Support for Distributed Write [iceberg-python]

2025-03-03 Thread via GitHub
andormarkus commented on issue #1751: URL: https://github.com/apache/iceberg-python/issues/1751#issuecomment-2695366541 Hi @Fokko Based on the source code writing to manifest / Avro can be achieved like this ```python manifest_path = f"temp-manifest-{uuid.uuid4()}.avro"

[I] [Feature] Add Support for Distributed Write [iceberg-python]

2025-03-03 Thread via GitHub
andormarkus opened a new issue, #1751: URL: https://github.com/apache/iceberg-python/issues/1751 ### Feature Request / Improvement ## Problem Statement A key problem in distributed Iceberg systems is that commit processes can block each other when multiple workers try to update tab