[I] How can I achieve fast writes with pyiceberg and s3 tables? [iceberg-python]

via GitHub Thu, 08 May 2025 10:17:58 -0700


potatochipcoconut opened a new issue, #1984:
URL: https://github.com/apache/iceberg-python/issues/1984


   ### Question
   
   Hello pyicebergers,
   
   I am new to iceberg/s3 tables and am experimenting with using it as part of 
an IDP pipeline where we would store ocr data in s3 tables, that requires 
supporting high throughput and concurrency.
   
   I have a poc pipeline set up that works, but the writes are too slow using 
basic implementation.
   
   Does anyone know how or what could be done to improve the write performance?
   
   I've been reading through various articles/issues/prs etc but not sure which 
approach to try?
   
   Thank you
   
   https://github.com/apache/iceberg-python/issues/1751 (could this be useful?)
   
   Additionally I read how setting `PYICEBERG_MAX_WORKERS` could help with 
concurrency, but I could not find any reference to it in the pyiceberg code? 
How does that setting get consumed/used?
   https://github.com/apache/iceberg-python/pull/444
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] How can I achieve fast writes with pyiceberg and s3 tables? [iceberg-python]

Reply via email to