Re: [PR] Use batchreader in upsert [iceberg-python]

via GitHub Mon, 02 Jun 2025 14:32:43 -0700


koenvo commented on PR #1995:
URL: https://github.com/apache/iceberg-python/pull/1995#issuecomment-2932586312


   Did an update and ran a quick benchmark with different `concurrent_tasks` 
settings on `to_arrow_batch_reader()`:
   
   ```python
   table = catalog.get_table("some_table")
   
   # Benchmark loop
   p = table.scan().to_arrow_batch_reader(concurrent_tasks=100)
   for batch in tqdm.tqdm(p):
       print(pool.max_memory())
   ```
   
   ### Results (including `pool.max_memory()`):
   - `concurrent_tasks=1` → `52it [00:06,  7.73it/s]` | Max memory: **7.4 MB**
   - `concurrent_tasks=10` → `391it [00:06, 61.98it/s]` | Max memory: **36.3 
MB**
   - `concurrent_tasks=100` → `1030it [00:09, 106.84it/s]` | Max memory: **1.76 
GB**
   
   > Note: Performance also depends on the network connection.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Use batchreader in upsert [iceberg-python]

Reply via email to