koenvo commented on PR #1995:
URL: https://github.com/apache/iceberg-python/pull/1995#issuecomment-2932586312

   Did an update and ran a quick benchmark with different `concurrent_tasks` 
settings on `to_arrow_batch_reader()`:
   
   ```python
   table = catalog.get_table("some_table")
   
   # Benchmark loop
   p = table.scan().to_arrow_batch_reader(concurrent_tasks=100)
   for batch in tqdm.tqdm(p):
       print(pool.max_memory())
   ```
   
   ### Results (including `pool.max_memory()`):
   - `concurrent_tasks=1` → `52it [00:06,  7.73it/s]` | Max memory: **7.4 MB**
   - `concurrent_tasks=10` → `391it [00:06, 61.98it/s]` | Max memory: **36.3 
MB**
   - `concurrent_tasks=100` → `1030it [00:09, 106.84it/s]` | Max memory: **1.76 
GB**
   
   > Note: Performance also depends on the network connection.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to