koenvo commented on PR #1995: URL: https://github.com/apache/iceberg-python/pull/1995#issuecomment-2932586312
Did an update and ran a quick benchmark with different `concurrent_tasks` settings on `to_arrow_batch_reader()`: ```python table = catalog.get_table("some_table") # Benchmark loop p = table.scan().to_arrow_batch_reader(concurrent_tasks=100) for batch in tqdm.tqdm(p): print(pool.max_memory()) ``` ### Results (including `pool.max_memory()`): - `concurrent_tasks=1` → `52it [00:06, 7.73it/s]` | Max memory: **7.4 MB** - `concurrent_tasks=10` → `391it [00:06, 61.98it/s]` | Max memory: **36.3 MB** - `concurrent_tasks=100` → `1030it [00:09, 106.84it/s]` | Max memory: **1.76 GB** > Note: Performance also depends on the network connection. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org