OliLay opened a new issue, #45749: URL: https://github.com/apache/arrow/issues/45749
### Describe the bug, including details regarding any error messages, version, and platform. Hey guys, we noticed something with using S3/Azure Blob and the arrow file system implementation. We are doing a lot of concurrent file operations and use a larger thread pool size than the default (32 instead of 8, kind of like it is recommended [here](https://arrow.apache.org/docs/cpp/threading.html#cpu-vs-i-o)). When using a larger thread pool size and dispatching a similar amount of async operations, the time until these async operations finish is significantly higher than it is with a smaller thread pool. [This](https://github.com/OliLay/arrow/blob/7d28657ae758f2df42454fa096bd5224eacc7a97/cpp/src/arrow/filesystem/s3fs_benchmark.cc#L58) benchmark reproduces the issue. It basically tries to open 32 files asynchronously with different thread pool sizes against files in a real S3 bucket (not MinIO). The results are quite surprising (first parameter is # files, second one is # threads) ``` --------------------------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------------------------- OpenFileAsync/32/4/iterations:1/process_time 0.254 s 0.139 s 1 OpenFileAsync/32/8/iterations:1/process_time 0.186 s 0.329 s 1 OpenFileAsync/32/16/iterations:1/process_time 0.181 s 0.799 s 1 OpenFileAsync/32/32/iterations:1/process_time 0.328 s 1.63 s 1 OpenFileAsync/32/64/iterations:1/process_time 0.385 s 1.71 s 1 ``` In contrary to what you would expect with I/O tasks, you can see that with 32 threads we take almost double the time as with 8 or 16 threads. One could work around this issue by configuring the thread pool dynamically according to the workload (e.g. decrease the size to 16 threads if we see we only have 32 files), but this is cumbersome to do for every use-case and not possible everywhere (e.g. for concurrent usage of thread pools by different use-cases). I verified this holds true for the Azure file system as well as the S3 file system, hence I think this has to do with how arrow's thread pool works. I did not get any additional insights by perf'ing this (the flamegraph looks almost identical); I assume there must be some lock contention in the thread pool. I just wanted to report this and ask if you have seen similar issues already? I am not too familiar with the thread pool code, could you maybe point me into a direction how we can improve this without dynamically adjusting the thread pool size? Note: To reproduce the issue the best, it makes sense to only run the benchmark executable for a parameter at a time as I assume there is some caching ongoing when executing all the parameters in the same run. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org