[I] [C++] Thread pool performance behaves counter-intuitive for I/O intensive operations (e.g. file system) [arrow]

via GitHub Tue, 11 Mar 2025 08:30:34 -0700


OliLay opened a new issue, #45749:
URL: https://github.com/apache/arrow/issues/45749


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Hey guys,
   
   we noticed something with using S3/Azure Blob and the arrow file system 
implementation. We are doing a lot of concurrent file operations and use a 
larger thread pool size than the default (32 instead of 8, kind of like it is 
recommended 
[here](https://arrow.apache.org/docs/cpp/threading.html#cpu-vs-i-o)).
   
   When using a larger thread pool size and dispatching a similar amount of 
async operations, the time until these async operations finish is significantly 
higher than it is with a smaller thread pool. 
[This](https://github.com/OliLay/arrow/blob/7d28657ae758f2df42454fa096bd5224eacc7a97/cpp/src/arrow/filesystem/s3fs_benchmark.cc#L58)
 benchmark reproduces the issue. It basically tries to open 32 files 
asynchronously with different thread pool sizes against files in a real S3 
bucket (not MinIO).
   
   The results are quite surprising (first parameter is # files, second one is 
# threads)
   ```
   
---------------------------------------------------------------------------------------
   Benchmark                                             Time             CPU   
Iterations
   
---------------------------------------------------------------------------------------
   OpenFileAsync/32/4/iterations:1/process_time      0.254 s         0.139 s    
         1
   OpenFileAsync/32/8/iterations:1/process_time      0.186 s         0.329 s    
         1
   OpenFileAsync/32/16/iterations:1/process_time      0.181 s         0.799 s   
          1
   OpenFileAsync/32/32/iterations:1/process_time      0.328 s          1.63 s   
          1
   OpenFileAsync/32/64/iterations:1/process_time      0.385 s          1.71 s   
          1
   ```
   In contrary to what you would expect with I/O tasks, you can see that with 
32 threads we take almost double the time as with 8 or 16 threads. One could 
work around this issue by configuring the thread pool dynamically according to 
the workload (e.g. decrease the size to 16 threads if we see we only have 32 
files), but this is cumbersome to do for every use-case and not possible 
everywhere (e.g. for concurrent usage of thread pools by different use-cases).
   
   I verified this holds true for the Azure file system as well as the S3 file 
system, hence I think this has to do with how arrow's thread pool works. I did 
not get any additional insights by perf'ing this (the flamegraph looks almost 
identical); I assume there must be some lock contention in the thread pool.
   
   I just wanted to report this and ask if you have seen similar issues 
already? I am not too familiar with the thread pool code, could you maybe point 
me into a direction how we can improve this without dynamically adjusting the 
thread pool size?
   
   Note: To reproduce the issue the best, it makes sense to only run the 
benchmark executable for a parameter at a time as I assume there is some 
caching ongoing when executing all the parameters in the same run.
   
   
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [C++] Thread pool performance behaves counter-intuitive for I/O intensive operations (e.g. file system) [arrow]

Reply via email to