uchenily opened a new issue, #45847: URL: https://github.com/apache/arrow/issues/45847
### Describe the bug, including details regarding any error messages, version, and platform. I am running a performance test on a plan as described below. This execution plan is fairly straightforward, involving only two source nodes, one hash join node, and one aggregation node. ```ast aggr + join +-----+------+ source_0 source_1 (probe) (build) ``` However, I have noticed that the performance is far below my expectations. So, I made the following table for further analysis. In this table, the horizontal axis represents the number of batches generated on the build side, while the vertical axis denotes the number of batches produced on the probe side, with each batch size being `1<<15`. | time(in seconds) | build 1 | build 2 | build 4 | build 16 | build 32 | build 64 | | ---------------- | ------- | ------- | ------- | -------- | -------- | -------- | | probe 1 | 0.03 | 0.04 | 0.06 | 0.03 | 7.6 | 269.7 | | probe 2 | 0.04 | 0.04 | 0.06 | 7.7 | 59.1 | 176.0 | | probe 4 | 0.05 | 0.06 | 0.06 | 7.7 | 56.1 | 229.9 | | probe 16 | 0.11 | 0.12 | 0.12 | 3.2 | 32.2 | 145.8 | | probe 32 | 0.19 | 0.19 | 0.20 | 3.1 | 32.2 | 107.5 | | probe 64 | 0.36 | 0.36 | 0.36 | 3.4 | 44.9 | 134.9 | | probe 256 | 1.33 | 1.33 | 1.34 | 5.5 | 30.7 | 197.3 | | probe 1024 | 5.2 | 5.3 | 5.2 | 5.3 | 17.7 | 50.9 | More information: 1. I run these tests on a Intel x86_64 machine with about 100 cores. 2. I have noticed that in scenarios where the execution time exceeds 10 seconds, the CPU utilization is very low, and in the most of time, only one CPU core is being used. 3. I found that the `arrow/compute/row/grouper.cc:ConsumeImpl` method is quite time-consuming. (map phase) 4. The process of generating data also consumes a considerable amount of time, but in the tests mentioned above, the time spent on data generation does not account for a significant portion. 5. The distribution of data is also certain to affect the execution time, but I have not conducted any relevant verification. 6. In some scenarios, the execution time is longer even with a smaller amount of data, which should be considered unreasonable. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org