Xiaoccer opened a new pull request, #16031: URL: https://github.com/apache/doris/pull/16031
# Proposed changes Issue Number: close #xxx ## Problem summary Add bthread to separate the logic of IO and computation when executing the OlapScanner, which can speed up to access those sql that are already cached. * In OlapScanner's execution chain, change std::mutex/std::condition_variable to bthread::Mutex/bthread::ConditionVariable * Add class of AsyncIO to separate task from bthread to pthread * Change the usage of reader's/filesytem's interface when using bthread ## Performance test * 1be+1fe * base: commit_id-bd2280b4ce702e24cc31ca1d379aeaf6f00ce69c ### ssbf100 benchmark | sql | base(s) | bthread(s) | | ---- | ------- | ---------- | | q1.1 | 0.92 | 0.904 | | q1.2 | 1.122 | 1.126 | | q1.3 | 0.052 | 0.052 | | q2.1 | 12.041 | 12.072 | | q2.2 | 0.741 | 0.705 | | q2.3 | 0.635 | 0.671 | | q3.1 | 3.492 | 3.422 | | q3.2 | 0.46 | 0.47 | | q3.3 | 0.649 | 0.679 | | q3.4 | 0.089 | 0.087 | | q4.1 | 4.358 | 4.685 | | q4.2 | 0.519 | 0.572 | | q4.3 | 0.457 | 0.463 | The performance of using bthread and using pthread is almost the same. ### cached read Description: First execute q1.1.sql to cache data, and then execute q1.1.sql、q2.1.sql and q4.1.sql concurrently. | sql | base(s) | bthread(s) | | ------------------------- | ------- | ---------- | | first time q1.1 | 2.673 | 2.721 | | second time (cached) q1.1 | 13.441 | **0.206** | | first time q2.1 | 53.846 | 53.793 | | first time q4.1 | 53.864 | 53.903 | When using bthread, If the data of sql has been cached, the result of sql can be returned fast without waiting the free thread of thread pool. ## Checklist(Required) 1. Does it affect the original behavior: - [x] Yes - [ ] No - [ ] I don't know 2. Has unit tests been added: - [ ] Yes - [x] No - [ ] No Need 3. Has document been added or modified: - [ ] Yes - [x] No - [ ] No Need 4. Does it need to update dependencies: - [ ] Yes - [x] No 5. Are there any changes that cannot be rolled back: - [ ] Yes (If Yes, please explain WHY) - [x] No ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org