yixiutt opened a new issue, #10300:
URL: https://github.com/apache/doris/issues/10300

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Description
   
   Single replica compaction instead of compaction in every replica, can save a 
lot of cpu in cluster.
   Here is my design:
   1. while one replica want to do compaction, it ask fe to get peer info
   2. if get peer info success, the check peer compaction info and version info
   3. make decision based on peer replica version info and compaction status
   
   Some details here:
   1.if one replica wants to compaction version [A-D], check peer replicas to 
get longgest consecutive versions start with A,such as [A-C], if local version 
has [A-x]and [x-C], then we can fetch data from peer.
   2.if no proper rowset to fetch, check peer compaction  status. If peer 
replica is doing compaction with version overlapping, it just wait next round, 
which means i can fetch data from peer when peer done this compaction
   3.control the frequency of check peer compaction status, add a interval for 
this check.
   4. Reuse engine clone task code when fetch files from peer, there is too 
much different in two types of clone file, so i cannot use engine clone task 
directly but copy it.
   
   some test result:
   A tablet has 2 replicas, Use flink to import data, compaction statistics 
list below.
                      do_local        do_fetch
   be_1           11581              14461
   be_2          14513              11526
   
   about 50% compaction don't need to do. 
   
   I will continuely update this issue if i had some new conclusion
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to