weizuo93 opened a new issue #4834:
URL: https://github.com/apache/incubator-doris/issues/4834


   A large  number of small segment files will lead to low efficiency for scan 
operations. Multiple small files can be merged into a large file by compaction 
operation. So we could take the tablet scan frequency into consideration when 
selecting an tablet for compaction and preferentially do compaction for those 
tablets which are scanned frequently during a latest period of time at the 
present.
   
   Using the compaction strategy of `Kudu`for reference, `scan frequency` can 
be calculated for tablet during a latest period of time at the present and be 
taken into consideration when calculating compaction score. New compaction 
score can be calculated like this:
   
     `new_compaction_score = k1 * tablet_scan_frequency + k2 * 
old_compaction_score  `
   
   `k1`and`k2`can be set dynamically through http interface 
`/api/update_config`.
   We can add a metric `query_scan_count` for each tablet which records the 
scan count of the tablet. Thus, tablet scan frequency can be calculated like 
this:
   
   `tablet_scan_frequency = (now_query_scan_count - last_query_scan_count) / 
(now_time - last_time)`
   `last_query_scan_count` will be updated every time an `interval` passes and 
`interval`can be config (such as `300` second).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to