weizuo93 opened a new issue #4834: URL: https://github.com/apache/incubator-doris/issues/4834
A large number of small segment files will lead to low efficiency for scan operations. Multiple small files can be merged into a large file by compaction operation. So we could take the tablet scan frequency into consideration when selecting an tablet for compaction and preferentially do compaction for those tablets which are scanned frequently during a latest period of time at the present. Using the compaction strategy of `Kudu`for reference, `scan frequency` can be calculated for tablet during a latest period of time at the present and be taken into consideration when calculating compaction score. New compaction score can be calculated like this: `new_compaction_score = k1 * tablet_scan_frequency + k2 * old_compaction_score ` `k1`and`k2`can be set dynamically through http interface `/api/update_config`. We can add a metric `query_scan_count` for each tablet which records the scan count of the tablet. Thus, tablet scan frequency can be calculated like this: `tablet_scan_frequency = (now_query_scan_count - last_query_scan_count) / (now_time - last_time)` `last_query_scan_count` will be updated every time an `interval` passes and `interval`can be config (such as `300` second). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org