weizuo93 opened a new issue #4997: URL: https://github.com/apache/incubator-doris/issues/4997
The rows deleted by `delete operation` will not be deleted from the disk untill base compaction for the relevant tablet is performed. The data deleted logically not only occupies disk space, but also has an effects on scan performance. So it is necessary to perform compaction task for the tablet that contains a lot of deleted rows. Can we take 'rows_del_filtered' into consideration when selecting a tablet for compaction task? For a tablet, we can record the filtered rows during scan operation since last base compaction, and take the filtered rows as a consideration factor when selecting a tablet for compaction task. `tablet score` for compaction can be calculated like this: `tablet_score = k1 * tablet_scan_frequency + k2 * old_compaction_score + k3 * rows_del_filtered` `k1`,`k2`and `k3`can be set dynamically through http interface `/api/update_config`. Of course, the impact on scan performance is different between rows in `DEL_PARTIAL_SATISFIED`blocks and those in `DEL_SATISFIED` blocks , and can be treated separately. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org