weizuo93 opened a new issue #4997:
URL: https://github.com/apache/incubator-doris/issues/4997


   The rows deleted by `delete operation` will not be deleted from the disk 
untill base compaction for the relevant tablet is performed. The data deleted 
logically not only occupies disk space, but also has an effects on scan 
performance. So it is necessary to perform compaction task for the tablet that 
contains a lot of deleted rows. 
   
   Can we take 'rows_del_filtered' into consideration when selecting a tablet 
for compaction task?
   
   For a tablet, we can record the filtered rows during scan operation since 
last base compaction, and take the filtered rows as a consideration factor when 
selecting a tablet for compaction task. `tablet score` for compaction can be 
calculated like this:
   
     `tablet_score = k1 * tablet_scan_frequency + k2 * old_compaction_score  + 
k3 * rows_del_filtered`
   
   `k1`,`k2`and `k3`can be set dynamically through http interface 
`/api/update_config`.
   
   Of course, the impact on scan performance is different between rows in 
`DEL_PARTIAL_SATISFIED`blocks and those in `DEL_SATISFIED` blocks , and can be 
treated separately.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to