morningman opened a new issue #2551: [Compaction] Support compact only one 
rowset
URL: https://github.com/apache/incubator-doris/issues/2551
 
 
   **Backgroup**
   
   For some historical reasons, we do not select the last rowset when 
performing compaction operations. There are two reasons:
   
   1. The last rowset may be rolled back.
   2. We will use the version hash of the last rowset as the version hash of 
this tablet. The version hash is obtained by XORing the version hash of 
multiple rowsets. If the compaction contains the last rowset, the final version 
hash will change, resulting in inconsistency between the version hash value of 
the tablet on BE and the version hash saved in the FE's metadata.
   
   And in version 0.11. Neither of the above issues exists. First, rowset no 
longer has a rollback mechanism. Second, the version hash is no longer used. 
Therefore, in theory we can compact the last rowset.
   
   **Motivation**
   
   The motivation for this modification is that if a user loads a large amount 
of data in one load job, a large number of segments may be generated in one 
rowset. The data in these segments overlaps, resulting in a relatively low 
efficiency in reading these segments. If there is no subsequent load job, this 
rowset will be the last rowset, resulting in no compaction.
   
   **What changes?**
   
   The main changes are as follows:
   
   Add a field `segments_overlap` to the rowset meta to indicate whether there 
is data overlap in the segments in this rowset. The values ​​are `UNKNOWN`, 
`OVERLAPPING` and `NONOVERLAPPING`. `UNKNOWN` is designed to be compatible with 
previous existing rowsets.
   
   Before, when we judge whether the data in the segments of a rowset overlap, 
it is judged by judging whether the start version and end version of the rowset 
are the same. And the modified judgment logic is:
   
   If start version and end version are not the same, or the `segments_overlap` 
value is `NONOVERLAPPING`.
   
   At the same time, I also modified the compaction logic, and the cumulative 
compaction can handle only one rowset now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to