morningman opened a new issue #2551: [Compaction] Support compact only one rowset URL: https://github.com/apache/incubator-doris/issues/2551 **Backgroup** For some historical reasons, we do not select the last rowset when performing compaction operations. There are two reasons: 1. The last rowset may be rolled back. 2. We will use the version hash of the last rowset as the version hash of this tablet. The version hash is obtained by XORing the version hash of multiple rowsets. If the compaction contains the last rowset, the final version hash will change, resulting in inconsistency between the version hash value of the tablet on BE and the version hash saved in the FE's metadata. And in version 0.11. Neither of the above issues exists. First, rowset no longer has a rollback mechanism. Second, the version hash is no longer used. Therefore, in theory we can compact the last rowset. **Motivation** The motivation for this modification is that if a user loads a large amount of data in one load job, a large number of segments may be generated in one rowset. The data in these segments overlaps, resulting in a relatively low efficiency in reading these segments. If there is no subsequent load job, this rowset will be the last rowset, resulting in no compaction. **What changes?** The main changes are as follows: Add a field `segments_overlap` to the rowset meta to indicate whether there is data overlap in the segments in this rowset. The values are `UNKNOWN`, `OVERLAPPING` and `NONOVERLAPPING`. `UNKNOWN` is designed to be compatible with previous existing rowsets. Before, when we judge whether the data in the segments of a rowset overlap, it is judged by judging whether the start version and end version of the rowset are the same. And the modified judgment logic is: If start version and end version are not the same, or the `segments_overlap` value is `NONOVERLAPPING`. At the same time, I also modified the compaction logic, and the cumulative compaction can handle only one rowset now.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org