ZhangYu0123 opened a new issue #4017:
URL: https://github.com/apache/incubator-doris/issues/4017


   **Describe the bug**
   
   Because the compaction task on the BE will continuously merge the Rowset 
version, the useless Rowset after the merge is deleted. At this time, if the 
query version issued by the FE is among the merged versions, the BE can not 
obtain the Rowset version path to be queried, and the error 
OLAP_ERR_VERSION_ALREADY_MERGED = -230 is returned.
   
   The specific meaning of this error can be found in #3270. And in PR #3271,  
#3859
   
   
   **Resolution**
   In order to not only ensure efficient compaction of Rowset merge, but also 
be able to query the previous version when querying, and make low-risk changes 
at the same time. This design adds the logic of the delayed deletion of the 
merged Rowset. The main ideas are as follows:
   
   (1) Data structure changes
   - Add _expired_snapshot_rs_version_map to the Tablet to maintain the merged 
Rowset.
   - Add _expired_snapshot_rs_metas to TabletMeta to maintain the merged 
RowsetMeta. 
   - Redefine the RowsetGraph structure in Rowset and change it to 
VersionedRowsetTracker, with the following responsibilities:
   a) Including the original RowsetGraph function, adding path information to 
the Vertex. The same path indicates the path that has been merged, and when 
pathVersion is -1, it indicates that the Rowset has not been merged.
   b) Join to maintain the merged Rowset collection 
_expired_snapshot_rs_path_map. The key of the map is the pathVersion and the 
value is the Rowset list with the same pathVersion.
   c) Maintain the current maximum path value and assign the Vertex 
corresponding to the Rowset merged next time.
   
![image](https://user-images.githubusercontent.com/67053339/86512214-adb23100-be32-11ea-81af-be059a5ba955.png)
   Among them, the Rowset version on the path where the pathVersion is not -1 
is the Rowset that can be deleted by delay.
   
   (2) Compaction process changes
   - After compaction merge, enter the modify_rowsets stage. At the end of the 
modify_rowsets, the tablet adds the rowset deleted from rs_version_map to 
_expired_snaphort_rs_version_map; the same applies to the deletion of 
RowsetMeta.
   - In the reconstruct_rowset_graph reconstruction logic of 
VersionedRowsetTracker, also add Rowset of _expired_snapshot_rs_metas to build 
VersionedRowsetTracker.  Add the merged Rowset list  to 
_expired_snapshot_rs_path_map, and the pathVersion is incremented by 1.
   - Remove the gc operation in the last compaction.
   
   (3) GC process changes
   -  Add cleanup task of _expired_snapshot_rs_metas to start_trash_sweep of 
TabletManager.
   -  When cleaning, check all paths in VersionedRowsetTracker where 
pathVersion is not -1. When the createtime of Rowset with the largest version 
number in a path is greater than 
config:tablet_rowset_expired_snapshot_sweep_time (new configuration, the 
default is 30 minutes), add Rowset on the entire pathVersion path to 
storage_engine's unused_rowset for cleaning.
   - After cleaning, use _expired_snapshot_rs_metas and _rs_meta to reconstruct 
VersionedRowsetTracker. At the same time, delete the key of the corresponding 
cleaned pathVersion in _expired_snapshot_rs_path_map.
   
   (4) Find the Rowset to be read
   When reading data, increase to find rowset in 
_expired_snapshot_rs_version_map.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to