sunyuting1 commented on PR #7169:
URL: https://github.com/apache/hbase/pull/7169#issuecomment-3125181168

   > Does this cause any real problems? As in the below catch block, we 
mentioned that it is also OK to not delete the committed HStoreFile as it will 
not cause any data corruption problem.
   
   If the file is not deleted in time, data corruption will eventually occur in 
the deletion scenario (because the deleted data will reappear after the region 
is reopened). The specific examples are as follows:
   
   1. Assume that file A contains data rowkey:a, value:1; file B contains data 
rowkey:a, value:2. At this time, the effective data is value:2
   2. When the merger of files A and B fails, files A/B still exist, and the 
generated file C contains data rowkey:a, value:2. At this time, the storefiles 
list is still A/B, and the actual data files contain three files A/B/C. The 
effective data is still value:2 (note that file C has not yet been added to the 
storefiles list)
   3. File D writes the deletion mark data rowkey:a, type=DeleteFamily
   4. When files A/B/D are merged, file E is generated. At this time, the 
storefiles list is updated to E (files A/B/D are removed), the data of rowkey:a 
should be cleared, and the actual data files contain two files C/E
   5. After a long time, after the region is reopened, file C is loaded, 
causing the data of rowkey:a to appear again, and the value is incorrectly 
restored to 2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to