sunyuting1 commented on PR #7169: URL: https://github.com/apache/hbase/pull/7169#issuecomment-3125181168
> Does this cause any real problems? As in the below catch block, we mentioned that it is also OK to not delete the committed HStoreFile as it will not cause any data corruption problem. If the file is not deleted in time, data corruption will eventually occur in the deletion scenario (because the deleted data will reappear after the region is reopened). The specific examples are as follows: 1. Assume that file A contains data rowkey:a, value:1; file B contains data rowkey:a, value:2. At this time, the effective data is value:2 2. When the merger of files A and B fails, files A/B still exist, and the generated file C contains data rowkey:a, value:2. At this time, the storefiles list is still A/B, and the actual data files contain three files A/B/C. The effective data is still value:2 (note that file C has not yet been added to the storefiles list) 3. File D writes the deletion mark data rowkey:a, type=DeleteFamily 4. When files A/B/D are merged, file E is generated. At this time, the storefiles list is updated to E (files A/B/D are removed), the data of rowkey:a should be cleared, and the actual data files contain two files C/E 5. After a long time, after the region is reopened, file C is loaded, causing the data of rowkey:a to appear again, and the value is incorrectly restored to 2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org