junegunn commented on PR #8001:
URL: https://github.com/apache/hbase/pull/8001#issuecomment-4179453905

   I updated the patch to compare qualifiers of contiguous delete markers, so 
the counter only increments for consecutive markers targeting the same column. 
With this, we don't need such a large N value to avoid the regression in the 
worst case.
   
   N=3 works correctly with this approach:
   
   - Same-column accumulation (the real problem): seeks after 3 markers. Fast 
kick-in.
   - Different-column DCs (false positive case): counter resets on qualifier 
change. All skip as before. No overhead.
   - One delete per row (common case): counter never reaches 3. Zero overhead.
   
   Even with qualifier comparison, false positives remain: exactly N 
consecutive redundant DCs for the same qualifier trigger a seek.
   
   > DC(q1) -skip-> DC(q1) -skip-> DC(q1) -seek-> DC(q2) -skip-> DC(q2) -skip-> 
DC(q2) -seek-> DC(q3)
   
   This should be rare in practice. If overhead is a concern, increasing N is 
the only alternative.
   
   Here are the results.
   
   - Regression in non-redundant DeleteFamily markers is fixed.
       - <img width="2304" height="1920" alt="image" 
src="https://github.com/user-attachments/assets/1b7d193d-87c7-4611-b170-fe4e35edebf9";
 />
   - No overhead in the worst case
       - <img width="2304" height="1920" alt="image" 
src="https://github.com/user-attachments/assets/d640a82a-b7c4-464a-93c5-e3700b9a7ec2";
 />
   - The best case benefit still holds
       - <img width="2304" height="1920" alt="image" 
src="https://github.com/user-attachments/assets/ee606aee-29fb-4b72-bdbb-80d6d6c3e638";
 />
       - <img width="2304" height="1920" alt="image" 
src="https://github.com/user-attachments/assets/577815a8-cfc0-49a6-abf3-bbd7436aa0bb";
 />
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to