Re: [PR] HBASE-30115 Introduce approximate progress estimation for TableRecordReader based on row key position [hbase]

via GitHub Mon, 04 May 2026 04:39:34 -0700


jinhyukify commented on PR #8134:
URL: https://github.com/apache/hbase/pull/8134#issuecomment-4370716821


   @junegunn Thank you for your feedback.
   
   ### Pluggable `RowKeyProgress` feels like overkill
   
   Thank you for raising this! I agree the abstraction comes with a real 
maintenance cost, and a string-config switch is appealing.
   
   The reason I'd like to keep the pluggable interface is that we have a 
concrete in-house use case where the row-key salt follows a `[a-z][0-9]{N}` 
pattern. Neither uniform nor hex reports a sensible curve for that schema (the 
byte-level interpretation under-reports through most of the alphabet, and the 
hex interpretation collapses every key past `f` to zero). With the pluggable 
interface, we can just ship a small custom implementation. Otherwise, users 
with similar non-uniform (but not quite hex) schemas would be stuck needing a 
follow-up patch.                                                           
   
   ### Naming could align with `RegionSplitter`
   Great point. I hadn't connected it to RegionSplitter clearly enough. Renamed:
   - `ByteBasedRowKeyProgress` → `UniformRowKeyProgress`
   - `HexPrefixRowKeyProgress` → `HexStringRowKeyProgress`
   
   ### Generalize `HexStringRowKeyProgress` to handle any-length hex keys 
(HexString?)
   Thank you, this was the right call. Removed the 
`hbase.mapreduce.rowkey.progress.hex.prefix.length` config. the class now 
auto-derives the prefix length from the start/stop rows (longest common prefix 
+ a small resolution margin, capped to fit double precision). Non-hex trailing 
bytes contribute zero:                                                          
                                                                                
                                                                                
                                                       
   -  startrow `a1` → `a10000`
   - rowkey `bea\x20\x20\x21\x22` → `bea0000`
   - stoprow  `cf123` → `cf1230`
   
   No prefix length to tune, and works for both pure-hex and hex-prefixed 
strings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-30115 Introduce approximate progress estimation for TableRecordReader based on row key position [hbase]

Reply via email to