jinhyukify commented on PR #8134:
URL: https://github.com/apache/hbase/pull/8134#issuecomment-4370716821
@junegunn Thank you for your feedback.
### Pluggable `RowKeyProgress` feels like overkill
Thank you for raising this! I agree the abstraction comes with a real
maintenance cost, and a string-config switch is appealing.
The reason I'd like to keep the pluggable interface is that we have a
concrete in-house use case where the row-key salt follows a `[a-z][0-9]{N}`
pattern. Neither uniform nor hex reports a sensible curve for that schema (the
byte-level interpretation under-reports through most of the alphabet, and the
hex interpretation collapses every key past `f` to zero). With the pluggable
interface, we can just ship a small custom implementation. Otherwise, users
with similar non-uniform (but not quite hex) schemas would be stuck needing a
follow-up patch.
### Naming could align with `RegionSplitter`
Great point. I hadn't connected it to RegionSplitter clearly enough. Renamed:
- `ByteBasedRowKeyProgress` → `UniformRowKeyProgress`
- `HexPrefixRowKeyProgress` → `HexStringRowKeyProgress`
### Generalize `HexStringRowKeyProgress` to handle any-length hex keys
(HexString?)
Thank you, this was the right call. Removed the
`hbase.mapreduce.rowkey.progress.hex.prefix.length` config. the class now
auto-derives the prefix length from the start/stop rows (longest common prefix
+ a small resolution margin, capped to fit double precision). Non-hex trailing
bytes contribute zero:
- startrow `a1` → `a10000`
- rowkey `bea\x20\x20\x21\x22` → `bea0000`
- stoprow `cf123` → `cf1230`
No prefix length to tune, and works for both pure-hex and hex-prefixed
strings.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]