[ 
https://issues.apache.org/jira/browse/HBASE-30115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JinHyuk Kim updated HBASE-30115:
--------------------------------
    Description: 
h1. Background

Currently, {{TableRecordReaderImpl.getProgress()}} always returns {*}0{*}, 
providing no progress feedback to the MapReduce framework. This makes it 
impossible for users to monitor scan progress during long-running jobs.

!mapreduce-progress-0.png|width=1095,height=236!

 
h1. Suggestion

This patch estimates progress by converting row keys to numeric values and 
computing the fraction of the key space covered so far: {{(current - start) / 
(stop - start)}}.

Since the {{TableInputFormat}} splitter sets start/stop row keys from region 
boundaries, they are only empty for the table's very first region (empty start) 
or last region (empty stop). In those cases, we *probe* the table with a 
forward or reverse scan (limit 1) to discover the actual boundary row key.

The implementation is pluggable via {{hbase.mapreduce.rowkey.progress.class}} 
configuration:
* {{UniformRowKeyProgress}} (default): treats row keys as raw bytes uniformly 
distributed across the byte range. Works well for most key designs.
* {{HexStringRowKeyProgress}}: interprets leading bytes as hex characters 
({{[0-9a-f]}}). Gives accurate linear progress for tables using hex-encoded 
hash prefixes (e.g. MD5). The raw byte approach is inaccurate for hex keys 
because there are large byte gaps between {{'9'→'a'}} (0x39→0x61) and between 
{{"0f"→"10"}} (0x3066→0x3130) that don't correspond to actual key distance. The 
hex prefix length is auto-derived from the start/stop rows (longest common 
prefix plus a small resolution margin, capped to fit {{double}} precision); 
non-hex trailing bytes contribute zero, so suffixes do not affect progress.
* Users can implement the {{RowKeyProgress}} interface for custom key encoding 
strategies (e.g. non-uniform salts like {{[a-z][0-9]\{N\}}}).

After this change, you can monitor the progress in this way.
 
!mapreduce-progress-after.png|width=1792,height=119!
 
h2. Why a pluggable estimator (and the hex variant) is needed

The default {{UniformRowKeyProgress}} assumes row keys span the full 0x00–0xFF 
byte range. But hex-encoded hash prefixes (MD5/SHA, the most common salting 
scheme) only use {{{}0–9a–f{}}}. The byte gap between {{'9' (0x39)}} and {{'a' 
(0x61)}} contains 39 byte values that no key ever occupies, so byte-level 
interpolation is wildly non-linear.

Concrete example: scan from {{09}} to {{a1}} (see attached graph):
||Real progress||{{UniformRowKeyProgress}}||{{HexStringRowKeyProgress}}||
|50% (key {{{}50{}}})|~10%|~47%|
|88% (key {{{}90{}}})|~18%|~89%|
|99% (key {{{}a0{}}})|~100%|~99%|

!byte-based-vs-hex.png|width=597,height=385!

{{UniformRowKeyProgress}} stays under 20% for nearly the whole job, then snaps 
to 100% the instant the scan crosses into {{{}a*{}}}. This breaks YARN progress 
bars, ETA estimation.

  was:
h1. Background

Currently, {{TableRecordReaderImpl.getProgress()}} always returns {*}0{*}, 
providing no progress feedback to the MapReduce framework. This makes it 
impossible for users to monitor scan progress during long-running jobs.

!mapreduce-progress-0.png|width=1095,height=236!

 
h1. Suggestion

This patch estimates progress by converting row keys to numeric values and 
computing the fraction of the key space covered so far: {{{}(current - start) / 
(stop - start){}}}.

Since the {{TableInputFormat}} splitter sets start/stop row keys from region 
boundaries, they are only empty for the table's very first region (empty start) 
or last region (empty stop). In those cases, we *probe* the table with a 
forward or reverse scan (limit 1) to discover the actual boundary row key.

The implementation is pluggable via {{hbase.mapreduce.rowkey.progress.class}} 
configuration:
 * {{ByteBasedRowKeyProgress}} (default) : treats row keys as raw bytes. Works 
well for most key designs.
 * {{HexPrefixRowKeyProgress}} : interprets leading bytes as hex characters 
([0-9a-f]). Gives accurate linear progress for tables using hex-encoded hash 
prefixes (e.g. MD5). The raw byte approach is inaccurate for hex keys because 
there are large byte gaps between '9'→'a' (0x39→0x61) and between "0f"→"10" 
(0x3066→0x3130) that don't correspond to actual key distance. The prefix length 
is configurable via {{hbase.mapreduce.rowkey.progress.hex.prefix.length}} 
(default 4). Bytes beyond the prefix are ignored, so non-hex suffixes do not 
affect progress.
 * Users can implement the {{RowKeyProgress}} interface for custom key encoding 
strategies.

After this change, you can monitor the progress in this way.
 
!mapreduce-progress-after.png|width=1792,height=119!
 
h2. Why a pluggable estimator (and the hex variant) is needed

The default {{ByteBasedRowKeyProgress}} assumes row keys span the full 
0x00–0xFF byte range. But hex-encoded hash prefixes (MD5/SHA, the most common 
salting scheme) only use {{{}0–9a–f{}}}. The byte gap between {{'9' (0x39)}} 
and {{'a' (0x61)}} contains 39 byte values that no key ever occupies, so 
byte-level interpolation is wildly non-linear.

Concrete example: scan from {{09}} to {{a1}} (see attached graph):
||Real progress||{{ByteBased}}||{{HexPrefix}}||
|50% (key {{{}50{}}})|~10%|~47%|
|88% (key {{{}90{}}})|~18%|~89%|
|99% (key {{{}a0{}}})|~100%|~99%|

!byte-based-vs-hex.png|width=597,height=385!

{{ByteBased}} stays under 20% for nearly the whole job, then snaps to 100% the 
instant the scan crosses into {{{}a*{}}}. This breaks YARN progress bars, ETA 
estimation.


> Introduce approximate progress estimation for TableRecordReader based on row 
> key position
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-30115
>                 URL: https://issues.apache.org/jira/browse/HBASE-30115
>             Project: HBase
>          Issue Type: Task
>          Components: mapreduce
>            Reporter: JinHyuk Kim
>            Assignee: JinHyuk Kim
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: byte-based-vs-hex.png, mapreduce-progress-0.png, 
> mapreduce-progress-after.png
>
>
> h1. Background
> Currently, {{TableRecordReaderImpl.getProgress()}} always returns {*}0{*}, 
> providing no progress feedback to the MapReduce framework. This makes it 
> impossible for users to monitor scan progress during long-running jobs.
> !mapreduce-progress-0.png|width=1095,height=236!
>  
> h1. Suggestion
> This patch estimates progress by converting row keys to numeric values and 
> computing the fraction of the key space covered so far: {{(current - start) / 
> (stop - start)}}.
> Since the {{TableInputFormat}} splitter sets start/stop row keys from region 
> boundaries, they are only empty for the table's very first region (empty 
> start) or last region (empty stop). In those cases, we *probe* the table with 
> a forward or reverse scan (limit 1) to discover the actual boundary row key.
> The implementation is pluggable via {{hbase.mapreduce.rowkey.progress.class}} 
> configuration:
> * {{UniformRowKeyProgress}} (default): treats row keys as raw bytes uniformly 
> distributed across the byte range. Works well for most key designs.
> * {{HexStringRowKeyProgress}}: interprets leading bytes as hex characters 
> ({{[0-9a-f]}}). Gives accurate linear progress for tables using hex-encoded 
> hash prefixes (e.g. MD5). The raw byte approach is inaccurate for hex keys 
> because there are large byte gaps between {{'9'→'a'}} (0x39→0x61) and between 
> {{"0f"→"10"}} (0x3066→0x3130) that don't correspond to actual key distance. 
> The hex prefix length is auto-derived from the start/stop rows (longest 
> common prefix plus a small resolution margin, capped to fit {{double}} 
> precision); non-hex trailing bytes contribute zero, so suffixes do not affect 
> progress.
> * Users can implement the {{RowKeyProgress}} interface for custom key 
> encoding strategies (e.g. non-uniform salts like {{[a-z][0-9]\{N\}}}).
> After this change, you can monitor the progress in this way.
>  
> !mapreduce-progress-after.png|width=1792,height=119!
>  
> h2. Why a pluggable estimator (and the hex variant) is needed
> The default {{UniformRowKeyProgress}} assumes row keys span the full 
> 0x00–0xFF byte range. But hex-encoded hash prefixes (MD5/SHA, the most common 
> salting scheme) only use {{{}0–9a–f{}}}. The byte gap between {{'9' (0x39)}} 
> and {{'a' (0x61)}} contains 39 byte values that no key ever occupies, so 
> byte-level interpolation is wildly non-linear.
> Concrete example: scan from {{09}} to {{a1}} (see attached graph):
> ||Real progress||{{UniformRowKeyProgress}}||{{HexStringRowKeyProgress}}||
> |50% (key {{{}50{}}})|~10%|~47%|
> |88% (key {{{}90{}}})|~18%|~89%|
> |99% (key {{{}a0{}}})|~100%|~99%|
> !byte-based-vs-hex.png|width=597,height=385!
> {{UniformRowKeyProgress}} stays under 20% for nearly the whole job, then 
> snaps to 100% the instant the scan crosses into {{{}a*{}}}. This breaks YARN 
> progress bars, ETA estimation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to