[ 
https://issues.apache.org/jira/browse/HBASE-28905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888129#comment-17888129
 ] 

Hudson commented on HBASE-28905:
--------------------------------

Results for branch branch-2.6
        [build #218 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/218/]:
 (/) *{color:green}+1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/218/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/218/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/218/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/218/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk17 hadoop3 checks{color}
-- For more information [see jdk17 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.6/218/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Skip excessive evaluations of LINK_NAME_PATTERN and REF_NAME_PATTERN regular 
> expressions
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-28905
>                 URL: https://issues.apache.org/jira/browse/HBASE-28905
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.6.0, 3.0.0-beta-1, 2.7.0
>            Reporter: Charles Connell
>            Assignee: Charles Connell
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 3.0.0, 2.7.0, 2.6.1
>
>         Attachments: cpu_time_flamegraph_2.6.0.html, 
> cpu_time_flamegraph_with_optimization.html, 
> performance_test_query_latency_2.6.0.png, 
> performance_test_query_latency_with_optimization.png
>
>
> To test if a file is a link file, HBase checks if its file name matches the 
> regex
> {code:java}
> ^(?:((?:[_\p{Digit}\p{IsAlphabetic}]+))(?:\=))?((?:[_\p{Digit}\p{IsAlphabetic}][-_.\p{Digit}\p{IsAlphabetic}]*))=((?:[a-f0-9]+))-([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?)$
> {code}
> To test if an HFile has a "reference name," HBase checks if its file name 
> matches the regex
> {code:java}
> ^([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?|^(?:((?:[_\p{Digit}\p{IsAlphabetic}]+))(?:\=))?((?:[_\p{Digit}\p{IsAlphabetic}][-_.\p{Digit}\p{IsAlphabetic}]*))=((?:[a-f0-9]+))-([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?)$)\.(.+)$
> {code}
> Matching against these big regexes is computationally expensive. HBASE-27474 
> introduced (in 2.6.0) [code in a hot 
> path|https://github.com/apache/hbase/blob/1602c531b245b4d455b48161757cde2ec3d1930b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java#L1716]
>  in {{HFileReaderImpl}} that checks whether an HFile is a link or reference 
> file while deciding whether to cache blocks from that file. In flamegraphs 
> taken at my company during performance tests, this meant that these regex 
> evaulations take 2-3% of the CPU time on a busy RegionServer.
> Later, the hot-path invocation of the regexes was removed in HBASE-28596 in 
> branch-2 and later, but not branch-2.6, so only the 2.6.x series suffers the 
> performance regression. Nonetheless, all invocations of these regexes are 
> still unnecessarily expensive and can be fast-failed easily.
> The link name pattern contains a literal "=", so any string that does not 
> contain a "=" can be assumed to not match the regex. The reference name 
> pattern contains a literal ".", so any string that does not contain a "." can 
> be assumed to not match the regex. This optimization is mostly helpful in 
> 2.6.x, but is valid in all branches.
> Running performance tests of this optimization removed the regex evaluations 
> from my flamegraphs entirely, and reduced query latency by 15%. Some charts 
> are attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to