[ https://issues.apache.org/jira/browse/HBASE-28905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887679#comment-17887679 ]
Hudson commented on HBASE-28905: -------------------------------- Results for branch master [build #1179 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1179/]: (/) *{color:green}+1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1179/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk17 hadoop3 checks{color} -- For more information [see jdk17 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1179/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Skip excessive evaluations of LINK_NAME_PATTERN and REF_NAME_PATTERN regular > expressions > ---------------------------------------------------------------------------------------- > > Key: HBASE-28905 > URL: https://issues.apache.org/jira/browse/HBASE-28905 > Project: HBase > Issue Type: Improvement > Affects Versions: 2.6.0, 3.0.0-beta-1, 2.7.0 > Reporter: Charles Connell > Assignee: Charles Connell > Priority: Minor > Labels: pull-request-available > Attachments: cpu_time_flamegraph_2.6.0.html, > cpu_time_flamegraph_with_optimization.html, > performance_test_query_latency_2.6.0.png, > performance_test_query_latency_with_optimization.png > > > To test if a file is a link file, HBase checks if its file name matches the > regex > {code:java} > ^(?:((?:[_\p{Digit}\p{IsAlphabetic}]+))(?:\=))?((?:[_\p{Digit}\p{IsAlphabetic}][-_.\p{Digit}\p{IsAlphabetic}]*))=((?:[a-f0-9]+))-([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?)$ > {code} > To test if an HFile has a "reference name," HBase checks if its file name > matches the regex > {code:java} > ^([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?|^(?:((?:[_\p{Digit}\p{IsAlphabetic}]+))(?:\=))?((?:[_\p{Digit}\p{IsAlphabetic}][-_.\p{Digit}\p{IsAlphabetic}]*))=((?:[a-f0-9]+))-([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?)$)\.(.+)$ > {code} > Matching against these big regexes is computationally expensive. HBASE-27474 > introduced (in 2.6.0) [code in a hot > path|https://github.com/apache/hbase/blob/1602c531b245b4d455b48161757cde2ec3d1930b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java#L1716] > in {{HFileReaderImpl}} that checks whether an HFile is a link or reference > file while deciding whether to cache blocks from that file. In flamegraphs > taken at my company during performance tests, this meant that these regex > evaulations take 2-3% of the CPU time on a busy RegionServer. > Later, the hot-path invocation of the regexes was removed in HBASE-28596 in > branch-2 and later, but not branch-2.6, so only the 2.6.x series suffers the > performance regression. Nonetheless, all invocations of these regexes are > still unnecessarily expensive and can be fast-failed easily. > The link name pattern contains a literal "=", so any string that does not > contain a "=" can be assumed to not match the regex. The reference name > pattern contains a literal ".", so any string that does not contain a "." can > be assumed to not match the regex. This optimization is mostly helpful in > 2.6.x, but is valid in all branches. > Running performance tests of this optimization removed the regex evaluations > from my flamegraphs entirely, and reduced query latency by 15%. Some charts > are attached. -- This message was sent by Atlassian Jira (v8.20.10#820010)