[
https://issues.apache.org/jira/browse/HBASE-29502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18017158#comment-18017158
]
Hudson commented on HBASE-29502:
--------------------------------
Results for branch branch-2
[build #1316 on
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1316/]:
(/) *{color:green}+1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1316/General_20Nightly_20Build_20Report/]
(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2)
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1316/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]
(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3)
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1316/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1316/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop3 checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1316/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop 3.3.5 backward compatibility checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1316/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop 3.3.6 backward compatibility checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1316/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 jdk17 hadoop 3.4.0 backward compatibility checks{color}
-- For more information [see jdk17
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1316/JDK17_20Nightly_20Build_20Report_20_28Hadoop3_29/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test for HBase 2 {color}
(/) {color:green}+1 client integration test for 3.3.5 {color}
(/) {color:green}+1 client integration test for 3.3.6 {color}
(/) {color:green}+1 client integration test for 3.4.0 {color}
(/) {color:green}+1 client integration test for 3.4.1 {color}
> RegionReplicaReplicationEndpoint fails to forward mutations when meta cache
> does not contain secondary replica locations
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-29502
> URL: https://issues.apache.org/jira/browse/HBASE-29502
> Project: HBase
> Issue Type: Bug
> Components: read replicas
> Affects Versions: 2.7.0, 2.6.3
> Reporter: Charles Connell
> Assignee: Charles Connell
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.7.0, 2.6.4
>
>
> (this only affects 2.x versions)
> When region replicas are enabled in "asynchronous WAL replication" mode, each
> RegionServer uses a {{RegionReplicaReplicationEndpoint}} object to tail its
> own WAL. Each mutation in its WAL may be related to a region which has its
> primary replica on this RegionServer, and has one or more secondary replicas
> on other servers. So, for each mutation in the WAL,
> {{RegionReplicaReplicationEndpoint}} decides whether any other servers are
> hosting replicas of the relevant region, and if so, sends an RPC to those
> servers containing the mutations they should apply to their memstores.
> When region replicas are enabled, a {{RegionReplicaReplicationEndpoint}}
> instance is created, with its own {{ConnectionImplementation}} and therefore
> its own {{MetaCache}}. This {{RegionReplicaReplicationEndpoint}} immediately
> starts attempting to send mutations to secondary replica regions, even though
> they will not be open for a few more seconds or minutes. In this moment, the
> {{MetaCache}} gets populated with entries that say that most regions are
> hosted on only one server. These cached lookups remain in use indefinitely,
> even though they are incorrect for most of their lifetime. Without knowing
> where the secondary replica regions are hosted, or if they exist at all, the
> {{RegionReplicaReplicationEndpoint}} cannot forward mutations to them. This
> leads to the secondary replica regions' memstores not getting updates, so
> their data is even more stale than it should be. Users would get
> unnecessarily incorrect results.
> {{RegionReplicaReplicationEndpoint}} actually contains cache-busting logic
> seemingly designed to fix this exact problem:
> {code:java}
> // Replicas can take a while to come online. The cache may have only the
> primary. If we
> // keep going to the cache, we will not learn of the replicas and their
> locations after
> // they come online.
> if (useCache && locations.size() == 1 &&
> TableName.isMetaTableName(tableName)) {
> if (tableDescriptors.get(tableName).getRegionReplication() > 1) {
> // Make an obnoxious log here. See how bad this issue is. Add a timer if
> happening
> // too much.
> LOG.info("Skipping location cache; only one location found for {}",
> tableName);
> useCache = false;
> continue;
> }
> }
> {code}
> However, because of the {{TableName.isMetaTableName(tableName)}} clause, the
> cache-busting only takes effect if the mutation being forwarded belongs to
> the meta table. I don't know why that restriction would make sense.
> In this ticket I plan to just remove the "is meta table" clause to fix this
> bug.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)