[ https://issues.apache.org/jira/browse/HBASE-23601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Duo Zhang reopened HBASE-23601: ------------------------------- Missed on master. Reopen > OutputSink.WriterThread exception gets stuck and repeated indefinietly > ---------------------------------------------------------------------- > > Key: HBASE-23601 > URL: https://issues.apache.org/jira/browse/HBASE-23601 > Project: HBase > Issue Type: Bug > Components: read replicas > Affects Versions: 2.2.2 > Reporter: Szabolcs Bukros > Assignee: Szabolcs Bukros > Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0 > > > When a WriterThread runs into an exception (ie: NotServingRegionException), > the exception is stored in the controller. It is never removed and can not be > overwritten either. > > {code:java} > public void run() { > try { > doRun(); > } catch (Throwable t) { > LOG.error("Exiting thread", t); > controller.writerThreadError(t); > } > }{code} > Thanks to this every time PipelineController.checkForErrors() is called the > same old exception is rethrown. > > For example in RegionReplicaReplicationEndpoint.replicate there is a while > loop that does the actual replicating. Every time it loops, it calls > checkForErrors(), catches the rethrown exception, logs it but does nothing > about it. This results in ~2GB log files in ~5min in my experience. > > My proposal would be to clean up the stored exception when it reaches > RegionReplicaReplicationEndpoint.replicate and make sure we restart the > WriterThread that died throwing it. -- This message was sent by Atlassian Jira (v8.20.10#820010)