[ https://issues.apache.org/jira/browse/SOLR-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220704#comment-17220704 ]
Anver Sotnikov commented on SOLR-14940: --------------------------------------- Mike, you were right. We instrumented Solr with extra logging on registerHook and shutdown in ReplicationController to confirm that leak was due to flaky ZK connection. We bumped timeouts (SOLR-10471) and fine tuned GC as well. Replication going into recovery happens way less then it was before. Stacktrace from registerHook {code} at org.apache.solr.handler.ReplicationHandler.registerCloseHook(ReplicationHandler.java:1397) java.lang.RuntimeException: ReplicationHandler.registerCloseHooks at org.apache.solr.handler.ReplicationHandler.registerCloseHook(ReplicationHandler.java:1397) ~[?:?] at org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:1239) ~[?:?] at org.apache.solr.cloud.ReplicateFromLeader.startReplication(ReplicateFromLeader.java:109) ~[?:?] at org.apache.solr.cloud.ZkController.startReplicationFromLeader(ZkController.java:1327) ~[?:?] at org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:713) ~[?:?] at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:334) ~[?:?] at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:317) ~[?:?] at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) ~[metrics-core-4.1.5.jar:4.1.5] at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?] at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?] at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?] at java.lang.Thread.run(Unknown Source) {code} > ReplicationHandler memory leak through SolrCore.closeHooks > ---------------------------------------------------------- > > Key: SOLR-14940 > URL: https://issues.apache.org/jira/browse/SOLR-14940 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: replication (java) > Environment: Solr Cloud Cluster on v.8.6.2 configured as 3 TLOG nodes > with 2 cores in each JVM. > > Reporter: Anver Sotnikov > Priority: Major > Attachments: Actual references to hooks that in turn hold references > to ReplicationHandlers.png, Memory Analyzer SolrCore.closeHooks .png > > Time Spent: 2h > Remaining Estimate: 0h > > We are experiencing a memory leak in Solr Cloud cluster configured as 3 TLOG > nodes. > Leader does not seem to be affected while Followers are. > > Looking at memory dump we noticed that SolrCore holds lots of references to > ReplicationHandler through anonymous inner classes in SolrCore.closeHooks, > which in turn holds ReplicationHandlers. > ReplicationHandler registers hooks as anonymous inner classes in > SolrCore.closeHooks through ReplicationHandler.inform() -> > ReplicationHandler.registerCloseHook(). > > Whenever ZkController.stopReplicationFromLeader is called - it would shutdown > ReplicationHandler (ReplicationHandler.shutdown()), BUT reference to > ReplicationHandler will stay in SolrCore.closeHooks. Once replication is > started again on same SolrCore - new ReplicationHandler will be created and > registered in closeHooks. > > It looks like there are few scenarios when replication is stopped and > restarted on same core and in our TLOG setup it shows up quite often. > > Potential solutions: > # Allow unregistering SolrCore.closeHooks so it can be used from > ReplicationHandler.shutdown > # Hack but easier - break the link between ReplicationHandler close hooks > and full ReplicationHandler object so ReplicationHandler can be GCed even > when hooks are still registered in SolrCore.closeHooks -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org