[ 
https://issues.apache.org/jira/browse/SOLR-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220704#comment-17220704
 ] 

Anver Sotnikov commented on SOLR-14940:
---------------------------------------

Mike, you were right. We instrumented Solr with extra logging on registerHook 
and shutdown in ReplicationController to confirm that leak was due to flaky ZK 
connection. We bumped timeouts (SOLR-10471) and fine tuned GC as well. 
Replication going into recovery happens way less then it was before.

Stacktrace from registerHook 
{code}
at 
org.apache.solr.handler.ReplicationHandler.registerCloseHook(ReplicationHandler.java:1397)
java.lang.RuntimeException: ReplicationHandler.registerCloseHooks
        at 
org.apache.solr.handler.ReplicationHandler.registerCloseHook(ReplicationHandler.java:1397)
 ~[?:?]
        at 
org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:1239) 
~[?:?]
        at 
org.apache.solr.cloud.ReplicateFromLeader.startReplication(ReplicateFromLeader.java:109)
 ~[?:?]
        at 
org.apache.solr.cloud.ZkController.startReplicationFromLeader(ZkController.java:1327)
 ~[?:?]
        at 
org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:713)
 ~[?:?]
        at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:334) 
~[?:?]
        at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:317) ~[?:?]
        at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
 ~[metrics-core-4.1.5.jar:4.1.5]
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
~[?:?]
        at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
        at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
 ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
~[?:?]
        at java.lang.Thread.run(Unknown Source)
{code}



> ReplicationHandler memory leak through SolrCore.closeHooks
> ----------------------------------------------------------
>
>                 Key: SOLR-14940
>                 URL: https://issues.apache.org/jira/browse/SOLR-14940
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: replication (java)
>         Environment: Solr Cloud Cluster on v.8.6.2 configured as 3 TLOG nodes 
> with 2 cores in each JVM.
>  
>            Reporter: Anver Sotnikov
>            Priority: Major
>         Attachments: Actual references to hooks that in turn hold references 
> to ReplicationHandlers.png, Memory Analyzer SolrCore.closeHooks .png
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> We are experiencing a memory leak in Solr Cloud cluster configured as 3 TLOG 
> nodes.
> Leader does not seem to be affected while Followers are.
>  
> Looking at memory dump we noticed that SolrCore holds lots of references to 
> ReplicationHandler through anonymous inner classes in SolrCore.closeHooks, 
> which in turn holds ReplicationHandlers.
> ReplicationHandler registers hooks as anonymous inner classes in 
> SolrCore.closeHooks through ReplicationHandler.inform() -> 
> ReplicationHandler.registerCloseHook().
>  
> Whenever ZkController.stopReplicationFromLeader is called - it would shutdown 
> ReplicationHandler (ReplicationHandler.shutdown()), BUT reference to 
> ReplicationHandler will stay in SolrCore.closeHooks. Once replication is 
> started again on same SolrCore - new ReplicationHandler will be created and 
> registered in closeHooks.
>  
> It looks like there are few scenarios when replication is stopped and 
> restarted on same core and in our TLOG setup it shows up quite often.
>  
> Potential solutions:
>  # Allow unregistering SolrCore.closeHooks so it can be used from 
> ReplicationHandler.shutdown
>  # Hack but easier - break the link between ReplicationHandler close hooks 
> and full ReplicationHandler object so ReplicationHandler can be GCed even 
> when hooks are still registered in SolrCore.closeHooks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to