RE: Solr Replication during Tomcat shutdown causes shutdown to hang/fail
I was helping to look into this with Nick & I think we may have figured out the core of the problem... The problem is easily reproducible by starting replication on the slave and then sending a shutdown command to tomcat (e.g. catalina.sh stop). With a debugger attached, it looks like the fsyncService thread is blocking VM shutdown because it is created as a non-daemon thread. Essentially what seems to be happening is that the fsyncService thread is running when 'catalina.sh stop' is executed. This goes in and calls SnapPuller.destroy() which aborts the current sync. Around line 517 of the SnapPuller, there is code that is supposed to cleanup the fsyncService thread, but I don't think it is getting executed because the thread that called SnapPuller.fetchLatestIndex() is configured as a daemon Thread, so the JVM ends up shutting that down before it can cleanup the fysncService... So... it seems like: if (fsyncService != null) ExecutorUtil.shutdownNowAndAwaitTermination(fsyncService); could be added around line 1706 of SnapPuller.java, or puller.setDaemon(*false*); could be added around line 230 of ReplicationHandler.java, however this needs some additional work (and I think it might need to be added regardless) since the cleanup code in SnapPuller(around 517) that shuts down the fsync thread never gets execute since logReplicationTimeAndConfFiles() can throw IO exceptions bypassing the rest of the finally block...So the call to logReplicationTimeAndConfFiles() around line 512 would need to get wrapped with a try/catch block to catch the IO exception... I can submit patches if needed... and cross post to the dev mailing list... -Phil
Re: Solr Replication during Tomcat shutdown causes shutdown to hang/fail
see the ticket here: https://issues.apache.org/jira/browse/SOLR-6579 including a patch to fix it. On Thu, Oct 2, 2014 at 9:44 AM, Shawn Heisey wrote: > On 10/2/2014 7:25 AM, Phil Black-Knight wrote: > > I was helping to look into this with Nick & I think we may have figured > out > > the core of the problem... > > > > The problem is easily reproducible by starting replication on the slave > and > > then sending a shutdown command to tomcat (e.g. catalina.sh stop). > > > > With a debugger attached, it looks like the fsyncService thread is > blocking > > VM shutdown because it is created as a non-daemon thread. > > > > > I can submit patches if needed... and cross post to the dev mailing > list... > > File a detailed issue in Jira and attach your patch there. This is our > bugtracker. You need an account on the Apache jira instance to do this: > > https://issues.apache.org/jira/browse/SOLR > > Thanks, > Shawn > >
Re: Solr Replication during Tomcat shutdown causes shutdown to hang/fail
I haven't seen any activity regarding this in Jira, just curious if it would be looked into anytime soon... On Thu, Oct 2, 2014 at 10:11 AM, Phil Black-Knight < pblackkni...@globalgiving.org> wrote: > see the ticket here: > https://issues.apache.org/jira/browse/SOLR-6579 > > including a patch to fix it. > > On Thu, Oct 2, 2014 at 9:44 AM, Shawn Heisey wrote: > >> On 10/2/2014 7:25 AM, Phil Black-Knight wrote: >> > I was helping to look into this with Nick & I think we may have figured >> out >> > the core of the problem... >> > >> > The problem is easily reproducible by starting replication on the slave >> and >> > then sending a shutdown command to tomcat (e.g. catalina.sh stop). >> > >> > With a debugger attached, it looks like the fsyncService thread is >> blocking >> > VM shutdown because it is created as a non-daemon thread. >> >> >> >> > I can submit patches if needed... and cross post to the dev mailing >> list... >> >> File a detailed issue in Jira and attach your patch there. This is our >> bugtracker. You need an account on the Apache jira instance to do this: >> >> https://issues.apache.org/jira/browse/SOLR >> >> Thanks, >> Shawn >> >> >