A little more about "At certain timing, this method also throw " SnapPuller - java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) at java.util.concurrent.FutureTask.get(FutureTask.java:191) at ....SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:680)
This is less confident scenario. The old core didn't complete the close() method during the reloadCore. Then, it execute the openNewSearcherAndUpdateCommitPoint method. Now, a http request, as an example, finished a process and called the SolrCore close() method. refCount is 0. and go to all other process in the close() On Thu, Oct 20, 2016 at 8:44 AM, Jihwan Kim <jihwa...@gmail.com> wrote: > Hi, > We are using Solr 4.10.4 and experiencing out of memory exception. It > seems the problem is cause by the following code & scenario. > > This is the last part of a fetchLastIndex method in SnapPuller.java > > // we must reload the core after we open the IW back up > if (reloadCore) { > reloadCore(); > } > > if (successfulInstall) { > if (isFullCopyNeeded) { > // let the system know we are changing dir's and the old one > // may be closed > if (indexDir != null) { > LOG.info("removing old index directory " + indexDir); > core.getDirectoryFactory().doneWithDirectory(indexDir); > core.getDirectoryFactory().remove(indexDir); > } > } > if (isFullCopyNeeded) { > solrCore.getUpdateHandler().newIndexWriter(isFullCopyNeeded); > } > > openNewSearcherAndUpdateCommitPoint(isFullCopyNeeded); > } > > Inside the reloadCore, it create a new core, register it, and try to close > the current/old core. When the closing old core process goes normal, it > throws an exception "SnapPull failed :org.apache.solr.common.SolrException: > Index fetch failed Caused by java.lang.RuntimeException: Interrupted while > waiting for core reload to finish Caused by Caused by: java.lang. > InterruptedException." > > Despite this exception, the process seems OK because it just terminate the > SnapPuller thread but all other threads that process the closing go well. > > *Now, the problem is when the close() method called during the reloadCore > doesn't really close the core.* > This is the beginning of the close() method. > public void close() { > int count = refCount.decrementAndGet(); > if (count > 0) return; // close is called often, and only actually > closes if nothing is using it. > if (count < 0) { > log.error("Too many close [count:{}] on {}. Please report this > exception to solr-user@lucene.apache.org", count, this ); > assert false : "Too many closes on SolrCore"; > return; > } > log.info(logid+" CLOSING SolrCore " + this); > > When a HTTP Request is executing, the refCount is greater than 1. So, when > the old core is trying to be closed during the core reload, the if (count > > 0) condition simply return this method. > > Then, fetchLastIndex method in SnapPuller processes next code and execute " > openNewSearcherAndUpdateCommitPoint". If you look at this method, it > tries to open a new searcher of the solrCore which is referenced during the > SnapPuller constructor and I believe this one points to the old core. At > certain timing, this method also throw > SnapPuller - java.lang.InterruptedException > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) > at java.util.concurrent.FutureTask.get(FutureTask.java:191) > at ....SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:680) > > After this exception, things start to go bad. > > *In summary, I have two questions.* > 1. Can you confirm this memory / thread issue? > 2. When the core reload happens successfully (no matter it throws the > exception or not), does Solr need to call the > openNewSearcherAndUpdateCommitPoint > method? > > Thanks. >