Please file a JIRA issue so that we can address this. - Mark
On Jul 2, 2013, at 6:20 AM, Daniel Collins <danwcoll...@gmail.com> wrote: > On looking at the code in SolrDispatchFilter, is this intentional or not? > I think I remember Mark Miller mentioning that in an OOM case, the best > course of action is basically to kill the process, there is very little > Solr can do once it has run out of memory. Yet it seems that Solr catches > the OOM itself and just logs it as an error, rather than letting it go back > up the to the JVM. > > We have also seem OOMs in IndexWriter and that has specific code to handle > OOM cases, and seems to fall-back to the transaction log (but fail > committing anything). I understand the logic of that, but in reality, I've > seen the tlog can get corrupted in this case, so we still need to be > monitoring the system and forcibly kill the process. > > > > On 27 June 2013 00:03, Timothy Potter <thelabd...@gmail.com> wrote: > >> Thanks for the feedback Daniel ... For now, I've opted to just kill >> the JVM with System.exit(1) in the SolrDispatchFilter code and will >> restart it with a Linux supervisor. Not elegant but the alternative of >> having a zombie Solr instance walking around my cluster is much worse >> ;-) Will try to dig into the code that is trapping this error but for >> now I've lost too many hours on this problem. >> >> Cheers, >> Tim >> >> On Wed, Jun 26, 2013 at 2:43 PM, Daniel Collins <danwcoll...@gmail.com> >> wrote: >>> Ooh, I guess Jetty is trapping that java.lang.OutOfMemoryError, and >>> throwing it/packaging it as a java.lang.RuntimeException. The -XX option >>> assumes that the application doesn't handle the Errors and so they would >>> reach the JVM and thus invoke the handler. >>> Since Jetty has an exception handler that is dealing with anything >>> (included Errors), they never reach the JVM, hence no handler. >>> >>> Not much we can do short of not using Jetty? >>> >>> That's a pain, I'd just written a nice OOM handler too! >>> >>> >>> On 26 June 2013 20:37, Timothy Potter <thelabd...@gmail.com> wrote: >>> >>>> A little more to this ... >>>> >>>> Just on chance this was a weird Jetty issue or something, I tried with >>>> the latest 9.... and the problem still occurs :-( >>>> >>>> This is on Java 7 on debian: >>>> >>>> java version "1.7.0_21" >>>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11) >>>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) >>>> >>>> Here is an example stack trace from the log >>>> >>>> 2013-06-26 19:31:33,801 [qtp632640515-62] ERROR >>>> solr.servlet.SolrDispatchFilter Q:22 - >>>> null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap >>>> space >>>> at >>>> >> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:670) >>>> at >>>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380) >>>> at >>>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) >>>> at >>>> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423) >>>> at >>>> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450) >>>> at >>>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) >>>> at >>>> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) >>>> at >>>> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) >>>> at >>>> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083) >>>> at >>>> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379) >>>> at >>>> >> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) >>>> at >>>> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017) >>>> at >>>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) >>>> at >>>> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258) >>>> at >>>> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) >>>> at >>>> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) >>>> at org.eclipse.jetty.server.Server.handle(Server.java:445) >>>> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260) >>>> at >>>> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225) >>>> at >>>> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358) >>>> at >>>> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596) >>>> at >>>> >> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527) >>>> at java.lang.Thread.run(Thread.java:722) >>>> Caused by: java.lang.OutOfMemoryError: Java heap space >>>> >>>> On Wed, Jun 26, 2013 at 12:27 PM, Timothy Potter <thelabd...@gmail.com> >>>> wrote: >>>>> Recently upgraded to 4.3.1 but this problem has persisted for a while >>>> now ... >>>>> >>>>> I'm using the following configuration when starting Jetty: >>>>> >>>>> -XX:OnOutOfMemoryError="/home/solr/oom_killer.sh 83 %p" >>>>> >>>>> If an OOM is triggered during Solr web app initialization (such as by >>>>> me lowering -Xmx to a value that is too low to initialize Solr with), >>>>> then the script gets called and does what I expect! >>>>> >>>>> However, once the Solr webapp initializes and Solr is happily >>>>> responding to updates and queries. When an OOM occurs in this >>>>> situation, then the script doesn't actually get invoked! All I see is >>>>> the following in the stdout/stderr log of my process: >>>>> >>>>> # >>>>> # java.lang.OutOfMemoryError: Java heap space >>>>> # -XX:OnOutOfMemoryError="/home/solr/oom_killer.sh 83 %p" >>>>> # Executing /bin/sh -c "/home/solr/oom_killer.sh 83 21358"... >>>>> >>>>> The oom_killer.sh script doesn't actually get called! >>>>> >>>>> So to recap, it works if an OOM occurs during initialization but once >>>>> Solr is running, the OOM killer doesn't fire correctly. This leads me >>>>> to believe my script is fine and there's something else going wrong. >>>>> Here's the oom_killer.sh script (pretty basic): >>>>> >>>>> #!/bin/bash >>>>> SOLR_PORT=$1 >>>>> SOLR_PID=$2 >>>>> NOW=$(date +"%Y%m%d_%H%M") >>>>> ( >>>>> echo "Running OOM killer script for process $SOLR_PID for Solr on port >>>>> 89$SOLR_PORT" >>>>> kill -9 $SOLR_PID >>>>> echo "Killed process $SOLR_PID" >>>>> exec /home/solr/solr-dg/dg-solr.sh recover $SOLR_PORT & >>>>> echo "Restarted Solr on 89$SOLR_PORT after OOM" >>>>> ) | tee oom_killer-89$SOLR_PORT-$NOW.log >>>>> >>>>> Anyone see anything like this before? Suggestions on where to begin >>>>> tracking down this issue? >>>>> >>>>> Cheers, >>>>> Tim >>>> >>