I found the suggesters very memory hungry. I had one particularly large index where the suggester should have been filtering a small number of docs, but was mmap'ing the entire index. I only ever saw this behavior with the suggesters.
On 22 November 2017 at 03:17, Walter Underwood <wun...@wunderwood.org> wrote: > All our customizations are in solr.in.sh. We’re using the one we > configured for 6.3.0. I’ll check for any differences between that and the > 6.5.1 script. > > I don’t see any arguments at all in the dashboard. I do see them in a ps > listing, right at the end. > > java -server -Xms8g -Xmx8g -XX:+UseG1GC -XX:+ParallelRefProcEnabled > -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages > -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -verbose:gc > -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps > -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime > -Xloggc:/solr/logs/solr_gc.log -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M > -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.port=18983 > -Dcom.sun.management.jmxremote.rmi.port=18983 > -Djava.rmi.server.hostname=new-solr-c01.test3.cloud.cheggnet.com > -DzkClientTimeout=15000 -DzkHost=zookeeper1.test3.cloud.cheggnet.com:2181, > zookeeper2.test3.cloud.cheggnet.com:2181,zookeeper3.test3.cloud. > cheggnet.com:2181/solr-cloud -Dsolr.log.level=WARN > -Dsolr.log.dir=/solr/logs -Djetty.port=8983 -DSTOP.PORT=7983 > -DSTOP.KEY=solrrocks -Dhost=new-solr-c01.test3.cloud.cheggnet.com > -Duser.timezone=UTC -Djetty.home=/apps/solr6/server > -Dsolr.solr.home=/apps/solr6/server/solr -Dsolr.install.dir=/apps/solr6 > -Dgraphite.prefix=solr-cloud.new-solr-c01 -Dgraphite.host=influx.test. > cheggnet.com -javaagent:/apps/solr6/newrelic/newrelic.jar > -Dnewrelic.environment=test3 -Dsolr.log.muteconsole -Xss256k > -Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/apps/solr6/bin/oom_solr.sh > 8983 /solr/logs -jar start.jar --module=http > > I’m still confused why we are hitting OOM in 6.5.1 but weren’t in 6.3.0. > Our load benchmarks use prod logs. We added suggesters, but those use > analyzing infix, so they are search indexes, not in-memory. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Nov 21, 2017, at 5:46 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > > > On 11/20/2017 6:17 PM, Walter Underwood wrote: > >> When I ran load benchmarks with 6.3.0, an overloaded cluster would get > super slow but keep functioning. With 6.5.1, we hit 100% CPU, then start > getting OOMs. That is really bad, because it means we need to reboot every > node in the cluster. > >> Also, the JVM OOM hook isn’t running the process killer (JVM > 1.8.0_121-b13). Using the G1 collector with the Shawn Heisey settings in an > 8G heap. > > <snip> > >> This is not good behavior in prod. The process goes to the bad place, > then we need to wait until someone is paged and kills it manually. Luckily, > it usually drops out of the live nodes for each collection and doesn’t take > user traffic. > > > > There was a bug, fixed long before 6.3.0, where the OOM killer script > wasn't working because the arguments enabling it were in the wrong place. > It was fixed in 5.5.1 and 6.0. > > > > https://issues.apache.org/jira/browse/SOLR-8145 > > > > If the scripts that you are using to get Solr started originated with a > much older version of Solr than you are currently running, maybe you've got > the arguments in the wrong order. > > > > Do you see the commandline arguments for the OOM killer (only available > on *NIX systems, not Windows) on the admin UI dashboard? If they are > properly placed, you will see them on the dashboard, but if they aren't > properly placed, then you won't see them. This is what the argument looks > like for one of my Solr installs: > > > > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs > > > > Something which you probably already know: If you're hitting OOM, you > need a larger heap, or you need to adjust the config so it uses less > memory. There are no other ways to "fix" OOM problems. > > > > Thanks, > > Shawn > >