Re: OutOfMemoryError in 6.5.1

Erick Erickson Tue, 21 Nov 2017 08:56:49 -0800

bq: but those use analyzing infix, so they are search indexes, not in-memory


Sure, but they still can consume heap. Most of the index is MMapped of
course, but there are some control structures, indexes and the like
still kept on the heap.

I suppose not using the suggester would nail it though.

I guess the second thing I'd be interested in is a heap dump of the
two to get a sense of whether something really wonky crept in between
those versions. Certainly nothing intentional that I know of.

Erick

On Tue, Nov 21, 2017 at 8:17 AM, Walter Underwood <[email protected]> wrote:
> All our customizations are in solr.in.sh. We’re using the one we configured 
> for 6.3.0. I’ll check for any differences between that and the 6.5.1 script.
>
> I don’t see any arguments at all in the dashboard. I do see them in a ps 
> listing, right at the end.
>
> java -server -Xms8g -Xmx8g -XX:+UseG1GC -XX:+ParallelRefProcEnabled 
> -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages 
> -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -verbose:gc 
> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
> -XX:+PrintGCApplicationStoppedTime -Xloggc:/solr/logs/solr_gc.log 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M 
> -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.local.only=false 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.port=18983 
> -Dcom.sun.management.jmxremote.rmi.port=18983 
> -Djava.rmi.server.hostname=new-solr-c01.test3.cloud.cheggnet.com 
> -DzkClientTimeout=15000 
> -DzkHost=zookeeper1.test3.cloud.cheggnet.com:2181,zookeeper2.test3.cloud.cheggnet.com:2181,zookeeper3.test3.cloud.cheggnet.com:2181/solr-cloud
>  -Dsolr.log.level=WARN -Dsolr.log.dir=/solr/logs -Djetty.port=8983 
> -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks 
> -Dhost=new-solr-c01.test3.cloud.cheggnet.com -Duser.timezone=UTC 
> -Djetty.home=/apps/solr6/server -Dsolr.solr.home=/apps/solr6/server/solr 
> -Dsolr.install.dir=/apps/solr6 -Dgraphite.prefix=solr-cloud.new-solr-c01 
> -Dgraphite.host=influx.test.cheggnet.com 
> -javaagent:/apps/solr6/newrelic/newrelic.jar -Dnewrelic.environment=test3 
> -Dsolr.log.muteconsole -Xss256k -Dsolr.log.muteconsole 
> -XX:OnOutOfMemoryError=/apps/solr6/bin/oom_solr.sh 8983 /solr/logs -jar 
> start.jar --module=http
>
> I’m still confused why we are hitting OOM in 6.5.1 but weren’t in 6.3.0. Our 
> load benchmarks use prod logs. We added suggesters, but those use analyzing 
> infix, so they are search indexes, not in-memory.
>
> wunder
> Walter Underwood
> [email protected]
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Nov 21, 2017, at 5:46 AM, Shawn Heisey <[email protected]> wrote:
>>
>> On 11/20/2017 6:17 PM, Walter Underwood wrote:
>>> When I ran load benchmarks with 6.3.0, an overloaded cluster would get 
>>> super slow but keep functioning. With 6.5.1, we hit 100% CPU, then start 
>>> getting OOMs. That is really bad, because it means we need to reboot every 
>>> node in the cluster.
>>> Also, the JVM OOM hook isn’t running the process killer (JVM 
>>> 1.8.0_121-b13). Using the G1 collector with the Shawn Heisey settings in an 
>>> 8G heap.
>> <snip>
>>> This is not good behavior in prod. The process goes to the bad place, then 
>>> we need to wait until someone is paged and kills it manually. Luckily, it 
>>> usually drops out of the live nodes for each collection and doesn’t take 
>>> user traffic.
>>
>> There was a bug, fixed long before 6.3.0, where the OOM killer script wasn't 
>> working because the arguments enabling it were in the wrong place.  It was 
>> fixed in 5.5.1 and 6.0.
>>
>> https://issues.apache.org/jira/browse/SOLR-8145
>>
>> If the scripts that you are using to get Solr started originated with a much 
>> older version of Solr than you are currently running, maybe you've got the 
>> arguments in the wrong order.
>>
>> Do you see the commandline arguments for the OOM killer (only available on 
>> *NIX systems, not Windows) on the admin UI dashboard?  If they are properly 
>> placed, you will see them on the dashboard, but if they aren't properly 
>> placed, then you won't see them.  This is what the argument looks like for 
>> one of my Solr installs:
>>
>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs
>>
>> Something which you probably already know:  If you're hitting OOM, you need 
>> a larger heap, or you need to adjust the config so it uses less memory.  
>> There are no other ways to "fix" OOM problems.
>>
>> Thanks,
>> Shawn
>

Re: OutOfMemoryError in 6.5.1

Reply via email to