Re: solrcloud used a lot of memory and memory keep increasing during long time run

zhenglingyun Sun, 20 Dec 2015 23:44:19 -0800

Just now, I see about 40 "Searchers@XXXX main" displayed in Solr Web UI: 
collection -> Plugins/Stats -> CORE


I think it’s abnormal!

softcommit is set to 1.5s, but warmupTime needs about 3s
Does it lead to so many Searchers?

maxWarmingSearchers is set to 4 in my solrconfig.xml,
doesn’t it will prevent Solr from creating more than 4 Searchers?



> 在 2015年12月21日，14:43，zhenglingyun <konghuaru...@163.com> 写道：
> 
> Thanks Erick for pointing out the memory change in a sawtooth pattern.
> The problem troubles me is that the bottom point of the sawtooth keeps 
> increasing.
> And when the used capacity of old generation exceeds the threshold set by 
> CMS’s
> CMSInitiatingOccupancyFraction, gc keeps running and uses a lot of CPU cycle
> but the used old generation memory does not decrease.
> 
> After I take Rahul’s advice, I decrease the Xms and Xmx from 16G to 8G, and
> adjust the parameters of JVM from
>    -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>    -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70
>    -XX:+CMSParallelRemarkEnabled
> to
>    -XX:NewRatio=3
>    -XX:SurvivorRatio=4
>    -XX:TargetSurvivorRatio=90
>    -XX:MaxTenuringThreshold=8
>    -XX:+UseConcMarkSweepGC
>    -XX:+UseParNewGC
>    -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
>    -XX:+CMSScavengeBeforeRemark
>    -XX:PretenureSizeThreshold=64m
>    -XX:+UseCMSInitiatingOccupancyOnly
>    -XX:CMSInitiatingOccupancyFraction=50
>    -XX:CMSMaxAbortablePrecleanTime=6000
>    -XX:+CMSParallelRemarkEnabled
>    -XX:+ParallelRefProcEnabled
>    -XX:-CMSConcurrentMTEnabled
> which is taken from bin/solr.in.sh
> I hope this can reduce gc pause time and full gc times.
> And maybe the memory increasing problem will disappear if I’m lucky.
> 
> After several day's running, the memory on one of my two servers increased to 
> 90% again…
> (When solr is started, the memory used by solr is less than 1G.)
> 
> Following is the output of stat -gccause -h5 <pid> 1000:
> 
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
> LGCC                 GCC
>  9.56   0.00   8.65  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 
> Allocation Failure   No GC
>  9.56   0.00  51.10  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 
> Allocation Failure   No GC
>  0.00   9.23  10.23  91.35  65.89  69380 3076.135 16563 1579.639 4655.774 
> Allocation Failure   No GC
>  7.90   0.00   9.74  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 CMS 
> Final Remark     No GC
>  7.90   0.00  67.45  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 CMS 
> Final Remark     No GC
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
> LGCC                 GCC
>  0.00   7.48  16.18  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 CMS 
> Initial Mark     No GC
>  0.00   7.48  73.77  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 CMS 
> Initial Mark     No GC
>  8.61   0.00  29.86  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 
> Allocation Failure   No GC
>  8.61   0.00  90.16  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 
> Allocation Failure   No GC
>  0.00   7.46  47.89  91.46  65.89  69384 3076.258 16565 1579.707 4655.966 
> Allocation Failure   No GC
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
> LGCC                 GCC
>  8.67   0.00  11.98  91.49  65.89  69385 3076.287 16565 1579.707 4655.995 
> Allocation Failure   No GC
>  0.00  11.76   9.24  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 CMS 
> Final Remark     No GC
>  0.00  11.76  64.53  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 CMS 
> Final Remark     No GC
>  7.25   0.00  20.39  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 CMS 
> Initial Mark     No GC
>  7.25   0.00  81.56  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 CMS 
> Initial Mark     No GC
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
> LGCC                 GCC
>  0.00   8.05  34.42  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 
> Allocation Failure   No GC
>  0.00   8.05  84.17  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 
> Allocation Failure   No GC
>  8.54   0.00  55.14  91.62  65.89  69389 3076.420 16567 1579.786 4656.205 
> Allocation Failure   No GC
>  0.00   7.74  12.42  91.66  65.89  69390 3076.456 16567 1579.786 4656.242 
> Allocation Failure   No GC
>  9.60   0.00  11.00  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 CMS 
> Final Remark     No GC
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
> LGCC                 GCC
>  9.60   0.00  69.24  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 CMS 
> Final Remark     No GC
>  0.00   8.70  18.21  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 CMS 
> Initial Mark     No GC
>  0.00   8.70  61.92  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 CMS 
> Initial Mark     No GC
>  7.36   0.00   3.49  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 
> Allocation Failure   No GC
>  7.36   0.00  42.03  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 
> Allocation Failure   No GC
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
> LGCC                 GCC
>  0.00   9.77   0.00  91.80  65.89  69394 3076.604 16569 1579.870 4656.475 
> Allocation Failure   No GC
>  9.08   0.00   9.92  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 CMS 
> Final Remark     No GC
>  9.08   0.00  58.90  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 CMS 
> Final Remark     No GC
>  0.00   8.44  16.20  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 CMS 
> Initial Mark     No GC
>  0.00   8.44  71.95  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 CMS 
> Initial Mark     No GC
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    
> LGCC                 GCC
>  8.11   0.00  30.59  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 
> Allocation Failure   No GC
>  8.11   0.00  93.41  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 
> Allocation Failure   No GC
>  0.00   9.77  57.34  91.96  65.89  69398 3076.724 16571 1579.930 4656.654 
> Allocation Failure   No GC
> 
> Full gc seems can’t free any garbage any more (Or the garbage produced is as 
> fast as gc freed?)
> On the other hand, another replication of the collection on another 
> server(the collection has two replications)
> uses 40% of old generation memory, and doesn’t trigger so many full gc.
> 
> 
> Following is the output of eclipse MAT leak suspects:
> 
>  Problem Suspect 1
> 
> 4,741 instances of "org.apache.lucene.index.SegmentCoreReaders", loaded by 
> "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 
> 3,743,067,520 (64.12%) bytes. These instances are referenced from one 
> instance of "java.lang.Object[]", loaded by "<system class loader>"
> 
> Keywords
> java.lang.Object[]
> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
> org.apache.lucene.index.SegmentCoreReaders
> 
> Details »
>  Problem Suspect 2
> 
> 2,815 instances of "org.apache.lucene.index.StandardDirectoryReader", loaded 
> by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 
> 970,614,912 (16.63%) bytes. These instances are referenced from one instance 
> of "java.lang.Object[]", loaded by "<system class loader>"
> 
> Keywords
> java.lang.Object[]
> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
> org.apache.lucene.index.StandardDirectoryReader
> 
> Details »
> 
> 
> 
> Class structure in above “Details":
> 
> java.lang.Thread @XXX
>    <Java Local> java.util.ArrayList @XXXX
>        elementData java.lang.Object[3141] @XXXX
>            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>            …
> a lot of org.apache.lucene.search.FieldCache$CacheEntry (1205 in Suspect 1, 
> 2785 in Suspect 2)
> 
> Does these lots of org.apache.lucene.search.FieldCache$CacheEntry normal?
> 
> Thanks.
> 
> 
> 
> 
>> 在 2015年12月16日，00:44，Erick Erickson <erickerick...@gmail.com> 写道：
>> 
>> Rahul's comments were spot on. You can gain more confidence that this
>> is normal if if you try attaching a memory reporting program (jconsole
>> is one) you'll see the memory grow for quite a while, then garbage
>> collection kicks in and you'll see it drop in a sawtooth pattern.
>> 
>> Best,
>> Erick
>> 
>> On Tue, Dec 15, 2015 at 8:19 AM, zhenglingyun <konghuaru...@163.com> wrote:
>>> Thank you very much.
>>> I will try reduce the heap memory and check if the memory still keep 
>>> increasing or not.
>>> 
>>>> 在 2015年12月15日，19:37，Rahul Ramesh <rr.ii...@gmail.com> 写道：
>>>> 
>>>> You should actually decrease solr heap size. Let me explain a bit.
>>>> 
>>>> Solr requires very less heap memory for its operation and more memory for
>>>> storing data in main memory. This is because solr uses mmap for storing the
>>>> index files.
>>>> Please check the link
>>>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for
>>>> understanding how solr operates on files .
>>>> 
>>>> Solr has typical problem of Garbage collection once you the heap size to a
>>>> large value. It will have indeterminate pauses due to GC. The amount of
>>>> heap memory required is difficult to tell. However the way we tuned this
>>>> parameter is setting it to a low value and increasing it by 1Gb whenever
>>>> OOM is thrown.
>>>> 
>>>> Please check the problem of having large Java Heap
>>>> 
>>>> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>>>> 
>>>> 
>>>> Just for your reference, in our production setup, we have data of around
>>>> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
>>>> the rest of the memory we will leave it to OS to manage. We do around 1000
>>>> (search + Insert)/second on the data.
>>>> 
>>>> I hope this helps.
>>>> 
>>>> Regards,
>>>> Rahul
>>>> 
>>>> 
>>>> 
>>>> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <konghuaru...@163.com> wrote:
>>>> 
>>>>> Hi, list
>>>>> 
>>>>> I’m new to solr. Recently I encounter a “memory leak” problem with
>>>>> solrcloud.
>>>>> 
>>>>> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
>>>>> have
>>>>> one collection with about 400k docs. The index size of the collection is
>>>>> about
>>>>> 500MB. Memory for solr is 16GB.
>>>>> 
>>>>> Following is "ps aux | grep solr” :
>>>>> 
>>>>> /usr/java/jdk1.7.0_67-cloudera/bin/java
>>>>> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
>>>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>>> -Xloggc:/var/log/solr/gc.log
>>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh 
>>>>> -DzkHost=
>>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>>> -Xloggc:/var/log/solr/gc.log
>>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh 
>>>>> -DzkHost=
>>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>>> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
>>>>> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
>>>>> -Dcatalina.base=/var/lib/solr/tomcat-deployment
>>>>> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
>>>>> org.apache.catalina.startup.Bootstrap start
>>>>> 
>>>>> 
>>>>> solr version is solr4.4.0-cdh5.3.0
>>>>> jdk version is 1.7.0_67
>>>>> 
>>>>> Soft commit time is 1.5s. And we have real time indexing/partialupdating
>>>>> rate about 100 docs per second.
>>>>> 
>>>>> When fresh started, Solr will use about 500M memory(the memory show in
>>>>> solr ui panel).
>>>>> After several days running, Solr will meet with long time gc problems, and
>>>>> no response to user query.
>>>>> 
>>>>> During solr running, the memory used by solr is keep increasing until some
>>>>> large value, and decrease to
>>>>> a low level(because of gc), and keep increasing until a larger value
>>>>> again, then decrease to a low level again … and keep
>>>>> increasing to an more larger value … until solr has no response and i
>>>>> restart it.
>>>>> 
>>>>> 
>>>>> I don’t know how to solve this problem. Can you give me some advices?
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
> 
>

Re: solrcloud used a lot of memory and memory keep increasing during long time run

Reply via email to