Just now, I see about 40 "Searchers@XXXX main" displayed in Solr Web UI: collection -> Plugins/Stats -> CORE
I think it’s abnormal! softcommit is set to 1.5s, but warmupTime needs about 3s Does it lead to so many Searchers? maxWarmingSearchers is set to 4 in my solrconfig.xml, doesn’t it will prevent Solr from creating more than 4 Searchers? > 在 2015年12月21日,14:43,zhenglingyun <konghuaru...@163.com> 写道: > > Thanks Erick for pointing out the memory change in a sawtooth pattern. > The problem troubles me is that the bottom point of the sawtooth keeps > increasing. > And when the used capacity of old generation exceeds the threshold set by > CMS’s > CMSInitiatingOccupancyFraction, gc keeps running and uses a lot of CPU cycle > but the used old generation memory does not decrease. > > After I take Rahul’s advice, I decrease the Xms and Xmx from 16G to 8G, and > adjust the parameters of JVM from > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 > -XX:+CMSParallelRemarkEnabled > to > -XX:NewRatio=3 > -XX:SurvivorRatio=4 > -XX:TargetSurvivorRatio=90 > -XX:MaxTenuringThreshold=8 > -XX:+UseConcMarkSweepGC > -XX:+UseParNewGC > -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 > -XX:+CMSScavengeBeforeRemark > -XX:PretenureSizeThreshold=64m > -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=50 > -XX:CMSMaxAbortablePrecleanTime=6000 > -XX:+CMSParallelRemarkEnabled > -XX:+ParallelRefProcEnabled > -XX:-CMSConcurrentMTEnabled > which is taken from bin/solr.in.sh > I hope this can reduce gc pause time and full gc times. > And maybe the memory increasing problem will disappear if I’m lucky. > > After several day's running, the memory on one of my two servers increased to > 90% again… > (When solr is started, the memory used by solr is less than 1G.) > > Following is the output of stat -gccause -h5 <pid> 1000: > > S0 S1 E O P YGC YGCT FGC FGCT GCT > LGCC GCC > 9.56 0.00 8.65 91.31 65.89 69379 3076.096 16563 1579.639 4655.735 > Allocation Failure No GC > 9.56 0.00 51.10 91.31 65.89 69379 3076.096 16563 1579.639 4655.735 > Allocation Failure No GC > 0.00 9.23 10.23 91.35 65.89 69380 3076.135 16563 1579.639 4655.774 > Allocation Failure No GC > 7.90 0.00 9.74 91.39 65.89 69381 3076.165 16564 1579.683 4655.848 CMS > Final Remark No GC > 7.90 0.00 67.45 91.39 65.89 69381 3076.165 16564 1579.683 4655.848 CMS > Final Remark No GC > S0 S1 E O P YGC YGCT FGC FGCT GCT > LGCC GCC > 0.00 7.48 16.18 91.41 65.89 69382 3076.200 16565 1579.707 4655.908 CMS > Initial Mark No GC > 0.00 7.48 73.77 91.41 65.89 69382 3076.200 16565 1579.707 4655.908 CMS > Initial Mark No GC > 8.61 0.00 29.86 91.45 65.89 69383 3076.228 16565 1579.707 4655.936 > Allocation Failure No GC > 8.61 0.00 90.16 91.45 65.89 69383 3076.228 16565 1579.707 4655.936 > Allocation Failure No GC > 0.00 7.46 47.89 91.46 65.89 69384 3076.258 16565 1579.707 4655.966 > Allocation Failure No GC > S0 S1 E O P YGC YGCT FGC FGCT GCT > LGCC GCC > 8.67 0.00 11.98 91.49 65.89 69385 3076.287 16565 1579.707 4655.995 > Allocation Failure No GC > 0.00 11.76 9.24 91.54 65.89 69386 3076.321 16566 1579.759 4656.081 CMS > Final Remark No GC > 0.00 11.76 64.53 91.54 65.89 69386 3076.321 16566 1579.759 4656.081 CMS > Final Remark No GC > 7.25 0.00 20.39 91.57 65.89 69387 3076.358 16567 1579.786 4656.144 CMS > Initial Mark No GC > 7.25 0.00 81.56 91.57 65.89 69387 3076.358 16567 1579.786 4656.144 CMS > Initial Mark No GC > S0 S1 E O P YGC YGCT FGC FGCT GCT > LGCC GCC > 0.00 8.05 34.42 91.60 65.89 69388 3076.391 16567 1579.786 4656.177 > Allocation Failure No GC > 0.00 8.05 84.17 91.60 65.89 69388 3076.391 16567 1579.786 4656.177 > Allocation Failure No GC > 8.54 0.00 55.14 91.62 65.89 69389 3076.420 16567 1579.786 4656.205 > Allocation Failure No GC > 0.00 7.74 12.42 91.66 65.89 69390 3076.456 16567 1579.786 4656.242 > Allocation Failure No GC > 9.60 0.00 11.00 91.70 65.89 69391 3076.492 16568 1579.841 4656.333 CMS > Final Remark No GC > S0 S1 E O P YGC YGCT FGC FGCT GCT > LGCC GCC > 9.60 0.00 69.24 91.70 65.89 69391 3076.492 16568 1579.841 4656.333 CMS > Final Remark No GC > 0.00 8.70 18.21 91.74 65.89 69392 3076.529 16569 1579.870 4656.400 CMS > Initial Mark No GC > 0.00 8.70 61.92 91.74 65.89 69392 3076.529 16569 1579.870 4656.400 CMS > Initial Mark No GC > 7.36 0.00 3.49 91.77 65.89 69393 3076.570 16569 1579.870 4656.440 > Allocation Failure No GC > 7.36 0.00 42.03 91.77 65.89 69393 3076.570 16569 1579.870 4656.440 > Allocation Failure No GC > S0 S1 E O P YGC YGCT FGC FGCT GCT > LGCC GCC > 0.00 9.77 0.00 91.80 65.89 69394 3076.604 16569 1579.870 4656.475 > Allocation Failure No GC > 9.08 0.00 9.92 91.82 65.89 69395 3076.632 16570 1579.913 4656.545 CMS > Final Remark No GC > 9.08 0.00 58.90 91.82 65.89 69395 3076.632 16570 1579.913 4656.545 CMS > Final Remark No GC > 0.00 8.44 16.20 91.86 65.89 69396 3076.664 16571 1579.930 4656.594 CMS > Initial Mark No GC > 0.00 8.44 71.95 91.86 65.89 69396 3076.664 16571 1579.930 4656.594 CMS > Initial Mark No GC > S0 S1 E O P YGC YGCT FGC FGCT GCT > LGCC GCC > 8.11 0.00 30.59 91.90 65.89 69397 3076.694 16571 1579.930 4656.624 > Allocation Failure No GC > 8.11 0.00 93.41 91.90 65.89 69397 3076.694 16571 1579.930 4656.624 > Allocation Failure No GC > 0.00 9.77 57.34 91.96 65.89 69398 3076.724 16571 1579.930 4656.654 > Allocation Failure No GC > > Full gc seems can’t free any garbage any more (Or the garbage produced is as > fast as gc freed?) > On the other hand, another replication of the collection on another > server(the collection has two replications) > uses 40% of old generation memory, and doesn’t trigger so many full gc. > > > Following is the output of eclipse MAT leak suspects: > > Problem Suspect 1 > > 4,741 instances of "org.apache.lucene.index.SegmentCoreReaders", loaded by > "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy > 3,743,067,520 (64.12%) bytes. These instances are referenced from one > instance of "java.lang.Object[]", loaded by "<system class loader>" > > Keywords > java.lang.Object[] > org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978 > org.apache.lucene.index.SegmentCoreReaders > > Details » > Problem Suspect 2 > > 2,815 instances of "org.apache.lucene.index.StandardDirectoryReader", loaded > by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy > 970,614,912 (16.63%) bytes. These instances are referenced from one instance > of "java.lang.Object[]", loaded by "<system class loader>" > > Keywords > java.lang.Object[] > org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978 > org.apache.lucene.index.StandardDirectoryReader > > Details » > > > > Class structure in above “Details": > > java.lang.Thread @XXX > <Java Local> java.util.ArrayList @XXXX > elementData java.lang.Object[3141] @XXXX > org.apache.lucene.search.FieldCache$CacheEntry @XXXX > org.apache.lucene.search.FieldCache$CacheEntry @XXXX > org.apache.lucene.search.FieldCache$CacheEntry @XXXX > … > a lot of org.apache.lucene.search.FieldCache$CacheEntry (1205 in Suspect 1, > 2785 in Suspect 2) > > Does these lots of org.apache.lucene.search.FieldCache$CacheEntry normal? > > Thanks. > > > > >> 在 2015年12月16日,00:44,Erick Erickson <erickerick...@gmail.com> 写道: >> >> Rahul's comments were spot on. You can gain more confidence that this >> is normal if if you try attaching a memory reporting program (jconsole >> is one) you'll see the memory grow for quite a while, then garbage >> collection kicks in and you'll see it drop in a sawtooth pattern. >> >> Best, >> Erick >> >> On Tue, Dec 15, 2015 at 8:19 AM, zhenglingyun <konghuaru...@163.com> wrote: >>> Thank you very much. >>> I will try reduce the heap memory and check if the memory still keep >>> increasing or not. >>> >>>> 在 2015年12月15日,19:37,Rahul Ramesh <rr.ii...@gmail.com> 写道: >>>> >>>> You should actually decrease solr heap size. Let me explain a bit. >>>> >>>> Solr requires very less heap memory for its operation and more memory for >>>> storing data in main memory. This is because solr uses mmap for storing the >>>> index files. >>>> Please check the link >>>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for >>>> understanding how solr operates on files . >>>> >>>> Solr has typical problem of Garbage collection once you the heap size to a >>>> large value. It will have indeterminate pauses due to GC. The amount of >>>> heap memory required is difficult to tell. However the way we tuned this >>>> parameter is setting it to a low value and increasing it by 1Gb whenever >>>> OOM is thrown. >>>> >>>> Please check the problem of having large Java Heap >>>> >>>> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap >>>> >>>> >>>> Just for your reference, in our production setup, we have data of around >>>> 60Gb/node spread across 25 collections. We have configured 8GB as heap and >>>> the rest of the memory we will leave it to OS to manage. We do around 1000 >>>> (search + Insert)/second on the data. >>>> >>>> I hope this helps. >>>> >>>> Regards, >>>> Rahul >>>> >>>> >>>> >>>> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <konghuaru...@163.com> wrote: >>>> >>>>> Hi, list >>>>> >>>>> I’m new to solr. Recently I encounter a “memory leak” problem with >>>>> solrcloud. >>>>> >>>>> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I >>>>> have >>>>> one collection with about 400k docs. The index size of the collection is >>>>> about >>>>> 500MB. Memory for solr is 16GB. >>>>> >>>>> Following is "ps aux | grep solr” : >>>>> >>>>> /usr/java/jdk1.7.0_67-cloudera/bin/java >>>>> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties >>>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager >>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true >>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true >>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384 >>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264 >>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC >>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled >>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled >>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC >>>>> -Xloggc:/var/log/solr/gc.log >>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh >>>>> -DzkHost= >>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181, >>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181, >>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr >>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr >>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf >>>>> -Dsolr.authentication.simple.anonymous.allowed=true >>>>> -Dsolr.security.proxyuser.hue.hosts=* >>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost= >>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host= >>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983 >>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties >>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984 >>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr >>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true >>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true >>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384 >>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264 >>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC >>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled >>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled >>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC >>>>> -Xloggc:/var/log/solr/gc.log >>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh >>>>> -DzkHost= >>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181, >>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181, >>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr >>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr >>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf >>>>> -Dsolr.authentication.simple.anonymous.allowed=true >>>>> -Dsolr.security.proxyuser.hue.hosts=* >>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost= >>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host= >>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983 >>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties >>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984 >>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr >>>>> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath >>>>> /usr/lib/bigtop-tomcat/bin/bootstrap.jar >>>>> -Dcatalina.base=/var/lib/solr/tomcat-deployment >>>>> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/ >>>>> org.apache.catalina.startup.Bootstrap start >>>>> >>>>> >>>>> solr version is solr4.4.0-cdh5.3.0 >>>>> jdk version is 1.7.0_67 >>>>> >>>>> Soft commit time is 1.5s. And we have real time indexing/partialupdating >>>>> rate about 100 docs per second. >>>>> >>>>> When fresh started, Solr will use about 500M memory(the memory show in >>>>> solr ui panel). >>>>> After several days running, Solr will meet with long time gc problems, and >>>>> no response to user query. >>>>> >>>>> During solr running, the memory used by solr is keep increasing until some >>>>> large value, and decrease to >>>>> a low level(because of gc), and keep increasing until a larger value >>>>> again, then decrease to a low level again … and keep >>>>> increasing to an more larger value … until solr has no response and i >>>>> restart it. >>>>> >>>>> >>>>> I don’t know how to solve this problem. Can you give me some advices? >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>> >>> >>> > >