For the sessionexpiredexception, the solr is throwing this exception and then the shard goes down.
>From the following discussion, it seems to be that the solr is loosing connection to zookeeper and throws the exception. In the zoo keeper configuration file, zoo.cfg, is it safe to increase the synclimit shown in below snippet. # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/sanfs/mnt/vol01/solr/zookeeperdata/2 # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 Thanks, Satya On Mon, May 8, 2017 at 12:04 PM Satya Marivada <satya.chaita...@gmail.com> wrote: > The 3g memory is doing well, performing a gc at 600-700 MB. > > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC > > Here are my jvm start up > > The start up parameters are: > > java -server -Xms3g -Xmx3g -XX:NewRatio=3 -XX:SurvivorRatio=4 > -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 > -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark > -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 > -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled > -XX:-OmitStackTraceInFastThrow -verbose:gc -XX:+PrintHeapAtGC > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > -Xloggc:/sanfs/mnt/vol01/solr/solr-6.3.0/server/logs/solr_gc.log > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M > -DzkClientTimeout=15000 ....... > > On Mon, May 8, 2017 at 11:50 AM Walter Underwood <wun...@wunderwood.org> > wrote: > >> Which garbage collector are you using? The default GC will probably give >> long pauses. >> >> You need to use CMS or G1. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >> > On May 8, 2017, at 8:48 AM, Erick Erickson <erickerick...@gmail.com> >> wrote: >> > >> > 3G of memory should not lead to long GC pauses unless you're running >> > very close to the edge of available memory. Paradoxically, running >> > with 6G of memory may lead to _fewer_ noticeable pauses since the >> > background threads can do the work, well, in the background. >> > >> > Best, >> > Erick >> > >> > On Mon, May 8, 2017 at 7:29 AM, Satya Marivada >> > <satya.chaita...@gmail.com> wrote: >> >> Hi Piyush and Shawn, >> >> >> >> May I ask what is the solution for it, if it is the long gc pauses? I >> am >> >> skeptical about the same problem in our case too. We have started with >> 3G >> >> of memory for the heap. >> >> Did you have to adjust some of the memory allotted? Very much >> appreciated. >> >> >> >> Thanks, >> >> Satya >> >> >> >> On Sat, May 6, 2017 at 12:36 PM Piyush Kunal <piyush.ku...@myntra.com> >> >> wrote: >> >> >> >>> We already faced this issue and found out the issue to be long GC >> pauses >> >>> itself on either client side or server side. >> >>> Regards, >> >>> Piyush >> >>> >> >>> On Sat, May 6, 2017 at 6:10 PM, Shawn Heisey <apa...@elyograg.org> >> wrote: >> >>> >> >>>> On 5/3/2017 7:32 AM, Satya Marivada wrote: >> >>>>> I see below exceptions in my logs sometimes. What could be causing >> it? >> >>>>> >> >>>>> org.apache.zookeeper.KeeperException$SessionExpiredException: >> >>>> >> >>>> Based on my limited research, this would tend to indicate that the >> >>>> heartbeats ZK uses to detect when sessions have gone inactive are not >> >>>> occurring in a timely fashion. >> >>>> >> >>>> Common causes seem to be: >> >>>> >> >>>> JVM Garbage collections. These can cause the entire JVM to pause >> for an >> >>>> extended period of time, and this time may exceed the configured >> >>> timeouts. >> >>>> >> >>>> Excess client connections to ZK. ZK limits the number of connections >> >>>> from each client address, with the idea of preventing denial of >> service >> >>>> attacks. If a client is misbehaving, it may make more connections >> than >> >>>> it should. You can try increasing the limit in the ZK config, but if >> >>>> this is the reason for the exception, then something's probably >> wrong, >> >>>> and you may be just hiding the real problem. >> >>>> >> >>>> Although we might have bugs causing the second situation, the first >> >>>> situation seems more likely. >> >>>> >> >>>> Thanks, >> >>>> Shawn >> >>>> >> >>>> >> >>> >> >>