Hey Maulin, I hope you are using some tools to look at your gc.log file (There are couple available online) or grepping for pauses. Do you mind sharing your G1GC settings and some screenshots from your gc.log analyzer's output ?
-SG On Wed, Jan 31, 2018 at 9:16 AM, Erick Erickson <erickerick...@gmail.com> wrote: > Just to double check, when you san you're seeing 60-200 sec GC pauses > are you looking at the GC logs (or using some kind of monitor) or is > that the time it takes the query to respond to the client? Because a > single GC pause that long on 40G is unusual no matter what. Another > take on Jason's question is > For all the JVMs you're running, how much _total_ heap is allocated? > And how much physical memory is on the box? I generally start with _at > least_ half the memory left to the OS.... > > These are fairly horrible, what generates such queries? > AND * AND * > > Best, > Erick > > > > On Wed, Jan 31, 2018 at 7:28 AM, Jason Gerlowski <gerlowsk...@gmail.com> > wrote: > > Hi Maulin, > > > > To clarify, when you said "...allocated 40 GB RAM to each shard." above, > > I'm going to assume you meant "to each node" instead. If you actually > did > > mean "to each shard" above, please correct me and anyone who chimes in > > afterward. > > > > Firstly, it's really hard to even take guesses about potential causes or > > remediations without more details about your load characteristics > > (average/peak QPS, index size, average document size, etc.). If no one > > gives any satisfactory advice, please consider uploading additional > details > > to help us help you. > > > > Secondly, I don't know anything about the load characteristics you're > > putting on your Solr cluster, but I'm curious whether you've experimented > > with lower RAM settings. Generally speaking, the more RAM you have, the > > longer your GC pauses are likely to be (even with the tuning that various > > GC settings provide). If you can get away with giving the Solr process > > less RAM, you should see your GC pauses shrink. Was 40GB chosen after > some > > trial-and-error experimentation, or is it something you could > investigate? > > > > For a bit more overview on this, see this slightly outdated (but still > > useful) wiki page: https://wiki.apache.org/solr/ > SolrPerformanceProblems#RAM > > > > Hope that helps, even if just to disqualify some potential > causes/solutions > > to close in on a real fix. > > > > Best, > > > > Jason > > > > On Wed, Jan 31, 2018 at 8:17 AM, Maulin Rathod <mrat...@asite.com> > wrote: > > > >> Hi, > >> > >> We are using solr cloud 6.1. We have around 20 collection on 4 nodes (We > >> have 2 shards and each shard have 2 replicas). We have allocated 40 GB > RAM > >> to each shard. > >> > >> Intermittently we found long GC pauses (60 sec to 200 sec) due to which > >> solr stops responding and hence collections goes in recovering mode. It > >> takes minimum 5-10 minutes (sometime it takes more and we have to > restart > >> the solr node) for recovering all collections. We are using default GC > >> setting (CMS) as per solr.cmd. > >> > >> We tried different G1 GC to see if it help, but still we see long GC > >> pauses(60 sec to 200 sec) and also found that memory usage is more in in > >> case G1 GC. > >> > >> What could be reason for long GC pauses and how can fix it? Insufficient > >> memory or problem with GC setting or something else? Any suggestion > would > >> be greatly appreciated. > >> > >> In our analysis, we also found some inefficient queries (which uses * > many > >> times in query) in solr logs. Could it be reason for high memory usage? > >> > >> Slow Query > >> -------------- > >> > >> INFO (qtp1239731077-498778) [c:documents s:shard1 r:core_node1 > >> x:documents] o.a.s.c.S.Request [documents] webapp=/solr path=/select > >> params={df=summary&distrib=false&fl=id&shards.purpose=4& > >> start=0&fsv=true&sort=description+asc,id+desc&fq=&shard.url= > >> s1.asite.com:8983/solr/documents|s1r1.asite.com: > >> 8983/solr/documents&rows=250&version=2&q=((id:( > >> REV78364_24705418+REV78364_24471492+REV78364_24471429+ > >> REV78364_24470771+REV78364_24470271+))+OR+summary:((HPC*+ > >> AND+*+AND+*+AND+OH1150*+AND+*+AND+*+AND+U0*+AND+*+AND+*+AND+ > >> HGS*+AND+*+AND+*+AND+MDL*+AND+*+AND+*+AND+100067*+AND+*+AND+ > >> -*+AND+Reinforcement*+AND+*+AND+Mode*)+))++AND++(title:((* > >> HPC\+\-\+OH1150\+\-\+U0\+\-\+HGS\+\-\+MDL\+\-\+100067\+-\+ > >> Reinforcement\+Mode*)+))+AND+project_id:(-2+78243+78365+ > >> 78364)+AND+is_active:true+AND+((isLatest:(true)+AND+ > >> isFolderActive:true+AND+isXref:false+AND+-document_ > >> type_id:(3+7)+AND+((is_public:true+OR+distribution_list: > >> 4858120+OR+folderadmin_list:4858120+OR+author_user_id: > >> 4858120)+AND+((defaultAccess:(true)+OR+allowedUsers:( > >> 4858120)+OR+allowedRoles:(6342201+172408+6336860)+OR+ > >> combinationUsers:(4858120))+AND+-blockedUsers:(4858120)))) > >> +OR+(isLatestRevPrivate:(true)+AND+allowedUsersForPvtRev:( > >> 4858120)+AND+-folderadmin_list:(4858120)))&shards.tolerant=true&NOW= > >> 1516786982952&isShard=true&wt=javabin} hits=0 status=0 QTime=83309 > >> > >> > >> > >> > >> Regards, > >> > >> Maulin > >> > >> [CC Award Winners!] > >> > >> >