On 12/3/2016 9:46 PM, S G wrote: > The symptom we see is that the java clients querying Solr see response > times in 10s of seconds (not milliseconds). <snip> > Some numbers for the Solr Cloud: > > *Overall infrastructure:* > - Only one collection > - 16 VMs used > - 8 shards (1 leader and 1 replica per shard - each core on separate VM) > > *Overview from one core:* > - Num Docs:193,623,388 > - Max Doc:230,577,696 > - Heap Memory Usage:231,217,880 > - Deleted Docs:36,954,308 > - Version:2,357,757 > - Segment Count:37
The heap memory usage number isn't useful. It doesn't cover all the memory used. > *Stats from QueryHandler/select* > - requests:78,557 > - errors:358 > - timeouts:0 > - totalTime:1,639,975.27 > - avgRequestsPerSecond:2.62 > - 5minRateReqsPerSecond:1.39 > - 15minRateReqsPerSecond:1.64 > - avgTimePerRequest:20.87 > - medianRequestTime:0.70 > - 75thPcRequestTime:1.11 > - 95thPcRequestTime:191.76 These times are in *milliseconds*, not seconds .. and these are even better numbers than you showed before. Where are you seeing 10 plus second query times? Solr is not showing numbers like that. If your VM host has 16 VMs on it and each one has a total memory size of 92GB, then if that machine doesn't have 1.5 terabytes of memory, you're oversubscribed, and this is going to lead to terrible performance... but the numbers you've shown here do not show terrible performance. > Plus, on every server, we are seeing lots of exceptions. > For example: > > Between 8:06:55 PM and 8:21:36 PM, exceptions are: > > 1) Request says it is coming from leader, but we are the leader: > update.distrib=FROMLEADER&distrib.from=HOSTB_ca_1_1456430020/&wt=javabin&version=2 > > 2) org.apache.solr.common.SolrException: Request says it is coming from > leader, but we are the leader > > 3) org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: Tried one server for read > operation and it timed out, so failing fast > > 4) null:org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: Tried one server for read > operation and it timed out, so failing fast > > 5) org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: Tried one server for read > operation and it timed out, so failing fast > > 6) null:org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: Tried one server for read > operation and it timed out, so failing fast > > 7) org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: No live SolrServers > available to handle this request. Zombie server list: > [HOSTA_ca_1_1456429897] > > 8) null:org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: No live SolrServers > available to handle this request. Zombie server list: > [HOSTA_ca_1_1456429897] > > 9) org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: Tried one server for read > operation and it timed out, so failing fast > > 10) null:org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: Tried one server for read > operation and it timed out, so failing fast > > 11) org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: Tried one server for read > operation and it timed out, so failing fast > > 12) null:org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: Tried one server for read > operation and it timed out, so failing fast These errors sound like timeouts, possibly caused by long GC pauses ... but as already mentioned, the query handler statistics do not indicate long query times. If a long GC were to happen during a query, then the query time would be long as well. The core information above doesn't include the size of the index on disk. That number would be useful for telling you whether there's enough memory. As I said at the beginning of the thread, I haven't seen anything here to indicate a memory leak, and others are using version 4.10 without any problems. If there were a memory leak in a released version of Solr, many people would have run into problems with it. Thanks, Shawn