Getting rid of group.truncate=true group.facet=true group=true group.field=edition group.limit=30 group.ngroups=true group.format=grouped
makes the solr behave again under the normal load but of course the results are a bit messed up with a kind of duplicates (the point of grouping is callapsing product variants into one superproduct) Cheers Marek > I'm attaching the gc log, looks ok at the beginning and then within 10 > minutes starts stopping everything for 20 sec or so. > (sorry for pasting in, wouldn't get through as an attachment) > > Cheers > Marek > > Hi, > thanks for the quick response. > We have meanwhile tried to remove the group.facet=true from the set of > parameters and couldn't reproduce the problem using the same stress > test, so I think 80% chance this is the root cause. > We have tried solr 6.4.1, same problem occurs. > There is only a very few small number of documents (100 - 200K) index on > the disk only 544Mb. Very low traffic - definitelly less than 10 qps. > No OOM errors. > GC log shows > 2017-03-04T18:40:48.056+0000: 407.215: Total time for which application > threads were stopped: 19.2522394 seconds, Stopping threads took: > 0.0003600 seconds > 2017-03-04T18:40:49.135+0000: 408.294: Total time for which application > threads were stopped: 0.0256060 seconds, Stopping threads took: > 0.0240290 seconds > 2017-03-04T18:40:50.146+0000: 409.305: Total time for which application > threads were stopped: 0.0106780 seconds, Stopping threads took: > 0.0090890 seconds > > 19 seconds is worrying. > > I will try some traces when I'm not under stress test myself. > > Thanks > Marek > > > > >>> The "Unable to write response, client closed connection or we are >>> shutting down" bits mean you're timing out. Or maybe something much >>> more serious. You can up the timeouts, but that's not particularly >>> useful since the response is so long anyway. >>> >>> Before jumping to conclusions, I'd _really_ recommend you figure out >>> the root cause. First set up jmeter or the like so you can create a >>> stress test and reproduce this at will on a test machine. >>> >>> Things I'd check: >>> >>>> At what point do things get slow? 10 QPS? 100 QPS, 1,000 QPS? Let's get a >>>> benchmark here for a reality check. If you're throwing 1,000 QPS at a >>>> single Solr instance that's simply unrealistic. 100 QPS/node is on the >>>> high side of what I'd expect. >>>> how many docs do you have on a node? >>>> look at your Solr logs for any anomalies, particularly OOM errors. >>>> turn on GC logs and see if you're spending an inordinate amount of time in >>>> GC. Note you can get a clue if this is the issue by just increasing the >>>> JVM heap as a quick test. Not conclusive, but if you give the app another >>>> 4G and your timings change radically, problem identified. >>>> That JIRA you pointed to is unlikely to be the real issue since your >>>> performance is OK to start. It's still possible, but.. >>>> attach a profiler to see where the time is being spent. Must be on a test >>>> machine since profilers are generally intrusive. >>>> Grab a couple of stack traces and see if that sheds a clue. >>> I really have to emphasize, though, that until you do a Root Cause >>> Analysis, you're just guessing. Going to 6.4 an using JSON facets is a >>> shot in the dark. >>> >>> Best, >>> Erick >>> >>> >>> >>> On Sat, Mar 4, 2017 at 8:45 AM, Marek Tichy <ma...@gn.apc.org> wrote: >>>> Hi, >>>> >>>> I'm in a bit of a crisis here. Trying to deploy a new search on an >>>> ecommerce website which has been tested (but not stress tested). The >>>> core has been running for years without any performance problems but we >>>> have now changed two things: >>>> >>>> 1) started using group.facet=true in a rather complicated query - see below >>>> >>>> 2) added a new core with suggester component >>>> >>>> Solr version was 5.2, upgraded to 5.5.4 to try, no improvement. >>>> >>>> What happens under real load is the query response times start getting >>>> higher > 10000 and most requests end up like this: >>>> org.apache.solr.servlet.HttpSolrCall; Unable to write response, client >>>> closed connection or we are shutting down >>>> >>>> Could it be this issue https://issues.apache.org/jira/browse/SOLR-4763 >>>> ? And if so, would upgrading to 6.4 help or changing the app to start >>>> using JSON.facet ? >>>> >>>> Any help would be greatly appreciated. >>>> >>>> Thanks >>>> >>>> Marek >>>> >>>> >>>> INFO - 2017-03-04 16:04:42.619; [ x:kcore] >>>> org.apache.solr.core.SolrCore; [kcore] webapp=/solr path=/select >>>> params={f.ebook_formats.facet.mincount=1&f.languageid.facet.limit=10&f.ebook_formats.facet.limit=10&fq=((type:knihy)+OR+(type:defekty))&fq=authorid:(27544)&f.thematicgroupid.facet.mincount=1&group.ngroups=true&group.ngroups=true&f.type.facet.limit=10&group.facet=true&f.articleparts.facet.mincount=1&f.articleparts.facet.limit=10&group.field=edition&group=true&facet.field=categoryid&facet.field={!ex%3Dat}articletypeid_grouped&facet.field={!ex%3Dat}type&facet.field={!ex%3Dsw}showwindow&facet.field={!ex%3Dtema}thematicgroupid&facet.field={!ex%3Dformat}articleparts&facet.field={!ex%3Dformat}ebook_formats&facet.field={!ex%3Dlang}languageid&f.categoryid.facet.mincount=1&group.limit=30&start=0&f.type.facet.mincount=1&f.thematicgroupid.facet.limit=10&sort=score+desc&rows=12&version=2.2&f.languageid.facet.mincount=1&q=&group.truncate=false&group.format=grouped&f.showwindow.facet.mincount=1&f.articletypeid_grouped.facet.mincount=1&f.categoryid.facet.limit=100&f.showwindow.facet.limit=10&f.articletypeid_grouped.facet.limit=10&facet=true} >>>> hits=1 status=0 QTime=19214 >>>> >>>> >>>> >>>> >>>> >>>> >