Hi Vladim, the thing is, that those exact same queries, that take longer during a load test, perform just fine when executed at a slower request rate and are also random, i.e. there is no pattern in bad/slow queries.
My first thought was some kind of contention and/or connection starvation for the internal shard communication? Fred. Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann: > Hi Fred, > analyze the queries which take longer. > We observe our queries and see the problems with q-time with queries which > are complex, with phrase queries or queries which contains numbers or > special characters. > if you don't know it: > http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance > Regards > Vadim > > > 2011/9/28 Frederik Kraus <frederik.kr...@gmail.com > (mailto:frederik.kr...@gmail.com)> > > > Hi, > > > > > > I am experiencing a strange issue doing some load tests. Our setup: > > > > - 2 server with each 24 cpu cores, 130GB of RAM > > - 10 shards per server (needed for response times) running in a single > > tomcat instance > > - each query queries all 20 shards (distributed search) > > > > - each shard holds about 1.5 mio documents (small shards are needed due to > > rather complex queries) > > - all caches are warmed / high cache hit rates (99%) etc. > > > > > > Now for some reason we cannot seem to fully utilize all CPU power (no disk > > IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, > > decreases throughput and increases the response times of the individual > > queries. > > > > Also 1-2% of the queries take significantly longer: avg somewhere at 100ms > > while 1-2% take 1.5s or longer. > > > > Any ideas are greatly appreciated :) > > > > Fred.