Re: Some performance questions....

Shawn Heisey Sun, 25 Mar 2018 09:16:44 -0700

On 3/25/2018 7:15 AM, Deepak Goel wrote:

$ Why is the 'qps' not increasing with increase in threads? (If I
understand the qps parameter right?)

Likely because I sent all these queries to a single copy of the index. We only have two copies of the index in production, plus a third copy ona dev server running a newer version of Solr. I sent the queries fromthe test program to the production server pair that's designated"standby" -- not receiving queries unless the other pair is down.

Our Solr servers do not handle a high query load. It's usually lessthan two queries per second.

Handling a very high query load requires load balancing to multiplecopies of the index (replicas in SolrCloud terminology). We don't needthat, so we don't have a bunch of copies. The only reason we have twocopies is so we can handle hardware failure gracefully. I bypassed theload balancer for these tests.

$ Is it possible to run with 10 & 5 & 2 threads?


Sure.

I have updated the gist with those results.

https://gist.github.com/elyograg/abedf4ae28467059e46781f7d474f379

$ What were the server utilisation (CPU, Memory) when you ran the test?

I actually never looked when I was running the tests before. I ranadditional tests so I could gather that data. The updated gist hasvmstat information (while running a 20 thread test, and while running a200 thread test) for the server side. The server named idxa1 has ahigher CPU load because it is aggregating the shard data and replying tothe query, in addition to serving three out of the seven shards. Theserver named idxa2 has four shards. The extra shard on idxa2 is verysmall - a little over 321000 docs, a little over 500MB disk used. Thisis where new docs are written.

The CPU load on idxa2 is similar for both thread levels. I this isbecause all queries are served from cache. But idxa1 shows a higherload, because even when the cache is used, that server must stillaggregate the shard data (which was pulled from cache) and createresponses. The aggregation is not cached, because Solr has no way toknow that what it is receiving from the shards is cached data.

Here's the benchmark output from the 200 thread test when I was gettingthe CPU information:


query count: 200000
elapsed count: 200000
query median: 488.0
elapsed median: 500.0
query 75th: 674.0
elapsed 75th: 686.0
query 95th: 1006.0
elapsed 95th: 1018.0
query 99th: 1283.01
elapsed 99th: 1299.0
total time in seconds: 542
numThreads: 200
queries per thread: 1000
qps: 369

$ The 'query median' increases from 35 to 470 as you increase threads from
20 to 200 (You had mentioned earlier that QTime for Banjo query was 11 when
you had hit it the second time around)

When I got 11 ms, that was doing *one* query. This program does a lotof them, so I'm not surprised by the increase. I did the one-offqueries on the dev server, not the standby production servers thatreceived the load test. The hardware specs are similar, except that indev, the entire index is on one server running Solr 6.6.2. That serveralso contains other indexes not being handled by the production pair Iused for the load test.

$ Can you please give Linux server configuration if possible?

What *exactly* are you looking for here? I've got some informationbelow, but I do not know if it's what you are after.


High level, first server (idxa1):
Dell PowerEdge 2950 III
Two 4-core CPUs.
model name      : Intel(R) Xeon(R) CPU           E5440  @ 2.83GHz
64GB memory
Solr is version 4.7.2, with an 8GB heap
About 140GB of index data
CentOS 6, kernel 2.6.32-431.11.2.el6.centos.plus.x86_64
Oracla java:
java version "1.7.0_72"
Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

Differences on the second server (idxa2):
model name      : Intel(R) Xeon(R) CPU           E5420  @ 2.50GHz
Slightly more (about 500MB) index data.
2.6.32-504.12.2.el6.centos.plus.x86_64.

The whole production index is in the ballpark of 280GB, and containsover 187 million docs. The dev server has more than 188 million docs. I think the reason that the counts are different is because we veryrecently deleted a bunch of data from the database, but skipped theupdate of the Solr index for the deletion. The production indexes havebeen rebuilt since the delete, but the dev index hasn't.

The network between the client running the test and the Solr serversincludes a layer 3 switch, some layer 2 switches, and a firewall. Allnetwork hardware is made by Cisco. The entire path (including thefirewall) is gigabit.


Thanks,
Shawn

Re: Some performance questions....

Reply via email to