Hi, I am trying to perform Solr Query performance benchmarking and trying to measure the maximum throughput and latency that I can get from.a given Solr cluster.
Following are my configurations Number of Solr Nodes: 4 Number of shards: 2 replication-factor: 2 Index size: 55 GB Shard/Core size: 27.7 GB maxConnsPerHost: 1000 The Solr nodes are VM's with 16 core vCpu and 112GB RAM. The CPU is 1-1 and it is not overcommitted. I am generating query load using a Java client program which fires Solr queries read from a static file. The client java program is using the Apache Http Client library to invoke the queries. I have already configured the client to create 300 max connections. The type of queries are mostly of the below pattern q=*:*&fl=orderNo,purchaseOrderNos,timestamp,eventName,eventID,_src_&fq=((orderNo:<orderNoValue>+AND+purchaseOrderNos:<purchaseOrderNoValue)+OR+(+orderNo:<orderNoValue)+OR+(+purchaseOrderNos:<purchaseOrderNoValue>))&sort=eventTimestamp+desc&rows=20&wt=javabin&version=2 Max throughput that I get: 12000 to 12500 reqs/sec 95 percentile query latency: 30 to 40 msec I am measuring the latency and throughput on the client side in my program. The max throughput that I am able to get (sum of each individual clients throughput) is 12000 reqs/sec. I am running with 4 clients each with 50 threads. Even if I increase the number of clients, the throughput still seems to be the same. It seems like I am hitting the maximum capacity of the cluster or some other limit due to which I am not able to put more stress on the server. My CPU is hitting 60% to 70%. I have not been able to increase the CPU usage more than this even when increasing client threads or generating load with more client nodes. The memory used is around 16% on all the nodes except on one node I am seeing the memory used is 41%. There is hardly any IO happening as it is a read test. I am wondering what is limiting my throughput, is there some internal thread pool limit that I am hitting due to which I am not able to increase my CPU/memory usage? My JVM settings are provided below. I am using G1GC and -DSTOP.KEY=solrrocks -DSTOP.PORT=7983 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=13001 -Dcom.sun.management.jmxremote.rmi.port=13001 -Dcom.sun.management.jmxremote.ssl=false -Djetty.home=/app/solr6/server -Djetty.port=8983 -Dlog4j.configuration=file:<log4j.properties file> -Dsolr.autoSoftCommit.maxTime=5000 -Dsolr.autoSoftCommit.minTime=5000 -Dsolr.install.dir=/app/solr6 -Dsolr.log.dir=/app/solrdata6/logs -Dsolr.log.muteconsole -Dsolr.solr.home=<solr home dir> -Duser.timezone=UTC -DzkClientTimeout=15000 -DzkHost=<ZkHostName> -XX:+AlwaysPreTouch -XX:+ResizeTLAB -XX:+UseG1GC -XX:+UseGCLogFileRotation -XX:+UseLargePages -XX:+UseTLAB -XX:-UseBiasedLocking -XX:GCLogFileSize=20M -XX:MaxGCPauseMillis=50 -XX:NumberOfGCLogFiles=9 -XX:OnOutOfMemoryError=/app/solr6/bin/oom_solr.sh -Xloggc:<solr gc log file> -Xms11g -Xmx11g -Xss256k -verbose:gc I have not customized the Solr Cache values. The DocumentCache, QueryResultCache, FieldValueCache everything is using default values. I read in one of the SolrPerformance documents that it is better to leave more memory to the Operating system and utilize the OS buffer cache. Is it the best query throughput that I can extract from this sized cluster and index size combination? Any ideas is highly appreciated. Thanks Suresh