Hi Shawn, yes i am running solr in cloud mode and Even after adding the params row=0 and distrib=false, the query response is more than 15 sec due to more than a billion doc set. Also the soft commit setting can not be changed to a higher no. due to requirement from business team.
http://hostname:8983/solr/parts/select?indent=on&q=*:*&rows=0&wt=json&distrib=false takes more than 10 sec always. Here are the java heap and G1GC setting i have , /usr/java/default/bin/java -server -Xmx31g -Xms31g -XX:+UseG1GC -XX:MaxGCPauseMillis=250 -XX:ConcGCThreads=5 -XX:ParallelGCThreads=10 -XX:+UseLargePages -XX:+AggressiveOpts -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:InitiatingHeapOccupancyPercent=50 -XX:G1ReservePercent=18 -XX:MaxNewSize=6G -XX:PrintFLSStatistics=1 -XX:+PrintPromotionFailure -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/solr7/logs/heapdump -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime JVM heap has never crossed 20GB in my setup , also Young G1GC timing is well within milli seconds (in range of 25-200 ms). On Mon, Aug 5, 2019 at 6:37 PM Shawn Heisey <apa...@elyograg.org> wrote: > On 8/4/2019 10:15 PM, dinesh naik wrote: > > My question is regarding the custom query being used. Here i am querying > > for field _root_ which is available in all of my cluster and defined as a > > string field. The result for _root_:abc might not get me any match as > > well(i am ok with not finding any matches, the query should not be taking > > 10-15 seconds for getting the response). > > Typically the *:* query is the fastest option. It is special syntax > that means "all documents" and it usually executes very quickly. It > will be faster than querying for a value in a specific field, which is > what you have defined currently. > > I will typically add a "rows" parameter to the ping handler with a value > of 1, so Solr will not be retrieving a large amount of data. If you are > running Solr in cloud mode, you should experiment with setting the > distrib parameter to false, which will hopefully limit the query to the > receiving node only. > > Erick has already mentioned GC pauses as a potential problem. With a > 10-15 second response time, I think that has high potential to be the > underlying cause. > > The response you included at the beginning of the thread indicates there > are 1.3 billion documents, which is going to require a fair amount of > heap memory. If seeing such long ping times with a *:* query is > something that happens frequently, your heap may be too small, which > will cause frequent full garbage collections. > > The very low autoSoftCommit time can contribute to system load. I think > it's very likely, especially with such a large index, that in many cases > those automatic commits are taking far longer than 5 seconds to > complete. If that's the case, you're not achieving a 5 second > visibility interval and you are putting a lot of load on Solr, so I > would consider increasing it. > > Thanks, > Shawn > -- Best Regards, Dinesh Naik