On Thu, 2020-07-02 at 11:16 +0000, Kommu, Vinodh K. wrote: > We are performing QA performance testing on couple of collections > which holds 2 billion and 3.5 billion docs respectively.
How many shards? > 1. Our performance team noticed that read operations are pretty > more than write operations like 100:1 ratio, is this expected during > indexing or solr nodes are doing any other operations like syncing? Are you saying that there are 100 times more read operations when you are indexing? That does not sound too unrealistic as the disk cache might be filled with the data that the writers are flushing. In that case, more RAM would help. Okay, more RAM nearly always helps, but such massive difference in IO-utilization does indicate that you are starved for cache. I noticed you have at least 18 replicas. That's a lot. Just to sanity check: How many replicas are each physical box handling? If they are sharing resources, fewer replicas would probably be better. > 3. Our client timeout is set to 2mins, can they increase further > more? Would that help or create any other problems? It does not hurt the server to increase the client timeout as the initiated query will keep running until it is finished, independent of whether or not there is a client to receive the result. If you want a better max time for query processing, you should look at https://lucene.apache.org/solr/guide/7_7/common-query-parameters.html#timeallowed-parameter but due to its inherent limitations it might not help in your situation. > 4. When we created an empty collection and loaded same data file, > it loaded fine without any issues so having more documents in a > collection would create such problems? Solr 7 does have a problem with sparse DocValues and many documents, leading to excessive IO-activity, which might be what you are seeing. I can see from an earlier post that you were using streaming expressions for another collection: This is one of the things that are affected by the Solr 7 DocValues issue. More info about DocValues and streaming: https://issues.apache.org/jira/browse/SOLR-13013 Fairly in-depth info on the problem with Solr 7 docValues: https://issues.apache.org/jira/browse/LUCENE-8374 If this is your problem, upgrading to Solr 8 and indexing the collection from scratch should fix it. Alternatively you can port the LUCENE-8374-patch from Solr 7.3 to 7.7 or you can ensure that there are values defined for all DocValues- fields in all your documents. > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) ... > Remote error message: java.util.concurrent.TimeoutException: Idle > timeout expired: 600000/600000 ms There is a default timeout of 10 minutes (distribUpdateSoTimeout?). You should be able to change it in solr.xml. https://lucene.apache.org/solr/guide/8_5/format-of-solr-xml.html BUT if an update takes > 10 minutes to be processed, it indicates that the cluster is overloaded. Increasing the timeout is just a band-aid. - Toke Eskildsen, Royal Danish Library