Varun Thacker created SOLR-14909: ------------------------------------ Summary: Add replica is very slow on a large cluster Key: SOLR-14909 URL: https://issues.apache.org/jira/browse/SOLR-14909 Project: Solr Issue Type: Task Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 7.6 Reporter: Varun Thacker
We create ~100 collections every day for new incoming data We first issue a create-collection request for all the collections (4 shards and createNodeSet=empty). This would create collections with no replicas We then issue async add-replica calls for all the shards creating 1 replica each. 100 collection X 4 shards = 400 add-replica calls. All the add replica calls pass the node parameter telling Solr where the replica should be created The cluster has 190 nodes currently and when we upgraded to Solr 7.7.3 we noticed that the add replicas took 2 hours and 45 mins to complete! Clearly something was wrong as the same cluster previously running Solr 7.3.1 was taking a few mins only. A thread dump of the overseer showed a 100 threads stuck here ( Why 100? That's the Solr default thread pool size set by MAX_PARALLEL_TASKS in OverseerTaskProcessor ) {code:java} "OverseerThreadFactory-13-thread-1226-processing-n:10.128.18.69:8983_solr" #11163 prio=5 os_prio=0 cpu=0.69ms elapsed=987.97s tid=0x00007f01f8051000 nid=0xd7a waiting for monitor entry [0x00007f01c1121000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Object.wait(java.base@11.0.5/Native Method) - waiting on <no object reference available> at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper$SessionRef.get(PolicyHelper.java:449) - waiting to re-lock in wait() <0x00000007259e6a98> (a java.lang.Object) at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getSession(PolicyHelper.java:493) at org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper.getReplicaLocations(PolicyHelper.java:121) at org.apache.solr.cloud.api.collections.Assign.getPositionsUsingPolicy(Assign.java:382) at org.apache.solr.cloud.api.collections.Assign$PolicyBasedAssignStrategy.assign(Assign.java:630) at org.apache.solr.cloud.api.collections.Assign.getNodesForNewReplicas(Assign.java:368) at org.apache.solr.cloud.api.collections.AddReplicaCmd.buildReplicaPositions(AddReplicaCmd.java:360) at org.apache.solr.cloud.api.collections.AddReplicaCmd.addReplica(AddReplicaCmd.java:146) at org.apache.solr.cloud.api.collections.AddReplicaCmd.call(AddReplicaCmd.java:91) at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:294) {code} It's strange because each add-replica API call would create a single replica and specify which node is must be created on. Assign.getNodesForNewReplicas is where the slowdown was and we noticed SKIP_NODE_ASSIGNMENT flag ( https://github.com/apache/lucene-solr/commit/17cb1b17172926d0d9aed3dfd3b9adb90cf65e0f#diff-ee29887eff6e474e58fcf3c02077f179R355 ) that the overseer reads could have skipped the method from being called. So we started passing SKIP_NODE_ASSIGNMENT=true and still no luck! The replicas took just as long to create. It turned out that the Collections Handler wasn't passing the SKIP_NODE_ASSIGNMENT parameter to the overseer. The add replica call only passes a specific set of params to the overseer https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.7.3/solr/core/src/java/org/apache/solr/handler/admin/CollectionsHandler.java#L823 . We changed this to also pass SKIP_NODE_ASSIGNMENT. Now when we try to create the replicas it takes 4 minutes approximately vs 2 hours 45 mins that it was taking previosuly. Only master respects that param to the overseer ( https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/CollectionsHandler.java#L938 ) . However it doesn't matter in master because the autoscaling framework is gone ( https://github.com/apache/lucene-solr/commit/cc0c111/ ) I believe this will be seen in all versions since Solr 7.6 ( https://issues.apache.org/jira/browse/SOLR-12739 ) through every 8.x release Lastly, I manually tried to add a replica with and without the flag. Without the flag it took 20 second and with the flag 2 seconds. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org