madrob commented on a change in pull request #2318: URL: https://github.com/apache/lucene-solr/pull/2318#discussion_r573957903
########## File path: solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java ########## @@ -213,7 +218,16 @@ public void call(ClusterState clusterState, ZkNodeProps message, @SuppressWarnin ZkStateReader.NODE_NAME_PROP, nodeName, ZkStateReader.REPLICA_TYPE, replicaPosition.type.name(), CommonAdminParams.WAIT_FOR_FINAL_STATE, Boolean.toString(waitForFinalState)); - ocmh.overseer.offerStateUpdate(Utils.toJSON(props)); + if(isPrs) { + ZkWriteCommand command = new SliceMutator(ocmh.cloudManager).addReplica(clusterState, props); + byte[] data = Utils.toJSON(Collections.singletonMap(collectionName, command.collection)); +// log.info("collection updated : {}", new String(data, StandardCharsets.UTF_8)); + ocmh.zkStateReader.getZkClient().setData(collectionPath, data, true); Review comment: We're doing this for each replica individually, which turns into a lot more writes. Previously the overseer would batch up the updates, I believe, and minimize the number of updates to state.json. Can we do a ZK multi-operation here instead of lots of small writes in a tight loop? ########## File path: solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java ########## @@ -41,19 +42,11 @@ import org.apache.solr.cloud.ZkController; import org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.ShardRequestTracker; import org.apache.solr.cloud.overseer.ClusterStateMutator; +import org.apache.solr.cloud.overseer.SliceMutator; +import org.apache.solr.cloud.overseer.ZkWriteCommand; import org.apache.solr.common.SolrException; import org.apache.solr.common.SolrException.ErrorCode; -import org.apache.solr.common.cloud.Aliases; -import org.apache.solr.common.cloud.ClusterState; -import org.apache.solr.common.cloud.DocCollection; -import org.apache.solr.common.cloud.DocRouter; -import org.apache.solr.common.cloud.ImplicitDocRouter; -import org.apache.solr.common.cloud.Replica; -import org.apache.solr.common.cloud.ReplicaPosition; -import org.apache.solr.common.cloud.ZkConfigManager; -import org.apache.solr.common.cloud.ZkNodeProps; -import org.apache.solr.common.cloud.ZkStateReader; -import org.apache.solr.common.cloud.ZooKeeperException; +import org.apache.solr.common.cloud.*; Review comment: please please please configure your IDE to not do this. it's been an issue on other PRs as well. ########## File path: solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java ########## @@ -213,7 +218,16 @@ public void call(ClusterState clusterState, ZkNodeProps message, @SuppressWarnin ZkStateReader.NODE_NAME_PROP, nodeName, ZkStateReader.REPLICA_TYPE, replicaPosition.type.name(), CommonAdminParams.WAIT_FOR_FINAL_STATE, Boolean.toString(waitForFinalState)); - ocmh.overseer.offerStateUpdate(Utils.toJSON(props)); + if(isPrs) { Review comment: This section looks like almost a duplicate of the other `if(isPrs)` block earlier. No concrete reason, but this doesn't seem right. ########## File path: solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java ########## @@ -256,6 +279,18 @@ public void call(ClusterState clusterState, ZkNodeProps message, @SuppressWarnin shardRequestTracker.processResponses(results, shardHandler, false, null, Collections.emptySet()); @SuppressWarnings({"rawtypes"}) boolean failure = results.get("failure") != null && ((SimpleOrderedMap)results.get("failure")).size() > 0; + if(isPrs) { + TimeOut timeout = new TimeOut(Integer.getInteger("solr.waitToSeeReplicasInStateTimeoutSeconds", 120), TimeUnit.SECONDS, timeSource); // could be a big cluster + PerReplicaStates prs = PerReplicaStates.fetch(collectionPath, ocmh.zkStateReader.getZkClient(), null); + while (!timeout.hasTimedOut()) { + if(prs.allActive()) break; + Thread.sleep(100); Review comment: Can we set a watch (maybe on an arbitrary down replica?) instead of doing a loop+sleep? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org