madrob commented on a change in pull request #2318:
URL: https://github.com/apache/lucene-solr/pull/2318#discussion_r573957903



##########
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java
##########
@@ -213,7 +218,16 @@ public void call(ClusterState clusterState, ZkNodeProps 
message, @SuppressWarnin
             ZkStateReader.NODE_NAME_PROP, nodeName,
             ZkStateReader.REPLICA_TYPE, replicaPosition.type.name(),
             CommonAdminParams.WAIT_FOR_FINAL_STATE, 
Boolean.toString(waitForFinalState));
-        ocmh.overseer.offerStateUpdate(Utils.toJSON(props));
+        if(isPrs) {
+          ZkWriteCommand command = new 
SliceMutator(ocmh.cloudManager).addReplica(clusterState, props);
+          byte[] data = Utils.toJSON(Collections.singletonMap(collectionName, 
command.collection));
+//        log.info("collection updated : {}", new String(data, 
StandardCharsets.UTF_8));
+          ocmh.zkStateReader.getZkClient().setData(collectionPath, data, true);

Review comment:
       We're doing this for each replica individually, which turns into a lot 
more writes. Previously the overseer would batch up the updates, I believe, and 
minimize the number of updates to state.json. Can we do a ZK multi-operation 
here instead of lots of small writes in a tight loop?

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java
##########
@@ -41,19 +42,11 @@
 import org.apache.solr.cloud.ZkController;
 import 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.ShardRequestTracker;
 import org.apache.solr.cloud.overseer.ClusterStateMutator;
+import org.apache.solr.cloud.overseer.SliceMutator;
+import org.apache.solr.cloud.overseer.ZkWriteCommand;
 import org.apache.solr.common.SolrException;
 import org.apache.solr.common.SolrException.ErrorCode;
-import org.apache.solr.common.cloud.Aliases;
-import org.apache.solr.common.cloud.ClusterState;
-import org.apache.solr.common.cloud.DocCollection;
-import org.apache.solr.common.cloud.DocRouter;
-import org.apache.solr.common.cloud.ImplicitDocRouter;
-import org.apache.solr.common.cloud.Replica;
-import org.apache.solr.common.cloud.ReplicaPosition;
-import org.apache.solr.common.cloud.ZkConfigManager;
-import org.apache.solr.common.cloud.ZkNodeProps;
-import org.apache.solr.common.cloud.ZkStateReader;
-import org.apache.solr.common.cloud.ZooKeeperException;
+import org.apache.solr.common.cloud.*;

Review comment:
       please please please configure your IDE to not do this. it's been an 
issue on other PRs as well.

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java
##########
@@ -213,7 +218,16 @@ public void call(ClusterState clusterState, ZkNodeProps 
message, @SuppressWarnin
             ZkStateReader.NODE_NAME_PROP, nodeName,
             ZkStateReader.REPLICA_TYPE, replicaPosition.type.name(),
             CommonAdminParams.WAIT_FOR_FINAL_STATE, 
Boolean.toString(waitForFinalState));
-        ocmh.overseer.offerStateUpdate(Utils.toJSON(props));
+        if(isPrs) {

Review comment:
       This section looks like almost a duplicate of the other `if(isPrs)` 
block earlier. No concrete reason, but this doesn't seem right.

##########
File path: 
solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java
##########
@@ -256,6 +279,18 @@ public void call(ClusterState clusterState, ZkNodeProps 
message, @SuppressWarnin
       shardRequestTracker.processResponses(results, shardHandler, false, null, 
Collections.emptySet());
       @SuppressWarnings({"rawtypes"})
       boolean failure = results.get("failure") != null && 
((SimpleOrderedMap)results.get("failure")).size() > 0;
+      if(isPrs) {
+        TimeOut timeout = new 
TimeOut(Integer.getInteger("solr.waitToSeeReplicasInStateTimeoutSeconds", 120), 
TimeUnit.SECONDS, timeSource); // could be a big cluster
+        PerReplicaStates prs = PerReplicaStates.fetch(collectionPath, 
ocmh.zkStateReader.getZkClient(), null);
+        while (!timeout.hasTimedOut()) {
+          if(prs.allActive()) break;
+          Thread.sleep(100);

Review comment:
       Can we set a watch (maybe on an arbitrary down replica?) instead of 
doing a loop+sleep?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to