Andrzej Bialecki created SOLR-14347: ---------------------------------------
Summary: Autoscaling placement wrong due to incorrect LockLevel for collection modifications Key: SOLR-14347 URL: https://issues.apache.org/jira/browse/SOLR-14347 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Affects Versions: 8.5 Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Steps to reproduce: * create a cluster of a few nodes (tested with 7 nodes) * define per-collection policies that distribute replicas exclusively on different nodes per policy * concurrently create a few collections, each using a different policy * resulting replica placement will be seriously wrong, causing many policy violations Running the same scenario but instead creating collections sequentially results in no violations. I suspect this is caused by incorrect locking level for all collection operations (as defined in {{CollectionParams.CollectionAction}}) that create new replica placements - i.e. CREATE, ADDREPLICA, MOVEREPLICA, DELETENODE, REPLACENODE, SPLITSHARD, RESTORE, REINDEXCOLLECTION. All of these operations use the policy engine to create new replica placements, and as a result they change the cluster state. However, currently these operations are locked (in {{OverseerCollectionMessageHandler.lockTask}} ) using {{LockLevel.COLLECTION}}. In practice this means that the lock is held only for the particular collection that is being modified. A straightforward fix for this issue is to change the locking level to CLUSTER (and I confirm this fixes the scenario described above). However, this effectively serializes all collection operations listed above, which will result in general slow-down of all collection operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org