Ilan Ginzburg created SOLR-14546:
------------------------------------

             Summary: OverseerTaskProcessor can process messages out of order
                 Key: SOLR-14546
                 URL: https://issues.apache.org/jira/browse/SOLR-14546
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
    Affects Versions: master (9.0)
            Reporter: Ilan Ginzburg


*Summary:* 

Collection API and ConfigSet API messages are processed out of order by the 
{{OverseerTaskProcessor}}/{{OverseerCollectionConfigSetProcessor}} because of 
the way locking is handled.

 

*Some context:*

The Overseer task processor dequeues messages from the Collection and ConfigSet 
Zookeeper queue at {{/overseer/collection-queue-work}} and hands processing to 
an executor pool with 100 max parallel tasks ({{ClusterStateUpdater}} on the 
other hand uses a single thread consuming and processing the{{ /overseer/queue 
queue}} and does not suffer from the problem described here).

Locking prevents tasks modifying the same or related data from running 
concurrently. For the Collection API, locking can be done at {{CLUSTER}}, 
{{COLLECTION}}, {{SHARD}} or {{REPLICA}} level and each command has its locking 
level defined in {{CollectionParams.CollectionAction}}.

Multiple tasks for the same or competing locking levels do not execute 
concurrently. Commands locking at {{COLLECTION}} level for the same collection 
(for example {{CREATE}} creating a collection, {{CREATEALIAS}} creating an 
alias for a collection and {{CREATESHARD}} creating a new shard for the 
collection) will be executed sequentially, and it is expected that they are 
executed in submission order. Commands locking at different levels are also 
serialized when needed (for example changes to a shard of a collection and 
changes to the collection itself).

 

*The issue:*

The way {{OverseerTaskProcessor.run()}} deals with messages that cannot be 
executed due to competing messages already running (i.e. lock unavailable) is 
not correct. The method basically iterates over the set of messages to execute, 
skips those that can’t be executed due to locking and executes those that can 
be (assuming enough remaining available executors).

The issue of out of order execution occurs when at least 3 messages compete for 
a lock. The following scenario shows out of order execution (messages numbered 
in enqueue order):
 * Message 1 gets the lock and runs,
 * Message 2 can’t be executed because the lock is taken,
 * Message 1 finishes execution and releases the lock,
 * Message 3 can be executed, gets the lock and runs,
 * Message 2 eventually runs when Message 3 finishes and releases the lock.

We therefore have execution order 1 - 3 - 2 when the expected execution order 
is 1 - 2 - 3. Order 1 - 3 - 2 might not make sense or result in a different 
final state from 1 - 2 - 3.

(there’s a variant of this scenario in which after Message 2 can't be executed 
the number of max parallel tasks is reached, then remaining tasks in {{heads}} 
including Message 3 are put into the {{blockedTasks}} map. These tasks are the 
first ones considered for processing on the next iteration of the main 
{{while}} loop. The reordering is similar).

 

Note that from the client perspective, there does not need to be 3 outstanding 
tasks. The third task required for observing reordering could be enqueued after 
the first one completed.

 

*Impact of the issue:*

This problem makes {{MultiThreadedOCPTest.testFillWorkQueue()}} fail because 
the test requires task execution in submission order.

 

Even though three messages are required for reordering to happen, these are not 
necessarily messages at the same locking level: a message locking at {{SHARD}} 
or {{REPLICA}} level prevents execution of a {{COLLECTION}} level message for 
the same collection. Examples can be built of sequences of messages that lead 
to incorrect results or failures (for example {{ADDREPLICA}} followed by 
{{CREATEALIAS}} followed by {{DELETEALIAS}}, it is possible that alias creation 
and deletion are reordered, making no sense) but I wasn’t able to come up with 
a realistic production example that would be impacted by this issue (doesn't 
mean one doesn't exist!).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to