Ishan Chattopadhyaya created SOLR-13945:
-------------------------------------------

             Summary: SPLITSHARD data loss due to "rollback"
                 Key: SOLR-13945
                 URL: https://issues.apache.org/jira/browse/SOLR-13945
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Ishan Chattopadhyaya


# As per SOLR-7673, there is a commit on the parent shard *after state changes* 
have happened, i.e. from active/construction/construction to 
inactive/active/active. Please see 
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java#L586-L588
# Due to SOLR-12509, there's now a cleanup/rollback method called 
"cleanupAfterFailure" in the finally block that resets the state to 
active/construction/construction. Please see: 
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java#L657
# When 2 is entered into due to a failure in 1, we have a situation where any 
documents that went into the subshards (because they are already active by now) 
are now lost after the parent becomes active.

If my above understanding is correct, I am wondering:
# Why is a commit to parent shard needed *after* the parent shard is inactive, 
subshards are now active and the split operation has completed?
# This rollback looks very suspicious. If state of subshards is already active 
and parent is inactive, then what is the need for setting them back to 
construction? Seems like a crucial check is missing there. Also, why do we 
reset the subshard status back to construction instead of inactive? It is 
extremely misleading (and, frankly, ridiculous) for any external clusterstate 
monitoring tools to see the subshards to go from CONSTRUCTION to ACTIVE to 
CONSTRUCTION and then the subshard disappearing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to