Enrico Minack created SPARK-52508:
-------------------------------------

             Summary: Read from fallback storage should consider replication 
delay
                 Key: SPARK-52508
                 URL: https://issues.apache.org/jira/browse/SPARK-52508
             Project: Spark
          Issue Type: Sub-task
          Components: k8s, Kubernetes
    Affects Versions: 4.1.0
            Reporter: Enrico Minack


Using the storage decommissioning feature on Kubernetes with a distributed 
filesystem as the fallback storage might run into the situation where an 
executor cannot see the shuffle data on the distributed filesystem that has 
just been written by the decommissioned executor. This is caused by some 
replication delay. Given the dependent executor knows the location of the 
shuffle data is the fallback storage, it can defer reading on a 
{{FileNotFoundException}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to