[
https://issues.apache.org/jira/browse/SPARK-52504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enrico Minack updated SPARK-52504:
----------------------------------
Description:
Spark on Kubernetes provides the storage decommissioning feature where shuffle
data stored on an executor are migrated to other executors and optionally the
fallback storage when an executor gets decommissioned.
Using this feature allows for dynamic allocation on Kubernetes. However, using
this in anger has showed a number of issues that render this setup not fit for
production use.
This issue collects those issues to provide an overview.
The following can be used to reproduce the issues using a local minikube K8S
cluster:
[https://gist.github.com/EnricoMi/e9daa1176bce4c1211af3f3c5848112a]
was:
Spark on Kubernetes provides the storage decommissioning feature where shuffle
data stored on an executor are migrated to other executors and optionally the
fallback storage when an executor gets decommissioned.
Using this feature allows for dynamic allocation on Kubernetes. However, using
this in anger has showed a number of issues that render this setup not fit for
production use.
This issue collects those issues to provide an overview.
> Spark on Kubernetes with storage decommissioning
> ------------------------------------------------
>
> Key: SPARK-52504
> URL: https://issues.apache.org/jira/browse/SPARK-52504
> Project: Spark
> Issue Type: Umbrella
> Components: k8s, Kubernetes
> Affects Versions: 4.1.0
> Reporter: Enrico Minack
> Priority: Blocker
>
> Spark on Kubernetes provides the storage decommissioning feature where
> shuffle data stored on an executor are migrated to other executors and
> optionally the fallback storage when an executor gets decommissioned.
> Using this feature allows for dynamic allocation on Kubernetes. However,
> using this in anger has showed a number of issues that render this setup not
> fit for production use.
> This issue collects those issues to provide an overview.
> The following can be used to reproduce the issues using a local minikube K8S
> cluster:
> [https://gist.github.com/EnricoMi/e9daa1176bce4c1211af3f3c5848112a]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]