[
https://issues.apache.org/jira/browse/HADOOP-18639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sejin Hwang updated HADOOP-18639:
---------------------------------
Affects Version/s: 3.1.2
> DockerContainerDeletionTask is not removed from the Nodemanager's statestore
> when the task is completed.
> --------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-18639
> URL: https://issues.apache.org/jira/browse/HADOOP-18639
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 3.1.2
> Reporter: Sejin Hwang
> Priority: Major
> Labels: pull-request-available
>
> YARN NodeManager's deletion service has two types of deletion tasks: the
> FileDeletionTask for deleting log, usercache, appcache files and the
> DockerContainerDeletionTask for deleting Docker containers.
>
> The FileDeletionTask is removed from the statestore when the task is
> completed, but the DockerContainerDeletionTask is not.
> Therefore, the DockerContainerDeletionTask accumulates continuously in the
> statestore.
>
> This causes the NodeManager's deletion service to run the accumulated
> DockerContainerDeletionTask in the statestore when the NodeManager restarts.
> As a result, the FileDeletionTask and DockerContainerDeletionTask are delayed
> unnecessarily while processing accumulated tasks, which can cause disk full
> issues in environments where a large number of containers are allocated and
> released.
> I will attach a patch soon
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]