[
https://issues.apache.org/jira/browse/HADOOP-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704069#comment-15704069
]
Weiwei Yang edited comment on HADOOP-13837 at 11/29/16 3:43 AM:
----------------------------------------------------------------
Hello [~aw]
bq. The proposed patch assumes that the process will actually end
The hadoop_status_daemon_wrapper was going to wait at maximum 5 secs, if
process doesn't get to the expected state (started or stopped), it will
terminate and return an error code 1. Won't be an infinite loop.
Just sleep has the problem that you don't know how long you want to sleep. Some
cases, process doesn't stop, then we should wait until times out; some other
cases, process was stopped in 1 or 2 secs, so we just wait for 1 or 2 secs.
was (Author: cheersyang):
Hello [~aw]
bq. The proposed patch assumes that the process will actually end
The hadoop_status_daemon_wrapper was going to wait at maximum 5 secs, if
process doesn't get to the expected state (started or stopped), it will
terminate and return an error code 1. Won't be an infinite loop.
Just sleep has the problem that you don't know how long you want to sleep. Some
cases, process doesn't stop, then we should wait until times out, some cases,
process was stopped in 1 or 2 secs, so we just wait for 1 or 2 secs.
> Process check bug in hadoop_stop_daemon of hadoop-functions.sh
> --------------------------------------------------------------
>
> Key: HADOOP-13837
> URL: https://issues.apache.org/jira/browse/HADOOP-13837
> Project: Hadoop Common
> Issue Type: Bug
> Components: scripts
> Reporter: Weiwei Yang
> Assignee: Weiwei Yang
> Attachments: HADOOP-13837.01.patch, HADOOP-13837.02.patch,
> check_proc.sh
>
>
> Always get {{ERROR: Unable to kill ...}} after {{Trying to kill with kill
> -9}}, see following output of stop-yarn.sh
> {code}
> <NM_HOST>: WARNING: nodemanager did not stop gracefully after 5 seconds:
> Trying to kill with kill -9
> <NM_HOST>: ERROR: Unable to kill 18097
> {code}
> hadoop_stop_daemon doesn't check process liveness correctly, this bug can be
> reproduced by the script easily. kill -9 would need some time to be done,
> directly check process existence right after mostly will fail.
> {code}
> function hadoop_stop_daemon
> {
> ...
> kill -9 "${pid}" >/dev/null 2>&1
> fi
> if ps -p "${pid}" > /dev/null 2>&1; then
> hadoop_error "ERROR: Unable to kill ${pid}"
> else
> ...
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]