Zeke Harris created AURORA-1215:
-----------------------------------
Summary: Improve gc_executor to better handle tasks stuck in
STARTING state
Key: AURORA-1215
URL: https://issues.apache.org/jira/browse/AURORA-1215
Project: Aurora
Issue Type: Task
Components: Executor
Reporter: Zeke Harris
If a task is lost on a slave for some reason while the scheduler still thinks
it's STARTING, the gc_executor doesn't know what to do and passes. It should
instead probably let the scheduler know that the task should be transitioned to
a different state (FAILED?).
Here's an example of an error log line with this happenning:
{code}I0320 07:22:01.281100 19634 executor_base.py:45] Executor
[20150206-190136-2126263306-5050-29652-S51]: Know nothing about task
1426024330051-mesos-test-oom-0-8e9c1594-fbba-4932-bb4e-140ce79100ad, but
scheduler says STARTING - passing{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)