[ 
https://issues.apache.org/jira/browse/AURORA-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368255#comment-14368255
 ] 

brian wickman commented on AURORA-1054:
---------------------------------------

I *think* understand what's going on now.  This is almost certainly an artifact 
from the open source refactor (fd241cdd).  Trying to understand the intricacies 
of the race but the gist is that we fork a runner, it forks a task that 
swallows SIGTERM, we send SIGUSR1 to the runner which tears its children down 
by doing the SIGTERM -> SIGKILL escalation, but the escalation grace period is 
60s.  This happens to correspond with the default timeout in stop(), when could 
result in a raised TaskError some of the time depending on whether the executor 
or runner wins.  Will continue to dig deeper to understand what behavior 
precisely we're trying to test and targeting that specifically.

> src.test.python.apache.aurora.executor.thermos_task_runner appears to be flaky
> ------------------------------------------------------------------------------
>
>                 Key: AURORA-1054
>                 URL: https://issues.apache.org/jira/browse/AURORA-1054
>             Project: Aurora
>          Issue Type: Story
>          Components: Executor
>            Reporter: Bill Farner
>            Assignee: brian wickman
>
> I've seen this test fail on a few reviews recently, but succeeds on a retry.
> {noformat}
>                      ==================== FAILURES ====================
>                       TestThermosTaskRunnerIntegration.test_integration_stop 
>                      
>                      self = 
> <test_thermos_task_runner.TestThermosTaskRunnerIntegration object at 
> 0x7ff3e1dc6fd0>
>                      
>                          def test_integration_stop(self):
>                            with self.yield_sleepy(ThermosTaskRunner, 
> sleep=1000, exit_code=0) as task_runner:
>                              task_runner.start()
>                              task_runner.forked.wait()
>                          
>                              assert task_runner.status is None
>                          
>                              task_runner.stop()
>                          
>                      >       assert task_runner.status is not None
>                      E       assert None is not None
>                      E        +  where None = 
> <apache.aurora.executor.thermos_task_runner.ThermosTaskRunner object at 
> 0x7ff3e3333490>.status
>                      
>                      
> src/test/python/apache/aurora/executor/test_thermos_task_runner.py:175: 
> AssertionError
>                      -------------- Captured stderr call --------------
>                      Writing log files to disk in /tmp/user/2396/tmpqHF5bz
>                      ERROR] Could not quitquitquit runner: Cannot take 
> control of a task in terminal state.
>                       generated xml file: 
> /home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/src.test.python.apache.aurora.executor.thermos_task_runner.xml
>  
>                      ====== 1 failed, 7 passed in 81.16 seconds 
> =======
>                      src.test.python.apache.aurora.admin.admin                
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.admin.host_maintenance     
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.admin.maintenance          
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.api             
>                        .....   SUCCESS
>                      
> src.test.python.apache.aurora.client.api.instance_watcher                     
>   .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.job_monitor     
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.mux             
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.quota_check     
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.restarter       
>                        .....   SUCCESS
>                      
> src.test.python.apache.aurora.client.api.scheduler_client                     
>   .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.sla             
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.task_util       
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.updater         
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.api.updater_util    
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.base                
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.binding_helper      
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.api             
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.client          
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.command_hooks   
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.config          
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.cron            
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.inspect         
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.job             
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.plugins         
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.quota           
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.sla             
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.supdate         
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.task            
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.update          
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.cli.version         
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.config              
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.client.hooks.hooked_api    
>                        .....   SUCCESS
>                      
> src.test.python.apache.aurora.client.hooks.non_hooked_api                     
>   .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_aurora_job_key 
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_cluster        
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_cluster_option 
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_clusters       
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_http_signaler  
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_pex_version    
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_shellify       
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.common.test_transport      
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.config.test_base           
>                        .....   SUCCESS
>                      
> src.test.python.apache.aurora.config.test_constraint_parsing                  
>   .....   SUCCESS
>                      src.test.python.apache.aurora.config.test_loader         
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.config.test_thrift         
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.executor.common.task_info  
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.executor.executor_base     
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.executor.executor_detector 
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.executor.executor_vars     
>                        .....   SUCCESS
>                      src.test.python.apache.aurora.executor.status_manager    
>                        .....   SUCCESS
>                      
> src.test.python.apache.aurora.executor.thermos_task_runner                    
>   .....   FAILURE
>                      src.test.python.apache.thermos.common.test_pathspec      
>                        .....   SUCCESS
>                      
> src.test.python.apache.thermos.core.test_runner_integration                   
>   .....   SUCCESS
>                      src.test.python.apache.thermos.monitoring.test_disk      
>                        .....   SUCCESS
>                      
> FAILURE
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to