Hi Ceph,

Today I did something wrong and that blocked the lab for a good half hour. 

a) I ran two teuthology-kill simultaneously and that makes them deadlock each 
other
b) I let them run unattended only to come back to the terminal 30 minutes later 
and see them stuck.

Sure, two teuthology-kill simultaneously should not deadlock and that needs to 
be fixed. But the easy workaround to avoid that trouble is to just not let it 
run forever. Even for ~200 jobs it takes at most a minute or two. And if it 
takes longer it probably means another teuthology-kill competes and it should 
be interrupted and restarted later. From now on I'll do

timeout 120 teuthology-kill .... || echo FAIL!

as a generic safeguard.

Apologies for the troubles.

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to