Occasionally, I get SSH timeouts or other network errors in the middle of a play. These cause that task to fail, but not the whole playbook, which continues on, but without the node that failed. You see output something like this:
... 2014-11-24 23:54:08,717 p=21193 u=ubuntu | TASK: [common | Install latest pip from PyPi] **************************** 2014-11-24 23:54:12,034 p=21193 u=ubuntu | changed: [box0] *2014-11-24 23:54:09,443 p=21193 u=ubuntu | fatal: [box1] => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue* 2014-11-24 23:54:11,001 p=21193 u=ubuntu | changed: [box2] 2014-11-24 23:54:11,278 p=21193 u=ubuntu | changed: [box3] 2014-11-24 23:54:12,305 p=21193 u=ubuntu | changed: [box4] ... You eventually get this at the bottom of the output: box0: ok=29 changed=23 unreachable=0 failed=0 box1: ok=18 *changed=15 unreachable=1* failed=0 box2: ok=77 changed=61 unreachable=0 failed=0 box3: ok=77 changed=62 unreachable=0 failed=0 box4: ok=77 changed=62 unreachable=0 failed=0 This means that box1 doesn't get anything that should have happened after the task that failed, because it got dropped from the play - and all groups - at that point. This means that I occasionally get one cluster node that doesn't work properly, because it was only partially provisioned/setup. Is there a way to make this SSH error fatal and to stop the whole playbook/ansible run at that point - the same way an error in a task itself would? -- You received this message because you are subscribed to the Google Groups "Ansible Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/39145844-e616-45ee-ac7b-c1c68c2a601f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
