I've found that if you have an unreliable connection to a target, you have at 
least two challenges.  One is Ansible and another is the package repo 
connection instability.    Sometimes they both recover or finish and sometimes 
they don't.

- create an execution environment ansible control node nearest to the edge 
system with a more reliable connection to the target.  Run your play there.
or
- schedule and run your play on the target system that you are patching.  The 
target becomes the Ansible Control Node.  Push the play to the target and then 
schedule it to run and then inspect the local logs for completion.

Troubleshooting flaky and unreliable connections is most difficult.

There are probably many other ways to manage this.

Regards,
Stan

From: [email protected] <[email protected]> On 
Behalf Of Francisco Palomares
Sent: Monday, March 20, 2023 12:58 PM
To: Ansible Project <[email protected]>
Subject: [EXTERNAL] [ansible-project] unreachable and retries

Hi Ansible Experts !

I am working in a playbook that executes a long running task (patching), that 
if the session gets disconnected half-way in the execution can cause serious 
damage in the target server.

Using async module with retries seems the right approach, I hope.

I am now trying to handle the scenario of reporting that we had an 
"unreachable" error while the async job is running.
We could have intermittent ldap and network issues. Thanks to async, the patch 
shell script won't stop running... but I want to inform the engineers (email, 
dashboard, etc.) that the playbook have actually stopped due to "unreachable" 
error.


But apparently ignore_unreachable is not working as I expected, when used with 
retry.

Based in the feedback given in this issue 
https://github.com/ansible/ansible/issues/78358<https://urldefense.com/v3/__https:/github.com/ansible/ansible/issues/78358__;!!Fto3Xw!sD2nXOZtEMhinLrSrSd52zbK830zPX332nTxTCGVR23lEfMXFRrhceBYfDbUuW_qL063n31LovHH7M4z1xgD$>,
 ignore_unreachable is not honored with retries.

I am not really looking for "keep retrying" functionality, but at least that I 
can get a failed task so I can rescue properly afterwards.

This is an example of what I am trying to achieve:

- name: patching
  become: yes
  block:
    - name: run patching async
      async: 43200
      poll: 0
      shell: my_patch.sh
      register: patch_sleeper

    - name: wait for async job to end
      async_status:
        jid: '{{ patch_sleeper.ansible_job_id }}'
      register: job_result
      until: job_result.finished
      retries: 720
      delay: 1
      ignore_unreachable: true

    - name: error handling for unreachable in the middle of the run
      fail:
        msg: Detected unreachable host error. Forcing a fail to trigger any 
rescue.
      when: job_result.unreachable is defined
  rescue:
    - name: send message to Monitoring Dashboard
      my_method:
        message: there was an error during patching


And this is an extract of the output I get, that clearly shows that the 
ignore_unreachable = true is ignored (no pun intended :) )

(ansible 2.12)


TASK [wait for async job to end] 
*************************************************************
FAILED - RETRYING: [mynode]: wait for async job to end (720 retries left).
...
FAILED - RETRYING: [mynode]: wait for async job to end (705 retries left).
fatal: [mynode]: UNREACHABLE! => changed=false
  msg: 'Failed to connect to the host via ssh: '
  skip_reason: Host mynode is unreachable
  unreachable: true

NO MORE HOSTS LEFT *************************************


I explored developing my own custom_async_status, callbacks or action_plugins 
... but none of those seems capable to change the status from "unreachable" to 
"failed".

Is there any way to convert status "unreachable" into a "failed" so that can be 
rescued? Or to somehow make ignore_unreachable working with retries?

Thanks in advance,
FP


--
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/b912e01c-d869-40cc-8d72-26c7572cfebbn%40googlegroups.com<https://urldefense.com/v3/__https:/groups.google.com/d/msgid/ansible-project/b912e01c-d869-40cc-8d72-26c7572cfebbn*40googlegroups.com?utm_medium=email&utm_source=footer__;JQ!!Fto3Xw!sD2nXOZtEMhinLrSrSd52zbK830zPX332nTxTCGVR23lEfMXFRrhceBYfDbUuW_qL063n31LovHH7O75v5CM$>.

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/PH0PR10MB55935B9F2E2CB9ED6FA69102F0809%40PH0PR10MB5593.namprd10.prod.outlook.com.

Reply via email to