This is the ansible list.
You may have better luck on the AWX list...

On Thu, 20 Jul 2023 at 16:56, Pabbisetty h <[email protected]> wrote:
>
> We have an AWX with  2 cluster K8S configuration (common external postgress 
> db), with container instances as execution envs.
>
> When we fire up only 1 cluster, all works fine.
>
> When we bring up the second cluster, the “awx.main.wsrelay” will try to 
> connect from pods in cluster1 to pods on cluster2 (and the other way around).
> Because it can’t find the other pods coroutine 
> 'WebSocketRelayManager.cleanup_offline_host' fails, and it’s marking its own 
> pod as failing.
>
> In the end, all TASK pods are restarted until Backoff.
>
> Can we isolate somehow the Websocket relay system for “Heartbeet” & 
> “Wsrelay”, and group the pods per cluster?
>
> Or this behaviour is a bug? 
> (https://github.com/ansible/awx/blob/devel/docs/websockets.md)
>
>  Logs:
>
> awx-test2-task 2023-07-20 10:34:45,625 INFO success: superwatcher entered 
> RUNNING state, process has stayed up for > than 1 seconds (startsecs)
>
> │ awx-test2-task 2023-07-20 10:34:45,625 INFO success: superwatcher entered 
> RUNNING state, process has stayed up for > than 1 seconds (startsecs)
>
> │ awx-test2-task 2023-07-20 10:34:46,739 INFO     [-] 
> awx.main.commands.run_callback_receiver Callback receiver started with pid=50
>
> │ awx-test2-task 2023-07-20 10:34:46,764 INFO     [-] awx.main.wsrelay Active 
> instance with hostname awx-test2-task-5<>7bsn8 is registered.
>
> │ awx-test2-task 2023-07-20 10:34:46,807 WARNING  [-] 
> awx.main.dispatch.periodic periodic beat started
>
> │ awx-test2-task 2023-07-20 10:34:46,832 INFO     [-] awx.main.dispatch 
> Running worker dispatcher listening to queues ['tower_broadcast_all', 
> 'tower_settings_change', 'awx-test2-task-<>-7bsn8'] │
>
> │ awx-test2-task 2023-07-20 10:34:56,776 INFO     [-] awx.main.wsrelay Adding 
> {'awx-test2-web-6<>d7-tzscp', 'awx-test2-web-6<>7cdd7-xqw29', 
> 'awx-test1-web-6<>c-29wzn', 'awx-test1-web-6<>8 │
>
> │ awx-test2-task 2023-07-20 10:34:56,794 INFO     [-] awx.main.wsrelay 
> Connection from awx-test2-task-5<>5-7bsn8 to 198.0.0.0 established.
>
> │ awx-test2-task 2023-07-20 10:34:56,795 INFO     [-] awx.main.wsrelay 
> Starting producer for metrics
>
> │ awx-test2-task 2023-07-20 10:34:56,798 INFO     [-] awx.main.wsrelay 
> Connection from awx-test2-task-584bdc44f5-7bsn8 to 198.0.0.0 established.
>
> │ awx-test2-task 2023-07-20 10:34:56,798 INFO     [-] awx.main.wsrelay 
> Starting producer for metrics
>
> │ awx-test2-task 2023-07-20 10:35:06,780 INFO     [-] awx.main.wsrelay 
> Removing {'awx-test1-web-6<>c-29wzn', 'awx-test1-web-68<>fc-zx8sf'} from 
> websocket broadcast list                        │
>
> │ awx-test2-task /usr/lib64/python3.9/asyncio/events.py:80: RuntimeWarning: 
> coroutine 'WebSocketRelayManager.cleanup_offline_host' was never awaited
>
> │ awx-test2-task   self._context.run(self._callback, *self._args)
>
> │ awx-test2-task RuntimeWarning: Enable tracemalloc to get the object 
> allocation traceback
>
> │ awx-test2-task 2023-07-20 10:35:06,789 WARNING  [-] awx.main.wsrelay 
> Connection from awx-test2-task-5<>5-7bsn8 to 172.0.0.x cancelled.    ->> 
> Cluster1
>
> │ awx-test2-task 2023-07-20 10:35:06,790 WARNING  [-] awx.main.wsrelay 
> Connection from awx-test2-task-5<>5-7bsn8 to 172.x.x.x.x cancelled.    ->> 
> Cluster1
>
> │ awx-test2-task 2023-07-20 10:35:06,791 WARNING  [-] awx.main.wsrelay 
> Connection from awx-test2-task-5<>5-7bsn8 to 198.x.x.x cancelled.    ->> 
> Cluster2
>
> │ awx-test2-task 2023-07-20 10:35:06,793 WARNING  [-] awx.main.wsrelay 
> Connection from awx-test2-task-5<>5-7bsn8 to 198.x.x.x cancelled.    ->> 
> Cluster2
>
> awx-test2-task Traceback (most recent call last):
>
> │ awx-test2-task   File "/usr/bin/awx-manage", line 8, in <module>
>
> │ awx-test2-task     sys.exit(manage())
>
> │ awx-test2-task   File 
> "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/__init__.py", line 
> 200, in manage
>
> │ awx-test2-task     execute_from_command_line(sys.argv)
>
> │ awx-test2-task   File 
> "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/__init__.py",
>  line 442, in execute_from_command_line
>
> │ awx-test2-task     utility.execute()
>
> │ awx-test2-task   File 
> "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/__init__.py",
>  line 436, in execute
>
> │ awx-test2-task     self.fetch_command(subcommand).run_from_argv(self.argv)
>
> │ awx-test2-task   File 
> "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/base.py",
>  line 412, in run_from_argv
>
> │ awx-test2-task     self.execute(*args, **cmd_options)
>
> │ awx-test2-task   File 
> "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/base.py",
>  line 458, in execute
>
> │ awx-test2-task     output = self.handle(*args, **options)
>
> │ awx-test2-task   File 
> "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/management/commands/run_wsrelay.py",
>  line 168, in handle
>
> │ awx-test2-task     asyncio.run(websocket_relay_manager.run())
>
> │ awx-test2-task   File "/usr/lib64/python3.9/asyncio/runners.py", line 44, 
> in run
>
> │ awx-test2-task     return loop.run_until_complete(main)
>
> │ awx-test2-task   File "/usr/lib64/python3.9/asyncio/base_events.py", line 
> 647, in run_until_complete
>
> │ awx-test2-task     return future.result()
>
> │ awx-test2-task   File 
> "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/wsrelay.py", 
> line 330, in run
>
> │ awx-test2-task     await asyncio.gather(self.cleanup_offline_host(h) for h 
> in deleted_remote_hosts)
>
> │ awx-test2-task   File 
> "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/wsrelay.py", 
> line 330, in <genexpr>
>
> │ awx-test2-task     await asyncio.gather(self.cleanup_offline_host(h) for h 
> in deleted_remote_hosts)
>
> │ awx-test2-task RuntimeError: Task got bad yield: <coroutine object 
> WebSocketRelayManager.cleanup_offline_host at 0x<>40>
>
> │ awx-test2-task 2023-07-20 10:35:08,314 WARN exited: wsrelay (exit status 1; 
> not expected)
>
> │ awx-test2-task 2023-07-20 10:35:08,314 WARN exited: wsrelay (exit status 1; 
> not expected)
>
> │ awx-test2-task 2023-07-20 10:35:09,317 INFO spawned: 'wsrelay' with pid 133
>
> │ awx-test2-task 2023-07-20 10:35:09,317 INFO spawned: 'wsrelay' with pid 133
>
> │ awx-test2-task 2023-07-20 10:35:11,359 INFO     [-] awx.main.wsrelay Active 
> instance with hostname awx-test2-task-58<>5-7bsn8 is registered.
>
>
>
>
>
> Repeats N times,
>
> and then: removed self from capacit
>
>
>
> 2023-07-20 11:00:48,825 INFO gave up: wsrelay entered FATAL state, too many 
> start retries too quickly                                                     
>                                │
>
> │ awx-test2-task Processing Event: ver:3.0 server:supervisor serial:0 
> pool:superwatcher poolserial:0 eventname:PROCESS_STATE_FATAL len:64           
>                                                       │
>
> │ awx-test2-task 2023-07-20 11:00:49,827 WARN received SIGQUIT indicating 
> exit request                                                                  
>                                                   │
>
> │ awx-test2-task 2023-07-20 11:00:49,827 WARN received SIGQUIT indicating 
> exit request                                                                  
>                                                   │
>
> │ awx-test2-task 2023-07-20 11:00:49,827 INFO waiting for superwatcher, 
> dispatcher, callback-receiver to die                                          
>                                                     │
>
> │ awx-test2-task 2023-07-20 11:00:49,827 INFO waiting for superwatcher, 
> dispatcher, callback-receiver to die                                          
>                                                     │
>
> │ awx-test2-task 2023-07-20 11:00:49,829 WARNING  
> [24ff42c8c9c64921a6097197bec680a3] awx.main.dispatch received SIGTERM, 
> stopping                                                                      
>    │
>
> │ awx-test2-task 2023-07-20 11:00:49,828 WARNING  [-] 
> awx.main.commands.run_callback_receiver received SIGTERM, stopping            
>                                                                       │
>
> │ awx-test2-task 2023-07-20 11:00:49,893 WARNING  
> [24ff42c8c9c64921a6097197bec680a3] awx.main.tasks.system Normal shutdown 
> signal for instance awx-test2-task-584bdc44f5-qfs4d, removed self from 
> capacit │
>
> │ awx-test2-task 2023-07-20 11:00:50,432 INFO stopped: dispatcher (exit 
> status 0)
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Ansible Project" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/ansible-project/3cfc4845-fbdb-4355-8132-61db45066fb9n%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/CAF8BbLbpdOVbY2m_1_qYi035%2Bj-RKH8XLCb8E_wackcCSfW_HA%40mail.gmail.com.

Reply via email to