This is the ansible list. You may have better luck on the AWX list... On Thu, 20 Jul 2023 at 16:56, Pabbisetty h <[email protected]> wrote: > > We have an AWX with 2 cluster K8S configuration (common external postgress > db), with container instances as execution envs. > > When we fire up only 1 cluster, all works fine. > > When we bring up the second cluster, the “awx.main.wsrelay” will try to > connect from pods in cluster1 to pods on cluster2 (and the other way around). > Because it can’t find the other pods coroutine > 'WebSocketRelayManager.cleanup_offline_host' fails, and it’s marking its own > pod as failing. > > In the end, all TASK pods are restarted until Backoff. > > Can we isolate somehow the Websocket relay system for “Heartbeet” & > “Wsrelay”, and group the pods per cluster? > > Or this behaviour is a bug? > (https://github.com/ansible/awx/blob/devel/docs/websockets.md) > > Logs: > > awx-test2-task 2023-07-20 10:34:45,625 INFO success: superwatcher entered > RUNNING state, process has stayed up for > than 1 seconds (startsecs) > > │ awx-test2-task 2023-07-20 10:34:45,625 INFO success: superwatcher entered > RUNNING state, process has stayed up for > than 1 seconds (startsecs) > > │ awx-test2-task 2023-07-20 10:34:46,739 INFO [-] > awx.main.commands.run_callback_receiver Callback receiver started with pid=50 > > │ awx-test2-task 2023-07-20 10:34:46,764 INFO [-] awx.main.wsrelay Active > instance with hostname awx-test2-task-5<>7bsn8 is registered. > > │ awx-test2-task 2023-07-20 10:34:46,807 WARNING [-] > awx.main.dispatch.periodic periodic beat started > > │ awx-test2-task 2023-07-20 10:34:46,832 INFO [-] awx.main.dispatch > Running worker dispatcher listening to queues ['tower_broadcast_all', > 'tower_settings_change', 'awx-test2-task-<>-7bsn8'] │ > > │ awx-test2-task 2023-07-20 10:34:56,776 INFO [-] awx.main.wsrelay Adding > {'awx-test2-web-6<>d7-tzscp', 'awx-test2-web-6<>7cdd7-xqw29', > 'awx-test1-web-6<>c-29wzn', 'awx-test1-web-6<>8 │ > > │ awx-test2-task 2023-07-20 10:34:56,794 INFO [-] awx.main.wsrelay > Connection from awx-test2-task-5<>5-7bsn8 to 198.0.0.0 established. > > │ awx-test2-task 2023-07-20 10:34:56,795 INFO [-] awx.main.wsrelay > Starting producer for metrics > > │ awx-test2-task 2023-07-20 10:34:56,798 INFO [-] awx.main.wsrelay > Connection from awx-test2-task-584bdc44f5-7bsn8 to 198.0.0.0 established. > > │ awx-test2-task 2023-07-20 10:34:56,798 INFO [-] awx.main.wsrelay > Starting producer for metrics > > │ awx-test2-task 2023-07-20 10:35:06,780 INFO [-] awx.main.wsrelay > Removing {'awx-test1-web-6<>c-29wzn', 'awx-test1-web-68<>fc-zx8sf'} from > websocket broadcast list │ > > │ awx-test2-task /usr/lib64/python3.9/asyncio/events.py:80: RuntimeWarning: > coroutine 'WebSocketRelayManager.cleanup_offline_host' was never awaited > > │ awx-test2-task self._context.run(self._callback, *self._args) > > │ awx-test2-task RuntimeWarning: Enable tracemalloc to get the object > allocation traceback > > │ awx-test2-task 2023-07-20 10:35:06,789 WARNING [-] awx.main.wsrelay > Connection from awx-test2-task-5<>5-7bsn8 to 172.0.0.x cancelled. ->> > Cluster1 > > │ awx-test2-task 2023-07-20 10:35:06,790 WARNING [-] awx.main.wsrelay > Connection from awx-test2-task-5<>5-7bsn8 to 172.x.x.x.x cancelled. ->> > Cluster1 > > │ awx-test2-task 2023-07-20 10:35:06,791 WARNING [-] awx.main.wsrelay > Connection from awx-test2-task-5<>5-7bsn8 to 198.x.x.x cancelled. ->> > Cluster2 > > │ awx-test2-task 2023-07-20 10:35:06,793 WARNING [-] awx.main.wsrelay > Connection from awx-test2-task-5<>5-7bsn8 to 198.x.x.x cancelled. ->> > Cluster2 > > awx-test2-task Traceback (most recent call last): > > │ awx-test2-task File "/usr/bin/awx-manage", line 8, in <module> > > │ awx-test2-task sys.exit(manage()) > > │ awx-test2-task File > "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/__init__.py", line > 200, in manage > > │ awx-test2-task execute_from_command_line(sys.argv) > > │ awx-test2-task File > "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/__init__.py", > line 442, in execute_from_command_line > > │ awx-test2-task utility.execute() > > │ awx-test2-task File > "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/__init__.py", > line 436, in execute > > │ awx-test2-task self.fetch_command(subcommand).run_from_argv(self.argv) > > │ awx-test2-task File > "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/base.py", > line 412, in run_from_argv > > │ awx-test2-task self.execute(*args, **cmd_options) > > │ awx-test2-task File > "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/django/core/management/base.py", > line 458, in execute > > │ awx-test2-task output = self.handle(*args, **options) > > │ awx-test2-task File > "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/management/commands/run_wsrelay.py", > line 168, in handle > > │ awx-test2-task asyncio.run(websocket_relay_manager.run()) > > │ awx-test2-task File "/usr/lib64/python3.9/asyncio/runners.py", line 44, > in run > > │ awx-test2-task return loop.run_until_complete(main) > > │ awx-test2-task File "/usr/lib64/python3.9/asyncio/base_events.py", line > 647, in run_until_complete > > │ awx-test2-task return future.result() > > │ awx-test2-task File > "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/wsrelay.py", > line 330, in run > > │ awx-test2-task await asyncio.gather(self.cleanup_offline_host(h) for h > in deleted_remote_hosts) > > │ awx-test2-task File > "/var/lib/awx/venv/awx/lib64/python3.9/site-packages/awx/main/wsrelay.py", > line 330, in <genexpr> > > │ awx-test2-task await asyncio.gather(self.cleanup_offline_host(h) for h > in deleted_remote_hosts) > > │ awx-test2-task RuntimeError: Task got bad yield: <coroutine object > WebSocketRelayManager.cleanup_offline_host at 0x<>40> > > │ awx-test2-task 2023-07-20 10:35:08,314 WARN exited: wsrelay (exit status 1; > not expected) > > │ awx-test2-task 2023-07-20 10:35:08,314 WARN exited: wsrelay (exit status 1; > not expected) > > │ awx-test2-task 2023-07-20 10:35:09,317 INFO spawned: 'wsrelay' with pid 133 > > │ awx-test2-task 2023-07-20 10:35:09,317 INFO spawned: 'wsrelay' with pid 133 > > │ awx-test2-task 2023-07-20 10:35:11,359 INFO [-] awx.main.wsrelay Active > instance with hostname awx-test2-task-58<>5-7bsn8 is registered. > > > > > > Repeats N times, > > and then: removed self from capacit > > > > 2023-07-20 11:00:48,825 INFO gave up: wsrelay entered FATAL state, too many > start retries too quickly > │ > > │ awx-test2-task Processing Event: ver:3.0 server:supervisor serial:0 > pool:superwatcher poolserial:0 eventname:PROCESS_STATE_FATAL len:64 > │ > > │ awx-test2-task 2023-07-20 11:00:49,827 WARN received SIGQUIT indicating > exit request > │ > > │ awx-test2-task 2023-07-20 11:00:49,827 WARN received SIGQUIT indicating > exit request > │ > > │ awx-test2-task 2023-07-20 11:00:49,827 INFO waiting for superwatcher, > dispatcher, callback-receiver to die > │ > > │ awx-test2-task 2023-07-20 11:00:49,827 INFO waiting for superwatcher, > dispatcher, callback-receiver to die > │ > > │ awx-test2-task 2023-07-20 11:00:49,829 WARNING > [24ff42c8c9c64921a6097197bec680a3] awx.main.dispatch received SIGTERM, > stopping > │ > > │ awx-test2-task 2023-07-20 11:00:49,828 WARNING [-] > awx.main.commands.run_callback_receiver received SIGTERM, stopping > │ > > │ awx-test2-task 2023-07-20 11:00:49,893 WARNING > [24ff42c8c9c64921a6097197bec680a3] awx.main.tasks.system Normal shutdown > signal for instance awx-test2-task-584bdc44f5-qfs4d, removed self from > capacit │ > > │ awx-test2-task 2023-07-20 11:00:50,432 INFO stopped: dispatcher (exit > status 0) > > -- > You received this message because you are subscribed to the Google Groups > "Ansible Project" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/ansible-project/3cfc4845-fbdb-4355-8132-61db45066fb9n%40googlegroups.com.
-- You received this message because you are subscribed to the Google Groups "Ansible Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/CAF8BbLbpdOVbY2m_1_qYi035%2Bj-RKH8XLCb8E_wackcCSfW_HA%40mail.gmail.com.
