[ceph-users] Re: Ceph orch commands non-responsive after mgr/mon reboots 16.2.9

Tim Olow Mon, 25 Jul 2022 10:31:09 -0700

I just wanted to follow up on this issue as it corrected itself today.  I 
started a drain/remove on two hosts a few weeks back, after the rolling restart 
of mgr/mon on the cluster it seems that the ops queue either became locked or 
overwhelmed with requests.  I had a degraded PG during the rolling reboot of 
the mon/mgr and that seems to have blocked ceph orch, balancer, 
autoscale-status cli commands from returning.  I could see in the manager debug 
logs that balancer was indeed running and returning results internally from the 
internal scheduled process but the cli would hang indefinitely.   This morning 
the last degraded/offline PG got resolved and all commands are running again.


Moving forward is there a method to view the ops queue or monitor if the queue 
gets full and starts to deprioritize CLI commands?

Tim


On 7/22/22, 6:32 PM, "Tim Olow" <[email protected]> wrote:

    Howdy,
    
    I seem to be facing a problem on my 16.2.9 ceph cluster.  After a staggered 
reboot of my 3 infra nodes all of ceph orch commands are hanging much like in 
this previous reported issue [1]
    
    I have paused orch and rebuilt a manager by hand as outlined here [2], and 
the issue continues to persist.   I am unable to scale up or down of services, 
restart daemons, etc.
    
    ceph orch ls –verbose
    <snip>
    [{'flags': 8,
      'help': 'List services known to orchestrator',
      'module': 'mgr',
      'perm': 'r',
      'sig': [argdesc(<class 'ceph_argparse.CephPrefix'>, req=True, 
name=prefix, n=1, numseen=0, prefix=orch),
              argdesc(<class 'ceph_argparse.CephPrefix'>, req=True, 
name=prefix, n=1, numseen=0, prefix=ls),
              argdesc(<class 'ceph_argparse.CephString'>, req=False, 
name=service_type, n=1, numseen=0),
              argdesc(<class 'ceph_argparse.CephString'>, req=False, 
name=service_name, n=1, numseen=0),
              argdesc(<class 'ceph_argparse.CephBool'>, req=False, name=export, 
n=1, numseen=0),
              argdesc(<class 'ceph_argparse.CephChoices'>, req=False, 
name=format, n=1, numseen=0, 
strings=plain|json|json-pretty|yaml|xml-pretty|xml),
              argdesc(<class 'ceph_argparse.CephBool'>, req=False, 
name=refresh, n=1, numseen=0)]}]
    Submitting command:  {'prefix': 'orch ls', 'target': ('mon-mgr', '')}
    submit {"prefix": "orch ls", "target": ["mon-mgr", ""]} to mon-mgr
    
    <hang>
    
    
    Debug output on the manager:
    
    debug 2022-07-22T23:27:12.509+0000 7fc180230700  0 log_channel(audit) log 
[DBG] : from='client.1084220 -' entity='client.admin' cmd=[{"prefix": "orch 
ls", "target": ["mon-mgr", ""]}]: dispatch
    
    I have collected a startup of the manager and uploaded it for review [3]
    
    
    Many Thanks,
    
    Tim
    
    
    [1] https://www.spinics.net/lists/ceph-users/msg68398.html
    [2] 
https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
    [3] https://pastebin.com/Dvb8sEbz
    
    _______________________________________________
    ceph-users mailing list -- [email protected]
    To unsubscribe send an email to [email protected]
    

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Ceph orch commands non-responsive after mgr/mon reboots 16.2.9

Reply via email to