Hi Torkil,

Possible that you are hitting balancer issues on 19.2.0 for clusters with 
larger pg numbers: https://tracker.ceph.com/issues/68657
Try turning it off with ceph balancer off

Best,
Laimis J.

> On 17 Dec 2024, at 13:15, Torkil Svensgaard <[email protected]> wrote:
> 
> 
> 
> On 17/12/2024 12:05, Torkil Svensgaard wrote:
>> Hi
>> Running upgrade from 18.2.4 to 19.2.0 and it managed to upgrade the managers 
>> but no further progress.
> 
> Now it actually seems to have upgraded 1 MON now then the orchestrator 
> crashed again:
> 
> "
> {
>    "mon": {
>       "ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef 
> (stable)": 4,
>        "ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid 
> (stable)": 1
>    },
>    "mgr": {
>        "ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid 
> (stable)": 3
>    },
>    "osd": {
>       "ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef 
> (stable)": 548
>    },
>    "mds": {
>       "ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef 
> (stable)": 3
>    },
>    "overall": {
>        "ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef 
> (stable)": 555,
>        "ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid 
> (stable)": 4
>    }
> }
> "
> 
> Mvh.
> 
> Torkil
> 
> 
>> If I fail over the mgr it goes:
>> "
>> [root@ceph-flash1 ~]# ceph orch upgrade status
>> Error ENOTSUP: Module 'orchestrator' is not enabled/loaded (required by 
>> command 'orch upgrade status'): use `ceph mgr module enable orchestrator` to 
>> enable it
>> "
>> From mgr log:
>> "
>> ...
>> 2024-12-17T10:43:11.729+0000 7f70efafe640  0 log_channel(audit) log [DBG] : 
>> from='client.2110010386 -' entity='client.admin' cmd=[{"prefix": "orch 
>> upgrade status", "target": ["mon-mgr", ""]}]: dispatch
>> 2024-12-17T10:43:11.733+0000 7f70ebaf6640  0 [cephadm INFO cherrypy.error] 
>> [17/Dec/2024:10:43:11] ENGINE Bus STARTING
>> 2024-12-17T10:43:11.733+0000 7f70ebaf6640  0 log_channel(cephadm) log [INF] 
>> : [17/Dec/2024:10:43:11] ENGINE Bus STARTING
>> 2024-12-17T10:43:11.811+0000 7f70e7aee640  0 [dashboard INFO 
>> dashboard.module] Engine started.
>> 2024-12-17T10:43:11.861+0000 7f70ebaf6640  0 [cephadm INFO cherrypy.error] 
>> [17/Dec/2024:10:43:11] ENGINE Serving on 
>> https://www.google.com/url?q=https://172.21.15.148:7150&source=gmail-imap&ust=1735039047000000&usg=AOvVaw3LyWY24vMZA-AbVVOsv3Z9
>> 2024-12-17T10:43:11.861+0000 7f70ebaf6640  0 log_channel(cephadm) log [INF] 
>> : [17/Dec/2024:10:43:11] ENGINE Serving on 
>> https://www.google.com/url?q=https://172.21.15.148:7150&source=gmail-imap&ust=1735039047000000&usg=AOvVaw3LyWY24vMZA-AbVVOsv3Z9
>> 2024-12-17T10:43:11.864+0000 7f70a2d7a640  0 [cephadm ERROR cherrypy.error] 
>> [17/Dec/2024:10:43:11] ENGINE Error in HTTPServer.serve
>> Traceback (most recent call last):
>>   File "/lib/python3.9/site-packages/cheroot/server.py", line 1823, in serve
>>     self._connections.run(self.expiration_interval)
>>   File "/lib/python3.9/site-packages/cheroot/connections.py", line 203, in 
>> run
>>     self._run(expiration_interval)
>>   File "/lib/python3.9/site-packages/cheroot/connections.py", line 246, in 
>> _run
>>     new_conn = self._from_server_socket(self.server.socket)
>>   File "/lib/python3.9/site-packages/cheroot/connections.py", line 300, in 
>> _from_server_socket
>>     s, ssl_env = self.server.ssl_adapter.wrap(s)
>>   File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line 277, in 
>> wrap
>>     s = self.context.wrap_socket(
>>   File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
>>     return self.sslsocket_class._create(
>>   File "/lib64/python3.9/ssl.py", line 1074, in _create
>>     self.do_handshake()
>>   File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
>>     self._sslobj.do_handshake()
>> ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) 
>> (_ssl.c:1133)
>> 2024-12-17T10:43:11.865+0000 7f70a2d7a640 -1 log_channel(cephadm) log [ERR] 
>> : [17/Dec/2024:10:43:11] ENGINE Error in HTTPServer.serve
>> Traceback (most recent call last):
>>   File "/lib/python3.9/site-packages/cheroot/server.py", line 1823, in serve
>>     self._connections.run(self.expiration_interval)
>>   File "/lib/python3.9/site-packages/cheroot/connections.py", line 203, in 
>> run
>>     self._run(expiration_interval)
>>   File "/lib/python3.9/site-packages/cheroot/connections.py", line 246, in 
>> _run
>>     new_conn = self._from_server_socket(self.server.socket)
>>   File "/lib/python3.9/site-packages/cheroot/connections.py", line 300, in 
>> _from_server_socket
>>     s, ssl_env = self.server.ssl_adapter.wrap(s)
>>   File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line 277, in 
>> wrap
>>     s = self.context.wrap_socket(
>>   File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
>>     return self.sslsocket_class._create(
>>   File "/lib64/python3.9/ssl.py", line 1074, in _create
>>     self.do_handshake()
>>   File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
>>     self._sslobj.do_handshake()
>> ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) 
>> (_ssl.c:1133)
>> 2024-12-17T10:43:11.963+0000 7f70ebaf6640  0 [cephadm INFO cherrypy.error] 
>> [17/Dec/2024:10:43:11] ENGINE Serving on 
>> https://www.google.com/url?q=http://172.21.15.148:8765&source=gmail-imap&ust=1735039047000000&usg=AOvVaw1D05c8loKEwXnozNdlOMpU
>> 2024-12-17T10:43:11.963+0000 7f70ebaf6640  0 log_channel(cephadm) log [INF] 
>> : [17/Dec/2024:10:43:11] ENGINE Serving on 
>> https://www.google.com/url?q=http://172.21.15.148:8765&source=gmail-imap&ust=1735039047000000&usg=AOvVaw1D05c8loKEwXnozNdlOMpU
>> 2024-12-17T10:43:11.963+0000 7f70ebaf6640  0 [cephadm INFO cherrypy.error] 
>> [17/Dec/2024:10:43:11] ENGINE Bus STARTED
>> 2024-12-17T10:43:11.964+0000 7f70ebaf6640  0 log_channel(cephadm) log [INF] 
>> : [17/Dec/2024:10:43:11] ENGINE Bus STARTED
>> ...
>> "
>> It will recover after some timeout, maybe 5-10 mins, and then just sit there 
>> with no upgrade progress.
>> Nothing in mgr/cephadm/osd_remove_queue.
>> Suggestions?
>> Mvh.
>> Torkil
> 
> -- 
> Torkil Svensgaard
> Sysadmin
> MR-Forskningssektionen, afs. 714
> DRCMR, Danish Research Centre for Magnetic Resonance
> Hvidovre Hospital
> Kettegård Allé 30
> DK-2650 Hvidovre
> Denmark
> Tel: +45 386 22828
> E-mail: [email protected]
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to