[ceph-users] After power outage, osd do not restart

Patrick Begou Thu, 21 Sep 2023 03:41:59 -0700

Hi,

After a power outage on my test ceph cluster, 2 osd fail to restart. The log file show:


[email protected]: Failed with result 'timeout'.

Sep 21 11:55:02 mostha1 systemd[1]: Failed to start Ceph osd.2 for250f9864-0142-11ee-8e5f-00266cf8869c.Sep 21 11:55:12 mostha1 systemd[1]:[email protected]: ServiceRestartSec=10s expired, scheduling restart.Sep 21 11:55:12 mostha1 systemd[1]:[email protected]: Scheduledrestart job, restart counter is at 2.Sep 21 11:55:12 mostha1 systemd[1]: Stopped Ceph osd.2 for250f9864-0142-11ee-8e5f-00266cf8869c.Sep 21 11:55:12 mostha1 systemd[1]:[email protected]: Found left-overprocess 1858 (bash) in control group while starting unit. Ignoring.Sep 21 11:55:12 mostha1 systemd[1]: This usually indicates uncleantermination of a previous run, or service implementation deficiencies.Sep 21 11:55:12 mostha1 systemd[1]:[email protected]: Found left-overprocess 2815 (podman) in control group while starting unit. Ignoring.

This is not critical as it is a test cluster and it is actuallyrebalancing on other osd but I would like to know how to return toHEALTH_OK status.


Smartctl show the HDD are OK.

So is there a way to recover the osd from this state ? Version is15.2.17 (juste moved from 15.2.13 to 15.2.17 yesterday, will try to moveto latest versions as soon as this problem is solved)


Thanks

Patrick

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] After power outage, osd do not restart

Reply via email to