Public bug reported:

[Impact]

The `osd_fast_shutdown` option (available in Nautilus and enabled
by default) may cause the cluster log to receive too many entries
of `osd.X reported immediately failed by osd.Y` on large clusters.

This happens as the monitor no longer receives the OSD message to
notify that the OSD is shutting down and now relies on other OSDs
telling it about the failed OSD (not really 'failed' but shutdown.)

This might be an issue for LMA stacks/tools that check ceph logs
for failed lines, and then require additional logic to filter on
an intended OSD (fast) shutdown; might not be an option/possible,
and require an admin to analyze.

[Fix]

The `osd_fast_shutdown_notify_mon` option for OSDs (disabled by
default) can tell the monitor it is shutting down (done in slow/
non-fast shutdown) under `osd_fast_shutdown`.

This introduces minimal delay (the ack from the mon is required
to prevent the messages), and addresses the cluster log issue.
PS: the `osd_mon_shutdown_timeout` option can be used to control
the maximum amount of time waiting for the monitor ack to arrive.

The new option should be available in the following Ceph releases:
- Pacific 16.2.0 [1] [Hirsute+]
- Octopus 15.2.11 [2] [Focal/Groovy; Ussuri+]
- Nautilus TBD (at least 14.2.20) [3] [Eoan is EOL; Train]

This bug tracks the release of this patch in Ubuntu/Cloud Archive.

[Test Case]

- Stop an OSD and watch the OSD and MON logs.

- Before / or with `osd_fast_shutdown_notify_mon = false`:

```
osd log:

2021-01-09T18:59:52.448+0000 7f937fcdc700 -1 received  signal: Terminated from 
-bash  (PID: 408) UID: 1000
2021-01-09T18:59:52.448+0000 7f937fcdc700 -1 osd.2 22 *** Got signal Terminated 
***
2021-01-09T18:59:52.448+0000 7f937fcdc700 -1 osd.2 22 *** Immediate shutdown 
(osd_fast_shutdown=true) ***

mon log:

$ cat out/mon.a.log | grep '^2021-01-09T18:59:' | grep 'osd.0 reported 
immediately failed by osd.' | rev | cut -d: -f1 | rev | sort | uniq -c
      4  osd.0 reported immediately failed by osd.1
      4  osd.0 reported immediately failed by osd.2
      4  osd.0 reported immediately failed by osd.3
      4  osd.0 reported immediately failed by osd.4
      4  osd.0 reported immediately failed by osd.5
      4  osd.0 reported immediately failed by osd.6
      4  osd.0 reported immediately failed by osd.7
      4  osd.0 reported immediately failed by osd.8
      4  osd.0 reported immediately failed by osd.9
```

- After / with `osd_fast_shutdown_notify_mon = true`:

```
osd log:

2021-01-14T17:21:10.825+0000 7feceded1700 -1 received  signal: Terminated from 
-bash  (PID: 1750) UID: 1000
2021-01-14T17:21:10.825+0000 7feceded1700 -1 osd.0 80 *** Got signal Terminated 
***
2021-01-14T17:21:10.825+0000 7feceded1700 -1 osd.0 80 *** Immediate shutdown 
(osd_fast_shutdown=true) ***
2021-01-14T17:21:10.825+0000 7feceded1700  0 osd.0 80 prepare_to_stop telling 
mon we are shutting down
...
2021-01-14T17:21:11.021+0000 7fecdac0c700  0 osd.0 80 got_stop_ack starting 
shutdown
2021-01-14T17:21:11.021+0000 7feceded1700  0 osd.0 80 prepare_to_stop starting 
shutdown

mon log:

2021-01-14T17:21:10.829+0000 7f62fce61700  0 log_channel(cluster) log [INF] : 
osd.0 marked itself down
2021-01-14T17:21:10.885+0000 7f62ff666700  1 mon.a@0(leader).osd e80 do_prune 
osdmap full prune enabled
2021-01-14T17:21:10.889+0000 7f62ff666700  0 log_channel(cluster) log [WRN] : 
Health check failed: 1 osds down (OSD_DOWN)
2021-01-14T17:21:10.957+0000 7f62fb65e700  1 mon.a@0(leader).osd e81 e81: 10 
total, 9 up, 10 in
2021-01-14T17:21:11.013+0000 7f62fb65e700  0 log_channel(cluster) log [DBG] : 
osdmap e81: 10 total, 9 up, 10 in
```

[Where problems could occur]

Any regression from this patch should manifest in OSD shutdown, but only
when the option is enabled.

The patch is quite small and contained to the OSD shutdown path.

It is effectively a nop when the option is disabled (by default).

It is a small change from the newly introduced default behavior,
but it just re-introduces a message in the shutdown path, which
is how it used to be done on previous releases and even earlier
stable releases in Nautilus.

[1] https://tracker.ceph.com/issues/49683
[2] https://tracker.ceph.com/issues/49681
[3] https://tracker.ceph.com/issues/49682

** Affects: cloud-archive
     Importance: Undecided
         Status: Invalid

** Affects: cloud-archive/train
     Importance: Undecided
         Status: Confirmed

** Affects: cloud-archive/ussuri
     Importance: Undecided
         Status: Confirmed

** Affects: cloud-archive/victoria
     Importance: Undecided
         Status: Invalid

** Affects: cloud-archive/wallaby
     Importance: Undecided
         Status: Invalid

** Affects: ceph (Ubuntu)
     Importance: Undecided
         Status: Confirmed

** Affects: ceph (Ubuntu Focal)
     Importance: Undecided
         Status: Confirmed

** Affects: ceph (Ubuntu Groovy)
     Importance: Undecided
         Status: Confirmed

** Affects: ceph (Ubuntu Hirsute)
     Importance: Undecided
         Status: Confirmed

** Also affects: ceph (Ubuntu Hirsute)
   Importance: Undecided
       Status: New

** Also affects: ceph (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: ceph (Ubuntu Groovy)
   Importance: Undecided
       Status: New

** Also affects: cloud-archive
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/ussuri
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/wallaby
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/victoria
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/train
   Importance: Undecided
       Status: New

** Description changed:

  [Impact]
  
  The `osd_fast_shutdown` option (available in Nautilus and enabled
  by default) may cause the cluster log to receive too many entries
  of `osd.X reported immediately failed by osd.Y` on large clusters.
  
  This happens as the monitor no longer receives the OSD message to
  notify that the OSD is shutting down and now relies on other OSDs
  telling it about the failed OSD (not really 'failed' but shutdown.)
  
  This might be an issue for LMA stacks/tools that check ceph logs
  for failed lines, and then require additional logic to filter on
  an intended OSD (fast) shutdown; might not be an option/possible,
  and require an admin to analyze.
  
  [Fix]
  
  The `osd_fast_shutdown_notify_mon` option for OSDs (disabled by
  default) can tell the monitor it is shutting down (done in slow/
  non-fast shutdown) under `osd_fast_shutdown`.
  
  This introduces minimal delay (the ack from the mon is required
  to prevent the messages), and addresses the cluster log issue.
  PS: the `osd_mon_shutdown_timeout` option can be used to control
  the maximum amount of time waiting for the monitor ack to arrive.
  
  The new option should be available in the following Ceph releases:
- - Pacific 16.2.0 [1]
- - Octopus 15.2.11 [2]
- - Nautilus TBD (at least 14.2.20)
+ - Pacific 16.2.0 [1] [Hirsute+]
+ - Octopus 15.2.11 [2] [Focal/Groovy; Ussuri+]
+ - Nautilus TBD (at least 14.2.20) [Eoan is EOL; Train]
  
  This bug tracks the release of this patch in Ubuntu/Cloud Archive.
  
  [Test Case]
  
  - Stop an OSD and watch the OSD and MON logs.
  
  - Before / or with `osd_fast_shutdown_notify_mon = false`:
  
  ```
  osd log:
  
  2021-01-09T18:59:52.448+0000 7f937fcdc700 -1 received  signal: Terminated 
from -bash  (PID: 408) UID: 1000
  2021-01-09T18:59:52.448+0000 7f937fcdc700 -1 osd.2 22 *** Got signal 
Terminated ***
  2021-01-09T18:59:52.448+0000 7f937fcdc700 -1 osd.2 22 *** Immediate shutdown 
(osd_fast_shutdown=true) ***
  
  mon log:
  
  $ cat out/mon.a.log | grep '^2021-01-09T18:59:' | grep 'osd.0 reported 
immediately failed by osd.' | rev | cut -d: -f1 | rev | sort | uniq -c
-       4  osd.0 reported immediately failed by osd.1
-       4  osd.0 reported immediately failed by osd.2
-       4  osd.0 reported immediately failed by osd.3
-       4  osd.0 reported immediately failed by osd.4
-       4  osd.0 reported immediately failed by osd.5
-       4  osd.0 reported immediately failed by osd.6
-       4  osd.0 reported immediately failed by osd.7
-       4  osd.0 reported immediately failed by osd.8
-       4  osd.0 reported immediately failed by osd.9
+       4  osd.0 reported immediately failed by osd.1
+       4  osd.0 reported immediately failed by osd.2
+       4  osd.0 reported immediately failed by osd.3
+       4  osd.0 reported immediately failed by osd.4
+       4  osd.0 reported immediately failed by osd.5
+       4  osd.0 reported immediately failed by osd.6
+       4  osd.0 reported immediately failed by osd.7
+       4  osd.0 reported immediately failed by osd.8
+       4  osd.0 reported immediately failed by osd.9
  ```
- 
  
  - After / with `osd_fast_shutdown_notify_mon = true`:
  
  ```
  osd log:
  
  2021-01-14T17:21:10.825+0000 7feceded1700 -1 received  signal: Terminated 
from -bash  (PID: 1750) UID: 1000
  2021-01-14T17:21:10.825+0000 7feceded1700 -1 osd.0 80 *** Got signal 
Terminated ***
  2021-01-14T17:21:10.825+0000 7feceded1700 -1 osd.0 80 *** Immediate shutdown 
(osd_fast_shutdown=true) ***
  2021-01-14T17:21:10.825+0000 7feceded1700  0 osd.0 80 prepare_to_stop telling 
mon we are shutting down
  ...
  2021-01-14T17:21:11.021+0000 7fecdac0c700  0 osd.0 80 got_stop_ack starting 
shutdown
  2021-01-14T17:21:11.021+0000 7feceded1700  0 osd.0 80 prepare_to_stop 
starting shutdown
  
  mon log:
  
  2021-01-14T17:21:10.829+0000 7f62fce61700  0 log_channel(cluster) log [INF] : 
osd.0 marked itself down
  2021-01-14T17:21:10.885+0000 7f62ff666700  1 mon.a@0(leader).osd e80 do_prune 
osdmap full prune enabled
  2021-01-14T17:21:10.889+0000 7f62ff666700  0 log_channel(cluster) log [WRN] : 
Health check failed: 1 osds down (OSD_DOWN)
  2021-01-14T17:21:10.957+0000 7f62fb65e700  1 mon.a@0(leader).osd e81 e81: 10 
total, 9 up, 10 in
  2021-01-14T17:21:11.013+0000 7f62fb65e700  0 log_channel(cluster) log [DBG] : 
osdmap e81: 10 total, 9 up, 10 in
  ```
  
  [Where problems could occur]
  
  Any regression from this patch should manifest in OSD shutdown, but only
  when the option is enabled.
  
  The patch is quite small and contained to the OSD shutdown path.
  
  It is effectively a nop when the option is disabled (by default).
  
  It is a small change from the newly introduced default behavior,
  but it just re-introduces a message in the shutdown path, which
  is how it used to be done on previous releases and even earlier
  stable releases in Nautilus.
  
  [1] https://tracker.ceph.com/issues/49683
  [2] https://tracker.ceph.com/issues/49681
  [3] https://tracker.ceph.com/issues/49682

** Changed in: cloud-archive/train
       Status: New => Confirmed

** Changed in: cloud-archive/ussuri
       Status: New => Confirmed

** Changed in: cloud-archive/victoria
       Status: New => Invalid

** Changed in: cloud-archive/wallaby
       Status: New => Invalid

** Changed in: ceph (Ubuntu Focal)
       Status: New => Confirmed

** Changed in: ceph (Ubuntu Groovy)
       Status: New => Confirmed

** Changed in: ceph (Ubuntu Hirsute)
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1922561

Title:
  osd shutdown may flood the cluster log with 'osd.X reported
  immediately failed by osd.Y'

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1922561/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to