Package: ocfs2-tools Version: 1.8.5-7 Severity: normal Hi,
I have two up-to-date Debian systems, with drbd and ocfs2. There is only one drbd device, which configured as Primary/Primary. Everything works as well, except one thing: the system can't mount the drbd device at boot, I have to do it by hand. Relevant configs: # cat /etc/drbd.d/r0.res resource r0 { meta-disk internal; device /dev/drbd0; syncer { verify-alg sha1; } net { allow-two-primaries; } startup { become-primary-on both; } on t2app1 { disk /dev/vg-t2app1/lvdrbd0; address 192.168.72.21:7789; } on t2app2 { disk /dev/vg-t2app2/lvdrbd0; address 192.168.72.22:7789; } } # cat /etc/ocfs2/cluster.conf cluster: node_count = 2 name = data node: ip_port = 7777 ip_address = 192.168.72.21 number = 1 name = t2app1 cluster = data node: ip_port = 7777 ip_address = 192.168.72.22 number = 2 name = t2app2 cluster = data # grep drbd /etc/fstab /dev/drbd0 /drbd ocfs2 _netdev,defaults 0 0 With these settings (total equals...) on Debian 9, there the mount works as well, without any interruption. Additional infos: Status of services after reboot: # systemctl status drbd.mount ● drbd.mount - /drbd Loaded: loaded (/etc/fstab; generated) Active: failed (Result: exit-code) since Wed 2019-09-04 15:13:49 CEST; 16s ago Where: /drbd What: /dev/drbd0 Docs: man:fstab(5) man:systemd-fstab-generator(8) Sep t 04 15:13:49 t2app1 systemd[1]: Mounting /drbd... Sep t 04 15:13:49 t2app1 mount[720]: mount.ocfs2: I/O error on channel while opening device /dev/drbd0 Sep t 04 15:13:49 t2app1 systemd[1]: drbd.mount: Mount process exited, code=exited, status=1/FAILURE Sep t 04 15:13:49 t2app1 systemd[1]: drbd.mount: Failed with result 'exit-code'. Sep t 04 15:13:49 t2app1 systemd[1]: Failed to mount /drbd. # systemctl status drbd.service ● drbd.service - LSB: Control DRBD resources. Loaded: loaded (/etc/init.d/drbd; generated) Active: active (exited) since Wed 2019-09-04 15:13:53 CEST; 41s ago Docs: man:systemd-sysv-generator(8) Process: 686 ExecStart=/etc/init.d/drbd start (code=exited, status=0/SUCCESS) Sep t 04 15:13:49 t2app1 systemd[1]: Starting LSB: Control DRBD resources.... Sep t 04 15:13:49 t2app1 drbd[686]: Starting DRBD resources:[ Sep t 04 15:13:49 t2app1 drbd[686]: create res: r0 Sep t 04 15:13:49 t2app1 drbd[686]: prepare disk: r0 Sep t 04 15:13:49 t2app1 drbd[686]: adjust disk: r0 Sep t 04 15:13:49 t2app1 drbd[686]: adjust net: r0 Sep t 04 15:13:49 t2app1 drbd[686]: ] Sep t 04 15:13:53 t2app1 drbd[686]: WARN: stdin/stdout is not a TTY; using /dev/console. Sep t 04 15:13:53 t2app1 systemd[1]: Started LSB: Control DRBD resources.. # systemctl status ocfs2.service ● ocfs2.service - Mount ocfs2 Filesystems Loaded: loaded (/lib/systemd/system/ocfs2.service; enabled; vendor preset: enabled) Active: active (exited) since Wed 2019-09-04 15:13:49 CEST; 1min 40s ago Docs: man:ocfs2(7) man:mount.ocfs2(8) Process: 796 ExecStart=/usr/lib/ocfs2-tools/ocfs2 start (code=exited, status=0/SUCCESS) Main PID: 796 (code=exited, status=0/SUCCESS) Sep t 04 15:13:49 t2app1 systemd[1]: Starting Mount ocfs2 Filesystems... Sep t 04 15:13:49 t2app1 ocfs2[796]: Starting Oracle Cluster File System (OCFS2) mount.ocfs2: I/O error on channel while opening device /dev/drbd0 Sep t 04 15:13:49 t2app1 ocfs2[796]: Failed Sep t 04 15:13:49 t2app1 systemd[1]: Started Mount ocfs2 Filesystems. # grep drbd /var/log/syslog Sep 4 15:13:29 t2app1 systemd[1]: dev-drbd0.device: Dependency Before=network-online.target ignored (.device units cannot be delayed) Sep 4 15:13:29 t2app1 systemd[1]: dev-drbd0.device: Dependency Before=network.target ignored (.device units cannot be delayed) Sep 4 15:13:31 t2app1 systemd[1]: Unmounting /drbd... Sep 4 15:13:31 t2app1 systemd[1021]: drbd.mount: Succeeded. Sep 4 15:13:49 t2app1 systemd-modules-load[373]: Inserted module 'drbd' Sep 4 15:13:49 t2app1 kernel: [ 3.422172] drbd: initialized. Version: 8.4.10 (api:1/proto:86-101) Sep 4 15:13:49 t2app1 kernel: [ 3.422173] drbd: srcversion: 9B4D87C5E865DF526864868 Sep 4 15:13:49 t2app1 kernel: [ 3.422174] drbd: registered as block device major 147 Sep 4 15:13:49 t2app1 drbd[686]: Starting DRBD resources:[ Sep 4 15:13:49 t2app1 drbd[686]: create res: r0 Sep 4 15:13:49 t2app1 drbd[686]: prepare disk: r0 Sep 4 15:13:49 t2app1 systemd[1]: Found device /dev/drbd0. Sep 4 15:13:49 t2app1 systemd[1]: Mounting /drbd... Sep 4 15:13:49 t2app1 mount[720]: mount.ocfs2: I/O error on channel while opening device /dev/drbd0 Sep 4 15:13:49 t2app1 systemd[1]: drbd.mount: Mount process exited, code=exited, status=1/FAILURE Sep 4 15:13:49 t2app1 systemd[1]: drbd.mount: Failed with result 'exit-code'. Sep 4 15:13:49 t2app1 systemd[1]: Failed to mount /drbd. Sep 4 15:13:49 t2app1 kernel: [ 3.964487] drbd r0: Starting worker thread (from drbdsetup-84 [727]) Sep 4 15:13:49 t2app1 kernel: [ 3.964926] block drbd0: disk( Diskless -> Attaching ) Sep 4 15:13:49 t2app1 kernel: [ 3.964984] drbd r0: Method to ensure write ordering: flush Sep 4 15:13:49 t2app1 kernel: [ 3.964986] block drbd0: max BIO size = 1048576 Sep 4 15:13:49 t2app1 kernel: [ 3.964988] block drbd0: drbd_bm_resize called with capacity == 20970808 Sep 4 15:13:49 t2app1 kernel: [ 3.965029] block drbd0: resync bitmap: bits=2621351 words=40959 pages=80 Sep 4 15:13:49 t2app1 kernel: [ 3.965030] block drbd0: size = 10 GB (10485404 KB) Sep 4 15:13:49 t2app1 kernel: [ 3.967075] block drbd0: recounting of set bits took additional 0 jiffies Sep 4 15:13:49 t2app1 kernel: [ 3.967077] block drbd0: 8192 KB (2048 bits) marked out-of-sync by on disk bit-map. Sep 4 15:13:49 t2app1 kernel: [ 3.967080] block drbd0: disk( Attaching -> UpToDate ) Sep 4 15:13:49 t2app1 kernel: [ 3.967082] block drbd0: attached to UUIDs 0756A08452088D5B:0000000000000000:724560592EBD27FB:724460592EBD27FB Sep 4 15:13:49 t2app1 drbd[686]: adjust disk: r0 Sep 4 15:13:49 t2app1 kernel: [ 3.969313] drbd r0: conn( StandAlone -> Unconnected ) Sep 4 15:13:49 t2app1 kernel: [ 3.969324] drbd r0: Starting receiver thread (from drbd_w_r0 [730]) Sep 4 15:13:49 t2app1 kernel: [ 3.969443] drbd r0: receiver (re)started Sep 4 15:13:49 t2app1 kernel: [ 3.969450] drbd r0: conn( Unconnected -> WFConnection ) Sep 4 15:13:49 t2app1 drbd[686]: adjust net: r0 Sep 4 15:13:49 t2app1 drbd[686]: ] Sep 4 15:13:49 t2app1 ocfs2[796]: Starting Oracle Cluster File System (OCFS2) mount.ocfs2: I/O error on channel while opening device /dev/drbd0 Sep 4 15:13:53 t2app1 kernel: [ 8.202810] drbd r0: Handshake successful: Agreed network protocol version 101 Sep 4 15:13:53 t2app1 kernel: [ 8.202812] drbd r0: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME. Sep 4 15:13:53 t2app1 kernel: [ 8.202852] drbd r0: conn( WFConnection -> WFReportParams ) Sep 4 15:13:53 t2app1 kernel: [ 8.202853] drbd r0: Starting ack_recv thread (from drbd_r_r0 [741]) Sep 4 15:13:53 t2app1 kernel: [ 8.238525] block drbd0: drbd_sync_handshake: Sep 4 15:13:53 t2app1 kernel: [ 8.238527] block drbd0: self 0756A08452088D5A:0000000000000000:724560592EBD27FB:724460592EBD27FB bits:2048 flags:0 Sep 4 15:13:53 t2app1 kernel: [ 8.238529] block drbd0: peer 2F26127377330A65:0756A08452088D5B:724560592EBD27FB:724460592EBD27FB bits:1 flags:0 Sep 4 15:13:53 t2app1 kernel: [ 8.238530] block drbd0: uuid_compare()=-1 by rule 50 Sep 4 15:13:53 t2app1 kernel: [ 8.238532] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate ) Sep 4 15:13:53 t2app1 kernel: [ 8.246935] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0% Sep 4 15:13:53 t2app1 kernel: [ 8.246982] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 32(1), total 32; compression: 100.0% Sep 4 15:13:53 t2app1 kernel: [ 8.246984] block drbd0: conn( WFBitMapT -> WFSyncUUID ) Sep 4 15:13:53 t2app1 kernel: [ 8.251561] block drbd0: updated sync uuid 0757A08452088D5A:0000000000000000:724560592EBD27FB:724460592EBD27FB Sep 4 15:13:53 t2app1 kernel: [ 8.251920] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Sep 4 15:13:53 t2app1 kernel: [ 8.252922] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) Sep 4 15:13:53 t2app1 kernel: [ 8.252926] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) Sep 4 15:13:53 t2app1 kernel: [ 8.252935] block drbd0: Began resync as SyncTarget (will sync 8192 KB [2048 bits set]). Sep 4 15:13:53 t2app1 kernel: [ 8.253390] block drbd0: role( Secondary -> Primary ) Sep 4 15:13:53 t2app1 drbd[686]: WARN: stdin/stdout is not a TTY; using /dev/console. Sep 4 15:13:58 t2app1 kernel: [ 12.624923] block drbd0: Resync done (total 4 sec; paused 0 sec; 2048 K/sec) Sep 4 15:13:58 t2app1 kernel: [ 12.624926] block drbd0: updated UUIDs 2F26127377330A65:0000000000000000:0757A08452088D5B:0756A08452088D5B Sep 4 15:13:58 t2app1 kernel: [ 12.624930] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Sep 4 15:13:58 t2app1 kernel: [ 12.625225] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 Sep 4 15:13:58 t2app1 kernel: [ 12.626743] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) # mount | grep drbd [EMPTY] Workaround 1: Add some delay to ocfs2.service start: # diff -ruN ocfs2.service /lib/systemd/system/ocfs2.service --- ocfs2.service 2019-09-04 14:43:55.613155935 +0200 +++ /lib/systemd/system/ocfs2.service 2019-09-04 15:21:34.502068371 +0200 @@ -7,6 +7,7 @@ [Service] Type=oneshot RemainAfterExit=yes +ExecStartPre=/bin/sleep 3 ExecStart=/usr/lib/ocfs2-tools/ocfs2 start ExecStop=/usr/lib/ocfs2-tools/ocfs2 stop ExecReload=/usr/lib/ocfs2-tools/ocfs2 restart Workaround 2: create an rc.local systemd unit, and place a 'mount -a' command at there: # cat <<EOF >> /etc/systemd/system/rc-local.service [Unit] Description=/etc/rc.local ConditionPathExists=/etc/rc.local [Service] Type=forking ExecStart=/etc/rc.local start TimeoutSec=0 StandardOutput=tty RemainAfterExit=yes SysVStartPriority=99 [Install] WantedBy=multi-user.target EOF cat <<EOF >> /etc/rc.local #!/bin/sh -e # # rc.local # # This script is executed at the end of each multiuser runlevel. # Make sure that the script will "exit 0" on success or any other # value on error. # # In order to enable or disable this script just change the # execution # bits. # # By default this script does nothing. /bin/sleep 3 && mount -a exit 0 EOF # chmod +x /etc/rc.local # systemctl enable rc-local # reboot Hope this helps, a.