retitle 857573 No longer umounts AoE/NBD-based file systems, causing data loss
severity 857573 grave
thanks

Since this is a regression to the jessie version and might cause
data loss, I have no other choice than raising the severity.

Steps to produce:

* On another system in the same (ethernet broadcast) domain, create
  a blob, create a file system on it, and export it using vblade(1):

    fallocate --length 16m /tmp/blob
    mkfs.ext4 /tmp/blob
    vblade 165 1 eth0 /tmp/blob

  assuming eth0 is the network interface. The shelf and slot numbers
  (165, 1) may be somewhat arbitrary but must be unique in the domain.

* On the actual system, enable AoE and mount the file system:

    modprobe aoe
    mkdir -p /mnt/share
    mount /dev/etherd/e165.1 /mnt/share

* Reboot


Expected behaviour, and seen on jessie:

System reboots without any notable unusual activity, after reboot the
filesystem on that device is clean.


Observed behaviour on stretch:

Shutdown stalls for a three minutes as systemd cannot umount the
filesystem ("A stop job is running for /mnt/share"). Another mount after
reboot indicates the filesystem wasn't cleanly umounted:

| EXT4-fs (etherd!e165.1): recovery complete


As said before, the most likely cause is the old init script
/etc/init.d/networking is no longer executed, that script has a
detection whether AoE (or something similar) is in use and skips
network deconfiration then.

Using swap over AoE would probably make the system hang, I haven't
tested that, though.

The thing that puzzles me here: That same check also handles nfs and
cifs mounts, but appearently they are still umounted in a sane way
in stretch.

Using network block device (NBD) however, things are even worse: Upon
shutdown, the kernel triggers a BUG and the system hangs:

         Unmounting /mnt/share...
   ------------[ cut here ]------------
   kernel BUG at /build/linux-9r9Ph5/linux-4.9.18/fs/buffer.c:3060!
   invalid opcode: 0000 [#1] SMP
   Modules linked in: nbd snd_hda_codec_generic crct10dif_pclmul ppdev 
crc32_pclmul ghash_clmulni_intel pcspkr evdev joydev serio_raw virtio_balloon 
virtio_console snd_hda_intel qxl snd_hda_codec sg ttm snd_hda_core 
drm_kms_helper snd_hwdep snd_pcm parport_pc acpi_cpufreq parport tpm_tis 
tpm_tis_core snd_timer tpm drm snd button soundcore sunrpc ip_tables x_tables 
autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache hid_generic usbhid 
hid aoe sr_mod cdrom ata_generic virtio_net virtio_blk crc32c_intel aesni_intel 
aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse ata_piix 
ehci_pci uhci_hcd ehci_hcd libata floppy usbcore usb_common virtio_pci 
virtio_ring virtio i2c_piix4 scsi_mod
   CPU: 0 PID: 984 Comm: umount Not tainted 4.9.0-2-amd64 #1 Debian 4.9.18-1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 
04/01/2014
   task: ffff92df39eec040 task.stack: ffffb5950045c000
   RIP: 0010:[<ffffffff8b43a133>]  [<ffffffff8b43a133>] 
submit_bh_wbc+0x173/0x1d0
   RSP: 0018:ffffb5950045fdc0  EFLAGS: 00010246
   RAX: 0000000000000005 RBX: ffff92df3a313b60 RCX: 0000000000000000
   RDX: ffff92df3a313b60 RSI: 0000000000000148 RDI: 0000000000000001
   RBP: 0000000000000148 R08: 0000000000000000 R09: 0000000000000001
   R10: 000000000000145c R11: 000000000001c2a1 R12: 0000000000000001
   R13: ffff92df3d184400 R14: 0000000000002176 R15: ffff92df3d1d1800
   FS:  00007f1d984822c0(0000) GS:ffff92df3fc00000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 00007f1d97cc4d00 CR3: 0000000039e01000 CR4: 00000000000406f0
   Stack:
    ffff92df3a313b60 0000000000000148 0000000000000001 ffff92df3d184400
    0000000000002176 ffff92df3d1d1800 ffffffff8b43c7ee ffffffff8bf11580
    ffff92df3a313b60 ffffffffc0526b6d 0000000000002175 ffff92df3d138800
   Call Trace:
    [<ffffffff8b43c7ee>] ? __sync_dirty_buffer+0x4e/0xf0
    [<ffffffffc0526b6d>] ? ext4_commit_super+0x20d/0x2b0 [ext4]
    [<ffffffffc0527ae3>] ? ext4_put_super+0xd3/0x3a0 [ext4]
    [<ffffffff8b404459>] ? generic_shutdown_super+0x69/0xf0
    [<ffffffff8b4047a1>] ? kill_block_super+0x21/0x60
    [<ffffffff8b4048a4>] ? deactivate_locked_super+0x34/0x60
    [<ffffffff8b42370b>] ? cleanup_mnt+0x3b/0x80
    [<ffffffff8b294af9>] ? task_work_run+0x79/0xa0
    [<ffffffff8b203284>] ? exit_to_usermode_loop+0xa4/0xb0
    [<ffffffff8b203a94>] ? syscall_return_slowpath+0x54/0x60
    [<ffffffff8b7fb108>] ? system_call_fast_compare_end+0x99/0x9b
   Code: 1f 09 c7 41 09 fc 48 89 ef 44 89 65 14 e8 96 13 0c 00 5b 31 c0 5d 41 
5c 41 5d 41 5e 41 5f c3 3e 80 62 01 f7 e9 fb fe ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 
0f 0b 0f 0b 80 3d 83 bb a4 00 00 75 ac be 1b 
   RIP  [<ffffffff8b43a133>] submit_bh_wbc+0x173/0x1d0
    RSP <ffffb5950045fdc0>
   ---[ end trace d8257e9f866874aa ]---


Overall, my main interest is to get this fixed. I happen to maintain
the AoE userland (client and server) in Debian and am willing to handle
the issue there, I'd however need an idea how this should be done. Also,
that approach would still require dealing with all other network
based file systems as well.

Regards,

    Christoph

Attachment: signature.asc
Description: Digital signature

Reply via email to