retitle 857573 No longer umounts AoE/NBD-based file systems, causing data loss severity 857573 grave thanks
Since this is a regression to the jessie version and might cause data loss, I have no other choice than raising the severity. Steps to produce: * On another system in the same (ethernet broadcast) domain, create a blob, create a file system on it, and export it using vblade(1): fallocate --length 16m /tmp/blob mkfs.ext4 /tmp/blob vblade 165 1 eth0 /tmp/blob assuming eth0 is the network interface. The shelf and slot numbers (165, 1) may be somewhat arbitrary but must be unique in the domain. * On the actual system, enable AoE and mount the file system: modprobe aoe mkdir -p /mnt/share mount /dev/etherd/e165.1 /mnt/share * Reboot Expected behaviour, and seen on jessie: System reboots without any notable unusual activity, after reboot the filesystem on that device is clean. Observed behaviour on stretch: Shutdown stalls for a three minutes as systemd cannot umount the filesystem ("A stop job is running for /mnt/share"). Another mount after reboot indicates the filesystem wasn't cleanly umounted: | EXT4-fs (etherd!e165.1): recovery complete As said before, the most likely cause is the old init script /etc/init.d/networking is no longer executed, that script has a detection whether AoE (or something similar) is in use and skips network deconfiration then. Using swap over AoE would probably make the system hang, I haven't tested that, though. The thing that puzzles me here: That same check also handles nfs and cifs mounts, but appearently they are still umounted in a sane way in stretch. Using network block device (NBD) however, things are even worse: Upon shutdown, the kernel triggers a BUG and the system hangs: Unmounting /mnt/share... ------------[ cut here ]------------ kernel BUG at /build/linux-9r9Ph5/linux-4.9.18/fs/buffer.c:3060! invalid opcode: 0000 [#1] SMP Modules linked in: nbd snd_hda_codec_generic crct10dif_pclmul ppdev crc32_pclmul ghash_clmulni_intel pcspkr evdev joydev serio_raw virtio_balloon virtio_console snd_hda_intel qxl snd_hda_codec sg ttm snd_hda_core drm_kms_helper snd_hwdep snd_pcm parport_pc acpi_cpufreq parport tpm_tis tpm_tis_core snd_timer tpm drm snd button soundcore sunrpc ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache hid_generic usbhid hid aoe sr_mod cdrom ata_generic virtio_net virtio_blk crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse ata_piix ehci_pci uhci_hcd ehci_hcd libata floppy usbcore usb_common virtio_pci virtio_ring virtio i2c_piix4 scsi_mod CPU: 0 PID: 984 Comm: umount Not tainted 4.9.0-2-amd64 #1 Debian 4.9.18-1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 task: ffff92df39eec040 task.stack: ffffb5950045c000 RIP: 0010:[<ffffffff8b43a133>] [<ffffffff8b43a133>] submit_bh_wbc+0x173/0x1d0 RSP: 0018:ffffb5950045fdc0 EFLAGS: 00010246 RAX: 0000000000000005 RBX: ffff92df3a313b60 RCX: 0000000000000000 RDX: ffff92df3a313b60 RSI: 0000000000000148 RDI: 0000000000000001 RBP: 0000000000000148 R08: 0000000000000000 R09: 0000000000000001 R10: 000000000000145c R11: 000000000001c2a1 R12: 0000000000000001 R13: ffff92df3d184400 R14: 0000000000002176 R15: ffff92df3d1d1800 FS: 00007f1d984822c0(0000) GS:ffff92df3fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1d97cc4d00 CR3: 0000000039e01000 CR4: 00000000000406f0 Stack: ffff92df3a313b60 0000000000000148 0000000000000001 ffff92df3d184400 0000000000002176 ffff92df3d1d1800 ffffffff8b43c7ee ffffffff8bf11580 ffff92df3a313b60 ffffffffc0526b6d 0000000000002175 ffff92df3d138800 Call Trace: [<ffffffff8b43c7ee>] ? __sync_dirty_buffer+0x4e/0xf0 [<ffffffffc0526b6d>] ? ext4_commit_super+0x20d/0x2b0 [ext4] [<ffffffffc0527ae3>] ? ext4_put_super+0xd3/0x3a0 [ext4] [<ffffffff8b404459>] ? generic_shutdown_super+0x69/0xf0 [<ffffffff8b4047a1>] ? kill_block_super+0x21/0x60 [<ffffffff8b4048a4>] ? deactivate_locked_super+0x34/0x60 [<ffffffff8b42370b>] ? cleanup_mnt+0x3b/0x80 [<ffffffff8b294af9>] ? task_work_run+0x79/0xa0 [<ffffffff8b203284>] ? exit_to_usermode_loop+0xa4/0xb0 [<ffffffff8b203a94>] ? syscall_return_slowpath+0x54/0x60 [<ffffffff8b7fb108>] ? system_call_fast_compare_end+0x99/0x9b Code: 1f 09 c7 41 09 fc 48 89 ef 44 89 65 14 e8 96 13 0c 00 5b 31 c0 5d 41 5c 41 5d 41 5e 41 5f c3 3e 80 62 01 f7 e9 fb fe ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 80 3d 83 bb a4 00 00 75 ac be 1b RIP [<ffffffff8b43a133>] submit_bh_wbc+0x173/0x1d0 RSP <ffffb5950045fdc0> ---[ end trace d8257e9f866874aa ]--- Overall, my main interest is to get this fixed. I happen to maintain the AoE userland (client and server) in Debian and am willing to handle the issue there, I'd however need an idea how this should be done. Also, that approach would still require dealing with all other network based file systems as well. Regards, Christoph
signature.asc
Description: Digital signature