I've verified the fix in the way I suspected I'd have to, with one extra wrinkle.
1) In a trusty VM, I verified that the C test case from the gist failed. (It did). 2) I launched a xenial lxd container on the VM and built the Go test case with version 1.6.2-0ubuntu5~16.04.2 of golang-1.6-go. 3) It did not fail in the lxd container for reasons I couldn't understand but it did fail when copied out of the container on to the trusty VM (failed 563 times out of 100k) 4) I then installed golang-1.6-go version 1.6.2-0ubuntu5~16.04.3 in the container and rebuilt the Go test case with the new compiler. 5) This did not fail when run directly on the trusty VM (0 failures out of 100k runs) So I'm confident the fix has helped. ** Tags removed: verification-needed verification-needed-xenial ** Tags added: verification-done-xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1672819 Title: exec'ing a setuid binary from a threaded program sometimes fails to setuid Status in Linux: Unknown Status in golang-1.6 package in Ubuntu: Invalid Status in linux package in Ubuntu: Fix Released Status in golang-1.6 source package in Xenial: Fix Committed Status in linux source package in Xenial: Fix Released Status in golang-1.6 source package in Yakkety: Invalid Status in linux source package in Yakkety: Fix Released Status in golang-1.6 source package in Zesty: Invalid Status in linux source package in Zesty: Fix Released Bug description: == SRU template for golang-1.6 == [Impact] The kernel bug reported below means that occasionally (maybe 1 in 1000 times) the snapd -> snap-confine exec that is part of a snap execution fails to take the setuid bit on the snap-confine binary into account which means that the execution fails. This is extremely confusing for the user of the snap who just sees a permission denied error with no explanation. The kernel bug has been fixed in Xenial+ but not all users of snapd are on xenial+ kernels (they might be on trusty or another distribution entirely). Backporting this fix will mean that the snapd in the core snap will get the workaround next time it is built and because the snapd in trusty or the other distro will re-exec into the snapd in the core snap before execing snap-confine, users should not see the above behaviour. [Test case] This will be a bit tricky as the kernel bug has been fixed. A xenial container on a trusty host/VM should do the trick. The test case from https://gist.github.com/chipaca/806c90d96c437444f27f45a83d00a813 should be sufficient to demonstrate the bug and then, once golang-1.6 has been upgraded from proposed, the fix. [Regression potential] If there is a bug in the patch it could cause deadlocks in currently working programs. But the patch is pretty simple and has passed review upstream so I think it should be OK. == SRU REQUEST XENIAL, YAKKETY, ZESTY == Due to two race conditions in check_unsafe_exec(), exec'ing a setuid binary from a threaded program sometimes fails to setuid. == Fix == Sauce patch for Xenial, Yakkety + Zesty: https://lists.ubuntu.com/archives/kernel-team/2017-May/084102.html This fix re-executes the unsafe check if there is a discrepancy between the expected fs count and the found count during the racy window during thread exec or exit. This re-check occurs very infrequently and saves a lot of addition locking on per thread structures that would make performance of fork/exec/exit prohibitively expensive. == Test case == See the example C code in the patch, https://lists.ubuntu.com/archives /kernel-team/2017-May/084102.html Run the test code as follows: for i in $(seq 1000); do ./a; done With the patch, no messages are emitted, without the patch, one sees a message: "Failed, got euid 1000 (expecting 0)" ..which shows the setuid program failed the check_unsafe_exec() because of the race. == Regression potential == breaking existing safe exec semantics. ==================== This can be reproduced with https://gist.github.com/chipaca/806c90d96c437444f27f45a83d00a813 With that, and go 1.8, if you run “make” and then for i in `seq 99`; do ./a_go; done you'll see a variable number of ”GOT 1000” (or whatever your user id is). If you don't, add one or two more 9s on there. That's a simple go reproducer. You can also use “a_p” instead of “a_go” to see one that only uses pthreads. “a_c” is a C version that does *not* reproduce the issue. But it's not pthreads: if in a_go.go you comment out the “import "C"”, you'll still see the “GOT 1000” messages, in a static binary that uses no pthreads, just clone(2). You'll also see a bunch of warnings because it's not properly handling an EAGAIN from clone, but that's unrelated. If you pin the process to a single thread using taskset, you don't get the issue from a_go; a_p continues to reproduce the issue. In some virtualized environments we haven't been able to reproduce the issue either (e.g. some aws instances), but kvm works (you need -smp to see the issue from a_go). ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: linux-image-4.4.0-64-generic 4.4.0-64.85 ProcVersionSignature: Ubuntu 4.4.0-64.85-generic 4.4.44 Uname: Linux 4.4.0-64-generic x86_64 NonfreeKernelModules: zfs zunicode zcommon znvpair zavl ApportVersion: 2.20.1-0ubuntu2.5 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/pcmC0D0p: john 2354 F...m pulseaudio /dev/snd/controlC0: john 2354 F.... pulseaudio CurrentDesktop: Unity Date: Tue Mar 14 17:17:23 2017 HibernationDevice: RESUME=UUID=b9fd155b-dcbe-4337-ae77-6daa6569beaf InstallationDate: Installed on 2014-04-27 (1051 days ago) InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140417) MachineType: Dell Inc. Latitude E6510 ProcFB: 0 inteldrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-64-generic root=/dev/mapper/ubuntu--vg-root ro enable_mtrr_cleanup mtrr_spare_reg_nr=8 mtrr_gran_size=32M mtrr_chunk_size=32M quiet splash RelatedPackageVersions: linux-restricted-modules-4.4.0-64-generic N/A linux-backports-modules-4.4.0-64-generic N/A linux-firmware 1.157.8 SourcePackage: linux SystemImageInfo: Error: command ['system-image-cli', '-i'] failed with exit code 2: UpgradeStatus: Upgraded to xenial on 2015-06-18 (634 days ago) dmi.bios.date: 12/05/2013 dmi.bios.vendor: Dell Inc. dmi.bios.version: A16 dmi.board.vendor: Dell Inc. dmi.chassis.type: 9 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvrA16:bd12/05/2013:svnDellInc.:pnLatitudeE6510:pvr0001:rvnDellInc.:rn:rvr:cvnDellInc.:ct9:cvr: dmi.product.name: Latitude E6510 dmi.product.version: 0001 dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1672819/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp