It may be worth trying this on real H/W to factor our the QEMU component. ** Description changed:
When running the stress-ng vector floating point stressor in QEMU PPC64 virtual machines I get floating point verification errors when running more stressor instances than the number of virtual CPUs. How to reproduce: Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and then do: get latest stress-ng: sudo apt-get build-dep stress-ng git clone https://github.com/ColinIanKing/stress-ng cd stress-ng make clean; make -j $(nproc) ./stress-ng --vecfp 32 --verify -t 10 One should get failures such as: stress-ng: info: [1487] setting to a 10 second run per stressor stress-ng: info: [1487] dispatching hogs: 32 vecfp stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 1078998925312.000000, expected 180812.062500 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 46779686912.000000, expected 13278722.000000 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 24992688128.000000, expected 26213772.000000 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 17185787904.000000, expected 39415832.000000 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 157250576.000000, expected 33576.261719 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 170314032.000000, expected 13129044.000000 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 183516080.000000, expected 26348392.000000 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 196647552.000000, expected 39365508.000000 etc.. However, running less instances than the number of CPUs this runs fine without any errors: /stress-ng --vecfp 1 --verify -t 10 stress-ng: info: [1521] setting to a 10 second run per stressor stress-ng: info: [1521] dispatching hogs: 1 vecfp stress-ng: info: [1521] passed: 1: vecfp (1) stress-ng: info: [1521] failed: 0 stress-ng: info: [1521] skipped: 0 stress-ng: info: [1521] metrics untrustworthy: 0 stress-ng: info: [1521] successful run completed in 19.00s It appears this only fails when the number of instances of the vecfp stressor is more than the number of virtual CPUs. This seems to indicate that vector floating point registers are being clobbered between processes, which could be a security exploitable issue. Reproduced with Ubuntu Lunar PPC64 VM (6.2.0-20-generic) and x86 host - (6.2.0-21-generic + qemu-kvm 1:5.0-5ubuntu6) + (6.2.0-21-generic + qemu-kvm 1:5.0-5ubuntu6). + + List of PPC64el kernels reproducers: + + Lunar: 6.2.0-20-generic + Mantic: 6.3.0-7-generic + Not sure if this is a kernel or KVM issue, or both. ** Information type changed from Public to Private Security ** Changed in: linux (Ubuntu Lunar) Importance: Undecided => High ** Changed in: linux (Ubuntu) Importance: Undecided => High ** Also affects: linux (Ubuntu Mantic) Importance: High Status: New ** Description changed: When running the stress-ng vector floating point stressor in QEMU PPC64 virtual machines I get floating point verification errors when running more stressor instances than the number of virtual CPUs. How to reproduce: Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and then do: get latest stress-ng: sudo apt-get build-dep stress-ng git clone https://github.com/ColinIanKing/stress-ng cd stress-ng make clean; make -j $(nproc) ./stress-ng --vecfp 32 --verify -t 10 One should get failures such as: stress-ng: info: [1487] setting to a 10 second run per stressor stress-ng: info: [1487] dispatching hogs: 32 vecfp stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 1078998925312.000000, expected 180812.062500 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 46779686912.000000, expected 13278722.000000 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 24992688128.000000, expected 26213772.000000 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 17185787904.000000, expected 39415832.000000 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 157250576.000000, expected 33576.261719 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 170314032.000000, expected 13129044.000000 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 183516080.000000, expected 26348392.000000 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 196647552.000000, expected 39365508.000000 etc.. However, running less instances than the number of CPUs this runs fine without any errors: /stress-ng --vecfp 1 --verify -t 10 stress-ng: info: [1521] setting to a 10 second run per stressor stress-ng: info: [1521] dispatching hogs: 1 vecfp stress-ng: info: [1521] passed: 1: vecfp (1) stress-ng: info: [1521] failed: 0 stress-ng: info: [1521] skipped: 0 stress-ng: info: [1521] metrics untrustworthy: 0 stress-ng: info: [1521] successful run completed in 19.00s It appears this only fails when the number of instances of the vecfp stressor is more than the number of virtual CPUs. This seems to indicate that vector floating point registers are being clobbered between processes, which could be a security exploitable issue. Reproduced with Ubuntu Lunar PPC64 VM (6.2.0-20-generic) and x86 host (6.2.0-21-generic + qemu-kvm 1:5.0-5ubuntu6). List of PPC64el kernels reproducers: - - Lunar: 6.2.0-20-generic - Mantic: 6.3.0-7-generic + Focal: 5.4.0-148-generic + Lunar: 6.2.0-20-generic + Mantic: 6.3.0-7-generic Not sure if this is a kernel or KVM issue, or both. ** Description changed: When running the stress-ng vector floating point stressor in QEMU PPC64 virtual machines I get floating point verification errors when running more stressor instances than the number of virtual CPUs. How to reproduce: Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and then do: get latest stress-ng: sudo apt-get build-dep stress-ng git clone https://github.com/ColinIanKing/stress-ng cd stress-ng make clean; make -j $(nproc) ./stress-ng --vecfp 32 --verify -t 10 One should get failures such as: stress-ng: info: [1487] setting to a 10 second run per stressor stress-ng: info: [1487] dispatching hogs: 32 vecfp stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 1078998925312.000000, expected 180812.062500 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 46779686912.000000, expected 13278722.000000 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 24992688128.000000, expected 26213772.000000 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 17185787904.000000, expected 39415832.000000 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 157250576.000000, expected 33576.261719 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 170314032.000000, expected 13129044.000000 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 183516080.000000, expected 26348392.000000 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 196647552.000000, expected 39365508.000000 etc.. However, running less instances than the number of CPUs this runs fine without any errors: /stress-ng --vecfp 1 --verify -t 10 stress-ng: info: [1521] setting to a 10 second run per stressor stress-ng: info: [1521] dispatching hogs: 1 vecfp stress-ng: info: [1521] passed: 1: vecfp (1) stress-ng: info: [1521] failed: 0 stress-ng: info: [1521] skipped: 0 stress-ng: info: [1521] metrics untrustworthy: 0 stress-ng: info: [1521] successful run completed in 19.00s It appears this only fails when the number of instances of the vecfp stressor is more than the number of virtual CPUs. This seems to indicate that vector floating point registers are being clobbered between processes, which could be a security exploitable issue. Reproduced with Ubuntu Lunar PPC64 VM (6.2.0-20-generic) and x86 host (6.2.0-21-generic + qemu-kvm 1:5.0-5ubuntu6). List of PPC64el kernels reproducers: - Focal: 5.4.0-148-generic + Focal: 5.4.0-148-generic + Jammy: 5.15.0-58-generic Lunar: 6.2.0-20-generic Mantic: 6.3.0-7-generic Not sure if this is a kernel or KVM issue, or both. ** Also affects: linux (Ubuntu Focal) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Focal) Importance: Undecided => High -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2026883 Title: vector floating point registers get clobbered when running stress-ng --vecfp with more instances than CPUs Status in linux package in Ubuntu: New Status in linux source package in Focal: New Status in linux source package in Lunar: New Status in linux source package in Mantic: New Bug description: When running the stress-ng vector floating point stressor in QEMU PPC64 virtual machines I get floating point verification errors when running more stressor instances than the number of virtual CPUs. How to reproduce: Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and then do: get latest stress-ng: sudo apt-get build-dep stress-ng git clone https://github.com/ColinIanKing/stress-ng cd stress-ng make clean; make -j $(nproc) ./stress-ng --vecfp 32 --verify -t 10 One should get failures such as: stress-ng: info: [1487] setting to a 10 second run per stressor stress-ng: info: [1487] dispatching hogs: 32 vecfp stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 1078998925312.000000, expected 180812.062500 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 46779686912.000000, expected 13278722.000000 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 24992688128.000000, expected 26213772.000000 stress-ng: fail: [1489] vecfp: floatv64div float vector operation result mismatch, got 17185787904.000000, expected 39415832.000000 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 157250576.000000, expected 33576.261719 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 170314032.000000, expected 13129044.000000 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 183516080.000000, expected 26348392.000000 stress-ng: fail: [1488] vecfp: floatv16div float vector operation result mismatch, got 196647552.000000, expected 39365508.000000 etc.. However, running less instances than the number of CPUs this runs fine without any errors: /stress-ng --vecfp 1 --verify -t 10 stress-ng: info: [1521] setting to a 10 second run per stressor stress-ng: info: [1521] dispatching hogs: 1 vecfp stress-ng: info: [1521] passed: 1: vecfp (1) stress-ng: info: [1521] failed: 0 stress-ng: info: [1521] skipped: 0 stress-ng: info: [1521] metrics untrustworthy: 0 stress-ng: info: [1521] successful run completed in 19.00s It appears this only fails when the number of instances of the vecfp stressor is more than the number of virtual CPUs. This seems to indicate that vector floating point registers are being clobbered between processes, which could be a security exploitable issue. Reproduced with Ubuntu Lunar PPC64 VM (6.2.0-20-generic) and x86 host (6.2.0-21-generic + qemu-kvm 1:5.0-5ubuntu6). List of PPC64el kernels reproducers: Focal: 5.4.0-148-generic Jammy: 5.15.0-58-generic Lunar: 6.2.0-20-generic Mantic: 6.3.0-7-generic Not sure if this is a kernel or KVM issue, or both. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026883/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp