[Kernel-packages] [Bug 2026883] Re: vector floating point registers get clobbered when running stress-ng --vecfp with more instances than CPUs

Colin Ian King Tue, 11 Jul 2023 08:55:38 -0700

It may be worth trying this on real H/W to factor our the QEMU
component.

** Description changed:


  When running the stress-ng vector floating point stressor in QEMU PPC64
  virtual machines I get floating point verification errors when running
  more stressor instances than the number of virtual CPUs.
  
  How to reproduce:
  
  Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and
  then do:
  
  get latest stress-ng:
  
  sudo apt-get build-dep stress-ng
  git clone https://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean; make -j $(nproc)
  ./stress-ng --vecfp 32 --verify -t 10
  
  One should get failures such as:
  stress-ng: info:  [1487] setting to a 10 second run per stressor
  stress-ng: info:  [1487] dispatching hogs: 32 vecfp
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 1078998925312.000000, expected 180812.062500
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 46779686912.000000, expected 13278722.000000
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 24992688128.000000, expected 26213772.000000
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 17185787904.000000, expected 39415832.000000
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 157250576.000000, expected 33576.261719
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 170314032.000000, expected 13129044.000000
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 183516080.000000, expected 26348392.000000
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 196647552.000000, expected 39365508.000000
  etc..
  
  However, running less instances than the number of CPUs this runs fine 
without any errors:
  /stress-ng --vecfp 1 --verify -t 10
  stress-ng: info:  [1521] setting to a 10 second run per stressor
  stress-ng: info:  [1521] dispatching hogs: 1 vecfp
  stress-ng: info:  [1521] passed: 1: vecfp (1)
  stress-ng: info:  [1521] failed: 0
  stress-ng: info:  [1521] skipped: 0
  stress-ng: info:  [1521] metrics untrustworthy: 0
  stress-ng: info:  [1521] successful run completed in 19.00s
  
  It appears this only fails when the number of instances of the vecfp
  stressor is more than the number of virtual CPUs.  This seems to
  indicate that vector floating point registers are being clobbered
  between processes, which could be a security exploitable issue.
  
  Reproduced with Ubuntu Lunar PPC64 VM (6.2.0-20-generic) and x86 host
- (6.2.0-21-generic + qemu-kvm  1:5.0-5ubuntu6)
+ (6.2.0-21-generic + qemu-kvm  1:5.0-5ubuntu6).
+ 
+ List of PPC64el kernels reproducers:
+     
+     Lunar: 6.2.0-20-generic
+     Mantic: 6.3.0-7-generic
+ 
  
  Not sure if this is a kernel or KVM issue, or both.

** Information type changed from Public to Private Security

** Changed in: linux (Ubuntu Lunar)
   Importance: Undecided => High

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Also affects: linux (Ubuntu Mantic)
   Importance: High
       Status: New

** Description changed:

  When running the stress-ng vector floating point stressor in QEMU PPC64
  virtual machines I get floating point verification errors when running
  more stressor instances than the number of virtual CPUs.
  
  How to reproduce:
  
  Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and
  then do:
  
  get latest stress-ng:
  
  sudo apt-get build-dep stress-ng
  git clone https://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean; make -j $(nproc)
  ./stress-ng --vecfp 32 --verify -t 10
  
  One should get failures such as:
  stress-ng: info:  [1487] setting to a 10 second run per stressor
  stress-ng: info:  [1487] dispatching hogs: 32 vecfp
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 1078998925312.000000, expected 180812.062500
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 46779686912.000000, expected 13278722.000000
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 24992688128.000000, expected 26213772.000000
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 17185787904.000000, expected 39415832.000000
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 157250576.000000, expected 33576.261719
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 170314032.000000, expected 13129044.000000
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 183516080.000000, expected 26348392.000000
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 196647552.000000, expected 39365508.000000
  etc..
  
  However, running less instances than the number of CPUs this runs fine 
without any errors:
  /stress-ng --vecfp 1 --verify -t 10
  stress-ng: info:  [1521] setting to a 10 second run per stressor
  stress-ng: info:  [1521] dispatching hogs: 1 vecfp
  stress-ng: info:  [1521] passed: 1: vecfp (1)
  stress-ng: info:  [1521] failed: 0
  stress-ng: info:  [1521] skipped: 0
  stress-ng: info:  [1521] metrics untrustworthy: 0
  stress-ng: info:  [1521] successful run completed in 19.00s
  
  It appears this only fails when the number of instances of the vecfp
  stressor is more than the number of virtual CPUs.  This seems to
  indicate that vector floating point registers are being clobbered
  between processes, which could be a security exploitable issue.
  
  Reproduced with Ubuntu Lunar PPC64 VM (6.2.0-20-generic) and x86 host
  (6.2.0-21-generic + qemu-kvm  1:5.0-5ubuntu6).
  
  List of PPC64el kernels reproducers:
-     
-     Lunar: 6.2.0-20-generic
-     Mantic: 6.3.0-7-generic
  
+     Focal: 5.4.0-148-generic
+     Lunar: 6.2.0-20-generic
+     Mantic: 6.3.0-7-generic
  
  Not sure if this is a kernel or KVM issue, or both.

** Description changed:

  When running the stress-ng vector floating point stressor in QEMU PPC64
  virtual machines I get floating point verification errors when running
  more stressor instances than the number of virtual CPUs.
  
  How to reproduce:
  
  Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login, and
  then do:
  
  get latest stress-ng:
  
  sudo apt-get build-dep stress-ng
  git clone https://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean; make -j $(nproc)
  ./stress-ng --vecfp 32 --verify -t 10
  
  One should get failures such as:
  stress-ng: info:  [1487] setting to a 10 second run per stressor
  stress-ng: info:  [1487] dispatching hogs: 32 vecfp
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 1078998925312.000000, expected 180812.062500
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 46779686912.000000, expected 13278722.000000
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 24992688128.000000, expected 26213772.000000
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 17185787904.000000, expected 39415832.000000
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 157250576.000000, expected 33576.261719
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 170314032.000000, expected 13129044.000000
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 183516080.000000, expected 26348392.000000
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 196647552.000000, expected 39365508.000000
  etc..
  
  However, running less instances than the number of CPUs this runs fine 
without any errors:
  /stress-ng --vecfp 1 --verify -t 10
  stress-ng: info:  [1521] setting to a 10 second run per stressor
  stress-ng: info:  [1521] dispatching hogs: 1 vecfp
  stress-ng: info:  [1521] passed: 1: vecfp (1)
  stress-ng: info:  [1521] failed: 0
  stress-ng: info:  [1521] skipped: 0
  stress-ng: info:  [1521] metrics untrustworthy: 0
  stress-ng: info:  [1521] successful run completed in 19.00s
  
  It appears this only fails when the number of instances of the vecfp
  stressor is more than the number of virtual CPUs.  This seems to
  indicate that vector floating point registers are being clobbered
  between processes, which could be a security exploitable issue.
  
  Reproduced with Ubuntu Lunar PPC64 VM (6.2.0-20-generic) and x86 host
  (6.2.0-21-generic + qemu-kvm  1:5.0-5ubuntu6).
  
  List of PPC64el kernels reproducers:
  
-     Focal: 5.4.0-148-generic
+     Focal: 5.4.0-148-generic
+     Jammy: 5.15.0-58-generic
      Lunar: 6.2.0-20-generic
      Mantic: 6.3.0-7-generic
  
  Not sure if this is a kernel or KVM issue, or both.

** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Focal)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2026883

Title:
  vector floating point registers get clobbered when running stress-ng
  --vecfp with more instances than CPUs

Status in linux package in Ubuntu:
  New
Status in linux source package in Focal:
  New
Status in linux source package in Lunar:
  New
Status in linux source package in Mantic:
  New

Bug description:
  When running the stress-ng vector floating point stressor in QEMU
  PPC64 virtual machines I get floating point verification errors when
  running more stressor instances than the number of virtual CPUs.

  How to reproduce:

  Create a PPC64 VM in QEMU on a x86 host with 8 virtual CPUs. Login,
  and then do:

  get latest stress-ng:

  sudo apt-get build-dep stress-ng
  git clone https://github.com/ColinIanKing/stress-ng
  cd stress-ng
  make clean; make -j $(nproc)
  ./stress-ng --vecfp 32 --verify -t 10

  One should get failures such as:
  stress-ng: info:  [1487] setting to a 10 second run per stressor
  stress-ng: info:  [1487] dispatching hogs: 32 vecfp
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 1078998925312.000000, expected 180812.062500
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 46779686912.000000, expected 13278722.000000
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 24992688128.000000, expected 26213772.000000
  stress-ng: fail:  [1489] vecfp: floatv64div float vector operation result 
mismatch, got 17185787904.000000, expected 39415832.000000
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 157250576.000000, expected 33576.261719
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 170314032.000000, expected 13129044.000000
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 183516080.000000, expected 26348392.000000
  stress-ng: fail:  [1488] vecfp: floatv16div float vector operation result 
mismatch, got 196647552.000000, expected 39365508.000000
  etc..

  However, running less instances than the number of CPUs this runs fine 
without any errors:
  /stress-ng --vecfp 1 --verify -t 10
  stress-ng: info:  [1521] setting to a 10 second run per stressor
  stress-ng: info:  [1521] dispatching hogs: 1 vecfp
  stress-ng: info:  [1521] passed: 1: vecfp (1)
  stress-ng: info:  [1521] failed: 0
  stress-ng: info:  [1521] skipped: 0
  stress-ng: info:  [1521] metrics untrustworthy: 0
  stress-ng: info:  [1521] successful run completed in 19.00s

  It appears this only fails when the number of instances of the vecfp
  stressor is more than the number of virtual CPUs.  This seems to
  indicate that vector floating point registers are being clobbered
  between processes, which could be a security exploitable issue.

  Reproduced with Ubuntu Lunar PPC64 VM (6.2.0-20-generic) and x86 host
  (6.2.0-21-generic + qemu-kvm  1:5.0-5ubuntu6).

  List of PPC64el kernels reproducers:

      Focal: 5.4.0-148-generic
      Jammy: 5.15.0-58-generic
      Lunar: 6.2.0-20-generic
      Mantic: 6.3.0-7-generic

  Not sure if this is a kernel or KVM issue, or both.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2026883/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2026883] Re: vector floating point registers get clobbered when running stress-ng --vecfp with more instances than CPUs

Reply via email to