Vincent, I'll update the description to match the SRU requirements, but
your original description will still be available.

** Description changed:

- Hello!
+ [Impact]
+ the IBRS would be mistakenly enabled in the host when the switching
+ from an IBRS-enabled VM and that causes the performance overhead in
+ the host. The other condition could also mistakenly disables the IBRS
+ in VM when context-switching from the host. And this could be
+ considered a CVE host.
  
- As of Linux 4.4.0-119, when a KVM guest is using IBRS, this incurs a
- very large performance penalty on the hosts and other guests.
+ [Fix]
+ The patch fixes the logic inside the x86_virt_spec_ctrl that it checks
+ the ibrs_enabled and _or_ the hostval with the SPEC_CTRL_IBRS as the
+ x86_spec_ctrl_base by default is zero. Because the upstream
+ implementation is not equal to the Xenial's implementation. Upstream
+ doesn't use the IBRS as the formal fix. So, by default, it's zero.
  
- From my understanding, the patch
- f676aa34b4027d1a7a4bbcc58b81b20c68c7ce0c is incomplete. If host doesn't
- handle IBRS itself (which is now the case by default since 4.4.0-116: it
- relies on retpoline instead) but the guest does (eg running an earlier
- kernel), the guest will set IBRS for the CPU it is running on from time
- to time but if it gets preempted at some point, the IBRS bit will stay,
- incurring a major performance penalty for all other users of the CPU
- (host userland, host kernel and other guests not caring about IBRS). The
- equivalent patch in mainline (d28b387fb74da95d69d2615732f50cceb38e9a4d)
- ensure the appropriate MSR is correctly restored when switching from one
- guest to another or from one guest to host.
+ On the other hand, after the VM exit, the SPEC_CTRL register also
+ needs to be saved manually by reading the SPEC_CTRL MSR as the MSR
+ intercept is disabled by default in the hardware_setup(v4.4) and
+ vmx_init(v3.13). The access to SPEC_CTRL MSR in VM is direct and
+ doesn't trigger a trap. So, the vmx_set_msr() function isn't called.
  
- The issue is easy to reproduce: host running 4.4.0-119, exposing
- "spec_ctrl" to a guest running CentOS 7.4 with its January kernel. Wait
- a few minutes and the host will become pretty slow. A simple shell loop
- will take 10 more times to execute. Executing "sysctl -w
- kernel.ibrs_dump=1" will show that most real cores have now their IBRS
- bit set to 1.
+ The v3.13 kernel hasn't been tested. However, the patch can be viewed
+ at:
+ 
http://kernel.ubuntu.com/git/gavinguo/ubuntu-trusty-amd64.git/log/?h=sf00191076-sru
  
- A workaround is to reeanble IBRS on the host (sysctl -w
- kernel.ibrs_enabled=1). This way, IBRS will be correctly disabled when
- changing context.
+ The v4.4 patch:
+ 
http://kernel.ubuntu.com/git/gavinguo/ubuntu-xenial.git/log/?h=sf00191076-spectre-v2-regres-backport-juerg
  
- A long term solution would be to properly backport the patch from
- mainline. It is not part of the 4.4 stable branch and it seems not
- trivial to port.
+ [Test]
  
- A mid term solution could be to remove the faulty patch (not exposing
- IBRS), since most VM don't need it anymore. This also salvage the
- ability to use IBPB (which doesn't seem to alter performance that much)
- but it isn't believed to be essential.
+ The patch has been tested on the 4.4.0-140.166 and works fine.
+ 
+ The reproducing environment:
+ Guest kernel version: 4.4.0-138.164
+ Host kernel version: 4.4.0-140.166
+ 
+ (host IBRS, guest IBRS)
+ 
+ - 1). (0, 1).
+ The case can be reproduced by the following instructions:
+ guest$ echo 1 | sudo tee /proc/sys/kernel/ibrs_enabled
+ 1
+ 
+ <Several minutes later...>
+ 
+ host$ cat /proc/sys/kernel/ibrs_enabled
+ 0
+ host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
+ 11111111111111000000000000000000010010100000000000000000
+ 
+ Some of the IBRS bit inside the SPEC_CTRL MSR are mistakenly
+ enabled.
+ 
+ host$ taskset -c 5 stress-ng -c 1 --cpu-ops 2500
+ stress-ng: info:  [11264] defaulting to a 86400 second run per stressor
+ stress-ng: info:  [11264] dispatching hogs: 1 cpu
+ stress-ng: info:  [11264] cache allocate: default cache size: 35840K
+ stress-ng: info:  [11264] successful run completed in 33.48s
+ 
+ The host kernel didn't notice the IBRS bit is enabled. So, the situation
+ is the same as "echo 2 > /proc/sys/kernel/ibrs_enabled" in the host.
+ And running the stress-ng is a pure userspace CPU capability
+ calculation. So, the performance downgrades to about 1/3. Without the
+ IBRS enabled, it needs about 10s.
+ 
+ - 2). (1, 1) disables IBRS in host -> (0, 1) actually it becomes (0, 0).
+ The guest IBRS has been mistakenly disabled.
+ 
+ guest$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
+ guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
+ 11111111111111111111111111111111111111111111111111111111
+ 
+ host$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
+ host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
+ 11111111111111111111111111111111111111111111111111111111
+ host$ echo 0 | sudo tee /proc/sys/kernel/ibrs_enabled
+ host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
+ 00000000000000000000000000000000000000000000000000000000
+ 
+ guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
+ 00000000000000000000000000000000000000000000000000000000

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1764956

Title:
  Guests using IBRS incur a large performance penalty

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress

Bug description:
  [Impact]
  the IBRS would be mistakenly enabled in the host when the switching
  from an IBRS-enabled VM and that causes the performance overhead in
  the host. The other condition could also mistakenly disables the IBRS
  in VM when context-switching from the host. And this could be
  considered a CVE host.

  [Fix]
  The patch fixes the logic inside the x86_virt_spec_ctrl that it checks
  the ibrs_enabled and _or_ the hostval with the SPEC_CTRL_IBRS as the
  x86_spec_ctrl_base by default is zero. Because the upstream
  implementation is not equal to the Xenial's implementation. Upstream
  doesn't use the IBRS as the formal fix. So, by default, it's zero.

  On the other hand, after the VM exit, the SPEC_CTRL register also
  needs to be saved manually by reading the SPEC_CTRL MSR as the MSR
  intercept is disabled by default in the hardware_setup(v4.4) and
  vmx_init(v3.13). The access to SPEC_CTRL MSR in VM is direct and
  doesn't trigger a trap. So, the vmx_set_msr() function isn't called.

  The v3.13 kernel hasn't been tested. However, the patch can be viewed
  at:
  
http://kernel.ubuntu.com/git/gavinguo/ubuntu-trusty-amd64.git/log/?h=sf00191076-sru

  The v4.4 patch:
  
http://kernel.ubuntu.com/git/gavinguo/ubuntu-xenial.git/log/?h=sf00191076-spectre-v2-regres-backport-juerg

  [Test]

  The patch has been tested on the 4.4.0-140.166 and works fine.

  The reproducing environment:
  Guest kernel version: 4.4.0-138.164
  Host kernel version: 4.4.0-140.166

  (host IBRS, guest IBRS)

  - 1). (0, 1).
  The case can be reproduced by the following instructions:
  guest$ echo 1 | sudo tee /proc/sys/kernel/ibrs_enabled
  1

  <Several minutes later...>

  host$ cat /proc/sys/kernel/ibrs_enabled
  0
  host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
  11111111111111000000000000000000010010100000000000000000

  Some of the IBRS bit inside the SPEC_CTRL MSR are mistakenly
  enabled.

  host$ taskset -c 5 stress-ng -c 1 --cpu-ops 2500
  stress-ng: info:  [11264] defaulting to a 86400 second run per stressor
  stress-ng: info:  [11264] dispatching hogs: 1 cpu
  stress-ng: info:  [11264] cache allocate: default cache size: 35840K
  stress-ng: info:  [11264] successful run completed in 33.48s

  The host kernel didn't notice the IBRS bit is enabled. So, the situation
  is the same as "echo 2 > /proc/sys/kernel/ibrs_enabled" in the host.
  And running the stress-ng is a pure userspace CPU capability
  calculation. So, the performance downgrades to about 1/3. Without the
  IBRS enabled, it needs about 10s.

  - 2). (1, 1) disables IBRS in host -> (0, 1) actually it becomes (0, 0).
  The guest IBRS has been mistakenly disabled.

  guest$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
  guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
  11111111111111111111111111111111111111111111111111111111

  host$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled
  host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
  11111111111111111111111111111111111111111111111111111111
  host$ echo 0 | sudo tee /proc/sys/kernel/ibrs_enabled
  host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
  00000000000000000000000000000000000000000000000000000000

  guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done
  00000000000000000000000000000000000000000000000000000000

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1764956/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to