Vincent, I'll update the description to match the SRU requirements, but your original description will still be available.
** Description changed: - Hello! + [Impact] + the IBRS would be mistakenly enabled in the host when the switching + from an IBRS-enabled VM and that causes the performance overhead in + the host. The other condition could also mistakenly disables the IBRS + in VM when context-switching from the host. And this could be + considered a CVE host. - As of Linux 4.4.0-119, when a KVM guest is using IBRS, this incurs a - very large performance penalty on the hosts and other guests. + [Fix] + The patch fixes the logic inside the x86_virt_spec_ctrl that it checks + the ibrs_enabled and _or_ the hostval with the SPEC_CTRL_IBRS as the + x86_spec_ctrl_base by default is zero. Because the upstream + implementation is not equal to the Xenial's implementation. Upstream + doesn't use the IBRS as the formal fix. So, by default, it's zero. - From my understanding, the patch - f676aa34b4027d1a7a4bbcc58b81b20c68c7ce0c is incomplete. If host doesn't - handle IBRS itself (which is now the case by default since 4.4.0-116: it - relies on retpoline instead) but the guest does (eg running an earlier - kernel), the guest will set IBRS for the CPU it is running on from time - to time but if it gets preempted at some point, the IBRS bit will stay, - incurring a major performance penalty for all other users of the CPU - (host userland, host kernel and other guests not caring about IBRS). The - equivalent patch in mainline (d28b387fb74da95d69d2615732f50cceb38e9a4d) - ensure the appropriate MSR is correctly restored when switching from one - guest to another or from one guest to host. + On the other hand, after the VM exit, the SPEC_CTRL register also + needs to be saved manually by reading the SPEC_CTRL MSR as the MSR + intercept is disabled by default in the hardware_setup(v4.4) and + vmx_init(v3.13). The access to SPEC_CTRL MSR in VM is direct and + doesn't trigger a trap. So, the vmx_set_msr() function isn't called. - The issue is easy to reproduce: host running 4.4.0-119, exposing - "spec_ctrl" to a guest running CentOS 7.4 with its January kernel. Wait - a few minutes and the host will become pretty slow. A simple shell loop - will take 10 more times to execute. Executing "sysctl -w - kernel.ibrs_dump=1" will show that most real cores have now their IBRS - bit set to 1. + The v3.13 kernel hasn't been tested. However, the patch can be viewed + at: + http://kernel.ubuntu.com/git/gavinguo/ubuntu-trusty-amd64.git/log/?h=sf00191076-sru - A workaround is to reeanble IBRS on the host (sysctl -w - kernel.ibrs_enabled=1). This way, IBRS will be correctly disabled when - changing context. + The v4.4 patch: + http://kernel.ubuntu.com/git/gavinguo/ubuntu-xenial.git/log/?h=sf00191076-spectre-v2-regres-backport-juerg - A long term solution would be to properly backport the patch from - mainline. It is not part of the 4.4 stable branch and it seems not - trivial to port. + [Test] - A mid term solution could be to remove the faulty patch (not exposing - IBRS), since most VM don't need it anymore. This also salvage the - ability to use IBPB (which doesn't seem to alter performance that much) - but it isn't believed to be essential. + The patch has been tested on the 4.4.0-140.166 and works fine. + + The reproducing environment: + Guest kernel version: 4.4.0-138.164 + Host kernel version: 4.4.0-140.166 + + (host IBRS, guest IBRS) + + - 1). (0, 1). + The case can be reproduced by the following instructions: + guest$ echo 1 | sudo tee /proc/sys/kernel/ibrs_enabled + 1 + + <Several minutes later...> + + host$ cat /proc/sys/kernel/ibrs_enabled + 0 + host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done + 11111111111111000000000000000000010010100000000000000000 + + Some of the IBRS bit inside the SPEC_CTRL MSR are mistakenly + enabled. + + host$ taskset -c 5 stress-ng -c 1 --cpu-ops 2500 + stress-ng: info: [11264] defaulting to a 86400 second run per stressor + stress-ng: info: [11264] dispatching hogs: 1 cpu + stress-ng: info: [11264] cache allocate: default cache size: 35840K + stress-ng: info: [11264] successful run completed in 33.48s + + The host kernel didn't notice the IBRS bit is enabled. So, the situation + is the same as "echo 2 > /proc/sys/kernel/ibrs_enabled" in the host. + And running the stress-ng is a pure userspace CPU capability + calculation. So, the performance downgrades to about 1/3. Without the + IBRS enabled, it needs about 10s. + + - 2). (1, 1) disables IBRS in host -> (0, 1) actually it becomes (0, 0). + The guest IBRS has been mistakenly disabled. + + guest$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled + guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done + 11111111111111111111111111111111111111111111111111111111 + + host$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled + host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done + 11111111111111111111111111111111111111111111111111111111 + host$ echo 0 | sudo tee /proc/sys/kernel/ibrs_enabled + host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done + 00000000000000000000000000000000000000000000000000000000 + + guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done + 00000000000000000000000000000000000000000000000000000000 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1764956 Title: Guests using IBRS incur a large performance penalty Status in linux package in Ubuntu: In Progress Status in linux source package in Xenial: In Progress Bug description: [Impact] the IBRS would be mistakenly enabled in the host when the switching from an IBRS-enabled VM and that causes the performance overhead in the host. The other condition could also mistakenly disables the IBRS in VM when context-switching from the host. And this could be considered a CVE host. [Fix] The patch fixes the logic inside the x86_virt_spec_ctrl that it checks the ibrs_enabled and _or_ the hostval with the SPEC_CTRL_IBRS as the x86_spec_ctrl_base by default is zero. Because the upstream implementation is not equal to the Xenial's implementation. Upstream doesn't use the IBRS as the formal fix. So, by default, it's zero. On the other hand, after the VM exit, the SPEC_CTRL register also needs to be saved manually by reading the SPEC_CTRL MSR as the MSR intercept is disabled by default in the hardware_setup(v4.4) and vmx_init(v3.13). The access to SPEC_CTRL MSR in VM is direct and doesn't trigger a trap. So, the vmx_set_msr() function isn't called. The v3.13 kernel hasn't been tested. However, the patch can be viewed at: http://kernel.ubuntu.com/git/gavinguo/ubuntu-trusty-amd64.git/log/?h=sf00191076-sru The v4.4 patch: http://kernel.ubuntu.com/git/gavinguo/ubuntu-xenial.git/log/?h=sf00191076-spectre-v2-regres-backport-juerg [Test] The patch has been tested on the 4.4.0-140.166 and works fine. The reproducing environment: Guest kernel version: 4.4.0-138.164 Host kernel version: 4.4.0-140.166 (host IBRS, guest IBRS) - 1). (0, 1). The case can be reproduced by the following instructions: guest$ echo 1 | sudo tee /proc/sys/kernel/ibrs_enabled 1 <Several minutes later...> host$ cat /proc/sys/kernel/ibrs_enabled 0 host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done 11111111111111000000000000000000010010100000000000000000 Some of the IBRS bit inside the SPEC_CTRL MSR are mistakenly enabled. host$ taskset -c 5 stress-ng -c 1 --cpu-ops 2500 stress-ng: info: [11264] defaulting to a 86400 second run per stressor stress-ng: info: [11264] dispatching hogs: 1 cpu stress-ng: info: [11264] cache allocate: default cache size: 35840K stress-ng: info: [11264] successful run completed in 33.48s The host kernel didn't notice the IBRS bit is enabled. So, the situation is the same as "echo 2 > /proc/sys/kernel/ibrs_enabled" in the host. And running the stress-ng is a pure userspace CPU capability calculation. So, the performance downgrades to about 1/3. Without the IBRS enabled, it needs about 10s. - 2). (1, 1) disables IBRS in host -> (0, 1) actually it becomes (0, 0). The guest IBRS has been mistakenly disabled. guest$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done 11111111111111111111111111111111111111111111111111111111 host$ echo 2 | sudo tee /proc/sys/kernel/ibrs_enabled host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done 11111111111111111111111111111111111111111111111111111111 host$ echo 0 | sudo tee /proc/sys/kernel/ibrs_enabled host$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done 00000000000000000000000000000000000000000000000000000000 guest$ for i in {0..55}; do sudo rdmsr 0x48 -p $i; done 00000000000000000000000000000000000000000000000000000000 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1764956/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp