AMD: deal with RDSEED issues

Jan Beulich Fri, 31 Oct 2025 05:35:42 -0700

On 31.10.2025 13:14, Roger Pau Monné wrote:
> On Fri, Oct 31, 2025 at 12:47:51PM +0100, Jan Beulich wrote:
>> On 31.10.2025 11:54, Roger Pau Monné wrote:
>>> On Fri, Oct 31, 2025 at 11:29:44AM +0100, Jan Beulich wrote:
>>>> On 31.10.2025 11:22, Roger Pau Monné wrote:
>>>>> On Tue, Oct 28, 2025 at 04:32:17PM +0100, Jan Beulich wrote:
>>>>>> Both patches also want 'x86/CPU: extend is_forced_cpu_cap()'s "reach"' in
>>>>>> place.
>>>>>>
>>>>>> 1: disable RDSEED on Fam17 model 47 stepping 0
>>>>>> 2: disable RDSEED on most of Zen5
>>>>>
>>>>> For both patches: don't we need to set the feature in the max policy
>>>>> to allow for incoming migrations of guests that have already seen the
>>>>> feature?
>>>>
>>>> No, such guests should not run on affected hosts (unless overrides are in 
>>>> place),
>>>> or else they'd face sudden malfunction of RDSEED. If an override was in 
>>>> place on
>>>> the source host, an override will also need to be put in place on the 
>>>> destination
>>>> one.
>>>
>>> But they may be malfunctioning before already, if started on a
>>> vulnerable hosts without this fix and having seen RDSEED?
>>
>> Yes. But there could also be ones coming from good hosts. Imo ...
>>
>>> IMO after this fix is applied you should do pool leveling, at which
>>> point RDSEED shouldn't be advertised anymore.  Having the feature in
>>> the max policy allows to evacuate running guests while updating the
>>> pool.  Otherwise those existing guests would be stuck to run on
>>> non-updated hosts.
>>
>> ... we need to err on the side of caution.
> 
> While I understand your concerns, this would cause failures in the
> upgrade and migration model used by both XCP-ng and XenServer at
> least, as it could prevent eviction of running VMs to updated hosts.
> 
> At a minimum we would need an option to allow the feature to be set on
> the max policy.


That's where the 3rd patch comes into play. "cpuid=rdseed" is the respective
override. Just that it doesn't work correctly without that further patch.

>  Overall I think safety of migration (in this specific
> regard) should be enforced by the toolstack (or orchestration layer),
> rather than the hypervisor itself.  The hypervisor can reject
> incompatible policies, but should leave the rest of the decisions to
> higher layers as it doesn't have enough knowledge.

But without rendering guests vulnerable behind the admin's back.

Jan

Re: [PATCH v2 for-4.21 0/2] x86/AMD: deal with RDSEED issues

Reply via email to