On 16.02.2026 18:17, Andrew Cooper wrote:
> On 16/02/2026 4:39 pm, Jan Beulich wrote:
>> On 16.02.2026 17:29, Andrew Cooper wrote:
>>> On 16/02/2026 3:51 pm, Jan Beulich wrote:
>>>> The label used upon the function failing is wrong.
>>> Is it?  Which label do you think it ought to be?
>> fail_sched, as Roger did point out in reply to the original other patch.
>> After all ...
>>
>>>>  Instead of correcting
>>>> the label, move the invocation up a little, to also avoid it altogether
>>>> for the idle domain (where ->vmtrace_size would be zero, and hence the
>>>> function would bail right away anyway).
>>>>
>>>> Fixes: 217dd79ee292 ("xen/domain: Add vmtrace_size domain creation 
>>>> parameter")
>>>> Reported-by: Roger Pau Monné <[email protected]>
>>>> Signed-off-by: Jan Beulich <[email protected]>
>>>>
>>>> --- a/xen/common/domain.c
>>>> +++ b/xen/common/domain.c
>>>> @@ -493,14 +493,14 @@ struct vcpu *vcpu_create(struct domain *
>>>>          set_bit(_VPF_down, &v->pause_flags);
>>>>          vcpu_info_reset(v);
>>>>          init_waitqueue_vcpu(v);
>>>> +
>>>> +        if ( vmtrace_alloc_buffer(v) != 0 )
>>>> +            goto fail_wq;
>>>>      }
>>>>  
>>>>      if ( sched_init_vcpu(v) != 0 )
>>>>          goto fail_wq;
>> ... this comes first, and ...
>>
>>>> -    if ( vmtrace_alloc_buffer(v) != 0 )
>>>> -        goto fail_wq;
>>>> -
>>>>      if ( arch_vcpu_create(v) != 0 )
>>>>          goto fail_sched;
>> ... here the correct label is used.
> 
> Eww, yes.  So multiple observations.
> 
> 1) This only functions in the first place because
> destroy_waitqueue_vcpu() is idempotent to v->waitqueue_vcpu being NULL
> which covers the idle case where init_waitqueue_vcpu() was never called.
> 
> 2) sched_destroy_vcpu() can be made idempotent against v->sched_unit.
> 
> Then we don't need multiple labels and this all gets a lot easier to
> untangle.

Yes, but as a backportable fix what I have here is the most suitable
first step, I'd say.

With what you suggest, I'd then want to check whether either or both of
the function invocations could move into vcpu_teardown(). At least for
destroy_waitqueue_vcpu() I can't really figure why it's called only in
complete_domain_destroy(); for sched_destroy_vcpu() it may well be that
it can't be done earlier. Or wait, looks like vm_event_cleanup() would
need moving up in domain_kill(). The comment there right now explains
why it can't be done later; it's not quite clear to me whether moving
it ahead of (or into) domain_teardown() would introduce any problems.
Tamas?

Jan

Reply via email to