HPET: use single, global, low-priority vector for broadcast IRQ

Roger Pau Monné Thu, 23 Oct 2025 05:50:07 -0700

On Thu, Oct 23, 2025 at 12:37:22PM +0200, Jan Beulich wrote:
> On 23.10.2025 10:39, Roger Pau Monné wrote:
> > On Wed, Oct 22, 2025 at 11:21:15AM +0200, Jan Beulich wrote:
> >> On 21.10.2025 15:49, Roger Pau Monné wrote:
> >>> On Tue, Oct 21, 2025 at 08:42:13AM +0200, Jan Beulich wrote:
> >>>> On 20.10.2025 18:22, Roger Pau Monné wrote:
> >>>>> On Mon, Oct 20, 2025 at 01:18:34PM +0200, Jan Beulich wrote:
> >>>>>> @@ -476,19 +486,50 @@ static struct hpet_event_channel *hpet_g
> >>>>>>  static void set_channel_irq_affinity(struct hpet_event_channel *ch)
> >>>>>>  {
> >>>>>>      struct irq_desc *desc = irq_to_desc(ch->msi.irq);
> >>>>>> +    struct msi_msg msg = ch->msi.msg;
> >>>>>>  
> >>>>>>      ASSERT(!local_irq_is_enabled());
> >>>>>>      spin_lock(&desc->lock);
> >>>>>> -    hpet_msi_mask(desc);
> >>>>>> -    hpet_msi_set_affinity(desc, cpumask_of(ch->cpu));
> >>>>>> -    hpet_msi_unmask(desc);
> >>>>>> +
> >>>>>> +    per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq;
> >>>>>> +
> >>>>>> +    /*
> >>>>>> +     * Open-coding a reduced form of hpet_msi_set_affinity() here.  
> >>>>>> With the
> >>>>>> +     * actual update below (either of the IRTE or of [just] message 
> >>>>>> address;
> >>>>>> +     * with interrupt remapping message address/data don't change) 
> >>>>>> now being
> >>>>>> +     * atomic, we can avoid masking the IRQ around the update.  As a 
> >>>>>> result
> >>>>>> +     * we're no longer at risk of missing IRQs (provided 
> >>>>>> hpet_broadcast_enter()
> >>>>>> +     * keeps setting the new deadline only afterwards).
> >>>>>> +     */
> >>>>>> +    cpumask_copy(desc->arch.cpu_mask, cpumask_of(ch->cpu));
> >>>>>> +
> >>>>>>      spin_unlock(&desc->lock);
> >>>>>>  
> >>>>>> -    spin_unlock(&ch->lock);
> >>>>>> +    msg.dest32 = cpu_physical_id(ch->cpu);
> >>>>>> +    msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
> >>>>>> +    msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32);
> >>>>>> +    if ( msg.dest32 != ch->msi.msg.dest32 )
> >>>>>> +    {
> >>>>>> +        ch->msi.msg = msg;
> >>>>>> +
> >>>>>> +        if ( iommu_intremap != iommu_intremap_off )
> >>>>>> +        {
> >>>>>> +            int rc = iommu_update_ire_from_msi(&ch->msi, &msg);
> >>>>>>  
> >>>>>> -    /* We may have missed an interrupt due to the temporary masking. 
> >>>>>> */
> >>>>>> -    if ( ch->event_handler && ch->next_event < NOW() )
> >>>>>> -        ch->event_handler(ch);
> >>>>>> +            ASSERT(rc <= 0);
> >>>>>> +            if ( rc > 0 )
> >>>>>> +            {
> >>>>>> +                ASSERT(msg.data == 
> >>>>>> hpet_read32(HPET_Tn_ROUTE(ch->idx)));
> >>>>>> +                ASSERT(msg.address_lo ==
> >>>>>> +                       hpet_read32(HPET_Tn_ROUTE(ch->idx) + 4));
> >>>>>> +            }
> >>>>>
> >>>>> The sequence of asserts seem wrong here, the asserts inside of the rc
> >>>>>> 0 check will never trigger, because there's an ASSERT(rc <= 0)
> >>>>> ahead of them?
> >>>>
> >>>> Hmm. My way of thinking was that if we get back 1 (which we shouldn't),
> >>>> we ought to check (and presumably fail on) data or address having 
> >>>> changed.
> >>>
> >>> Right, but the ASSERT(rc <= 0) will prevent reaching any of the
> >>> followup ASSERTs if rc == 1?
> >>
> >> Which is no problem, as we'd be dead already anyway if the first assertion
> >> triggered. Nevertheless I've switched the if() to >= 0 (which then pointed
> >> out a necessary change in AMD IOMMU code).
> > 
> > Right, so and adjusted if condition plus an ASSERT_UNREACHABLE() at
> > the end of the if code block?
> 
> That is, instead of
> 
>             ASSERT(rc <= 0);
>             if ( rc >= 0 )
>             {
>                 ASSERT(msg.data == hpet_read32(HPET_Tn_ROUTE(ch->idx)));
>                 ASSERT(msg.address_lo ==
>                        hpet_read32(HPET_Tn_ROUTE(ch->idx) + 4));
>             }
> 
> you'd prefer
> 
>             if ( rc >= 0 )
>             {
>                 ASSERT(msg.data == hpet_read32(HPET_Tn_ROUTE(ch->idx)));
>                 ASSERT(msg.address_lo ==
>                        hpet_read32(HPET_Tn_ROUTE(ch->idx) + 4));
>                 ASSERT_UNREACHABLE();
>             }
> 
> ? That's wrong though (for rc == 0), i.e. I fear I don't see what you mean.


Oh, I see, sorry for the suggestions, it's indeed wrong.  FTAOD, what
do you plan to use then here?

You could replace the ASSERT_UNREACHABLE() for ASSERT(rc == 0) in my
suggestion I think?

Or maybe just do:

ASSERT(rc <= 0);
if ( !rc )
{
    ASSERT(msg.data == hpet_read32(HPET_Tn_ROUTE(ch->idx)));
    ASSERT(msg.address_lo ==
           hpet_read32(HPET_Tn_ROUTE(ch->idx) + 4));
}

Was your original intention with those checks to ensure that for the
rc == 0 case the message fields remained unchanged?

> >>>  IOW, we possibly want:
> >>>
> >>>             if ( rc > 0 )
> >>>             {
> >>>                 dprintk(XENLOG_ERR,
> >>>                         "Unexpected HPET MSI setup returned: data: %#x 
> >>> address: %#lx expected data %#x address %#lx\n",
> >>>                         msg.data, msg.address,
> >>>                         ch->msi.msg.data, ch->msi.msg.address);
> >>>                 ASSERT_UNREACHABLE();
> >>>                 hpet_msi_mask(desc);
> >>>                 hpet_write32(msg.data, HPET_Tn_ROUTE(ch->idx));
> >>>                 hpet_write32(msg.address_lo, HPET_Tn_ROUTE(ch->idx) + 4);
> >>>                 hpet_msi_unmask(desc);
> >>>             }
> >>>             ASSERT(!rc);
> >>
> >> To be honest, for my taste this goes too far as to what follows an
> >> ASSERT_UNREACHABLE().
> > 
> > I can understand that.  It's the best way I've come up with attempting
> > to recover from a possible error in the release case, but I don't
> > particularly like it either.
> > 
> >>> I'm unsure about attempting to propagate the returned values on release
> >>> builds, I guess it's slightly better than possibly using an outdated
> >>> RTE entry?  Albeit this should never happen.
> >>
> >> Yes to the last remark; I don't actually see what you would want to do
> >> with the propagated return value.
> > 
> > OK, I can this this not being clear.  By propagate here I mean
> > propagate to the hardware registers, not to the function caller.
> 
> I.e. you still think adding the two hpet_write32() is going to be useful?
> The mask/unmask, as I did say in another reply to your comments, isn't
> useful here anyway (for already not being atomic), so I wouldn't see much
> sense in having them.

Right, for it to be correct the masking would need to include the
iommu_update_ire_from_msi() call also.

> Plus of course we'd want to avoid the writes on
> release builds if the values actually match, i.e. the construct would then
> rather end up as two if-mismatch-then-write-else-assert-unreachable ones.

My concern would be that after this change we won't cope anymore with
iommu_update_ire_from_msi() returning a value > 1.  Which might be
fine, as it's in theory not possible, but seems less robust than it
was before the change.  I guess it's the price to pay for avoiding the
masking (unless you have other options).

Looking at the existing code is likely no worse than when
iommu_update_ire_from_msi() returning an error, and that case is
already silently ignored by hpet_msi_set_affinity().  So I think
silently ignoring > 0 is not that different, and doesn't make the
current handling much worse.  It would be nice to handle those better,
but can be done separately.

Thanks, Roger.

Re: [PATCH v2 for-4.21 2/9] x86/HPET: use single, global, low-priority vector for broadcast IRQ

Reply via email to