On Tue, Oct 29, 2024 at 05:43:24PM +0100, Jan Beulich wrote:
> On 29.10.2024 12:03, Roger Pau Monne wrote:
> > When using AMD-Vi interrupt remapping the vector field in the IO-APIC RTE is
> > repurposed to contain part of the offset into the remapping table.  
> > Previous to
> > 2ca9fbd739b8 Xen had logic so that the offset into the interrupt remapping
> > table would match the vector.  Such logic was mandatory for end of 
> > interrupt to
> > work, since the vector field (even when not containing a vector) is used by 
> > the
> > IO-APIC to find for which pin the EOI must be performed.
> > 
> > Introduce a table to store the EOI handlers when using interrupt remapping, 
> > so
> > that the IO-APIC driver can translate pins into EOI handlers without having 
> > to
> > read the IO-APIC RTE entry.  Note that to simplify the logic such table is 
> > used
> > unconditionally when interrupt remapping is enabled, even if strictly it 
> > would
> > only be required for AMD-Vi.
> 
> In here I think you mean "handle" when you use "handler"?

Indeed.

> Plus with what you said
> earlier about vector vs EOI handle, and with the code using "vector" all over 
> the
> place, their (non-)relationship could also do with clarifying (perhaps better 
> in
> a code comment in __io_apic_eoi()).

I've attempted to clarify the relation between vector vs EOI handle in
the first paragraph, and how that applies to AMD-Vi.  I can move
(part?) of that into the comment in __ioapic_write_entry(), maybe:

/*
 * Might be called before io_apic_pin_eoi is allocated.  Entry will be
 * updated once the array is allocated and there's a write against the
 * pin.
 *
 * Note that the vector field is only cached for raw RTE writes when
 * using IR.  In that case the vector field might have been repurposed
 * to store something different than the target vector, and hence need
 * to be cached for performing EOI.
 */

> > @@ -273,6 +293,13 @@ void __ioapic_write_entry(
> >      {
> >          __io_apic_write(apic, 0x11 + 2 * pin, eu.w2);
> >          __io_apic_write(apic, 0x10 + 2 * pin, eu.w1);
> > +        /*
> > +         * Called in clear_IO_APIC_pin() before io_apic_pin_eoi is 
> > allocated.
> > +         * Entry will be updated once the array is allocated and there's a
> > +         * write against the pin.
> > +         */
> > +        if ( io_apic_pin_eoi )
> > +            io_apic_pin_eoi[apic][pin] = e.vector;
> 
> The comment here looks a little misleading to me. clear_IO_APIC_pin() calls
> here to, in particular, set the mask bit. With the mask bit the vector isn't
> meaningful anyway (and indeed clear_IO_APIC_pin() sets it to zero, at which
> point recording IRQ_VECTOR_UNASSIGNED might be better than the bogus vector
> 0x00).

Note that clear_IO_APIC_pin() performs the call to
__ioapic_write_entry() with raw == false, at which point
__ioapic_write_entry() will call iommu_update_ire_from_apic() if IOMMU
IR is enabled.  The cached 'vector' value will be the IOMMU entry
offset for the AMD-Vi case, as the IOMMU code will perform the call to
__ioapic_write_entry() with raw == true.

What matters is that the cached value matches what's written in the
IO-APIC RTE, and the current logic ensures this.

What's the benefit of using IRQ_VECTOR_UNASSIGNED if the result is
reading the RTE and finding that vector == 0?

Looking at clear_IO_APIC_pin() - I think the function is slightly
bogus.  If entry.trigger is not set, the logic to switch the entry to
level triggered  will fetch the entry contents without requesting a
raw RTE, at which point the entry.vector field can not be used as
the EOI handle since it will contain the vector, not the IR table
offset.  I will need to make a further patch to fix this corner
case.

> > @@ -298,9 +325,17 @@ static void __io_apic_eoi(unsigned int apic, unsigned 
> > int vector, unsigned int p
> >      /* Prefer the use of the EOI register if available */
> >      if ( ioapic_has_eoi_reg(apic) )
> >      {
> > +        if ( io_apic_pin_eoi )
> > +            vector = io_apic_pin_eoi[apic][pin];
> > +
> >          /* If vector is unknown, read it from the IO-APIC */
> >          if ( vector == IRQ_VECTOR_UNASSIGNED )
> > +        {
> >              vector = __ioapic_read_entry(apic, pin, true).vector;
> 
> Related to my comment higher up regarding vector vs EOI handle: Here we're
> doing a raw read, i.e. we don't really fetch the vector but the EOI handle
> in the AMD case. Why is it that this isn't sufficient for directed EOI to
> work (perhaps with the conditional adjusted)?

It is enough, but we don't want to be doing such read for each EOI,
hence why we cache it in io_apic_pin_eoi.

> Then again - are we ever taking this path? Certainly not when coming from
> clear_IO_APIC_pin(), hence ...
> 
> > +            if ( io_apic_pin_eoi )
> 
> ... I'm unconvinced this conditional is needed.

Hm, maybe.  I can adjust but seems more fragile to trigger a
dereference for the extra cost of a conditional in what should be a
non-common path anyway.

> > +                /* Update cached value so further EOI don't need to fetch 
> > it. */
> > +                io_apic_pin_eoi[apic][pin] = vector;
> > +        }
> >  
> >          *(IO_APIC_BASE(apic)+16) = vector;
> >      }
> > @@ -1022,8 +1057,27 @@ static void __init setup_IO_APIC_irqs(void)
> >  
> >      apic_printk(APIC_VERBOSE, KERN_DEBUG "init IO_APIC IRQs\n");
> >  
> > +    if ( iommu_intremap )
> > +    {
> > +        io_apic_pin_eoi = xmalloc_array(typeof(*io_apic_pin_eoi), 
> > nr_ioapics);
> 
> Nit: Strictly speaking this and ...
> 
> > +        BUG_ON(!io_apic_pin_eoi);
> > +    }
> > +
> >      for (apic = 0; apic < nr_ioapics; apic++) {
> > -        for (pin = 0; pin < nr_ioapic_entries[apic]; pin++) {
> > +        const unsigned int nr_entries = nr_ioapic_entries[apic];
> > +
> > +        if ( iommu_intremap )
> > +        {
> > +            io_apic_pin_eoi[apic] = 
> > xmalloc_array(typeof(**io_apic_pin_eoi),
> > +                                                  nr_entries);
> 
> ... and this should be xvmalloc_array() in new code.

Sorry, didn't notice we have that now.

> Also this 2nd conditional may better use io_apic_pin_eoi, such that the two
> conditionals don't need keeping in sync. Note also how Andrew previously
> pointed out that both conditionals aren't Misra-compliant right now.

Oh, yes, completely forgot to adjust the MISRA comment from Andrew,
sorry.

Thanks, Roger.

Reply via email to