Hi Kevin,

On 12/06/2026 09:42, Tian, Kevin wrote:
>> From: Matt Evans <[email protected]>
>> Sent: Wednesday, June 10, 2026 11:43 PM
>>
>> +int vfio_pci_dma_buf_find_pfn(struct vfio_pci_dma_buf *priv,
>> +                          struct vm_area_struct *vma,
>> +                          unsigned long address,
>> +                          unsigned int order,
>> +                          unsigned long *out_pfn)
>> +{
>> +    /*
>> +     * Given a VMA (start, end, pgoffs) and a fault address,
>> +     * search the corresponding DMABUF's phys_vec[] to find the
>> +     * range representing the address's offset into the VMA, and
>> +     * its PFN.
>> +     *
>> +     * The phys_vec[] ranges represent contiguous spans of VAs
>> +     * upwards from the buffer offset 0; the actual PFNs might be
>> +     * in any order, overlap/alias, etc.  Calculate an offset of
>> +     * the desired page given VMA start/pgoff and address, then
>> +     * search upwards from 0 to find which span contains it.
>> +     *
>> +     * On success, a valid PFN for a page sized by 'order' is
>> +     * returned into out_pfn.
>> +     *
>> +     * Failure occurs if:
>> +     * - The page would cross the edge of the VMA
>> +     * - The page isn't entirely contained within a range
>> +     * - We find a range, but the final PFN isn't aligned to the
>> +     *   requested order.
>> +     *
>> +     * (Upon failure, the caller is expected to try again with a
>> +     * smaller order; the tests above will always succeed for
>> +     * order=0 as the limit case.)
>> +     *
>> +     * It's suboptimal if DMABUFs are created with neigbouring
> 
> s/neigbouring/neighboring/

Ah, not a typo. :)  That is en_GB and AFAIK is permitted.

>> +     * ranges that are physically contiguous, since hugepages
>> +     * can't straddle range boundaries.  (The construction of the
>> +     * ranges vector should merge such ranges.)
> 
> though the field is called 'phys_vec', removing 'vector' in description
> is clearer here.

Fair, reworded.

>> +     *
>> +     * Finally, vma_pgoff_adjust is used for a DMABUF representing
>> +     * a VFIO BAR mmap, which is created from the start of the
>> +     * offset region.
> 
> Elaborate it a little bit that the vm_pgoff is already counted in paddr
> of phys_vec so it should be skipped when finding the pfn.

OK!  Expanded this paragraph slightly to explain that vma_pgoff_adjust
avoids double-accounting, and that a BAR mmap() DMABUF is created such
that the start of the VMA (even with an offset) equals the start of the
DMABUF and equals the start of the physical range.

>> +     */
>> +
>> +    const unsigned long pagesize = PAGE_SIZE << order;
>> +    unsigned long vma_off = ((vma->vm_pgoff - priv->vma_pgoff_adjust)
>> <<
>> +                             PAGE_SHIFT) & VFIO_PCI_OFFSET_MASK;
>> +    unsigned long rounded_page_addr = ALIGN_DOWN(address,
>> pagesize);
>> +    unsigned long rounded_page_end = rounded_page_addr + pagesize;
>> +    unsigned long page_buf_offset;
>> +    unsigned long page_buf_offset_end;
> 
> what about "fault_offset[_end]"? page_buf is a bit confusing.

I went round several times with these names, thanks for the input.  Just
tried it out and your suggestion is clearer.

>> +    unsigned long range_buf_offset = 0;
> 
> could this be called 'range_start' then the 'range_start' in latter loop
> is renamed to 'phys_start'?
> 
> Not strong... just feel such naming helps me understand the logic easier

Anything that helps helps, thanks.  I ended up renaming this to
range_start_offset (as offset is IMHO important).

I'm a fan of diagrams but this is too large to include in a comment.
But for posterity on the list, and using the new names, an illustration
of a DMABUF with 3 ranges in phys_vec, where a mapping's
faulting page offset lies in range [1]:

                               fault_addr--+
                                           v                   VMA
                    +-----------------+----------+-----------------+
                    |                 | Faulting |                 |
                    |                 | (hg)page |                 |
                    |                 |          |                 |
 |---- vma_off ---->+-----------------+----------+-----------------+
 |                                    .          .
 |                                    .          .
 |--------- fault_offset ------------>.          .             DMABUF
 +-------------------------+---------------------------+--------------+
 | phys_vec[0]             | phys_vec[1]         .     | phys_vec[2]  |
 |    .paddr               |          .          .     |              |
 |    .len                 |          .          .     |              |
 +-------------------------+---------------------------+--------------+
 0                         :          .          .     :              L
 |-- range_start_offset -->:          .          .  -->: range_len
                           :          .          .     :
                           V          .          .     :
                           +----------+----------+-----+
                           |.paddr    | PFN      |     |
                           |          |          |     |
                           |          |          |     |
                           +----------+----------+-----+
                                      P

 P = paddr + (fault_offset - range_start_offset)
 L = sum(phys_vec[0...2].len)

>> +    unsigned int i;
>> +
>> +    if (rounded_page_addr < vma->vm_start || rounded_page_end >
>> vma->vm_end) {
>> +            if (order > 0)
>> +                    return -EAGAIN;
>> +
>> +            /* A fault address outside of the VMA is absurd. */
>> +            WARN(1, "Fault addr 0x%lx outside VMA 0x%lx-0x%lx\n",
>> +                 address, vma->vm_start, vma->vm_end);
>> +            return -EFAULT;
>> +    }
>> +
>> +    /*
>> +     * page_buff_offset[_end] is the span of DMABUF offsets
>> +     * corresponding to the faulting page:
>> +     */
> 
> if the naming is kept then s/page_buff_offset/page_buf_offset/
> 
> otherwise,
> 
> Reviewed-by: Kevin Tian <[email protected]>

Thank you,


Matt


Reply via email to