On Sun, Jun 7, 2026 at 11:42 PM Christian König <[email protected]>
wrote:

> On 6/5/26 20:44, Bobby Eshleman wrote:
> > On Fri, Jun 05, 2026 at 11:30:07AM +0200, Christian König wrote:
> >> On 6/4/26 02:42, Bobby Eshleman wrote:
> >>> From: Bobby Eshleman <[email protected]>
> >>>
> >>> get_sg_table() emitted one PAGE_SIZE sg entry per page even when the
> >>> underlying folio was larger.
> >>>
> >>> Instead, walk folios[] and emit one sg entry per folio. When folios
> >>> represent large pages (as is for MFD_HUGETLB), each sg entry is a large
> >>> page. Normal PAGE_SIZE sg tables are unchanged.
> >>>
> >>> Required by net/core/devmem to support rx-buf-size > PAGE_SIZE with
> >>> udmabuf.
> >>
> >> That doesn't explain why this is required.
> >
> > Sure, can definitely add. Devmem currently requires dmabuf sg entries to
> > be length and size aligned when it allocates niovs for NIC page pools.
> > Though udmabuf is not violating any dmabuf contract by emitting
> > PAGE_SIZE entries and the above restriction is probably more a
> > shortfalling of devmem, by emitting a single entry per folio this patch
> > allows udmabuf to be used by devmem for large pages.
> >
> >>
> >> Please note that accessing the pages/folio of an sg-table returned by
> DMA-buf is illegal and strictly forbidden!
> >>
> >> Regards,
> >> Christian.
> >
> > It seems both devmem and io_uring zcrx at least introspect through to
> > the sg-table to build NIC page pools (not accessing the memory itself,
> > however). Is there a better way?
>
> That's an absolute NO-GO! We need to stop that immediately.
>
> Touching the underlying struct page of an DMA-buf exported sg-table is
> strictly forbidden.
>
> We even have code to wrap the sg_table and hide the struct pages on debug
> builds to catch those issues, see function dma_buf_wrap_sg_table().
>
> My last status is that the NIC page pools are build directly from the DMA
> addresses exposed by the sg_table.
>
> Was there any change I'm not aware of?
>
> Regards,
> Christian.


Oh no change, your mental model is still current.
They just go through each sg and use sg_dma_address() on each.

Best,
Bobby


> >
> > Best,
> > Bobby
> >
> >>
> >>> Signed-off-by: Bobby Eshleman <[email protected]>
> >>> ---
> >>>  drivers/dma-buf/udmabuf.c | 47
> ++++++++++++++++++++++++++++++++++++++++++-----
> >>>  1 file changed, 42 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> >>> index 94b8ecb892bb..f28dd3788ada 100644
> >>> --- a/drivers/dma-buf/udmabuf.c
> >>> +++ b/drivers/dma-buf/udmabuf.c
> >>> @@ -141,26 +141,63 @@ static void vunmap_udmabuf(struct dma_buf *buf,
> struct iosys_map *map)
> >>>         vm_unmap_ram(map->vaddr, ubuf->pagecount);
> >>>  }
> >>>
> >>> +/* Return the number of contiguous pages backed by the folio at @i.
> >>> + * A udmabuf may map only part of a folio, or reference the same folio
> >>> + * in multiple non-contiguous runs, so folio_nr_pages() can't be used.
> >>> + */
> >>> +static pgoff_t udmabuf_folio_nr_pages(struct udmabuf *ubuf, pgoff_t i)
> >>> +{
> >>> +       struct folio *f = ubuf->folios[i];
> >>> +       pgoff_t j;
> >>> +
> >>> +       for (j = 1; i + j < ubuf->pagecount; j++) {
> >>> +               if (ubuf->folios[i + j] != f)
> >>> +                       break;
> >>> +               /* Same folio, but not a sequential offset within it.
> */
> >>> +               if (ubuf->offsets[i + j] != ubuf->offsets[i] + j *
> PAGE_SIZE)
> >>> +                       break;
> >>> +       }
> >>> +       return j;
> >>> +}
> >>> +
> >>> +/* Count the contiguous folio runs in @ubuf, one sg entry per run. */
> >>> +static unsigned int udmabuf_sg_nents(struct udmabuf *ubuf)
> >>> +{
> >>> +       unsigned int nents = 0;
> >>> +       pgoff_t i;
> >>> +
> >>> +       for (i = 0; i < ubuf->pagecount; i +=
> udmabuf_folio_nr_pages(ubuf, i))
> >>> +               nents++;
> >>> +       return nents;
> >>> +}
> >>> +
> >>>  static struct sg_table *get_sg_table(struct device *dev, struct
> dma_buf *buf,
> >>>                                      enum dma_data_direction direction)
> >>>  {
> >>>         struct udmabuf *ubuf = buf->priv;
> >>> -       struct sg_table *sg;
> >>>         struct scatterlist *sgl;
> >>> -       unsigned int i = 0;
> >>> +       struct sg_table *sg;
> >>> +       pgoff_t i, run;
> >>> +       unsigned int nents;
> >>>         int ret;
> >>>
> >>> +       nents = udmabuf_sg_nents(ubuf);
> >>> +
> >>>         sg = kzalloc_obj(*sg);
> >>>         if (!sg)
> >>>                 return ERR_PTR(-ENOMEM);
> >>>
> >>> -       ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL);
> >>> +       ret = sg_alloc_table(sg, nents, GFP_KERNEL);
> >>>         if (ret < 0)
> >>>                 goto err_alloc;
> >>>
> >>> -       for_each_sg(sg->sgl, sgl, ubuf->pagecount, i)
> >>> -               sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE,
> >>> +       sgl = sg->sgl;
> >>> +       for (i = 0; i < ubuf->pagecount; i += run) {
> >>> +               run = udmabuf_folio_nr_pages(ubuf, i);
> >>> +               sg_set_folio(sgl, ubuf->folios[i], run << PAGE_SHIFT,
> >>>                              ubuf->offsets[i]);
> >>> +               sgl = sg_next(sgl);
> >>> +       }
> >>>
> >>>         ret = dma_map_sgtable(dev, sg, direction, 0);
> >>>         if (ret < 0)
> >>>
> >>> --
> >>> 2.53.0-Meta
> >>>
> >>
>
>

Reply via email to