Currently, `fill_sg_entry()` splits the scatterlist using `UINT_MAX`.
This creates a non-page-aligned DMA length (`0xFFFFFFFF`) for the
first entry, resulting in non-page-aligned DMA addresses for all
subsequent entries.

While the underlying IOMMU mapping may be contiguous, hardware
DMA engines often require explicit address alignment (e.g., page,
cacheline, or storage sector boundaries). Passing unaligned
addresses and lengths can cause explicit failures in DMA descriptor
creation or silent data corruption if lower unaligned bits are
truncated.

Fix this by splitting the scatterlist into 2G chunks. An alternative
previously considered was to use the largest page aligned chunk within
`UINT_MAX` (`ALIGN_DOWN(UINT_MAX, PAGE_SIZE)`) to satisfy page
alignment. A 2G chunk is better as it naturally aligns with most known
hardware boundaries, while also allowing compiler optimizations with
simple bit shifts. This ensures all scatterlist DMA addresses and
lengths remain page aligned and satisfy hardware constraints.

Page-aligned entries allow the system to cleanly chunk payloads into
PCIe MaxPayloadSize (MPS) (e.g., 128 bytes, 256 bytes, 512 bytes).
As a result, this may help reduce TLP fragmentation in P2P transfers
and alleviate potential congestion within a logical PCIe switch
partition, especially when Relaxed Ordering is not possible due to
hardware constraints.

Reported-by: sashiko-bot <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Fixes: 3aa31a8bb11e ("dma-buf: provide phys_vec to scatter-gather mapping 
routine")
Cc: [email protected]
Signed-off-by: David Hu <[email protected]>
---
 Changes in v2:
 - Updated commit title and message to reflect the switch to 2G chunks
 - Switch to using 2G as the max sg entry size as it naturally aligns
   with most hardware boundaries, while allowing compiler optimizations
   with bit shifts (David Laight)
 - Optimized away division calculation for `nent`, and multiplication
   calculation for sgl address, by dropping the `for` loop in favor of a
   `while (length)` loop (David Laight)
 - Dropped `min_t` in favor of `min()` to maintain a strict type
   checking safety net (David Laight)

 drivers/dma-buf/dma-buf-mapping.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/dma-buf/dma-buf-mapping.c 
b/drivers/dma-buf/dma-buf-mapping.c
index 794acff2546a..2d88e08c5ebf 100644
--- a/drivers/dma-buf/dma-buf-mapping.c
+++ b/drivers/dma-buf/dma-buf-mapping.c
@@ -5,16 +5,17 @@
  */
 #include <linux/dma-buf-mapping.h>
 #include <linux/dma-resv.h>
+#include <linux/sizes.h>
+
+#define MAX_SG_ENT_SZ ((size_t)SZ_2G)
 
 static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t 
length,
                                         dma_addr_t addr)
 {
-       unsigned int len, nents;
-       int i;
+       size_t len;
 
-       nents = DIV_ROUND_UP(length, UINT_MAX);
-       for (i = 0; i < nents; i++) {
-               len = min_t(size_t, length, UINT_MAX);
+       while (length) {
+               len = min(length, MAX_SG_ENT_SZ);
                length -= len;
                /*
                 * DMABUF abuses scatterlist to create a scatterlist
@@ -24,11 +25,12 @@ static struct scatterlist *fill_sg_entry(struct scatterlist 
*sgl, size_t length,
                 * does not require the CPU list for mapping or unmapping.
                 */
                sg_set_page(sgl, NULL, 0, 0);
-               sg_dma_address(sgl) = addr + (dma_addr_t)i * UINT_MAX;
+               sg_dma_address(sgl) = addr;
                sg_dma_len(sgl) = len;
+               addr += len;
+               /* Unconditionally advance. On last segment, this becomes NULL 
*/
                sgl = sg_next(sgl);
        }
-
        return sgl;
 }
 
@@ -41,14 +43,14 @@ static unsigned int calc_sg_nents(struct dma_iova_state 
*state,
 
        if (!state || !dma_use_iova(state)) {
                for (i = 0; i < nr_ranges; i++)
-                       nents += DIV_ROUND_UP(phys_vec[i].len, UINT_MAX);
+                       nents += DIV_ROUND_UP(phys_vec[i].len, MAX_SG_ENT_SZ);
        } else {
                /*
                 * In IOVA case, there is only one SG entry which spans
                 * for whole IOVA address space, but we need to make sure
                 * that it fits sg->length, maybe we need more.
                 */
-               nents = DIV_ROUND_UP(size, UINT_MAX);
+               nents = DIV_ROUND_UP(size, MAX_SG_ENT_SZ);
        }
 
        return nents;
-- 
2.55.0.rc0.799.gd6f94ed593-goog

Reply via email to