This series significantly reduces the IOMMU/DMA overhead for I/O, particularly when the IOMMU is configured in STRICT or LAZY mode. I modified t/io_uring in fio to exercise this path and tested with an Intel Optane device. On my setup, I see the following improvement:
- STRICT: before = 570 KIOPS, after = 5.01 MIOPS - LAZY: before = 1.93 MIOPS, after = 5.01 MIOPS - PASSTHROUGH: before = 5.01 MIOPS, after = 5.01 MIOPS The STRICT/LAZY numbers clearly show the benefit of avoiding per-I/O dma_map/dma_unmap and reusing the pre-mapped DMA addresses. -- Anuj Gupta
