On 9/12/25 17:51, Peter Maydell wrote:
(3) address_space_cache_init(), which initializes a MemoryRegionCache
which you can then use for hopefully faster read and write
operations via address_space_read_cached() and
address_space_write_cached().
(And the cached ld/st variants as well)
Again, subject to limitations: must operate on RAM;
You can operate on non-RAM but it will not be any faster.
you might not be able to access the whole
range you wanted. This currently seems to be used solely by
virtio.
Indeed. The APIs compare like this
fast gives void* limits
direct - - -
map y y 1 MR, 1 bounce buffer
cached y - 1 MR
MemoryRegionCache has the additional complication of needing a
MemoryListener to invalidate the cache, if the MemoryRegionCache is long
lived. This could be done globally, it just wasn't necessary while
virtio was the only user.
You could create a MemoryRegionCache for the duration of (say) a single
function call, and then you don't need to deal with invalidation; but
then the single bounce buffer limitation is not a problem and map/unmap
is probably easier to use.
In particular, I'm working on a GICv5 model. This device puts a
lot of its working data structures into guest memory, so we're going
to be accessing guest memory a lot. The device spec says if you point
it at not-RAM you get to keep both pieces, and requires the guest
not to try to change the contents of that memory underfoot without
notifying it, so this seems like it ought to be a good candidate
for some kind of "act like you have this memory cached so you don't
need to keep looking it up every time" API...
Yes, that is indeed a good use.
Does the MemoryRegionCache API cover all the use cases we use
address_space_map() and dma_memory_map() for? (i.e. could we
deprecate the latter and transition code over to the new API?)
No, there's no way to get a void* from MemoryRegionCache (you could get
one when the underlying block is RAM by peeking at the struct members;
but there's no bounce buffering by design).
Incidentally, on the subject of the dma.h wrappers -- I've never
really been very clear why we have these. Some devices use them,
but a lot do not.
All PCI devices use them.
The fact that the dma wrappers put in smp_mb()
barriers leaves me wondering if all those other devices that
don't use them have subtle bugs, but OTOH I've never noticed
any problems...
The idea was that PCI specifies the ordering of DMA operations and the
memory barrier provides that ordering when the operations are performed
by the host CPU.
In practice the cases in which ordering is required are limited, and
personally I prefer to write these barriers in the device model so that
the synchronization algorithm is documented. That means you can use
map/unmap or MemoryRegionCache instead, both of which are also faster.
The original and more unique DMA wrapper is dma_blk_io(), which is a
wrapper around block layer APIs that supports cross-memory-region
operations and the fact that the address_space_map() bounce buffer is
only 1 page long. This one, which works together with QEMUSGList, is
used in several block device models (IDE, SCSI, NVMe).
Paolo