On 4/29/22 08:54, Tian, Kevin wrote:
>> From: Joao Martins <[email protected]>
>> Sent: Friday, April 29, 2022 5:09 AM
>>
>> Add to iommu domain operations a set of callbacks to
>> perform dirty tracking, particulary to start and stop
>> tracking and finally to test and clear the dirty data.
>
> to be consistent with other context, s/test/read/
>
/me nods
>>
>> Drivers are expected to dynamically change its hw protection
>> domain bits to toggle the tracking and flush some form of
>
> 'hw protection domain bits' sounds a bit weird. what about
> just using 'translation structures'?
>
I replace with that instead.
>> control state structure that stands in the IOVA translation
>> path.
>>
>> For reading and clearing dirty data, in all IOMMUs a transition
>> from any of the PTE access bits (Access, Dirty) implies flushing
>> the IOTLB to invalidate any stale data in the IOTLB as to whether
>> or not the IOMMU should update the said PTEs. The iommu core APIs
>> introduce a new structure for storing the dirties, albeit vendor
>> IOMMUs implementing .read_and_clear_dirty() just use
>
> s/vendor IOMMUs/iommu drivers/
>
> btw according to past history in iommu mailing list sounds like
> 'vendor' is not a term welcomed in the kernel, while there are
> many occurrences in this series.
>
Hmm, I wasn't aware actually.
Will move away from using 'vendor'.
> [...]
>> Although, The ARM SMMUv3 case is a tad different that its x86
>> counterparts. Rather than changing *only* the IOMMU domain device entry
>> to
>> enable dirty tracking (and having a dedicated bit for dirtyness in IOPTE)
>> ARM instead uses a dirty-bit modifier which is separately enabled, and
>> changes the *existing* meaning of access bits (for ro/rw), to the point
>> that marking access bit read-only but with dirty-bit-modifier enabled
>> doesn't trigger an perm io page fault.
>>
>> In pratice this means that changing iommu context isn't enough
>> and in fact mostly useless IIUC (and can be always enabled). Dirtying
>> is only really enabled when the DBM pte bit is enabled (with the
>> CD.HD bit as a prereq).
>>
>> To capture this h/w construct an iommu core API is added which enables
>> dirty tracking on an IOVA range rather than a device/context entry.
>> iommufd picks one or the other, and IOMMUFD core will favour
>> device-context op followed by IOVA-range alternative.
>
> Above doesn't convince me on the necessity of introducing two ops
> here. Even for ARM it can accept a per-domain op and then walk the
> page table to manipulate any modifier for existing mappings. It
> doesn't matter whether it sets one bit in the context entry or multiple
> bits in the page table.
>
OK
> [...]
>> +
>
> Miss comment for this function.
>
ack
>> +unsigned int iommu_dirty_bitmap_record(struct iommu_dirty_bitmap
>> *dirty,
>> + unsigned long iova, unsigned long length)
>> +{
>> + unsigned long nbits, offset, start_offset, idx, size, *kaddr;
>> +
>> + nbits = max(1UL, length >> dirty->pgshift);
>> + offset = (iova - dirty->iova) >> dirty->pgshift;
>> + idx = offset / (PAGE_SIZE * BITS_PER_BYTE);
>> + offset = offset % (PAGE_SIZE * BITS_PER_BYTE);
>> + start_offset = dirty->start_offset;
>
> could you elaborate the purpose of dirty->start_offset? Why dirty->iova
> doesn't start at offset 0 of the bitmap?
>
It is to deal with page-unaligned addresses.
Like if the start of the bitmap -- and hence bitmap base IOVA for the first bit
of the
bitmap -- isn't page-aligned and starts in the offset of a given page. Thus
start-offset
is to know bit in the pinned page does dirty::iova correspond to.
>> +
>> + while (nbits > 0) {
>> + kaddr = kmap(dirty->pages[idx]) + start_offset;
>> + size = min(PAGE_SIZE * BITS_PER_BYTE - offset, nbits);
>> + bitmap_set(kaddr, offset, size);
>> + kunmap(dirty->pages[idx]);
>
> what about the overhead of kmap/kunmap when it's done for every
> dirtied page (as done in patch 18)?
Isn't it an overhead mainly with highmem? Otherwise it ends up being
page_to_virt(...)
But anyways the kmap's should be cached, and teardown when pinning the next
user data.
Performance analysis is also something I want to fully hash out (as mentioned
in the cover
letter).
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu