Am 12.05.2017 um 21:25 schrieb Felix Kuehling:
On 17-05-12 04:43 AM, Christian König wrote:
Am 12.05.2017 um 10:37 schrieb zhoucm1:
If the sdma is faster, even they wait for finish, which time is
shorter than CPU, isn't it? Of course, the precondition is sdma is
exclusive. They can reserve a sdma for PT updating.
No, if I understood Felix numbers correctly the setup and wait time
for SDMA is a bit (but not much) longer than doing it with the CPU.
I'm skeptical of claims that SDMA is faster. Even when you use SDMA to
write the page table, the CPU still has to do about the same amount of
work writing PTEs into the SDMA IBs. SDMA can only save CPU time in
certain cases:
* Copying PTEs from GART table if they are on the same GPU (not
possible on Vega10 due to different MTYPE bits)
* Generating PTEs for contiguous VRAM BOs
At least for system memory BOs writing the PTEs directly to
write-combining VRAM should be faster than writing them to cached system
memory IBs first and then kicking off an SDMA transfer and waiting for
completion.
That's unfortunately not correct at all.
Nicolai did quite some measurements on this and even with WC enabled on
most systems the SDMA is more efficient transferring even small amounts
of memory over the bus than the CPU.
And no we couldn't figure why, it indeed doesn't make much sense when WC
is enabled.
I think the SDMA is simply optimized for those kinds of transfers, so
even considering the overhead of allocating an IB.
So anything larger than I would say 1KB is faster handled when you write
it to system memory and then copy it to VRAM with the SDMA.
Regards,
Christian.
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx