Am 12.05.2017 um 21:25 schrieb Felix Kuehling:
On 17-05-12 04:43 AM, Christian König wrote:
Am 12.05.2017 um 10:37 schrieb zhoucm1:


If the sdma is faster, even they wait for finish, which time is
shorter than CPU, isn't it? Of course, the precondition is sdma is
exclusive. They can reserve a sdma for PT updating.

No, if I understood Felix numbers correctly the setup and wait time
for SDMA is a bit (but not much) longer than doing it with the CPU.
I'm skeptical of claims that SDMA is faster. Even when you use SDMA to
write the page table, the CPU still has to do about the same amount of
work writing PTEs into the SDMA IBs. SDMA can only save CPU time in
certain cases:

   * Copying PTEs from GART table if they are on the same GPU (not
     possible on Vega10 due to different MTYPE bits)
   * Generating PTEs for contiguous VRAM BOs

At least for system memory BOs writing the PTEs directly to
write-combining VRAM should be faster than writing them to cached system
memory IBs first and then kicking off an SDMA transfer and waiting for
completion.

That's unfortunately not correct at all.

Nicolai did quite some measurements on this and even with WC enabled on most systems the SDMA is more efficient transferring even small amounts of memory over the bus than the CPU.

And no we couldn't figure why, it indeed doesn't make much sense when WC is enabled.

I think the SDMA is simply optimized for those kinds of transfers, so even considering the overhead of allocating an IB.

So anything larger than I would say 1KB is faster handled when you write it to system memory and then copy it to VRAM with the SDMA.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to