On 2016-08-09 11:35, Christian König wrote:
Am 09.08.2016 um 17:49 schrieb Jay Cornwall:
On 2016-08-09 07:52, Christian König wrote:
From: Christian König <[email protected]>
We align to 64KB, but when userspace aligns even more we can easily
use more.
Signed-off-by: Christian König <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index e6c030b..88f4109 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -817,13 +817,13 @@ static void amdgpu_vm_frag_ptes(struct
amdgpu_pte_update_params *params,
* allocation size to the fragment size.
*/
- /* SI and newer are optimized for 64KB */
- uint64_t frag_flags =
AMDGPU_PTE_FRAG(AMDGPU_LOG2_PAGES_PER_FRAG);
- uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;
+ const uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;
uint64_t frag_start = ALIGN(start, frag_align);
uint64_t frag_end = end & ~(frag_align - 1);
+ uint32_t frag;
+
/* system pages are non continuously */
if (params->src || params->pages_addr || !(flags &
AMDGPU_PTE_VALID) ||
(frag_start >= frag_end)) {
@@ -832,6 +832,10 @@ static void amdgpu_vm_frag_ptes(struct
amdgpu_pte_update_params *params,
return;
}
+ /* use more than 64KB fragment size if possible */
+ frag = lower_32_bits(frag_start | frag_end);
+ frag = likely(frag) ? __ffs(frag) : 31;
+
/* handle the 4K area at the beginning */
if (start != frag_start) {
amdgpu_vm_update_ptes(params, vm, start, frag_start,
@@ -841,7 +845,7 @@ static void amdgpu_vm_frag_ptes(struct
amdgpu_pte_update_params *params,
/* handle the area in the middle */
amdgpu_vm_update_ptes(params, vm, frag_start, frag_end, dst,
- flags | frag_flags);
+ flags | AMDGPU_PTE_FRAG(frag));
/* handle the 4K area at the end */
if (frag_end != end) {
Would this change not direct larger fragments away from the BigK TLB
partition?
My understanding was VM_L2_CNTL3.L2_CACHE_BIGK_FRAGMENT_SIZE is an
exact match and not a minimum size. I can't find any immediate
documentation on that topic to confirm.
Yeah I was questioning that myself as well, especially since you wrote
in the initial patch that SI and later are optimized for 64K.
The 64K figure came from VM documentation. It was otherwise unqualified
but my guess is it was based on VidMM's page size (64K), the average
gaming work set size, and the number of BigK entries. Apparently it's
still good as we haven't changed it since.
So I tested it on Tonga and Polaris10 and it seems to work as
expected, e.g. a 1MB fragment size really results in not reading the
other page table entries as soon as it is cached.
But I'm not sure how exactly this partitioning of the L2 works and
what effect it should have.
OK. As long as there's no regression on e.g. Heaven, where I benchmarked
the original change at + several percent, then it should be fine.
--
Jay Cornwall
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx