Thanks, will have a look - can't look right now, will see if I can sometime tomorrow.
Alex On Mon, 3 Jun 2019 at 13:27, Koenig, Christian <[email protected]> wrote: > Am 03.06.19 um 14:21 schrieb Alex Smith: > > On Mon, 3 Jun 2019 at 11:57, Koenig, Christian <[email protected]> > wrote: > >> Am 02.06.19 um 12:32 schrieb Alex Smith: >> > Put the uncached GTT type at a higher index than the visible VRAM type, >> > rather than having GTT first. >> > >> > When we don't have dedicated VRAM, we don't have a non-visible VRAM >> > type, and the property flags for GTT and visible VRAM are identical. >> > According to the spec, for types with identical flags, we should give >> > the one with better performance a lower index. >> > >> > Previously, apps which follow the spec guidance for choosing a memory >> > type would have picked the GTT type in preference to visible VRAM (all >> > Feral games will do this), and end up with lower performance. >> > >> > On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in >> > the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of >> > other (Feral) games and saw similar improvement on those as well. >> >> Well that patch doesn't looks like a good idea to me. >> >> Using VRAM over uncached GTT should have something between no and only >> minimal performance difference on APU. >> >> To make things even worse VRAM is still needed for scanout and newer >> laptops have only a very very low default setting (32 or 16MB). So you >> can end up in VRAM clashing on those systems. >> >> Can you check some kernel statistics to figure out what exactly is going >> on here? >> > > What statistics should I look at? > > > First of all take a look at amdgpu_gem_info file in the debugfs directory > while using GTT and match that to using VRAM. You should see a lot more of > GTT ... CPU_GTT_USWC entries with the GTT variant. If the CPU_GTT_USWC flag > is missing we have found the problem. > > If that looks ok, then take a look at the ttm_page_pool or > ttm_dma_page_pool file and see how many wc/uc and wc/uc huge pages you got. > Huge pages should be used for anything larger than 2MB, if not we have > found the problem. > > If that still isn't the issue I need to take a look at the VM code again > and see if we still map VRAM/GTT differently on APUs. > > Thanks, > Christian. > > > Thanks, > Alex > > >> >> Regards, >> Christian. >> >> > >> > Signed-off-by: Alex Smith <[email protected]> >> > --- >> > I noticed that the memory types advertised on my Raven laptop looked a >> > bit odd so played around with it and found this. I'm not sure if it is >> > actually expected that the performance difference between visible VRAM >> > and GTT is so large, seeing as it's not dedicated VRAM, but the results >> > are clear (and consistent, tested multiple times). >> > --- >> > src/amd/vulkan/radv_device.c | 18 +++++++++++++++--- >> > 1 file changed, 15 insertions(+), 3 deletions(-) >> > >> > diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c >> > index 3cf050ed220..d36ee226ebd 100644 >> > --- a/src/amd/vulkan/radv_device.c >> > +++ b/src/amd/vulkan/radv_device.c >> > @@ -171,12 +171,11 @@ radv_physical_device_init_mem_types(struct >> radv_physical_device *device) >> > .heapIndex = vram_index, >> > }; >> > } >> > - if (gart_index >= 0) { >> > + if (gart_index >= 0 && device->rad_info.has_dedicated_vram) { >> > device->mem_type_indices[type_count] = >> RADV_MEM_TYPE_GTT_WRITE_COMBINE; >> > device->memory_properties.memoryTypes[type_count++] = >> (VkMemoryType) { >> > .propertyFlags = >> VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | >> > - VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | >> > - (device->rad_info.has_dedicated_vram ? 0 : >> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT), >> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, >> > .heapIndex = gart_index, >> > }; >> > } >> > @@ -189,6 +188,19 @@ radv_physical_device_init_mem_types(struct >> radv_physical_device *device) >> > .heapIndex = visible_vram_index, >> > }; >> > } >> > + if (gart_index >= 0 && !device->rad_info.has_dedicated_vram) { >> > + /* Put GTT after visible VRAM for GPUs without dedicated >> VRAM >> > + * as they have identical property flags, and according >> to the >> > + * spec, for types with identical flags, the one with >> greater >> > + * performance must be given a lower index. */ >> > + device->mem_type_indices[type_count] = >> RADV_MEM_TYPE_GTT_WRITE_COMBINE; >> > + device->memory_properties.memoryTypes[type_count++] = >> (VkMemoryType) { >> > + .propertyFlags = >> VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | >> > + VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | >> > + VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, >> > + .heapIndex = gart_index, >> > + }; >> > + } >> > if (gart_index >= 0) { >> > device->mem_type_indices[type_count] = >> RADV_MEM_TYPE_GTT_CACHED; >> > device->memory_properties.memoryTypes[type_count++] = >> (VkMemoryType) { >> >> >
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
