[PATCH] drm/amd/pm: refine amdgpu pm sysfs node error code

2025-09-02 Thread Yang Wang
Returns different error codes based on the scenario to help the user app understand the AMDGPU device status when an exception occurs. Signed-off-by: Yang Wang --- drivers/gpu/drm/amd/pm/amdgpu_pm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/

Re: [PATCH v6 00/11] Improvements to S5 power consumption

2025-09-02 Thread Mario Limonciello
On 8/17/2025 9:00 PM, Mario Limonciello (AMD) wrote: A variety of issues both in function and in power consumption have been raised as a result of devices not being put into a low power state when the system is powered off. There have been some localized changes[1] to PCI core to help these issu

Re: [PATCH v3 6/6] drm: panel-backlight-quirks: Log applied panel brightness quirks

2025-09-02 Thread Mario Limonciello
On 8/29/2025 10:01 AM, Antheas Kapenekakis wrote: On Fri, 29 Aug 2025 at 16:57, Antheas Kapenekakis wrote: Currently, when a panel brightness quirk is applied, there is no log indicating that a quirk was applied. Unwrap the drm device on its own and use drm_info() to log when a quirk is applie

[PATCH][next] drm/amd/amdgpu: Fix missing error return on kzalloc failure

2025-09-02 Thread Colin Ian King
Currently the kzalloc failure check just sets reports the failure and sets the variable ret to -ENOMEM, which is not checked later for this specific error. Fix this by just returning -ENOMEM rather than setting ret. Fixes: 4fb930715468 ("drm/amd/amdgpu: remove redundant host to psp cmd buf alloca

[PATCH v7 5/8] drm/amdgpu: Implement TTM handling for MMIO_REMAP placement

2025-09-02 Thread Srinivasan Shanmugam
Implement TTM-level behavior for AMDGPU_PL_MMIO_REMAP so it behaves as a CPU-visible IO page: * amdgpu_evict_flags(): mark as unmovable * amdgpu_res_cpu_visible(): consider CPU-visible * amdgpu_bo_move(): use null move when src/dst is MMIO_REMAP * amdgpu_ttm_io_mem_reserve(): program base/is_iomem

Re: [PATCH v2] drm/amdgpu: Increment drm driver minor version for list handles ioctl

2025-09-02 Thread Alex Deucher
On Tue, Sep 2, 2025 at 3:05 PM David Francis wrote: > > With the addition of the drm ioctl > DRM_IOCTL_AMDGPU_GEM_LIST_HANDLES, > the drm driver version should be incremented (to 65) > > Signed-off-by: David Francis Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3

RE: [PATCH 1/3] drm/amdgpu: fix userq VM validation v3

2025-09-02 Thread Liang, Prike
[Public] Regards, Prike > -Original Message- > From: amd-gfx On Behalf Of Christian > König > Sent: Thursday, August 28, 2025 11:02 PM > To: Khatri, Sunil > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander > > Subject: [PATCH 1/3] drm/amdgpu: fix userq VM validation v3 > > T

[PATCH v7 2/8] drm/amdgpu/uapi: Introduce AMDGPU_GEM_DOMAIN_MMIO_REMAP

2025-09-02 Thread Srinivasan Shanmugam
Add a new GEM domain bit AMDGPU_GEM_DOMAIN_MMIO_REMAP to allow userspace to request the MMIO remap (HDP flush) page via GEM_CREATE. - include/uapi/drm/amdgpu_drm.h: * define AMDGPU_GEM_DOMAIN_MMIO_REMAP * include the bit in AMDGPU_GEM_DOMAIN_MASK v2: Add early reject in amdgpu_gem_create_ioct

Re: [PATCH 2/4] drm/amdgpu: Clarify that BO size is in bytes in comments

2025-09-02 Thread Christian König
On 02.09.25 10:08, Timur Kristóf wrote: > On Tue, 2025-09-02 at 08:43 +0200, Christian König wrote: >> On 01.09.25 12:00, Timur Kristóf wrote: >>> To avoid confusion with dwords. >>> >>> Signed-off-by: Timur Kristóf >>> --- >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 ++-- >>>  1 file chang

[v13 06/11] drm/amdgpu/mes12: implement detect and reset callback

2025-09-02 Thread Jesse . Zhang
Implement support for the hung queue detect and reset functionality. v2: Always use AMDGPU_MES_SCHED_PIPE Signed-off-by: Alex Deucher Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 31 ++ 1 file changed, 31 insertions(+) diff --git a/drivers/gp

Re: [PATCH] drm/amdgpu: Correct misnamed function in amdgpu_gem.c

2025-09-02 Thread Alex Deucher
On Sun, Aug 31, 2025 at 6:53 AM Srinivasan Shanmugam wrote: > > The header comment above amdgpu_gem_list_handles_ioctl referenced > drm_amdgpu_gem_list_handles_ioctl. Update the comment to reflect the > actual function identifier to avoid misleading prototype warnings. > > Fixes the below: > drive

Re: [PATCH 1/4] drm/amdgpu: Fix allocating extra dwords for rings

2025-09-02 Thread Timur Kristóf
On Tue, 2025-09-02 at 08:41 +0200, Christian König wrote: > On 01.09.25 12:00, Timur Kristóf wrote: > > The amdgpu_bo_create_kernel function takes a byte count, > > so we need to multiply the extra dword count by four. > > (The ring_size is already in bytes so that one is correct here.) > > Good c

Re: [PATCH] drm/amdkfd: fix p2p links bug in topology

2025-09-02 Thread Alex Deucher
On Mon, Aug 25, 2025 at 10:33 AM Eric Huang wrote: > > When creating p2p links, KFD needs to check XGMI link > with two conditions, hive_id and is_sharing_enabled, > but it is missing to check is_sharing_enabled, so add > it to fix the error. > > Signed-off-by: Eric Huang Acked-by: Alex Deucher

[PATCH 06/11] drm/amd/display: Update dchubbub.h for hubbub perfmon support

2025-09-02 Thread waynelin
From: Wenjing Liu [why] dchubbub supports performance monitoring for hubbub. The interfaces define the performance monitoring events and their attributes. Reviewed-by: Alvin Lee Signed-off-by: Wenjing Liu Signed-off-by: Wayne Lin --- .../gpu/drm/amd/display/dc/inc/hw/dchubbub.h | 22 +++

[PATCH v7 6/8] drm/amdgpu/ttm: Initialize AMDGPU_PL_MMIO_REMAP Heap

2025-09-02 Thread Srinivasan Shanmugam
Add a one-page TTM range manager for AMDGPU_PL_MMIO_REMAP via amdgpu_ttm_init_on_chip(). This only registers the placement with TTM; no BO is allocated in this patch. The singleton 4K remap BO is created and freed in the following patch. This split follows to separate heap bring-up from BO alloca

Re: [REGRESSION] AMD HDMI/DP audio broken after suspend since commit 50e0bae34fa6

2025-09-02 Thread Mario Limonciello
On 8/31/2025 5:12 AM, Przemysław Kopa wrote: Hello, I'm running Radeon RX 9060 XT and since upgrading to the kernel 6.15 I'm facing an issue with audio via DisplayPort. After waking from S3 suspend (sometimes, but not always) audio doesn't work - pavucontrol shows that the output is disconnected

RE: [PATCH 2/3] drm/amdgpu: remove check for BO reservation add assert instead

2025-09-02 Thread Liang, Prike
[Public] Regards, Prike > -Original Message- > From: amd-gfx On Behalf Of Christian > König > Sent: Thursday, August 28, 2025 11:02 PM > To: Khatri, Sunil > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander > > Subject: [PATCH 2/3] drm/amdgpu: remove check for BO reservation

Re: [PATCH] drm/amdgpu: Increment drm driver minor version for list handles ioctl

2025-09-02 Thread Mario Limonciello
On 8/29/2025 1:30 PM, David Francis wrote: With the addition of the drm ioctl DRM_IOCTL_AMDGPU_GEM_LIST_HANDLES, the drm driver version should be incremented (to 65) Signed-off-by: David Francis --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

[PATCH 4/8] drm/amd/display: incorrect conditions for failing dto calculations

2025-09-02 Thread Fangzhi Zuo
From: Clay King [Why & How] Previously, when calculating dto phase, we would incorrectly fail when phase <=0 without additionally checking for the integer value. This meant that calculations would incorrectly fail when the desired pixel clock was an exact multiple of the reference clock. Reviewe

[PATCH 5/8] drm/amd/display: Clear the CUR_ENABLE register on DCN314 w/out DPP PG

2025-09-02 Thread Fangzhi Zuo
From: Ivan Lipski [Why&How] ON DCN314, clearing DPP SW structure without power gating it can cause a double cursor in full screen with non-native scaling. A W/A that clears CURSOR0_CONTROL cursor_enable flag if dcn10_plane_atomic_power_down is called and DPP power gating is disabled. Reviewed-b

Re: [PATCH v6 7/8] drm/amdgpu/ttm: Allocate/Free 4K MMIO_REMAP Singleton

2025-09-02 Thread Alex Deucher
On Tue, Sep 2, 2025 at 9:27 AM Christian König wrote: > > On 02.09.25 15:25, Alex Deucher wrote: > > On Tue, Sep 2, 2025 at 3:38 AM Christian König > > wrote: > >> > >> On 02.09.25 05:29, Srinivasan Shanmugam wrote: > >>> Add mmio_remap bookkeeping to amdgpu_device and introduce > >>> amdgpu_ttm

Re: [PATCH] drm/amd: Re-enable common modes for eDP and LVDS

2025-09-02 Thread Harry Wentland
On 2025-08-28 10:08, Mario Limonciello (AMD) wrote: > [Why] > Although compositors will add their own modes, Xorg won't use it's own > modes and will only stick to modes advertised by the driver. This mean a > user that used to pick 1024x768 could no longer access it unless the > panel's native

Re: [PATCH v9 08/14] drm/amdgpu: add userq object va track helpers

2025-09-02 Thread Alex Deucher
On Mon, Sep 1, 2025 at 5:13 AM Liang, Prike wrote: > > [Public] > > > > Regards, > Prike > > > -Original Message- > > From: Alex Deucher > > Sent: Thursday, August 28, 2025 6:13 AM > > To: Liang, Prike ; Koenig, Christian > > > > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexande

Re: [PATCH][next] drm/amdgpu/amdkfd: Avoid a couple hundred -Wflex-array-member-not-at-end warnings

2025-09-02 Thread Kuehling, Felix
On 2025-08-29 5:58 a.m., Gustavo A. R. Silva wrote: -Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Move the conflicting declarations to the end of the corresponding structures. Notice that `struct dev_pagemap` is a flexible structure, th

[PATCH v7 0/8] drm/amdgpu: add MMIO-remap singleton BO for HDP flush v7

2025-09-02 Thread Srinivasan Shanmugam
This series introduces a kernel-managed singleton BO representing the MMIO-remap (HDP flush) page and exposes it to userspace through a new GEM domain. Design -- - A tiny (1-page) TTM bucket is introduced for AMDGPU_PL_MMIO_REMAP (mirroring doorbells). - A singleton BO is created during am

[PATCH v7 8/8] drm/amdgpu/gem: Return Handle to MMIO_REMAP Singleton in GEM_CREATE

2025-09-02 Thread Srinivasan Shanmugam
Enable userspace to obtain a handle to the kernel-owned MMIO_REMAP singleton when AMDGPU_GEM_DOMAIN_MMIO_REMAP is requested via amdgpu_gem_create_ioctl(). Validate the fixed 4K constraint: if PAGE_SIZE > AMDGPU_GPU_PAGE_SIZE return -EINVAL; when provided, size and alignment must equal AMDGPU_GPU_P

[PATCH v7 4/8] drm/amdgpu: Wire up MMIO_REMAP placement and User-visible strings

2025-09-02 Thread Srinivasan Shanmugam
Wire up the conversions and strings for the new MMIO_REMAP placement: * amdgpu_mem_type_to_domain() maps AMDGPU_PL_MMIO_REMAP -> domain * amdgpu_bo_placement_from_domain() accepts the new domain * amdgpu_bo_mem_stats_placement() and amdgpu_bo_print_info() report it * res cursor supports the new pl

[PATCH v7 3/8] drm/amdgpu/ttm: Add New AMDGPU_PL_MMIO_REMAP Placement

2025-09-02 Thread Srinivasan Shanmugam
Introduce a kernel-internal TTM placement type AMDGPU_PL_MMIO_REMAP for the HDP flush MMIO remap page Plumbing added: - amdgpu_res_cursor.{first,next}: treat MMIO_REMAP like DOORBELL - amdgpu_ttm_io_mem_reserve(): return BAR bus address + offset for MMIO_REMAP, mark as uncached I/O - amdgpu_ttm_

[PATCH v7 1/8] drm/ttm: Bump TTM_NUM_MEM_TYPES to 9 (Prep for AMDGPU_PL_MMIO_REMAP)

2025-09-02 Thread Srinivasan Shanmugam
Increase TTM_NUM_MEM_TYPES from 8 to 9 to accommodate the upcoming AMDGPU_PL_MMIO_REMAP placement. Cc: Alex Deucher Suggested-by: Christian König Signed-off-by: Srinivasan Shanmugam Reviewed-by: Christian König Reviewed-by: Alex Deucher --- include/drm/ttm/ttm_resource.h | 2 +- 1 file chang

[PATCH] drm/amdgpu: Correct info field of bad page threshold exceed CPER

2025-09-02 Thread Xiang Liu
Correct valid_bits and ms_chk_bits of section info field for bad page threshold exceed CPER to match OOB's behavior. Signed-off-by: Xiang Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper

Re: [PATCH][next] drm/amd/amdgpu: Fix missing error return on kzalloc failure

2025-09-02 Thread Alex Deucher
Applied. Thanks! Alex On Tue, Sep 2, 2025 at 8:49 AM Colin Ian King wrote: > > Currently the kzalloc failure check just sets reports the failure > and sets the variable ret to -ENOMEM, which is not checked later > for this specific error. Fix this by just returning -ENOMEM rather > than setting

Re: [PATCH] drm/amdgpu: Fix function header names in amdgpu_connectors.c

2025-09-02 Thread Alex Deucher
On Sun, Aug 31, 2025 at 6:13 AM Srinivasan Shanmugam wrote: > > Align the function headers for `amdgpu_max_hdmi_pixel_clock` and > `amdgpu_connector_dvi_mode_valid` with the function implementations so > they match the expected kdoc style. > > Fixes the below: > drivers/gpu/drm/amd/amdgpu/amdgpu_c

Re: [PATCH] drm/amdkfd: fix p2p links bug in topology

2025-09-02 Thread Eric Huang
Ping ... On 2025-08-25 10:23, Eric Huang wrote: When creating p2p links, KFD needs to check XGMI link with two conditions, hive_id and is_sharing_enabled, but it is missing to check is_sharing_enabled, so add it to fix the error. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_t

Re: [PATCH v6 7/8] drm/amdgpu/ttm: Allocate/Free 4K MMIO_REMAP Singleton

2025-09-02 Thread Christian König
On 02.09.25 15:31, Alex Deucher wrote: > On Tue, Sep 2, 2025 at 9:27 AM Christian König > wrote: >> >> On 02.09.25 15:25, Alex Deucher wrote: >>> On Tue, Sep 2, 2025 at 3:38 AM Christian König >>> wrote: On 02.09.25 05:29, Srinivasan Shanmugam wrote: > Add mmio_remap bookkeeping t

RE: [PATCH 0/8] DC Patches August 25, 2025

2025-09-02 Thread Wheeler, Daniel
[Public] Hi all, This week this patchset was tested on 4 systems, two dGPU and two APU based, and tested across multiple display and connection types. APU * Single Display eDP -> 1080p 60hz, 1920x1200 165hz, 3840x2400 60hz * Single Display DP (SST DSC) -> 4k144hz, 4k240hz

[PATCH 8/8] drm/amd/display: Promote DC to 3.2.348

2025-09-02 Thread Fangzhi Zuo
From: Taimur Hassan Summary: * Refactor bounding box values handling * Fix incorrect condition to fail dto clk calculation * Skip check downlink setting for a certain MST branch device * Fix double cursor issue on dcn314 Signed-off-by: Taimur Hassan Signed-off-by: Alex Hung Tested-by: Dan Whe

[PATCH 2/8] drm/amd/display: Optimize custom brightness curve interpolation

2025-09-02 Thread Fangzhi Zuo
From: Mario Limonciello [Why] Custom brightness curve works by walking through all data points one by one. When the brightness value is at either extreme this is a lot of data points to walk. This is especially noticeable when moving a brightness slider around how it can lag. [How] Bisect the

Re: [PATCH v6 7/8] drm/amdgpu/ttm: Allocate/Free 4K MMIO_REMAP Singleton

2025-09-02 Thread Christian König
On 02.09.25 15:25, Alex Deucher wrote: > On Tue, Sep 2, 2025 at 3:38 AM Christian König > wrote: >> >> On 02.09.25 05:29, Srinivasan Shanmugam wrote: >>> Add mmio_remap bookkeeping to amdgpu_device and introduce >>> amdgpu_ttm_mmio_remap_bo_init()/fini() to manage a kernel-owned, >>> one-page (4K

Re: evergreen_packet3_check:... radeon 0000:1d:00.0: vbo resource seems too big for the bo

2025-09-02 Thread Borislav Petkov
On Mon, Sep 01, 2025 at 11:27:01AM +0200, Michel Dänzer wrote: > use some kind of debug output API which doesn't hit dmesg by default You still want to be enabled by default so that normal users can see it and actually report it. > (can be a non-once variant instead, that's more useful for user-s

[PATCH 1/1] drm/amdgpu: use KMEM_CACHE instead of kmem_cache_create

2025-09-02 Thread Longlong Xia
Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code. Signed-off-by: Longlong Xia --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/am

Re: [PATCH 3/4] drm/amdgpu: Fill extra dwords with NOPs

2025-09-02 Thread Timur Kristóf
On Mon, 2025-09-01 at 11:13 +0100, Tvrtko Ursulin wrote: > > Hi, > > On 01/09/2025 11:00, Timur Kristóf wrote: > > Technically not necessary, but clear the extra dwords too, > > so that the command processors don't read uninitialized memory. > > > > Fixes: c8c1a1d2ef04 ("drm/amdgpu: define and a

[v13 09/11] drm/amdgpu/userq: add a detect and reset callback

2025-09-02 Thread Jesse . Zhang
From: Alex Deucher Add a detect and reset callback and add the implementation for mes. The callback will detect all hung queues of a particular ip type (e.g., GFX or compute or SDMA) and reset them. v2: increase reset counter and set fence force completion v3: Removed userq_mutex in mes_userq_d

Re: [PATCH 3/4] drm/amdgpu: Fill extra dwords with NOPs

2025-09-02 Thread Tvrtko Ursulin
On 02/09/2025 12:30, Timur Kristóf wrote: On Mon, 2025-09-01 at 11:13 +0100, Tvrtko Ursulin wrote: Hi, On 01/09/2025 11:00, Timur Kristóf wrote: Technically not necessary, but clear the extra dwords too, so that the command processors don't read uninitialized memory. Fixes: c8c1a1d2ef04 ("

Re: [PATCH 1/1] drm/amdgpu: use KMEM_CACHE instead of kmem_cache_create

2025-09-02 Thread Christian König
On 02.09.25 09:27, Longlong Xia wrote: > Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code. In general a good cleanup, but why are we using a separate kmem_cache here in the first place? SLAB_HWCACHE_ALIGN rounds up the struct size to 128 bytes and that is something kzalloc()

Re: [PATCH 2/4] drm/amdgpu: Clarify that BO size is in bytes in comments

2025-09-02 Thread Timur Kristóf
On Tue, 2025-09-02 at 13:10 +0200, Christian König wrote: > On 02.09.25 10:08, Timur Kristóf wrote: > > On Tue, 2025-09-02 at 08:43 +0200, Christian König wrote: > > > On 01.09.25 12:00, Timur Kristóf wrote: > > > > To avoid confusion with dwords. > > > > > > > > Signed-off-by: Timur Kristóf > >

Re: [PATCH 1/4] drm/amdgpu: Fix allocating extra dwords for rings

2025-09-02 Thread Christian König
On 02.09.25 10:26, Timur Kristóf wrote: > On Tue, 2025-09-02 at 08:41 +0200, Christian König wrote: >> On 01.09.25 12:00, Timur Kristóf wrote: >>> The amdgpu_bo_create_kernel function takes a byte count, >>> so we need to multiply the extra dword count by four. >>> (The ring_size is already in byte

[v13 11/11] drm/amdgpu: Implement user queue reset functionality

2025-09-02 Thread Jesse . Zhang
From: Alex Deucher This patch adds robust reset handling for user queues (userq) to improve recovery from queue failures. The key components include: 1. Queue detection and reset logic: - amdgpu_userq_detect_and_reset_queues() identifies failed queues - Per-IP detect_and_reset callbacks fo

Re: [PATCH v5 1/2] drm/buddy: Optimize free block management with RB tree

2025-09-02 Thread Arunpravin Paneer Selvam
On 9/2/2025 3:53 PM, Jani Nikula wrote: On Tue, 02 Sep 2025, Arunpravin Paneer Selvam wrote: Replace the freelist (O(n)) used for free block management with a red-black tree, providing more efficient O(log n) search, insert, and delete operations. This improves scalability and performance w

Re: [PATCH v5 1/2] drm/buddy: Optimize free block management with RB tree

2025-09-02 Thread Jani Nikula
On Tue, 02 Sep 2025, Arunpravin Paneer Selvam wrote: > Replace the freelist (O(n)) used for free block management with a > red-black tree, providing more efficient O(log n) search, insert, > and delete operations. This improves scalability and performance > when managing large numbers of free blo

Re: [PATCH 3/4] drm/amdgpu: Fill extra dwords with NOPs

2025-09-02 Thread Timur Kristóf
On Tue, 2025-09-02 at 11:54 +0200, Christian König wrote: > On 02.09.25 09:42, Timur Kristóf wrote: > > On Tue, 2025-09-02 at 08:39 +0200, Christian König wrote: > > > On 01.09.25 12:00, Timur Kristóf wrote: > > > > Technically not necessary, but clear the extra dwords too, > > > > so that the comm

Re: [PATCH 3/4] drm/amdgpu: Fill extra dwords with NOPs

2025-09-02 Thread Christian König
On 02.09.25 09:42, Timur Kristóf wrote: > On Tue, 2025-09-02 at 08:39 +0200, Christian König wrote: >> On 01.09.25 12:00, Timur Kristóf wrote: >>> Technically not necessary, but clear the extra dwords too, >>> so that the command processors don't read uninitialized memory. >> >> That is most likely

[PATCH v6 1/8] drm/ttm: Bump TTM_NUM_MEM_TYPES to 9 (Prep for AMDGPU_PL_MMIO_REMAP)

2025-09-02 Thread Srinivasan Shanmugam
Increase TTM_NUM_MEM_TYPES from 8 to 9 to accommodate the upcoming AMDGPU_PL_MMIO_REMAP placement. Cc: Alex Deucher Suggested-by: Christian König Signed-off-by: Srinivasan Shanmugam Reviewed-by: Christian König Reviewed-by: Alex Deucher --- include/drm/ttm/ttm_resource.h | 2 +- 1 file chang

Re: [PATCH 3/3] drm/amdgpu: revert "Rename VM invalidate to status lock" v2

2025-09-02 Thread Khatri, Sunil
On 8/28/2025 8:32 PM, Christian König wrote: This reverts commit 0479956c94b1cfa6a1ab9206eff76072944ece8b. It turned out that protecting the status of each bo_va only with a spinlock was just hiding problems instead of solving them. Revert the whole approach, add a separate stats_lock and loc

Re: [PATCH 2/3] drm/amdgpu: remove check for BO reservation add assert instead

2025-09-02 Thread Khatri, Sunil
LGTM but some else should check. Acked-by: Sunil Khatri On 8/28/2025 8:31 PM, Christian König wrote: We should leave such checks to lockdep and not implement something manually. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 + 1 file changed, 1

Re: [PATCH 1/3] drm/amdgpu: fix userq VM validation v3

2025-09-02 Thread Khatri, Sunil
On 8/28/2025 8:31 PM, Christian König wrote: That was actually complete nonsense and not validating the BOs at all. The code just cleared all VM areas were it couldn't grab the lock for a BO. Try to fix this. Only compile tested at the moment. v2: fix fence slot reservation as well as pointed

[v13 03/11] drm/amd/amdgpu: Implement MES suspend/resume gang functionality for v12

2025-09-02 Thread Jesse . Zhang
This commit implements the actual MES (Micro Engine Scheduler) suspend and resume gang operations for version 12 hardware. Previously these functions were just stubs returning success. v2: Always use AMDGPU_MES_SCHED_PIPE Signed-off-by: Alex Deucher Signed-off-by: Jesse Zhang --- drivers/gpu/d

[v13 07/11] drm/amdgpu: add user queue reset source

2025-09-02 Thread Jesse . Zhang
From: Alex Deucher Track resets from user queues. Signed-off-by: Alex Deucher Reviewed-by: Christian König Reviewed-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/

[v13 04/11] drm/amdgpu/mes: add front end for detect and reset hung queue

2025-09-02 Thread Jesse . Zhang
From: Alex Deucher Helper function to detect and reset hung queues. MES will return an array of doorbell indices of which queues are hung and were optionally reset. v2: Clear the doorbell array before detection Signed-off-by: Alex Deucher Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/

[v13 02/11] drm/amdgpu: adjust MES API used for suspend and resume

2025-09-02 Thread Jesse . Zhang
From: Alex Deucher Use the suspend and resume API rather than remove queue and add queue API. The former just preempts the queue while the latter remove it from the scheduler completely. There is no need to do that, we only need preemption in this case. V2: replace queue_active with queue state

[v13 01/11] drm/amdgpu: Add preempt and restore callbacks to userq funcs

2025-09-02 Thread Jesse . Zhang
From: Alex Deucher Add two new function pointers to struct amdgpu_userq_funcs: - preempt: To handle preemption of user mode queues - restore: To restore preempted user mode queues These callbacks will allow the driver to properly manage queue preemption and restoration when needed, such as durin

Re: [PATCH 4/4] drm/amdgpu: Set SDMA v3 copy_max_bytes to 0x3fff00

2025-09-02 Thread Timur Kristóf
On Tue, 2025-09-02 at 08:45 +0200, Christian König wrote: > On 01.09.25 12:00, Timur Kristóf wrote: > > SDMA v3-v5 can copy almost 4 MiB in a single copy operation. > > Use the same value as PAL and Mesa for copy_max_bytes. > > > > For reference, see oss2DmaCmdBuffer.cpp in PAL: > > "Due t

Re: [PATCH 3/4] drm/amdgpu: Fill extra dwords with NOPs

2025-09-02 Thread Timur Kristóf
On Tue, 2025-09-02 at 08:39 +0200, Christian König wrote: > On 01.09.25 12:00, Timur Kristóf wrote: > > Technically not necessary, but clear the extra dwords too, > > so that the command processors don't read uninitialized memory. > > That is most likely a really bad idea. > > The extra DWs are f

Re: [PATCH] drm/amdgpu: Release hive reference properly

2025-09-02 Thread Sun, Ce(Overlord)
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Ce Sun Best Regards, Sun,Ce From: Lazar, Lijo Sent: Tuesday, September 2, 2025 2:22 PM To: amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Deucher, Alexander ; Sun, Ce(Overlord) Sub

Re: [PATCH v6 8/8] drm/amdgpu/gem: Return Handle to MMIO_REMAP Singleton in GEM_CREATE

2025-09-02 Thread Christian König
On 02.09.25 05:29, Srinivasan Shanmugam wrote: > Enable userspace to obtain a handle to the kernel-owned MMIO_REMAP > singleton when AMDGPU_GEM_DOMAIN_MMIO_REMAP is requested via > amdgpu_gem_create_ioctl(). > > Validate the fixed 4K constraint: if PAGE_SIZE > AMDGPU_GPU_PAGE_SIZE > return -EINVAL

Re: [PATCH v6 7/8] drm/amdgpu/ttm: Allocate/Free 4K MMIO_REMAP Singleton

2025-09-02 Thread Christian König
On 02.09.25 05:29, Srinivasan Shanmugam wrote: > Add mmio_remap bookkeeping to amdgpu_device and introduce > amdgpu_ttm_mmio_remap_bo_init()/fini() to manage a kernel-owned, > one-page (4K) BO in AMDGPU_GEM_DOMAIN_MMIO_REMAP. > > Bookkeeping: > - adev->rmmio_remap.bo : kernel-owned singleton BO