[PATCH v5 1/8] drm/amdgpu/userq: extend userq state

2025-10-02 Thread Prike Liang
Extend the userq state for identifying the userq invalid cases. Signed-off-by: Prike Liang Reviewed-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_u

Re: [PATCH v7 3/3] drm/buddy: Add KUnit tests for allocator performance under fragmentation

2025-10-02 Thread Arunpravin Paneer Selvam
On 9/26/2025 4:30 PM, Matthew Auld wrote: On 23/09/2025 10:02, Arunpravin Paneer Selvam wrote: Add KUnit test cases that create severe memory fragmentation and measure allocation/free performance. The tests simulate two scenarios - 1. Allocation under severe fragmentation     - Allocate the

Re: [PATCH] drm: amd: Use kmalloc_array to prevent overflow of dynamic size calculation

2025-10-02 Thread kernel test robot
Hi Bhanu, kernel test robot noticed the following build warnings: [auto build test WARNING on amd-pstate/linux-next] [also build test WARNING on amd-pstate/bleeding-edge v6.17] [cannot apply to linus/master next-20251002] [If your patch is applied to the wrong git tree, kindly drop us a note

RE: [PATCH v5 4/8] drm/amdgpu: keeping waiting userq fence infinitely

2025-10-02 Thread Liang, Prike
[Public] Hi Alex, Apologies for overlooking your earlier review comments. I just see patches 1-4 have already been reviewed. Can we proceed to land the series (patches 1-6) in drm-next? Regards, Prike > -Original Message- > From: Liang, Prike > Sent: Monday, September 29, 2025

[PATCH v5 3/8] drm/amdgpu: track the userq bo va for its obj management

2025-10-02 Thread Prike Liang
Track the userq obj for its life time, and reference and dereference the buffer flag at its creating and destroying period. Suggested-by: Alex Deucher Signed-off-by: Prike Liang --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 28 +++ 1 file changed, 28 insertions(+) diff --

[PATCH v3 09/11 6.1.y] minmax.h: move all the clamp() definitions after the min/max() ones

2025-10-02 Thread Eliav Farber
From: David Laight [ Upstream commit c3939872ee4a6b8bdcd0e813c66823b31e6e26f7 ] At some point the definitions for clamp() got added in the middle of the ones for min() and max(). Re-order the definitions so they are more sensibly grouped. Link: https://lkml.kernel.org/r/8bb285818e4846469121c8

Re: [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address

2025-10-02 Thread Chen, Xiaogang
On 10/2/2025 12:43 PM, Philip Yang wrote: svm_range_unmap_from_gpus uses page aligned start, end address, the end address is inclusive. Fixes: 38c55f6719f7 ("drm/amdkfd: Handle lack of READ permissions in SVM mapping") Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4

Re: [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL

2025-10-02 Thread Chen, Xiaogang
On 10/2/2025 12:43 PM, Philip Yang wrote: Add hmm_range kzalloc return NULL error check. In case the get_pages return failed, free and set hmm_range to NULL, to avoid double free in get_pages_done. Fixes: 29e6f5716115 ("drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_

[PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL

2025-10-02 Thread Philip Yang
Add hmm_range kzalloc return NULL error check. In case the get_pages return failed, free and set hmm_range to NULL, to avoid double free in get_pages_done. Fixes: 29e6f5716115 ("drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages") Signed-off-by: Philip Yang --- drive

[PATCH 5/7] drm/amd: Pass IP suspend errors up to callers

2025-10-02 Thread Mario Limonciello
If IP suspend fails the callers should be notified so that they can potentially react. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/dri

[PATCH 1/7] drm/amd: Unify shutdown() callback behavior

2025-10-02 Thread Mario Limonciello
[Why] The shutdown() callback uses amdgpu_ip_suspend() which doesn't notify drm clients during shutdown. This could lead to hangs. [How] Change amdgpu_pci_shutdown() to call the same sequence as suspend/resume. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++

[PATCH 3/3] drm/amdkfd: Don't stuck in svm restore worker

2025-10-02 Thread Philip Yang
If vma is not found, the application has freed the memory using madvise MADV_FREE, but driver don't receive the unmap from CPU MMU notifier callback, the memory is still mapped on GPUs. svm restore work will schedule the work to retry forever. Then user queues not resumed and cause application hang

[PATCH v3 03/11 6.1.y] minmax: improve macro expansion and type checking

2025-10-02 Thread Eliav Farber
From: Linus Torvalds [ Upstream commit 22f5468731491e53356ba7c028f0fdea20b18e2c ] This clarifies the rules for min()/max()/clamp() type checking and makes them a much more efficient macro expansion. In particular, we now look at the type and range of the inputs to see whether they work together

[PATCH v3 01/11 6.1.y] minmax: simplify min()/max()/clamp() implementation

2025-10-02 Thread Eliav Farber
From: Linus Torvalds [ Upstream commit dc1c8034e31b14a2e5e212104ec508aec44ce1b9 ] Now that we no longer have any C constant expression contexts (ie array size declarations or static initializers) that use min() or max(), we can simpify the implementation by not having to worry about the result s

[PATCH v3 07/11 6.1.y] minmax.h: reduce the #define expansion of min(), max() and clamp()

2025-10-02 Thread Eliav Farber
From: David Laight [ Upstream commit b280bb27a9f7c91ddab730e1ad91a9c18a051f41 ] Since the test for signed values being non-negative only relies on __builtion_constant_p() (not is_constexpr()) it can use the 'ux' variable instead of the caller supplied expression. This means that the #define par

[PATCH v3 05/11 6.1.y] minmax.h: add whitespace around operators and after commas

2025-10-02 Thread Eliav Farber
From: David Laight [ Upstream commit 71ee9b16251ea4bf7c1fe222517c82bdb3220acc ] Patch series "minmax.h: Cleanups and minor optimisations". Some tidyups and minor changes to minmax.h. This patch (of 7): Link: https://lkml.kernel.org/r/c50365d214e04f9ba256d417c8beb...@acums.aculab.com Link: h

[PATCH 6/7] drm/amd: Fix error handling with multiple userq IDRs

2025-10-02 Thread Mario Limonciello
If multiple userq IDR are in use and there is an error handling one at suspend or resume it will be silently discarded. Switch the suspend/resume() code to use guards and return immediately. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 25 ++---

[PATCH 4/7] drm/amd: Don't always set IP block HW status to false

2025-10-02 Thread Mario Limonciello
amdgpu_device_ip_suspend_phase2() calls amdgpu_ip_block_suspend() which already sets HW block status to false when succeeding with IP suspend. Remove the explicit call in amdgpu_device_ip_suspend_phase2() so that the status is accurate. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/am

[PATCH 0/7] Improve suspend callback usage

2025-10-02 Thread Mario Limonciello
The shutdown() callback doesn't use the same code as suspend() callbacks. This series unifies them and then also improves error handling for all suspend flows. Mario Limonciello (7): drm/amd: Unify shutdown() callback behavior drm/amd: Stop exporting amdgpu_device_ip_suspend() outside amdg

[PATCH 3/7] drm/amd: Remove comment about handling errors in amdgpu_device_ip_suspend_phase1()

2025-10-02 Thread Mario Limonciello
Error handling was introduced in commit e095026f0066e ("drm/amdgpu: validate suspend before function call") so the comment about TODO is no longer needed. Fixes: e095026f0066e ("drm/amdgpu: validate suspend before function call") Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/am

[PATCH 2/7] drm/amd: Stop exporting amdgpu_device_ip_suspend() outside amdgpu_device

2025-10-02 Thread Mario Limonciello
amdgpu_device_ip_suspend() doesn't have a caller outside of amdgpu_device.c. Make it static. No intended functional changes. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 2 files changed, 1 insertion(+),

[PATCH 7/7] drm/amd: Pass userq suspend failures up to caller

2025-10-02 Thread Mario Limonciello
If a userq failed to suspend the rest of the suspend sequence may have problems. Pass the error code up to the caller for a decision on what to do. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --gi

[PATCH 2/3] drm/amdkfd: svm unmap use page aligned address

2025-10-02 Thread Philip Yang
svm_range_unmap_from_gpus uses page aligned start, end address, the end address is inclusive. Fixes: 38c55f6719f7 ("drm/amdkfd: Handle lack of READ permissions in SVM mapping") Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 delet

kdf_ioctl_create_queue_args::write/read_pointer_address are still FROM KFD ?

2025-10-02 Thread Liu, Robert
[AMD Official Use Only - AMD Internal Distribution Only] This below struct is shared by kdf and libhsakmt. In the latest codeline, it's seems they are going TO KFD now, together with ring_base and size. If this is true, it'd be nice to make the comments aligned with the code. struct kfd_ioctl_

[PATCH] drm/sched: Avoid killing entity last used by parent on child SIGKILL

2025-10-02 Thread David Rosca
drm_sched_entity_flush should only kill the entity if the current process is the last user of the entity. The last_user is only set when adding new job, so entities that had no jobs submitted to them have NULL last_user and would always be killed. Another issue is setting last_user to NULL from drm

[PATCH 19/19 v6.1.y] minmax.h: remove some #defines that are only expanded once

2025-10-02 Thread Eliav Farber
From: David Laight [ Upstream commit 2b97aaf74ed534fb838d09867d09a3ca5d795208 ] The bodies of __signed_type_use() and __unsigned_type_use() are much the same size as their names - so put the bodies in the only line that expands them. Similarly __signed_type() is defined separately for 64bit and