Re: [PATCH RFC v2 0/3] drm/ttm: allow direct reclaim to be skipped

2025-09-18 Thread Christian König
On 18.09.25 22:09, Thadeu Lima de Souza Cascardo wrote: > On certain workloads, like on ChromeOS when opening multiple tabs and > windows, and switching desktops, memory pressure can build up and latency > is observed as high order allocations result in memory reclaim. This was > observed when runn

RE: [PATCH] drm/amd/pm: Avoid interface mismatch messaging

2025-09-18 Thread Lazar, Lijo
[Public] First, it's not the developer who is using the system. Any mismatch information in a dmesg log is always an alarm to the user who is not aware of the implementation details. This mismatch is bound to happen when PMFW is not loaded along with the driver. In such cases, the mechanism t

[PATCH] drm/amdgpu/userq: assign an error code for invalid userq va

2025-09-18 Thread Prike Liang
It should return an error code if userq VA validation fails. Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual address and size") Signed-off-by: Prike Liang --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdg

RE: [PATCH] drm/amd/pm: Avoid interface mismatch messaging

2025-09-18 Thread Wang, Yang(Kevin)
[Public] >> " is not used": Actually, this is used to display information to users and >> developers, not the driver itself. This "driver_if_version" is used to show what driver if source file version is using on current driver, this information is used to *developers* and *user* from dmesg.lo

RE: [PATCH] drm/amd/pm: Avoid interface mismatch messaging

2025-09-18 Thread Wang, Yang(Kevin)
[AMD Official Use Only - AMD Internal Distribution Only] >> PMFW interface version is not used by some IP implementations like SMU >> v13.0.6/12, instead rely on PMFW version checks. Avoid the log if interface >> version is not used. " is not used": Actually, this is used to display information

[PATCH] drm/amd/pm: Avoid interface mismatch messaging

2025-09-18 Thread Lijo Lazar
PMFW interface version is not used by some IP implementations like SMU v13.0.6/12, instead rely on PMFW version checks. Avoid the log if interface version is not used. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 3 ++- drivers/gpu/drm/amd/pm/swsmu/smu13/s

RE: [PATCH] drm/amd/pm: place the smu 13.0.0 pptable header into the correct folder

2025-09-18 Thread Gadre, Mangesh
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by:mangesh.ga...@amd.com >-Original Message- >From: amd-gfx On Behalf Of Yang Wang >Sent: Friday, September 19, 2025 7:17 AM >To: amd-gfx@lists.freedesktop.org >Subject: [PATCH] drm/amd/pm: place the smu 13.0.0 pptable hea

RE: [PATCH v2] drm/amdgpu: Enable MES lr_compute_wa by default

2025-09-18 Thread Zhang, Yifan
[Public] Yes. enable_lr_compute_wa is ignored in current MES FW setting. (zeroed in mes_set_hw_res_pkt packet). Best Regards, Yifan -Original Message- From: Limonciello, Mario Sent: Friday, September 19, 2025 5:36 AM To: Alex Deucher Cc: amd-gfx@lists.freedesktop.org; Zhang, Yifan Su

[PATCH v3] drm/amdgpu: Enable MES lr_compute_wa by default

2025-09-18 Thread Mario Limonciello (AMD)
From: Mario Limonciello The MES set resources packet has an optional bit 'lr_compute_wa' which can be used for preventing MES hangs on long compute jobs. Set this bit by default. Co-developed-by: Yifan Zhang Signed-off-by: Yifan Zhang Signed-off-by: Mario Limonciello --- v3: * gate on fw ve

RE: [PATCH next] drm/amdgpu/userq: Fix error codes in mes_userq_mqd_create()

2025-09-18 Thread Liang, Prike
[Public] Regards, Prike > -Original Message- > From: Koenig, Christian > Sent: Thursday, September 18, 2025 5:57 PM > To: Dan Carpenter ; Liang, Prike > > Cc: Deucher, Alexander ; David Airlie > ; Simona Vetter ; Sharma, Shashank > ; Arvind Yadav ; Khatri, > Sunil ; Zhang, Jesse(J

Re: [PATCH v6 00/11] Improvements to S5 power consumption

2025-09-18 Thread Greg Kroah-Hartman
On Wed, Sep 03, 2025 at 01:14:18PM +0200, Rafael J. Wysocki wrote: > On Wed, Sep 3, 2025 at 6:41 AM Mario Limonciello wrote: > > > > On 8/17/2025 9:00 PM, Mario Limonciello (AMD) wrote: > > > A variety of issues both in function and in power consumption have been > > > raised as a result of device

Re: [PATCH] drm/amd/display: Fix DVI-D/HDMI adapters

2025-09-18 Thread Alex Hung
On 9/8/25 11:45, Timur Kristóf wrote: On Mon, 2025-09-08 at 11:40 -0600, Alex Hung wrote: On 9/8/25 11:36, Alex Deucher wrote: @alexh...@amd.com@Wentland, Harry   Were you planning to pick this up for this week's promotion or should I grab it? I will send them to weekly DC promotion. T

[PATCH] drm/amd/pm: place the smu 13.0.0 pptable header into the correct folder

2025-09-18 Thread Yang Wang
Place the smu 13.0.0 pptable header in the correct folder Signed-off-by: Yang Wang --- drivers/gpu/drm/amd/pm/{ => swsmu}/inc/smu_v13_0_0_pptable.h | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename drivers/gpu/drm/amd/pm/{ => swsmu}/inc/smu_v13_0_0_pptable.h (100%) diff --git a/drive

RE: [PATCH 05/10] drm/amd/ras: Amdgpu handle ras ioctl command

2025-09-18 Thread Chai, Thomas
[AMD Official Use Only - AMD Internal Distribution Only] The patch "[PATCH 3/3] drm/amdgpu: Add amdgpu drm ras ioctl for unified ras module" actually defines the IOCTL interface. And released it on the internal mailing list, please help review. -Original Message- From: amd-gfx On Beha

RE: [PATCH 13/21] drm/amd/ras: Add eeprom ras functions

2025-09-18 Thread Chai, Thomas
[AMD Official Use Only - AMD Internal Distribution Only] Ok, will update. -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Friday, September 19, 2025 4:16 AM To: Chai, Thomas Cc: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Zhou1, Tao ; Li, Candice ; Yang, Stanley

RE: [PATCH 06/10] drm/amd/ras: Add amdgpu ras system functions

2025-09-18 Thread Chai, Thomas
[AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: Friday, September 19, 2025 4:05 AM To: Chai, Thomas Cc: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Zhou1, Tao ; Li, Candice ; Yang, Stanley Subject: Re: [PATC

RE: [PATCH 01/21] drm/amd/ras: Add unified ras core folder

2025-09-18 Thread Chai, Thomas
[AMD Official Use Only - AMD Internal Distribution Only] Ok, will squash this patch. -Original Message- From: Alex Deucher Sent: Friday, September 19, 2025 4:00 AM To: Chai, Thomas Cc: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Zhou1, Tao ; Li, Candice ; Yang, Stanley Subject: R

RE: [PATCH 01/10] drm/amd/ras: Add amdgpu ras manager folder

2025-09-18 Thread Chai, Thomas
[AMD Official Use Only - AMD Internal Distribution Only] Ok, will squash this patch. -Original Message- From: Alex Deucher Sent: Friday, September 19, 2025 3:58 AM To: Chai, Thomas Cc: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Zhou1, Tao ; Li, Candice ; Yang, Stanley Subject: R

Re: [v5 1/4] drm/amdgpu: Refactor VCN v5.0.1 HW init into separate instance function

2025-09-18 Thread Jiang, Sonny
[AMD Official Use Only - AMD Internal Distribution Only] This serial is Reviewed-by: Sonny Jiang From: Zhang, Jesse(Jie) Sent: Tuesday, September 16, 2025 11:11 PM To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christia

Re: [PATCH 06/10] drm/amd/ras: Add amdgpu ras system functions

2025-09-18 Thread Alex Deucher
On Wed, Sep 17, 2025 at 9:37 PM YiPeng Chai wrote: > > Add amdgpu ras system functions. > > Signed-off-by: YiPeng Chai > Reviewed-by: Tao Zhou > --- > .../gpu/drm/amd/ras/ras_mgr/amdgpu_ras_sys.c | 268 ++ > drivers/gpu/drm/amd/ras/ras_mgr/ras_sys.h | 109 +++ > 2 files

[PATCH v2 0/3] drm/amdgpu: Handle MMIO_REMAP as fixed I/O via dma-buf

2025-09-18 Thread Srinivasan Shanmugam
This series makes the amdgpu dma-buf exporter handle AMDGPU_PL_MMIO_REMAP (the HDP flush page) as a BAR-mapped register window (MMIO) The HDP flush “MMIO_REMAP” page is a BAR-backed I/O window (not RAM). When another PCIe device (GPU) needs to poke that window (e.g., device-to-device HDP flush),

[PATCH v3 3/3] drm/amdkfd: free system struct pages when migration bit is cleared

2025-09-18 Thread James Zhu
if destination is on system ram. migrate_vma_pages can fail if a CPU thread faults on the same page. However, the page table is locked and only one of the new pages will be inserted. The device driver will see that the MIGRATE_PFN_MIGRATE bit is cleared if it loses the race. Signed-off-by: James Z

[PATCH 03/11] PCI: Move pci_rebar_size_to_bytes() and export it

2025-09-18 Thread Ilpo Järvinen
pci_rebar_size_to_bytes() is in drivers/pci/pci.h but would be useful for endpoint drivers as well. Move the function into rebar.c and export it. In addition, convert the literal to where the number comes from (PCI_REBAR_MIN_SIZE). Signed-off-by: Ilpo Järvinen --- drivers/pci/pci.h | 4

[PATCH v2 1/7] drm/amdgpu/pm: Add definition for gpu_metrics v1.9

2025-09-18 Thread Lijo Lazar
Add gpu metrics definition which is only a set of gpu metrics attributes. A field is encoded by its id, type and number of instances. Signed-off-by: Lijo Lazar --- .../gpu/drm/amd/include/kgd_pp_interface.h| 117 ++ 1 file changed, 117 insertions(+) diff --git a/drivers/gpu/

Re: [PATCH 1/3] drm/amdgpu: adjust MES API used for suspend and resume

2025-09-18 Thread Alex Deucher
On Wed, Sep 10, 2025 at 4:16 AM Jesse.Zhang wrote: > > Use the suspend and resume API rather than remove queue > and add queue API. The former just preempts the queue > while the latter remove it from the scheduler completely. > There is no need to do that, we only need preemption > in this case.

Re: [PATCH v2] drm/amdgpu: Enable MES lr_compute_wa by default

2025-09-18 Thread Mario Limonciello
On 9/18/2025 2:05 PM, Alex Deucher wrote: On Thu, Sep 18, 2025 at 2:59 PM Mario Limonciello wrote: The MES set resources packet has an optional bit 'lr_compute_wa' which can be used for preventing MES hangs on long compute jobs. Set this bit by default. Co-developed-by: Yifan Zhang Signe

Re: [PATCH v2] drm/amdgpu: Enable MES lr_compute_wa by default

2025-09-18 Thread Alex Deucher
On Thu, Sep 18, 2025 at 5:35 PM Mario Limonciello wrote: > > > > On 9/18/2025 2:05 PM, Alex Deucher wrote: > > On Thu, Sep 18, 2025 at 2:59 PM Mario Limonciello > > wrote: > >> > >> The MES set resources packet has an optional bit 'lr_compute_wa' > >> which can be used for preventing MES hangs on

Re: [PATCH v3 04/10] drm/amdgpu/userq: extend userq state

2025-09-18 Thread Alex Deucher
On Thu, Sep 18, 2025 at 4:18 AM Prike Liang wrote: > > Extend the userq state for identifying the > userq invalid cases. > > Signed-off-by: Prike Liang Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/

Re: [PATCH v2 9/9] drm/amdgpu: validate userq va for GEM unmap

2025-09-18 Thread Alex Deucher
On Wed, Sep 17, 2025 at 10:37 PM Liang, Prike wrote: > > [Public] > > Regards, > Prike > > > -Original Message- > > From: Alex Deucher > > Sent: Wednesday, September 17, 2025 10:10 PM > > To: Liang, Prike > > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander > > ; Koenig, Chris

Re: [PATCH v7 05/12] PCI/PM: Disable device wakeups when halting or powering off system

2025-09-18 Thread Bjorn Helgaas
On Wed, Sep 10, 2025 at 11:52:00AM -0500, Mario Limonciello wrote: > On 9/10/25 10:06 AM, Bjorn Helgaas wrote: > > On Tue, Sep 09, 2025 at 02:16:12PM -0500, Mario Limonciello (AMD) wrote: > > > PCI devices can be configured as wakeup sources from low power states. > > > However, when the system is

Re: [PATCH v3 08/10] drm/amdgpu: keeping waiting userq fence infinitely

2025-09-18 Thread Alex Deucher
On Thu, Sep 18, 2025 at 5:14 AM Prike Liang wrote: > > Keeping waiting the userq fence infinitely untill > hang detection, and then suspend the hang queue and > set the fence error. > > Signed-off-by: Prike Liang Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 11 ++

[PATCH RFC v2 1/3] ttm: pool: allow requests to prefer latency over throughput

2025-09-18 Thread Thadeu Lima de Souza Cascardo
The TTM pool allocator prefer to allocate higher order pages such that the GPU will spend less time walking page tables and provide better throughput. There were cases where too much fragmented memory led to a 30% change in the throughput of a given GPU workload on a datacenter. On a desktop work

[PATCH] drm/amdgpu/atom: Check kcalloc() for WS buffer in amdgpu_atom_execute_table_locked()

2025-09-18 Thread Guangshuo Li
kcalloc() may fail. When WS is non-zero and allocation fails, ectx.ws remains NULL while ectx.ws_size is set, leading to a potential NULL pointer dereference in atom_get_src_int() when accessing WS entries. Return -ENOMEM on allocation failure to avoid the NULL dereference. Fixes: 6396bb221514 ("

[PATCH RFC v2 0/3] drm/ttm: allow direct reclaim to be skipped

2025-09-18 Thread Thadeu Lima de Souza Cascardo
On certain workloads, like on ChromeOS when opening multiple tabs and windows, and switching desktops, memory pressure can build up and latency is observed as high order allocations result in memory reclaim. This was observed when running on an amdgpu. This is caused by TTM pool allocations and tu

Re: [PATCH v2] drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume

2025-09-18 Thread Matthew Schwartz
On 9/11/25 10:55 AM, Mario Limonciello wrote: > On 9/11/25 12:48 PM, Matthew Schwartz wrote: >> On clients that utilize AMD_PRIVATE_COLOR properties for HDR support, >> brightness sliders can include a hardware controlled portion and a >> gamma-based portion. This is the case on the Steam Deck OLED

[PATCH RFC v2 3/3] drm/amdgpu: allow allocation preferences when creating GEM object

2025-09-18 Thread Thadeu Lima de Souza Cascardo
When creating a GEM object on amdgpu, it may be specified that latency during allocation should be preferred over throughput when processing. That will reflect into the TTM operation, which will lead to the use of direct reclaim for higher order pages when throughput is preferred, even if latency

[PATCH RFC v2 2/3] ttm: pool: add a module parameter to set latency preference

2025-09-18 Thread Thadeu Lima de Souza Cascardo
This allows a system-wide setting for allocations of higher order pages not to use direct reclaim. The default setting is to keep existing behavior and allow direct reclaim when allocating higher order pages. Signed-off-by: Thadeu Lima de Souza Cascardo --- drivers/gpu/drm/ttm/ttm_pool.c | 12 ++

Re: [PATCH v3 05/10] drm/amdgpu: add userq invalid VA query

2025-09-18 Thread Alex Deucher
On Thu, Sep 18, 2025 at 4:29 AM Prike Liang wrote: > > Add the userq invalid VA query interface. > > Signed-off-by: Prike Liang Move patches 1-3, and 5 to the end of the series so we can land the validation changes before the query status changes. Alex > --- > drivers/gpu/drm/amd/amdgpu/amdgp

Re: [PATCH v3 07/10] drm/amdgpu: track the userq bo va for its obj management

2025-09-18 Thread Alex Deucher
On Thu, Sep 18, 2025 at 4:18 AM Prike Liang wrote: > > Track the userq obj for its life time, and reference and > dereference the buffer flag at its creating and destroying > period. > > Suggested-by: Alex Deucher > Signed-off-by: Prike Liang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c |

Re: [PATCH v3 06/10] drm/amdgpu: add userq object va track helpers

2025-09-18 Thread Alex Deucher
On Thu, Sep 18, 2025 at 4:29 AM Prike Liang wrote: > > Add the userq object virtual address get(),mapped() and put() > helpers for tracking the userq obj va address usage. > > Signed-off-by: Prike Liang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgp

Re: [PATCH v3 05/10] drm/amdgpu: add userq invalid VA query

2025-09-18 Thread Alex Deucher
On Thu, Sep 18, 2025 at 4:29 AM Prike Liang wrote: > > Add the userq invalid VA query interface. > > Signed-off-by: Prike Liang Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amd

Re: [PATCH 05/10] drm/amd/ras: Amdgpu handle ras ioctl command

2025-09-18 Thread Alex Deucher
On Wed, Sep 17, 2025 at 9:36 PM YiPeng Chai wrote: > > Amdgpu handle ras ioctl command. Where is the actual IOCTL interface defined? Alex > > V2: > Remove non-standard device information. > > Signed-off-by: YiPeng Chai > Reviewed-by: Tao Zhou > --- > .../gpu/drm/amd/ras/ras_mgr/amdgpu_ras_

Re: [PATCH 13/21] drm/amd/ras: Add eeprom ras functions

2025-09-18 Thread Alex Deucher
On Wed, Sep 17, 2025 at 9:44 PM YiPeng Chai wrote: > > Add eeprom ras functions. > > Signed-off-by: YiPeng Chai > Reviewed-by: Tao Zhou > --- > drivers/gpu/drm/amd/ras/rascore/ras_eeprom.c | 1368 ++ > drivers/gpu/drm/amd/ras/rascore/ras_eeprom.h | 217 +++ > 2 files changed, 1

Re: [PATCH 01/21] drm/amd/ras: Add unified ras core folder

2025-09-18 Thread Alex Deucher
On Wed, Sep 17, 2025 at 9:39 PM YiPeng Chai wrote: > > Add unified ras core folder. > > Signed-off-by: YiPeng Chai > Reviewed-by: Tao Zhou > --- > drivers/gpu/drm/amd/ras/rascore/Makefile | 0 > 1 file changed, 0 insertions(+), 0 deletions(-) > create mode 100644 drivers/gpu/drm/amd/ras/rascor

Re: [PATCH 01/10] drm/amd/ras: Add amdgpu ras manager folder

2025-09-18 Thread Alex Deucher
On Wed, Sep 17, 2025 at 9:54 PM YiPeng Chai wrote: > > Add amdgpu ras manager folder. > > Signed-off-by: YiPeng Chai > Reviewed-by: Tao Zhou > --- > drivers/gpu/drm/amd/ras/ras_mgr/Makefile | 0 > 1 file changed, 0 insertions(+), 0 deletions(-) > create mode 100644 drivers/gpu/drm/amd/ras/ras_

RE: [PATCH 01/21] drm/amd/ras: Add unified ras core folder

2025-09-18 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only] Series is Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Chai, Thomas Sent: Wednesday, September 17, 2025 21:32 To: amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Zhou1, Tao ; Li, Candice ; Yang, Stanle

[pull] amdgpu, amdkfd drm-fixes-6.17

2025-09-18 Thread Alex Deucher
Hi Dave, Simona, Fixes for 6.17. The following changes since commit f83ec76bf285bea5727f478a68b894f5543ca76e: Linux 6.17-rc6 (2025-09-14 14:21:14 -0700) are available in the Git repository at: https://gitlab.freedesktop.org/agd5f/linux.git tags/amd-drm-fixes-6.17-2025-09-18 for you to fe

Re: [PATCH v2] drm/amdgpu: Enable MES lr_compute_wa by default

2025-09-18 Thread Alex Deucher
On Thu, Sep 18, 2025 at 2:59 PM Mario Limonciello wrote: > > The MES set resources packet has an optional bit 'lr_compute_wa' > which can be used for preventing MES hangs on long compute jobs. > > Set this bit by default. > > Co-developed-by: Yifan Zhang > Signed-off-by: Yifan Zhang > Signed-off

Re: [PATCH] drm/amd/display: Fix DVI-D/HDMI adapters

2025-09-18 Thread Timur Kristóf
On Mon, 2025-09-08 at 11:40 -0600, Alex Hung wrote: > > > On 9/8/25 11:36, Alex Deucher wrote: > > @alexh...@amd.com@Wentland, Harry > >   Were you planning to pick this up for this week's promotion or > > should > > I grab it? > > I will send them to weekly DC promotion. > > Thanks. > > > >

[PATCH v3 01/10] drm/amdgpu: add UAPI for user queue query status

2025-09-18 Thread Prike Liang
From: Alex Deucher Add an API to query queue status such as whether the queue is hung or whether vram is lost. Reviewed-by: Christian König Reviewed-by: Sunil Khatri Signed-off-by: Alex Deucher Reviewed-by: Prike Liang --- include/uapi/drm/amdgpu_drm.h | 14 ++ 1 file changed, 1

RE: [PATCH 01/10] drm/amd/ras: Add amdgpu ras manager folder

2025-09-18 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only] Series is Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: Chai, Thomas Sent: Wednesday, September 17, 2025 21:35 To: amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Zhou1, Tao ; Li, Candice ; Yang, Stanle

[PATCH v2] drm/amdgpu: Enable MES lr_compute_wa by default

2025-09-18 Thread Mario Limonciello
The MES set resources packet has an optional bit 'lr_compute_wa' which can be used for preventing MES hangs on long compute jobs. Set this bit by default. Co-developed-by: Yifan Zhang Signed-off-by: Yifan Zhang Signed-off-by: Mario Limonciello --- v2: * drop module parameter * add more descr

Re: [PATCH 2/2] drm/amdgpu: clean up and unify hw fence handling

2025-09-18 Thread David Wu
thank you Alex. The series is: Tested-by: David (Ming Qiang) Wu Reviewed-by: David (Ming Qiang) Wu David On 2025-09-05 11:25, Alex Deucher wrote: Decouple the amdgpu fence from the amdgpu_job structure. This lets us clean up the separate fence ops for the embedded fence and other fences. Th

Re: [PATCH v2] drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.0.1/11.0.4 GPUs

2025-09-18 Thread Alex Deucher
On Thu, Sep 11, 2025 at 11:09 AM Srinivasan Shanmugam wrote: > > Enable the cleaner shader for additional GFX11.5.2/11.5.3 series GPUs to > ensure data isolation among GPU tasks. The cleaner shader is tasked with > clearing the Local Data Store (LDS), Vector General Purpose Registers > (VGPRs), an

[PATCH 7/9] drm/amdgpu: keeping waiting userq fence infinitely

2025-09-18 Thread Prike Liang
Keeping waiting the userq fence infinitely untill hang detection, and then suspend the hang queue and set the fence error. Signed-off-by: Prike Liang --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/

[v2 1/2] drm/amdgpu: Simplify user queue locking with global device mutex

2025-09-18 Thread Jesse . Zhang
The current user queue implementation uses a dual-mutex scheme with both per-device (adev->userq_mutex) and per-process (uq_mgr->userq_mutex) locking. This overcomplicated design creates potential deadlock scenarios and makes the code harder to maintain. Simplify the locking by switching entirely

Re: [PATCH] drm/amd/display: Remove duplicated code

2025-09-18 Thread Alex Deucher
On Mon, Sep 8, 2025 at 9:09 AM Ray Wu wrote: > > [Why&How] > Remove duplicated code > > Signed-off-by: Ray Wu Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/display/dc/resource/dcn35/dcn35_resource.c | 3 --- > .../gpu/drm/amd/display/dc/resource/dcn351/dcn351_resource.c | 3 --- > d

[PATCH] drm/amdgpu: Update amdgpu_vcn5_fw_shared for vcn_5_0_1

2025-09-18 Thread Sonny Jiang
Align vcn5_fw_shared structure with FW Signed-off-by: Sonny Jiang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h index bebfc2b34afe..dc8a17bcc3c8 10

[PATCH v2 1/2] amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw

2025-09-18 Thread Yifan Zhang
There is race in amdgpu_amdkfd_device_fini_sw and interrupt. if amdgpu_amdkfd_device_fini_sw run in b/w kfd_cleanup_nodes and kfree(kfd), and KGD interrupt generated. kernel panic log: BUG: kernel NULL pointer dereference, address: 0098 amdgpu :c8:00.0: amdgpu: Requesting 4 part

Re: [PATCH v2] drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume

2025-09-18 Thread Mario Limonciello
On 9/18/2025 11:37 AM, Matthew Schwartz wrote: On 9/11/25 10:55 AM, Mario Limonciello wrote: On 9/11/25 12:48 PM, Matthew Schwartz wrote: On clients that utilize AMD_PRIVATE_COLOR properties for HDR support, brightness sliders can include a hardware controlled portion and a gamma-based porti

[PATCH v3 05/10] drm/amdgpu: add userq invalid VA query

2025-09-18 Thread Prike Liang
Add the userq invalid VA query interface. Signed-off-by: Prike Liang --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c index 83f0ecdaa0b7..3b57352e523a 100644

Re: [PATCH] drm/amdgpu: use hmm_pfns instead of array of pages

2025-09-18 Thread Christian König
On 17.09.25 19:22, Sunil Khatri wrote: > we dont need to allocate local array of pages to hold > the pages returned by the hmm, instead we could use > the hmm_range structure itself to get to hmm_pfn > and get the required pages directly. > > This saved alloc/free a lot of memory without > any imp

[PATCH v3 07/10] drm/amdgpu: track the userq bo va for its obj management

2025-09-18 Thread Prike Liang
Track the userq obj for its life time, and reference and dereference the buffer flag at its creating and destroying period. Suggested-by: Alex Deucher Signed-off-by: Prike Liang --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a

[PATCH v3 03/10] drm/amdgpu/userq: extend queue flags for user queue query status

2025-09-18 Thread Prike Liang
Add the userq flag to identify the invalid userq cases. Signed-off-by: Prike Liang --- include/uapi/drm/amdgpu_drm.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 7292f7bfcd13..62520c4e4b19 100644 --- a/include/uapi/drm/

[PATCH next] drm/amdgpu/userq: Fix error codes in mes_userq_mqd_create()

2025-09-18 Thread Dan Carpenter
Return the error code if amdgpu_userq_input_va_validate() fails. Don't return success. Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual address and size") Signed-off-by: Dan Carpenter --- drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 15 +-- 1 file changed, 9 insertio

[PATCH v2 2/2] amd/amdkfd: enhance kfd process check in switch partition

2025-09-18 Thread Yifan Zhang
current switch partition only check if kfd_processes_table is empty. kfd_prcesses_table entry is deleted in kfd_process_notifier_release, but kfd_process tear down is in kfd_process_wq_release. Process A workqueue -> kfd_process_wq_release -> Access kfd_node member Process B switch partition -> am

Re: [PATCH] drm/amdgpu: Fix vbios build number parsing logic

2025-09-18 Thread Alex Deucher
On Thu, Sep 18, 2025 at 9:09 AM Lijo Lazar wrote: > > It's not necessary that the build string and atom header section has a > difference of 32 bytes. Use the remaining bytes in the section as copy > limit. > > Signed-off-by: Lijo Lazar Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgp

Re: [PATCH 2/2] amd/amdkfd: drain kfd process workquque before switch partition

2025-09-18 Thread Zhang, Yifan
[Public] Thanks. Changed in v2. Best Regards, Yifan From: Lazar, Lijo Sent: Wednesday, September 17, 2025 1:51 PM To: Zhang, Yifan ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kuehling, Felix ; Yang, Philip ; Zhang, Yifan Subject: RE: [PATCH

[PATCH] drm/amdgpu: add module parameter enable_lr_compute_wa

2025-09-18 Thread Yifan Zhang
Default value is 0. No functional change with this patch. Signed-off-by: Yifan Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 11 +++ drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 2 ++ drivers/gpu/drm/amd/amdgpu/mes_v12

Re: [PATCH] drm/amdgpu/atom: Check kcalloc() for WS buffer in amdgpu_atom_execute_table_locked()

2025-09-18 Thread Alex Deucher
On Thu, Sep 18, 2025 at 8:44 AM Guangshuo Li wrote: > > kcalloc() may fail. When WS is non-zero and allocation fails, ectx.ws > remains NULL while ectx.ws_size is set, leading to a potential NULL > pointer dereference in atom_get_src_int() when accessing WS entries. > > Return -ENOMEM on allocatio

Re: [PATCH V11 06/47] drm/colorop: Add 1D Curve subtype

2025-09-18 Thread Pekka Paalanen
On Tue, 16 Sep 2025 17:01:07 -0600 Alex Hung wrote: > On 8/26/25 03:03, Pekka Paalanen wrote: > > On Thu, 21 Aug 2025 11:54:32 -0600 > > Alex Hung wrote: > > > >> On 8/21/25 06:23, Xaver Hugl wrote: > We user space folks have been convinced at this point that the sRGB EOTF > is ac

[PATCH v2] drm/amdgpu: Use kmalloc_array() instead of kmalloc()

2025-09-18 Thread Rahul Kumar
Documentation/process/deprecated.rst recommends against the use of kmalloc with dynamic size calculations due to the risk of overflow and smaller allocation being made than the caller was expecting. Replace kmalloc() with kmalloc_array() in amdgpu_amdkfd_gfx_v10.c, amdgpu_amdkfd_gfx_v10_3.c, amdgp

Re: [PATCH V11 31/47] drm/colorop: add BT2020/BT709 OETF and Inverse OETF

2025-09-18 Thread Pekka Paalanen
On Fri, 15 Aug 2025 21:28:47 -0600 Alex Hung wrote: > On 8/15/25 20:45, Shengyu Qu wrote: > > Hi, > > > > Thanks for reply. I guess we need to point this out in documentation or > > code comment? As I can see different definition somewhere like this[1]. > > > > Best regards, > > Shengyu > > >

[PATCH] drm/amdgpu: Fix vbios build number parsing logic

2025-09-18 Thread Lijo Lazar
It's not necessary that the build string and atom header section has a difference of 32 bytes. Use the remaining bytes in the section as copy limit. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/atom.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/

Re: [PATCH next] drm/amdgpu/userq: Fix error codes in mes_userq_mqd_create()

2025-09-18 Thread Christian König
On 18.09.25 11:46, Dan Carpenter wrote: > Return the error code if amdgpu_userq_input_va_validate() fails. Don't > return success. > > Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual address and > size") > Signed-off-by: Dan Carpenter > --- > drivers/gpu/drm/amd/amdgpu/mes_u

Re: [PATCH] drm/amdgpu: Fix pipelining jobs with timeline syncobj dependencies

2025-09-18 Thread David Rosca
On 18. 09. 25 9:47, Tvrtko Ursulin wrote: On 17/09/2025 11:54, David Rosca wrote: Hi, On 17. 09. 25 12:15, Tvrtko Ursulin wrote: Hi, On 17/09/2025 10:59, David Rosca wrote: drm_syncobj_find_fence returns fence chain for timeline syncobjs. Scheduler expects normal fences as job dependenci

[PATCH 05/20] drm/amd/display: Handle interpolation for first data point

2025-09-18 Thread IVAN.LIPSKI
From: Mario Limonciello [Why] If the first data point for a custom brightness curve is not 0% luminance then the first few luminance values will be ignored. [How] Check signal is below first data point and if so do linear interpolation to 0 instead. Reviewed-by: Alex Hung Signed-off-by: Mario

[PATCH v3 10/10] drm/amdgpu: validate userq va for GEM unmap

2025-09-18 Thread Prike Liang
When an user unmaps a userq VA, the driver must ensure the queue has no in-flight jobs. If there is pending work, the kernel should wait for the attached eviction (bookkeeping) fence to signal before deleting the mapping. Suggested-by: Christian König Signed-off-by: Prike Liang --- drivers/gpu/

[PATCH v3 02/10] drm/amdgpu/userq: implement support for query status

2025-09-18 Thread Prike Liang
From: Alex Deucher Query the status of the user queue, currently whether the queue is hung and whether or not VRAM is lost. v2: Misc cleanups Reviewed-by: Sunil Khatri Signed-off-by: Alex Deucher Reviewed-by: Prike Liang --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 35 +++

[PATCH v3 09/10] drm/amdgpu: validate the queue va for resuming the queue

2025-09-18 Thread Prike Liang
It requires validating the userq VA whether is mapped before trying to resume the queue. Signed-off-by: Prike Liang Reviewed-by: Christian König Reviewed-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 24 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h | 1

[PATCH v3 08/10] drm/amdgpu: keeping waiting userq fence infinitely

2025-09-18 Thread Prike Liang
Keeping waiting the userq fence infinitely untill hang detection, and then suspend the hang queue and set the fence error. Signed-off-by: Prike Liang --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/

[PATCH v3 06/10] drm/amdgpu: add userq object va track helpers

2025-09-18 Thread Prike Liang
Add the userq object virtual address get(),mapped() and put() helpers for tracking the userq obj va address usage. Signed-off-by: Prike Liang --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 73 -- drivers/gpu/drm/amd/amdgpu/

[PATCH v3 04/10] drm/amdgpu/userq: extend userq state

2025-09-18 Thread Prike Liang
Extend the userq state for identifying the userq invalid cases. Signed-off-by: Prike Liang --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h index 2260b1fb8a22..

Re: [PATCH] drm/amdgpu: Fix pipelining jobs with timeline syncobj dependencies

2025-09-18 Thread Tvrtko Ursulin
On 17/09/2025 11:54, David Rosca wrote: Hi, On 17. 09. 25 12:15, Tvrtko Ursulin wrote: Hi, On 17/09/2025 10:59, David Rosca wrote: drm_syncobj_find_fence returns fence chain for timeline syncobjs. Scheduler expects normal fences as job dependencies to be able to determine whether the fence

Re: [PATCH] drm/amdgpu: add missing comment for the new argument

2025-09-18 Thread Christian König
On 18.09.25 06:09, Sunil Khatri wrote: > In function 'amdgpu_vm_lock_done_list' update the comment > for the new argument 'vm'. > > Reported-by: kernel test robot > Closes: > https://lore.kernel.org/oe-kbuild-all/202509180211.uaqme0zj-...@intel.com/ > Signed-off-by: Sunil Khatri Reviewed-by: C