The problem is reproducible with enable UBSAN.

 
================================================================================
[    3.866643] UBSAN: Undefined behaviour in drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:379:29
[    3.866656] index 2 is out of range for type 'amdgpu_uvd_inst [2]'
[    3.866667] CPU: 0 PID: 59 Comm: kworker/0:1 Not tainted 4.16.0-rc7+ #3
[    3.866677] Hardware name: Gigabyte Technology Co., Ltd. GA-990FXA-UD7/GA-990FXA-UD7, BIOS F9 06/08/2012
[    3.866693] Workqueue: events work_for_cpu_fn
[    3.866702] Call Trace:
[    3.866710]  dump_stack+0x85/0xc5
[    3.866719]  ubsan_epilogue+0x9/0x40
[    3.866727]  __ubsan_handle_out_of_bounds+0x89/0x90
[    3.866737]  ? rcu_read_lock_sched_held+0x58/0x60
[    3.866746]  ? __kmalloc+0x26c/0x2d0
[    3.866846]  amdgpu_fence_driver_start_ring+0x259/0x280 [amdgpu]
[    3.866896]  amdgpu_ring_init+0x12c/0x710 [amdgpu]
[    3.866906]  ? sprintf+0x42/0x50
[    3.866956]  amdgpu_gfx_kiq_init_ring+0x1bc/0x3a0 [amdgpu]
[    3.867009]  gfx_v8_0_sw_init+0x1ad3/0x2360 [amdgpu]
[    3.867062]  ? smu7_init+0xec/0x160 [amdgpu]
[    3.867109]  amdgpu_device_init+0x112c/0x1dc0 [amdgpu]
[    3.867120]  ? rcu_read_lock_sched_held+0x58/0x60
[    3.867166]  amdgpu_driver_load_kms+0x74/0x2e0 [amdgpu]
[    3.867178]  drm_dev_register+0x134/0x1c0
[    3.867223]  amdgpu_pci_probe+0x163/0x270 [amdgpu]
[    3.867233]  local_pci_probe+0x42/0xa0
[    3.867242]  work_for_cpu_fn+0x16/0x20
[    3.867250]  process_one_work+0x269/0x640
[    3.867260]  worker_thread+0x216/0x3d0
[    3.867268]  ? process_one_work+0x640/0x640
[    3.867276]  kthread+0x113/0x130
[    3.867282]  ? kthread_create_worker_on_cpu+0x50/0x50
[    3.867293]  ret_from_fork+0x27/0x50
[    3.867304] ================================================================================
[    3.869808] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[    3.871505] [drm] Found VCE firmware Version: 53.26 Binary ID: 3


The fix will follow.

Regards,
Leo



On 06/25/2018 03:02 PM, Alex Deucher wrote:
On Mon, Jun 25, 2018 at 2:59 PM, James Zhu <[email protected]> wrote:

On 2018-06-25 02:53 PM, Alex Deucher wrote:

On Mon, Jun 25, 2018 at 2:37 PM, James Zhu <[email protected]> wrote:

For one UVD instance case,:


In function amdgpu_driver_load_kms, all ring->me should be set to zero.
     adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);


For two UVD instances cases:

static void uvd_v7_0_set_ring_funcs(struct amdgpu_device *adev)
..
     for (i = 0; i < adev->uvd.num_uvd_inst; i++) {
         adev->uvd.inst[i].ring.me = i;

static void uvd_v7_0_set_enc_ring_funcs(struct amdgpu_device *adev)

     for (j = 0; j < adev->uvd.num_uvd_inst; j++) {
             adev->uvd.inst[j].ring_enc[i].me = j;

uvd_v4_2_early_init in uvd_v4_2.c  adev->uvd.num_uvd_inst = 1;
uvd_v5_0_early_init in uvd_v5_0.c  adev->uvd.num_uvd_inst = 1;
uvd_v6_0_early_init in uvd_v6_0.c  adev->uvd.num_uvd_inst = 1;
uvd_v7_0_early_init in uvd_v7_0.c
     if (adev->asic_type == CHIP_VEGA20)
         adev->uvd.num_uvd_inst = UVD7_MAX_HW_INSTANCES_VEGA20;/*2*/
     else
         adev->uvd.num_uvd_inst = 1;


I didn't know when ring->me is set to 2. Maybe there is some leakage
somewhere.

What about older uvd (4.2, 5.0, 6.0) blocks?

I think the below code will reset
adev->uvd.inst[AMDGPU_MAX_UVD_INSTANCES].ring->me and
adev->uvd.inst[AMDGPU_MAX_UVD_INSTANCES].ring_enc[AMDGPU_MAX_UVD_ENC_RINGS]->me
to zero.
for older uvd IP UVD block.

adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);

Do I understand correctly?
Yes, it should.  That's why it doesn't make sense that it would be
getting another value.

Alex

James

Alex

Best regards!

James zhu


On 2018-06-25 01:29 PM, Deucher, Alexander wrote:

Odd. The structure should be 0 initialized.  Does this patch help?


Alex

________________________________
From: Timothy Pearson <[email protected]>
Sent: Monday, June 25, 2018 11:53:12 AM
To: Zhu, James
Cc: [email protected]; Deucher, Alexander; Zhou,
David(ChunMing); Koenig, Christian
Subject: Re: [PATCH] Increase AMDGPU_MAX_UVD_INSTANCES to 3

n 06/25/2018 09:46 AM, James Zhu wrote:

On 2018-06-23 08:02 PM, Timothy Pearson wrote:

amdgpu_fence_driver_start_ring() attempts to access
UVD instance 2 during setup, while the existing UVD
instance count only allows instances 0 and 1.

Increase AMDGPU_MAX_UVD_INSTANCES by one to avoid the
invalid array access.

Caught by UBSAN.

Hi Timothy,

 From design of view, it is not right to just change
AMDGPU_MAX_UVD_INSTANCES to 3.

Could you tell me some detail of UBSAN test and attach the dmesg also?

Definitely, was looking for some feedback from anyone knowing more about
the internals of the UVD system.

What's happening is that "ring->me" in amdgpu_fence_driver_start_ring()
(drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:379) is set to a value of
"2".  The overall dmesg is otherwise uninteresting, but I can try to
grab the UBSAN output if needed.

--
Timothy Pearson
Raptor Engineering
+1 (415) 727-8645 (direct line)
+1 (512) 690-0200 (switchboard)
https://www.raptorengineering.com



_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to