but how to understand 1)
what do you mean "The CS can still be blocked because of
amdgpu_ctx_add_fence()."
See amdgpu_ctx_add_fence(), it can block for previous command
submissions just like entity_push_job(). So only moving
entity_push_job() out of locking the PD doesn't help at all.
for 2)The order of submission isn't correct any more.
I think since the pointer "job" is already a dirty pointer,
meaningless that we talking about it if the order is correct ...
The problem isn't parser->job, but rather that the job is referencing
the entity which is part of the context and we already called
amdgpu_ctx_put() on that one.
Regards,
Christian.
Am 06.09.2017 um 10:04 schrieb Liu, Monk:
>The patch doesn't work at all:
1. The CS can still be blocked because of amdgpu_ctx_add_fence().
2. The order of submission isn't correct any more.
3. We could end up using freed up memory because we now drop the
ctx reference to early.
I suddenly found that the parser->job is really a wild pointer:
amdgpu_cs_parser_fini(p, 0, true);
trace_amdgpu_cs_ioctl(job);
amd_sched_entity_push_job(&job->base);
so "cs_parser_fini" cannot be called before "entity_push_job", that
part is correct
but how to understand 1)
what do you mean "The CS can still be blocked because of
amdgpu_ctx_add_fence()."
for 2)The order of submission isn't correct any more.
I think since the pointer "job" is already a dirty pointer,
meaningless that we talking about it if the order is correct ...
BR Monk
------------------------------------------------------------------------
*From:* amd-gfx <[email protected]> on behalf of
Christian König <[email protected]>
*Sent:* Tuesday, September 5, 2017 9:14:23 PM
*To:* [email protected]; Zhou, David(ChunMing)
*Subject:* [PATCH] drm/amdgpu: revert "fix deadlock of reservation
between cs and gpu reset v2"
From: Christian König <[email protected]>
This reverts commit 10e709cb296c98424c03408d23e3addeddcd4088.
The patch doesn't work at all:
1. The CS can still be blocked because of amdgpu_ctx_add_fence().
2. The order of submission isn't correct any more.
3. We could end up using freed up memory because we now drop the
ctx reference to early.
This needs to be fixed cleanly by doing the context handling after the BO
handling, but this is a larger task just avoid the obvious crashes for
now.
Signed-off-by: Christian König <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index b96776c..2db4010 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1061,7 +1061,6 @@ static int amdgpu_cs_submit(struct
amdgpu_cs_parser *p,
cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring, p->fence);
job->uf_sequence = cs->out.handle;
amdgpu_job_free_resources(job);
- amdgpu_cs_parser_fini(p, 0, true);
trace_amdgpu_cs_ioctl(job);
amd_sched_entity_push_job(&job->base);
@@ -1120,10 +1119,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev,
void *data, struct drm_file *filp)
goto out;
r = amdgpu_cs_submit(&parser, cs);
- if (r)
- goto out;
- return 0;
out:
amdgpu_cs_parser_fini(&parser, r, reserved_buffers);
return r;
--
2.7.4
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx