scheduler: stop setting rq to NULL

Andrey Grodzovsky Thu, 09 Aug 2018 10:40:31 -0700


On 08/06/2018 04:14 AM, Christian König wrote:

Am 03.08.2018 um 20:41 schrieb Andrey Grodzovsky:
On 08/06/2018 08:44 AM, Christian König wrote:
Am 03.08.2018 um 16:54 schrieb Andrey Grodzovsky:
[SNIP]
    >
    > Second of all, even after we removed the entity from rq in
    > drm_sched_entity_flush to terminate any subsequent submissions
    >
    > to the queue the other thread doing push job can just
    acquire again a
    > run queue
    >
    > from drm_sched_entity_get_free_sched and continue submission

Hi Christian

    That is actually desired.

    When another process is now using the entity to submit jobs
    adding it
    back to the rq is actually the right thing to do cause the
    entity is
    still in use.
Yes, no problem if it's another process. But what about anotherthread from same process ? Is it a possible use case that 2threads from same process submit to same entity concurrently ? Ifso and we specifically kill one, the other will not stop event ifwe want him to because current code makes him just require a rqfor him self.
Well you can't kill a single thread of a process (you can onlyinterrupt it), a SIGKILL will always kill the whole process.
Is the following scenario possible and acceptable ?
2 threads from same process working on same queue where thread Acurrently in drm_sched_entity_flush->wait_event_timeout (theprocess getting shut down because of SIGKILL sent by user)while thread B still inside drm_sched_entity_push_job before 'if(reschedule)'. 'A' stopped waiting because queue became empty andthen removes the entity queue from scheduler's run queue whileB goes inside 'reschedule' because it evaluates to true ('first' istrue and all the rest of the conditions), acquires new rq, andlater adds it back to scheduler (different one maybe) and keepssubmitting jobs as much as he likes and then can be stack for up to'timeout' time in his drm_sched_entity_flush waiting for them.
I'm not 100% sure but I don't think that can happen.
See flushing the fd is done while dropping the fd, which happensonly after all threads of the process in question are killed.
Yea, this FDs handling is indeed a lot of gray area for me but as faras I remember flushing is done per each thread when exits (possiblydue to a signal).Now signals interception and processing (as a result of which .flushwill get called if SIGKILL received) is done in some points amongstwhich is when returning from IOCTL.So if first thread was at the very end of the CS ioctl when SIGKILLwas received while the other one at the beginning then I think wemight see something like the scenario above.
SIGKILL isn't processed as long as any thread of the application isstill inside the kernel. That's why we have wait_event_killable().


Can you tell me where is this happening ? What i see is in the code is that

do_group_exit calls zap_other_threads which just adds SIGKILL to signalsets of other threads in group and sends a wake up.Then do_exit will close all FDs for current thread and so .flush will becalled, when last thread drops it's refcount for the FD .release will becalled.


Andrey

So I don't think that the scenario above is possible, but I'm reallynot 100% sure either.
Christian.
Andrey
Otherwise the flushing wouldn't make to much sense. In other wordsimagine an application where a thread does a write() on a fd whichis killed.
The idea of the flush is to preserve the data and that won't work ifthat isn't correctly ordered.
My understanding was that introduction of entity->last is to forceimmediate termination job submissions by any thread from theterminating process.
We could consider reordering that once more. Going to play out allscenarios in my head over the weekend :)
Christian.
Andrey
_______________________________________________
dri-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/2] drm/scheduler: stop setting rq to NULL

Reply via email to