On 10/29/25 21:17, Rodrigo Vivi wrote: > On Wed, Aug 13, 2025 at 03:29:39PM +0200, Christian König wrote: >> On 13.08.25 14:43, Janusz Krzysztofik wrote: >>> Hi Christian, >>> >>> On Tuesday, 8 July 2025 12:09:58 CEST Christian König wrote: >>>> On 08.07.25 10:56, Janusz Krzysztofik wrote: >>>>>> >>>>>> There is no reason to test enabling signaling each of the element in a >>>>>> loop. So there should be something like 4096 calls to the >>>>>> dma_fence_chain_cb function each jumping to the next unsignaled fence >>>>>> and re-installing the callback. >>>>> >>>>> So how building a chain should look like in real use cases? When a user >>>>> builds a chained link of her fence with another fence then may she enable >>>>> signaling on the new chain link? If that other fence occurs a chain link >>>>> then >>>>> who should take care of disabling signaling on it so signaling is enabled >>>>> only >>>>> on the last link of the chain, not leading to a situation similar to what >>>>> we >>>>> have now in the test case? IOW, what's a correct use pattern of >>>>> dma_fence_chain? I can't find that documented anywhere, neither in >>>>> inline >>>>> docs nor in commit descriptions. >>>> >>>> The dma_fence_chain container is basically a single linked list which >>>> allows to "chain" together multiple dma_fence objects. >>>> >>>> The use cases is to keep only a single fence even when multiple >>>> asynchronous operations have been started. >>>> >>>> So you usually keep only the most recently created dma_fence_chain and >>>> eventually ask that one to signal when you need to wait for all fences >>>> inside the container. >>>> >>>> The tricky part is that since the chain can get very long the >>>> implementation can't use recursion or otherwise we would potentially >>>> overflow the kernel stack. And that needs to be tested and made sure we >>>> don't accidentally break the implementation somehow. >>>> >>>> Keeping all elements of a dma_fence_chain in an array and asking all of >>>> them to signal individually makes no sense, for this use case we have the >>>> dma_fence_array in the first place. >>> >>> I'm going to submit a patch that drops enabling of signaling on each link >>> of >>> the tested chain, as you suggested. Don't you mind if I add your Suggested- >>> by:? >> >> Sure. >> >> Thanks for looking into that, > > > Hi Christian, > > Okay, so the take is that that IGT case is non-realistic and that that kernel > change is not needed. We still have a problem with this patch here take so > long to complete in a few platforms in a way that the watchdogs gets triggered > and test case is marked as failure. While if we do the reschedule waiting a > bit > longer it end up completing successfully as expected. > > Since the goal here is not performance but compliance, can we go with this > cond_reschedule here? Or do you have any other suggestion here on how > to handle this test case and make it pass on those old platforms? > > Or should we simply disable the tests for those platforms?
I think for this particular case we should just completely remove the test. HW can have some parallelism and out of order execution, but it is just unrealistic that 1k of submission made in the order A..Z signal in the order Z..A. What we should test is that in no possible combination the kernel runs into a stack overflow because of recursion in the release handler (for example) of 1k submissions. What we should also test from userspace is that we can't overload the callback functions of a fence. E.g. something make a submission and spawn thousands of threads who want to wait for this single submission. Regards, Christian. > > Thanks, > Rodrigo. > >> Christian. >> >>> >>> Thanks, >>> Janusz >>> >>>> >>>> Regards, >>>> Christian. >>>> >>>>> >>>>> Thanks, >>>>> Janusz >>>> >>> >>> >>> >>> >>
