Op 14-10-2021 om 15:25 schreef Tvrtko Ursulin:
>
> On 14/10/2021 13:05, Maarten Lankhorst wrote:
>> Op 14-10-2021 om 10:37 schreef Tvrtko Ursulin:
>>>
>>> On 13/10/2021 11:41, Maarten Lankhorst wrote:
>>>> No memory should be allocated when calling i915_gem_object_wait,
>>>> because it may be called to idle a BO when evicting memory.
>>>>
>>>> Fix this by using dma_resv_iter helpers to call
>>>> i915_gem_object_wait_fence() on each fence, which cleans up the code a lot.
>>>> Also remove dma_resv_prune, it's questionably.
>>>>
>>>> This will result in the following lockdep splat.
>>>
>>> <snip>
>>>
>>>> @@ -37,56 +36,17 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
>>>> unsigned int flags,
>>>> long timeout)
>>>> {
>>>> - struct dma_fence *excl;
>>>> - bool prune_fences = false;
>>>> -
>>>> - if (flags & I915_WAIT_ALL) {
>>>> - struct dma_fence **shared;
>>>> - unsigned int count, i;
>>>> - int ret;
>>>> + struct dma_resv_iter cursor;
>>>> + struct dma_fence *fence;
>>>> - ret = dma_resv_get_fences(resv, &excl, &count, &shared);
>>>> - if (ret)
>>>> - return ret;
>>>> -
>>>> - for (i = 0; i < count; i++) {
>>>> - timeout = i915_gem_object_wait_fence(shared[i],
>>>> - flags, timeout);
>>>> - if (timeout < 0)
>>>> - break;
>>>> + dma_resv_iter_begin(&cursor, resv, flags & I915_WAIT_ALL);
>>>> + dma_resv_for_each_fence_unlocked(&cursor, fence) {
>>>> - dma_fence_put(shared[i]);
>>>> - }
>>>> -
>>>> - for (; i < count; i++)
>>>> - dma_fence_put(shared[i]);
>>>> - kfree(shared);
>>>> -
>>>> - /*
>>>> - * If both shared fences and an exclusive fence exist,
>>>> - * then by construction the shared fences must be later
>>>> - * than the exclusive fence. If we successfully wait for
>>>> - * all the shared fences, we know that the exclusive fence
>>>> - * must all be signaled. If all the shared fences are
>>>> - * signaled, we can prune the array and recover the
>>>> - * floating references on the fences/requests.
>>>> - */
>>>> - prune_fences = count && timeout >= 0;
>>>> - } else {
>>>> - excl = dma_resv_get_excl_unlocked(resv);
>>>> + timeout = i915_gem_object_wait_fence(fence, flags, timeout);
>>>> + if (timeout <= 0)
>>>> + break;
>>>
>>> You have another change in behaviour here, well a bug really. When
>>> userspace passes in zero timeout you fail to report activity in other than
>>> the first fence.
>>
>> Hmm, not necessarily, passing 0 to i915_gem_object_wait_fence timeout = 0 is
>> a special case and means test only. It will return 1 on success.
>
> I tried to enumerate the whole chain here. All for timeout == 0. Please
> double check I did not make a mistake somewhere since there are many return
> code inversions here.
>
> As building blocks for the whole "game" we have:
>
> 1. dma_fence_default_wait, it returns for states:
>
> not signaled -> 0
> signaled -> 1
>
> 2. i915_request_wait
>
> not signaled -> -ETIME
> signaled -> 0
>
> Then i915_gem_object_wait_fence builds on top of it and has therefore these
> possible outputs:
>
> signaled -> 0
> not signaled:
> i915 path -> -ETIME
> ext fence -> 0
>
> So this looks a like problem already with 0 for signaled and not signaled.
> Unless it is by design that the return value does not want to report external
> fences? But it is not documented and it still waits on them so odd.
>
> Then in i915_gem_object_wait_reservation we have a loop:
>
> for (i = 0; i < count; i++) {
> timeout = i915_gem_object_wait_fence(shared[i],
> flags, timeout);
> if (timeout < 0)
> break;
>
> So short circuit happens only for i915 fences, by virtue of no negative
> return codes otherwise.
>
> If we focus for i915 fences only for a moment. It means it keeps skipping
> signaled to check if any is not, therefore returning -ETIME if any is not
> signaled. i915_gem_object_wait passes the negative return on.
>
> With your patch you have:
>
> + timeout = i915_gem_object_wait_fence(fence, flags, timeout);
> + if (timeout <= 0)
> + break;
>
> Which means you break on first signaled fence (i915 or external), therefore
> missing to report any possible subsequent unsignaled fences. So gem_wait
> ioctl breaks unless I am missing something.
You're cc'd on a mail I sent to König regarding this.
"Re: [PATCH 20/28] drm/i915: use new iterator in
i915_gem_object_wait_reservation"
[email protected]
timeout = 0 is a special case, fence_wait should return 1 if signaled, or 0 if
waiting. Not -ETIME, as i915 does currently.
This means our i915_fence_wait() handler is currently very wrong too, needs to
be fixed. It returns 0 if timeout = 0 even
if signaled.
I think it cancels the fail in our gem_object_wait, but more consistency is
definitely needed first.
I think it's best to keep the current semantics for i915_reuest_wait, but make
it a wrapper around a
fixed i915_request_wait_timeout(), which would have the correct return
semantics.
~Maarten