Re: [PATCH v4] drm/panfrost: Move the GPU reset bits outside the timeout handler

Boris Brezillon Thu, 05 Nov 2020 05:40:40 -0800

On Thu, 5 Nov 2020 13:27:04 +0000
Steven Price <[email protected]> wrote:


> > +   old_status = atomic_xchg(&queue->status,
> > +                            PANFROST_QUEUE_STATUS_STOPPED);
> > +   WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE &&
> > +           old_status != PANFROST_QUEUE_STATUS_STOPPED);
> > +   if (old_status == PANFROST_QUEUE_STATUS_STOPPED)
> > +           goto out;  
> 
> NIT: It's slightly cleaner if you swap the above lines, i.e.:
> 
>       if (old_status == PANFROST_QUEUE_STATUS_STOPPED)
>               goto out;
>       WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE);

I agree.

> 
> > +
> > +   drm_sched_stop(&queue->sched, bad);
> > +   if (bad)
> > +           drm_sched_increase_karma(bad);
> > +
> > +   stopped = true;
> > +
> > +   /*
> > +    * Set the timeout to max so the timer doesn't get started
> > +    * when we return from the timeout handler (restored in
> > +    * panfrost_scheduler_start()).
> > +    */
> > +   queue->sched.timeout = MAX_SCHEDULE_TIMEOUT;
> > +
> > +out:
> >     mutex_unlock(&queue->lock);
> >   
> >     return stopped;
> >   }
> >   
> > +static void panfrost_scheduler_start(struct panfrost_queue_state *queue)
> > +{
> > +   enum panfrost_queue_status old_status;
> > +
> > +   mutex_lock(&queue->lock);
> > +   old_status = atomic_xchg(&queue->status,
> > +                            PANFROST_QUEUE_STATUS_STARTING);
> > +   if (WARN_ON(old_status != PANFROST_QUEUE_STATUS_STOPPED))
> > +           goto out;  
> 
> The error handling isn't great here - in this case the queue status is 
> left in _STATUS_STARTING, which at best would lead to another WARN_ON 
> being hit, but also has the effect of ignoring job faults. Probably the 
> timeout would eventually get things back to normal.
> 
> Obviously this situation will never occur™, but we can do better either 
> by continuing with the normal logic below, or even better replacing 
> atomic_xchg() with an atomic_cmpxchg() (so leave the status alone if not 
> _STOPPED). Both seem like better error recovery options to me. But keep 
> the WARN_ON because something has clearly gone wrong if this happens.

The second approach doesn't unblock things if we end up with
old_status != STOPPED and the queue is really stopped (which shouldn't
happen, unless we have a problem in our state machine). I think I'll
go for the first option and restart the queue unconditionally (I'm
keeping the WARN_ON(), of course).
_______________________________________________
dri-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] drm/panfrost: Move the GPU reset bits outside the timeout handler

Reply via email to