On Tue, May 05, 2015 at 10:36:24AM -0400, Peter Hurley wrote:
> On 05/04/2015 12:52 AM, Mario Kleiner wrote:
> > On 04/16/2015 03:03 PM, Daniel Vetter wrote:
> >> On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote:
> >>> On 04/15/2015 01:31 PM, Daniel Vetter wrote:
> >>>> On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
> >>>>> Hi Daniel,
> >>>>>
> >>>>> On 04/15/2015 03:17 AM, Daniel Vetter wrote:
> >>>>>> This was a bit too much cargo-culted, so lets make it solid:
> >>>>>> - vblank->count doesn't need to be an atomic, writes are always done
> >>>>>> under the protection of dev->vblank_time_lock. Switch to an unsigned
> >>>>>> long instead and update comments. Note that atomic_read is just a
> >>>>>> normal read of a volatile variable, so no need to audit all the
> >>>>>> read-side access specifically.
> >>>>>>
> >>>>>> - The barriers for the vblank counter seqlock weren't complete: The
> >>>>>> read-side was missing the first barrier between the counter read and
> >>>>>> the timestamp read, it only had a barrier between the ts and the
> >>>>>> counter read. We need both.
> >>>>>>
> >>>>>> - Barriers weren't properly documented. Since barriers only work if
> >>>>>> you have them on boths sides of the transaction it's prudent to
> >>>>>> reference where the other side is. To avoid duplicating the
> >>>>>> write-side comment 3 times extract a little store_vblank() helper.
> >>>>>> In that helper also assert that we do indeed hold
> >>>>>> dev->vblank_time_lock, since in some cases the lock is acquired a
> >>>>>> few functions up in the callchain.
> >>>>>>
> >>>>>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
> >>>>>> the vblank_wait ioctl.
> >>>>>>
> >>>>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >>>>>> Cc: Mario Kleiner <mario.kleiner.de at gmail.com>
> >>>>>> Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> >>>>>> Cc: Michel Dänzer <michel at daenzer.net>
> >>>>>> Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> >>>>>> ---
> >>>>>> drivers/gpu/drm/drm_irq.c | 92
> >>>>>> ++++++++++++++++++++++++-----------------------
> >>>>>> include/drm/drmP.h | 8 +++--
> >>>>>> 2 files changed, 54 insertions(+), 46 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> >>>>>> index c8a34476570a..23bfbc61a494 100644
> >>>>>> --- a/drivers/gpu/drm/drm_irq.c
> >>>>>> +++ b/drivers/gpu/drm/drm_irq.c
> >>>>>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay,
> >>>>>> drm_vblank_offdelay, int, 0600);
> >>>>>> module_param_named(timestamp_precision_usec,
> >>>>>> drm_timestamp_precision, int, 0600);
> >>>>>> module_param_named(timestamp_monotonic, drm_timestamp_monotonic,
> >>>>>> int, 0600);
> >>>>>>
> >>>>>> +static void store_vblank(struct drm_device *dev, int crtc,
> >>>>>> + unsigned vblank_count_inc,
> >>>>>> + struct timeval *t_vblank)
> >>>>>> +{
> >>>>>> + struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> >>>>>> + u32 tslot;
> >>>>>> +
> >>>>>> + assert_spin_locked(&dev->vblank_time_lock);
> >>>>>> +
> >>>>>> + if (t_vblank) {
> >>>>>> + tslot = vblank->count + vblank_count_inc;
> >>>>>> + vblanktimestamp(dev, crtc, tslot) = *t_vblank;
> >>>>>> + }
> >>>>>> +
> >>>>>> + /*
> >>>>>> + * vblank timestamp updates are protected on the write side with
> >>>>>> + * vblank_time_lock, but on the read side done locklessly using a
> >>>>>> + * sequence-lock on the vblank counter. Ensure correct ordering
> >>>>>> using
> >>>>>> + * memory barrriers. We need the barrier both before and also
> >>>>>> after the
> >>>>>> + * counter update to synchronize with the next timestamp write.
> >>>>>> + * The read-side barriers for this are in
> >>>>>> drm_vblank_count_and_time.
> >>>>>> + */
> >>>>>> + smp_wmb();
> >>>>>> + vblank->count += vblank_count_inc;
> >>>>>> + smp_wmb();
> >>>>>
> >>>>> The comment and the code are each self-contradictory.
> >>>>>
> >>>>> If vblank->count writes are always protected by vblank_time_lock
> >>>>> (something I
> >>>>> did not verify but that the comment above asserts), then the trailing
> >>>>> write
> >>>>> barrier is not required (and the assertion that it is in the comment is
> >>>>> incorrect).
> >>>>>
> >>>>> A spin unlock operation is always a write barrier.
> >>>>
> >>>> Hm yeah. Otoh to me that's bordering on "code too clever for my own
> >>>> good".
> >>>> That the spinlock is held I can assure. That no one goes around and does
> >>>> multiple vblank updates (because somehow that code raced with the hw
> >>>> itself) I can't easily assure with a simple assert or something similar.
> >>>> It's not the case right now, but that can changes.
> >>>
> >>> The algorithm would be broken if multiple updates for the same vblank
> >>> count were allowed; that's why it checks to see if the vblank count has
> >>> not advanced before storing a new timestamp.
> >>>
> >>> Otherwise, the read side would not be able to determine that the
> >>> timestamp is valid by double-checking that the vblank count has not
> >>> changed.
> >>>
> >>> And besides, even if the code looped without dropping the spinlock,
> >>> the correct write order would still be observed because it would still
> >>> be executing on the same cpu.
> >>>
> >>> My objection to the write memory barrier is not about optimization;
> >>> it's about correct code.
> >>
> >> Well diff=0 is not allowed, I guess I could enforce this with some
> >> WARN_ON. And I still think my point of non-local correctness is solid.
> >> With the smp_wmb() removed the following still works correctly:
> >>
> >> spin_lock(vblank_time_lock);
> >> store_vblank(dev, crtc, 1, ts1);
> >> spin_unlock(vblank_time_lock);
> >>
> >> spin_lock(vblank_time_lock);
> >> store_vblank(dev, crtc, 1, ts2);
> >> spin_unlock(vblank_time_lock);
> >>
> >> But with the smp_wmb(); removed the following would be broken:
> >>
> >> spin_lock(vblank_time_lock);
> >> store_vblank(dev, crtc, 1, ts1);
> >> store_vblank(dev, crtc, 1, ts2);
> >> spin_unlock(vblank_time_lock);
> >>
> >> because the compiler/cpu is free to reorder the store for vblank->count
> >> _ahead_ of the store for the timestamp. And that would trick readers into
> >> believing that they have a valid timestamp when they potentially raced.
> >>
> >> Now you're correct that right now there's no such thing going on, and it's
> >> unlikely to happen (given the nature of vblank updates). But my point is
> >> that if we optimize this then the correctness can't be proven locally
> >> anymore by just looking at store_vblank, but instead you must audit all
> >> the callers. And leaking locking/barriers like that is too fragile design
> >> for my taste.
> >>
> >> But you insist that my approach is broken somehow and dropping the smp_wmb
> >> is needed for correctness. I don't see how that's the case at all.
>
> Daniel,
>
> I've been really busy this last week; my apologies for not replying promptly.
>
> > Fwiw, i spent some time reeducating myself about memory barriers (thanks
> > for your explanations) and thinking about this, and the last version of
> > your patch looks good to me. It also makes sense to me to leave that last
> > smb_wmb() in place to make future use of the helper robust - for non-local
> > correctness, to avoid having to audit all future callers of that helper.
>
> My concern wrt to unnecessary barriers in this algorithm is that the trailing
> barrier now appears mandatory, when in fact it is not.
>
> Moreover, this algorithm is, in general, fragile and not designed to handle
> random or poorly-researched changes.
Less fragility is exactly why I want that surplus barrier. But I've run
out of new ideas for how to explain that ...
> For example, if only the read and store operations are considered, it's
> obviously
> unsafe, since a read may unwittingly retrieve an store in progress.
>
>
> CPU 0 | CPU 1
> |
> /* vblank->count == 0 */
> |
> drm_vblank_count_and_time() | store_vblank(.., inc = 2, ...)
> |
> cur_vblank <= LOAD vblank->count |
> | tslot = vblank->count + 2
> | /* tslot == 2 */
> | STORE vblanktime[0]
This line here is wrong, it should be "STORE vblanktime[2]"
The "STORE vblanktime[0]" happened way earlier, before 2 smp_wmb and the
previous updating of vblank->count.
I'm also somewhat confused about how you to a line across both cpus for
barriers because barriers only have cpu-local effects (which is why we
always need a barrier on both ends of a transaction).
In short I still don't follow what's wrong.
-Daniel
> - - - - - - - - - - - - - - - - - - - - - smp_wmb() - - - - - - - - - -
> /* cur_vblank == 0 */ |
> local <= LOAD vblanktime[0] |
> smp_rmb - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> |
> * cpu has loaded the wrong timestamp * |
> |
> local <= LOAD vblank->count |
> cur_vblank == local? |
> yes - exit loop |
> | vblank->count += 2
> - - - - - - - - - - - - - - - - - - - - - smp_wmb() - - - - - - - - - -
>
> Regards,
> Peter Hurley
>
>
> > I also tested your patch + a slightly modified version of Chris vblank
> > delayed disable / instant query patches + my fixes using my own stress
> > tests and hardware timing test equipment on both intel and nouveau, and
> > everything seems to work fine.
> >
> > So i'm all for including this patch and it has my
> >
> > Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de at gmail.com>
> >
> > I just sent out an updated version of my patches, so they don't conflict
> > with this one and also fix a compile failure of drm/qxl with yours.
> >
> > Thanks,
> > -mario
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch