Re: [PATCH weston v2 10/11] [RFC] Account for very large repaint window misses

Mario Kleiner Tue, 28 Mar 2017 11:54:20 -0700

On 03/28/2017 01:02 PM, Pekka Paalanen wrote:

Hi Mario,


I'm glad to hear from you, it's this kind of details we really
appreciate you for. :-)

On Tue, 28 Mar 2017 00:59:41 +0200
Mario Kleiner <[email protected]> wrote:

Hi Daniel & Pekka,

finally managed to go through the whole patch series, updated my own
application to current Wayland/Weston and test it a bit. I like it! It
would have gotten my Reviewed-by's if i had actually managed to review
it in some more reasonable time before the merge ;).

The only thing i would suggest is to make the time window for the output
update coalescing somewhat more tight and mostly to prevent repaint
deadlines from shifting.

In weston_output_maybe_repaint() you have this...

        msec_to_repaint = timespec_sub_to_msec(&output->next_repaint, now);
        if (msec_to_repaint > 1)
                return ret;

...for skipping an output for coalescing. Given timespec_sub_to_msec
floor()'s to full msecs, that means you'd accept a next_repaint almost 2
msecs (~1.999999 msecs) into the future, so for the candidate output
that would be like moving the repaint window deadline ~ 2 msecs closer
to the start of a refresh cycle, cutting off more clients earlier from
getting their surface updates out for the next vblank.


True, I think. Ideally we'd have a tunable (automatically chosen)
variable for the coalescing window.

After return from weston_output_repaint() you call ...

weston_compositor_read_presentation_clock(...&now)

... again to update "now", so if weston_output_repaint() for the current
output involves some serious compositing work, you shift the "now" point
for the following outputs in the compositors output list further into
the future, so depending on where an output is in the
compositor->output_list you could get such delays to add up and
essentially move the repaint window deadline for later outputs by more
than 2 msecs closer. I think that's not so good for predictability if
the position of an output in the output_list and the potentially varying
composition workload on preceding outputs can shift the repaint
deadlines for later outputs by a large amount and defer an actual
coalesced update for all outputs which are caught in this further.


I think I see what you mean. If an output is first in the list, it
might get postponed, while if it is last in the list, the updates to
'now' may cause it to get repainted now instead of postponed. And this
is arbitrary, although it should be "stable" (we don't reorder the
list). Except it probably isn't stable as different CRTCs run async,
which means the timings (relative cycle phases) change all the time.

I could argue, that outputs later in the list should not be postponed
like they would be if they were first in the list, in case the other
repaint programming has actually taken significant time. Postponing
would mean it will be later than it already is. But it also makes the
repaint/postpone decision less deterministic.

Considering we don't know how long it'll take to program the repaint
of any output, I'm not sure what we can do. From the determinism point
of view, and with the assumption that simply programming the repaints
(gl-renderer) should not take a significant amount of time, I would
agree to not update 'now' in the middle of the list traversal.

Yes, i think we agree. Of course once we have a presentation_queueextension, clients could tell the compositor their exact timingrequirements, so the compositor would have more info to work with wrt.deadlines and what related stuff to group together or skip. But atm. thebest we can do is to provide clients with some predictable andrelatively stable deadlines for an output.

So i'd probably drop that
weston_compositor_read_presentation_clock(...&now) to prevent this kind
of drift? And make the msec_to_repaint deadline more like >= 1 instead
of > 1 to limit the time window to at most 1 msec?

Ideally we'd probably have timers with better than 1 msec granularity to
deal with high refresh rate displays. The underlying timerfd api for
wl_event_source_timer_update() seems to support nsecs resolution.


Indeed. However, I'm not sure we want to promote libwayland-server
event loop API by adding another timer function that works with
nanoseconds or with absolute timestamps, I'd rather see Weston migrate
to a proper event loop library. But that's a whole another story.

Gamer or Stereo panels with 144 Hz or 165 Hz refresh are now becoming
more common. One of my users already uses a commercially available 240
Hz BenQ Zowie panel for reliable fullscreen high-speed animations under
X11 with the FOSS graphics stack, so refresh durations of only ~4 msecs
are now a thing and shifting repaint deadlines by 2 msecs or more would
have significant impact at only 4 msecs refresh.


That's a very good point. We should probably move from milliseconds to
nanoseconds.

I assume a very important intended use case for this output coalescing
is to make sure that outputs which are tightly genlocked/synchronized in
their video refresh cycles will really update/page-flip somewhat
reliably together and do so efficiently, e.g., if this is implemented on
top of some solid atomic flip kms-driver support? Stuff like
stereoscopic 3D output to two separate genlocked outputs for the two
eyes (3D cinema, medical/science applications, advanced VR/AR HMD's). Or
even multi-display walls or VR CAVE environments with > 2 outputs? For
such apps one would assume the outputs are tightly synchronized, so even
a < 1 msec window for coalescing should be fine.


Quite right about the atomic KMS. We want to coalesce all reasonably
possible output updates into a single KMS atomic commit, and not just
for genlocked CRTCs but any time vblanks happen close enough.

I believe the fundamental reason for it is to coalesce all video mode
changes into a single commit, particularly at start-up.

If you think about single-display VR apps like 1 output driving a
regular desktop GUI display, the other driving something like a cosumer
VR HMD like the HTC Vive or Oculus Rift, we'd also would want to make
sure the repaint behavior of the output driving the HMD is very
predictable and stable, even in presence of some activity on the regular
non-synchronized desktop screen, so apps can minimize motion-to-photon
latency. Output coalescing which would too liberally coalesce outputs
which are unrelated in their refresh cycles could hurt such applications
quite a bit. Or create funny beat patterns when the outputs refresh
cycles drift against each other (60 Hz vs. 75/90/144 Hz) and the
compositor alternates between coalescing updates together and treating
them separately in a way that could cause hard to understand or avoid
frame drops?


Yeah, the beating is in the latency and deadlines, which means the
probability to accidentally miss deadlines may rise.


Yes.

However, it seems that VR/HMDs are going to go a very different route:
there is work underway by Keith Packard and others to split DRM KMS
resources in the kernel, offering a sub-node with which another process
aside the regular display server could be driving the pageflips
directly. See:
https://keithp.com/blogs/Valve/

There are also other developments, like priority based, pre-emptive
task scheduling in the GPUs, which I'm told AMD already has written:
https://lists.freedesktop.org/archives/amd-gfx/2017-March/006106.html

Here's a related IRC log, between 06:50 and 07:31:
https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=dri-devel&highlight_names=&date=2017-03-28

Yep, i read keithp's Valve announcement with some excitement, as thatwould be very useful for my type of application as well, especially ifthere would be a dynamic way to split off such resources. Currently isolve that problem mixed GUI + fullscreen app multi-display setups bysetting up separate X-Screens + Zaphodheads to statically configureX-Screens/outputs that are mostly left alone by the desktopenvironments. Works well in terms of timing/performance, but the fun ofguiding users through the mysteries of xorg.conf, bugs in not so widelyused code paths in X/ddx/mesa/..., static configuration...

The IRC log is interesting to me, thanks!

Before I was aware of those plans, I wrote an idea for Wayland:
https://gist.github.com/ppaalanen/e0d2744ff71188e9a294726a37ca83c2

That is not far from your suggestion for optimizing the latency for
fullscreen apps hitting the direct scanout path you sent a long time
ago, except we'd just have better guarantees and less heuristics with
HMDs.


Have to look at your proposal in detail later. Interesting developments :).

I've read that the latest Vulkan spec now includes a
VK_GOOGLE_display_timing extension, intended for VR apps, which is
pretty close to what was proposed for Waylands presentation_queue
extension or VDPAU's frame scheduling for video playback.

Comments more to the point of patch 10/11 below...

On 03/13/2017 01:48 PM, Pekka Paalanen wrote:

On Fri, 10 Mar 2017 14:52:58 +0000
Daniel Stone <[email protected]> wrote:

Hi Pekka,

On 10 March 2017 at 13:41, Pekka Paalanen
<[email protected]> wrote:

On Wed,  1 Mar 2017 11:34:09 +0000 Daniel Stone <[email protected]> wrote:

       * the deadline given by repaint_msec? In that case we delay until
       * the deadline of the next frame, to give clients a more predictable
       * timing of the repaint cycle to lock on. */
-     if (presented_flags == WP_PRESENTATION_FEEDBACK_INVALID && msec_rel < 0)
-             timespec_add_nsec(&output->next_repaint, &output->next_repaint,
-                               refresh_nsec);
+     if (presented_flags == WP_PRESENTATION_FEEDBACK_INVALID &&
+         msec_rel < 0) {
+             while (timespec_sub_to_nsec(&output->next_repaint, &now) < 0) {
+                     timespec_add_nsec(&output->next_repaint,
+                                       &output->next_repaint,
+                                       refresh_nsec);
+             }
+     }

 out:
      output->repaint_status = REPAINT_SCHEDULED;


the very point is to postpone the repaint to a future time so that
clients would not be racing each other any more than they do when the
repaint loop is continuously active. In that sense this is a correct
change. It just seems a bit awkward to use a loop instead computing the
rounded number of frame periods by division.


Sure, not that difficult, just a little ugly:
int n_periods = !!(msec_rel % (refresh_nsec * NSEC_TO_MSEC));
n_periods += msec_rel / (refresh_nsec * NSEC_TO_MSEC);


Maybe I'll send my own proposal for this, if I can come up with
something nicer.

However, what strikes me as odd here is that this is done only when
coming from start_repaint_loop(). How could start_repaint_loop() cause
weston_output_finish_frame() to be called with a timestamp that is
older than the most recent vblank?


No idea: if you look at the diff, the original code did the exact same
thing, but only adjusted by a maximum of one period. I agree with you
that it would probably seem better to postpone in all cases, giving
sheer predictability (always respecting the repaint window exactly)
for speed. But I'd like some more input before changing this, and I'd
like to change it in a separate patch to this.


Ok, so i think i can clarify this. The point of this routine wasn't to
cope with massive scheduling delays/system overload etc. or other mild
disasters, but only to make sure that clients always see a predictable
repaint timing, regardless if the repaint loop was/is running when they
make their surface.commit, or if the output repaint was idle and is just
restarting the repaint loop, or if the specific client is the
reason/trigger for the restart of the repaint loop. This to make sure
that timing sensitive clients can "latch" onto some stable compositor
timing. See commit b7df04ebc048e18417a13d577d977d6e36c8a8ca

Iff the repaint loop is already running when a client requests an update
then the repaint window will define the cutoff point / deadline for
deciding if a clients update makes it for the next vblank, or if it gets
deferred to the next frame.

Iff the repaint loop was idle, the clients should see the same cutoff
deadline as if it was running. So two cases for that:

- If the first client in that refresh cycle triggered
weston_output_schedule_repaint() -> output->start_repaint_loop ->
weston_output_finish_frame while the output in its current refresh cycle
is before the repaint window deadline then you'll see a msec_rel >= 0
and the output should perform a repaint for that client (and possible
others which commit before the repaint window deadline) targetting the
next vblank.

- If the first client in that refresh cycle triggered a restart of the
repaint loop after the repaint_window cutoff deadline then you will get
a msecs_rel < 0 in the old code and current code, and the clients update
should get deferred to the vblank after the next vblank, just as would
have been the case if the repaint loop was running all the time. This is
what that if-statement => next_repaint += refresh_nsec is supposed to do.

What we want to prevent by shifting the repaint by 1 frame specifically
during restart of repaint loop if msecs_rel < 0 is that one client
triggers restart too late in the frame (= after repaint_window
deadline), then its repaint misses the intended next vblank (n) due to
the lateness, and a kms page-flip therefore skips to the vblank (n+1)
after this next one, but now all other potential latecomers from this
frame (targetting vblank n) and all the clients who wanted to commit an
update properly in time for the next refresh cycle (n+1) will get
delayed by a full frame while a kms page-flip waits for the (n+1) vblank
and only then triggers weston_output_finsh_frame for an update at vblank
(n+2) earliest. You'd punish clients which are in time by an extra 1
frame delay (earliest processing/completion at n+2 instead of n+1), and
all other latecomers by an extra 2 frame delay (n+2 vs. n) while at the
same time not achieving an improvement for the first latecomer, which
will likely still skip to n+1 instead of n.

This would be a common scenario, e.g., for 16 msecs refresh rate and our
default repaint window of 7 msecs = 7/16 ~ 43% probability of a random
client triggering this.


While the above explanation is true and correct as far as I understand,
it does not answer my question. :-)


Oops! :)

The question was: when would the postponing adjustment be more than one
refresh period? Why need the loop, why is not a one-shot adjustment
enough?

start_repaint_loop() is specifically written to check that the
timestamp from the immediate vblank query is not several periods in the
past. So the only time I see adjustment loop spinning more than once is
if we fell back to a flip in start_repaint_loop() and for some reason
are extremely late in processing the resulting page flip event, or the
page flip event carries a bad timestamp.

Agreed. The adjustment can never be more than 1 refresh period, ergodoesn't need a loop, unless we'd get the very unlikely case of Westongetting preempted for a long time exactly during the few instructionsleading from output->start_repaint_loop to weston_output_finish_framereading the compositor clock. In which case we'd glitch badly in anycase. I can't think of a way, apart from kms driver bug, that thepageflip event could carry a bad timestamp, and for the pageflip drivenpath we don't hit that one-shot adjustment (or potential loop) anyway,as the flags aren't WP_PRESENTATION_FEEDBACK_INVALID.

Unfortunately, I also conflated another topic in the same discussion:
if the delay sanity check fails, should we postpone also then.

I guess if it failed due to msecs_rel < -1000 we should treat it as arepaint loop restart, setting


if (msecs_rel < -1000)
        presented_flags = WP_PRESENTATION_FEEDBACK_INVALID;

I can't think of a way msecs_rel > 1000 unless refresh_nsecs would beliterally >= 1 second, in which case next_repaint = now would be theonly safe choice, ie.,


if (msecs_rel < -1000)
        presented_flags = WP_PRESENTATION_FEEDBACK_INVALID;
else
        output->next_repaint = now;

Sure!

Did you see a case where it happened?


No, it's just something that had been bugging me ever since I started
looking into repaint. Found purely by inspection.

I think it would only happen with a) broken driver reporting bad
timestamps with pageflip events (we already verify timestamps from the
vblank query and fall back to pageflip otherwise), or b) something
stalled weston for too long.

a) would be a bug we want to log. b) might be a bug we want to log, or
just high system load in general which we wouldn't want to log, really.
Unfortunately there is no way to tell.

There is the original reason we want to ensure the target time is in
the future, but there are no reasons when doing so would be bad.
Therefore I think this is a good idea.


I'm not sure if generally ensuring that the target time is in the future
helps or hurts or doesn't matter in practice. That code wasn't meant to
deal with overload/scheduling delay. I assume you can only get larger
delays during repaint loop restart or during already running repaint
loop on system overload, and in that case all predictability is probably
out of the window and the whole desktop will glitch anyway. One could
argue that it would be better to just accept defeat and try to get an
output repaint out as fast as possible, so not too many client
schedule_repaint requests can back up during the extra time given to
them by shifting the repaint deadline further into the future, possibly
making the glitch longer. Not sure if that would matter in practice though.


That, indeed, I agree with. It should not matter, things have already
fallen apart if the while-loop would spin more than once.

That still leaves the question: should this patch be merged? :-)

I tend to yes for the merge. I think it can't hurt in the "restartrepaint loop" case. It won't help avoid the big glitch which alreadyhappened, but it gets us back to well defined behavior after the glitch,and it may help some debugging at some time if we know what next_repaintto expect wrt. display timing, especially if one would need to compareWeston debug output with drm/kms timestamping debug output. In thatsense the while loop is easy to understand and even if it wouldinefficiently spin many times it would be likely nothing compared to thetime lost during the big glitch/preemption.


thanks,
-mario



Thanks,
pq

I'd like to get rid of the loop if it doesn't make things too
messy, even when the "insanity check" already ensures we don't
loop too long.


As long as the above looks reasonable, then sure; happy to expand the
!! to a ternary or something as well.

Failed insanity check makes the target time 'now'. Should that also be
subject to postponing for a period? I think it would be more
consistent. That is, make the comparison <= 0.


Er, unsure. Probably?

What were the arguments you thought to make this an undesireable patch?
I couldn't come up with any.


None per se. I flagged it as such because a) it isn't required to
achieve the actual goal of the patchset, b) given the somewhat
intricate nature of the code in question, I wasn't sure if it was
deliberate or not, and c) I only noticed it by inspection and have
never observed it myself. But it does indeed seem to be correct, so
I'll take that as a positive comment, and maybe wait to see what Mario
says before we think about merging it.


Yeah, let's leave this brewing for a bit.


Thanks,
pq

_______________________________________________
wayland-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [PATCH weston v2 10/11] [RFC] Account for very large repaint window misses

Reply via email to