Re: [RFC v2] Wayland presentation extension (video protocol)

Mario Kleiner Thu, 20 Feb 2014 21:40:57 -0800

On 20/02/14 12:07, Pekka Paalanen wrote:

Hi Mario,


Ok, now i am magically subscribed. Thanks to the moderator!

I have replies to your comments below, but while reading what you said,
I started wondering whether Wayland would be good for you after all.

It seems that your timing sensitive experiment programs and you and
your developers use a great deal of effort into
- detecting the hardware and drivers,
- determining how the display server works, so that you can
- try to make it do exactly what you want, and
- detect if it still does not do exactly like you want it and bail,
   while also
- trying to make sure you get the right timing feedback from the kernel
   unmangled.

Yes. It's "trust buf verify". If i know that the api / protocol is welldefined and suitable for my purpose and have verified that at least thereference compositor implements the protocol correctly then i can atleast hope that all other compositors are also implemented correctly, sostuff should work as expected. And i can verify that at least somesubset of compositors really works, and try to submit bug reports orpatches if they don't.

Sounds like the display server is a huge source of problems to you, but
I am not quite sure how running on top a display server benefits you.
Your experiment programs want to be in precise control, get accurate
timings, and they are always fullscreen. Your users / test subjects
never switch away from the program while it's running, you don't need
windowing or multi-tasking, AFAIU, nor any of the application
interoperability features that are the primary features of a display
server.

They are fullscreen and timing sensitive in probably 95% of all typicalapplication cases during actual "production use" while experiments arerun. But some applications need the toolkit to present in regularwindows and GUI thingys, a few even need compositing to combine mywindows with windows of other apps. Some setups run multi-display, wheresome displays are used for fullscreen stimulus presentation to thetested person, but another display may be used for control/feedback orduring debugging by the experimenter, in which case the regular desktopGUI and UI of the scripting environment is needed on that display. Onepopular case during debugging is having a half-transparent fullscreenwindow for stimulus presentation, but behind that window the wholeregular GUI with the code editor and debugger in the background, so onecan set breakpoints etc. - The window is made transparent for mouse andkeyboard input, so users can interact with the editor.

So in most cases i need a display server running, because i sometimesneed compositing and i often need a fully functional GUI during at theat least 50% of the work time where users are debugging and testingtheir code and also don't want to be separated from their e-mail clientsand web browsers etc. during that time.

Why not take the display server completely out of the equation?

I understand that some years ago, it would probably not have been
feasible and X11 was the de facto interface to do any graphics.

However, it seems you are already married to DRM/KMS so that you get
accurate timing feedback, so why not port your experiment programs
(the framework) directly on top of DRM/KMS instead of Wayland?

Yes and no. DRM/KMS will be the most often used one and is the best betif i need timing control and it's the one i'm most familiar with. I alsowant to keep the option of running on other backends if timing is not ofmuch importance, or if it can be improved on them, should the need arise.

With Mesa EGL and GBM, you can still use hardware accelerated openGL if
you want to, but you will also be in explicit control of when you
push that rendered buffer into KMS for display. Software rendering by
direct pixel poking is also possible and at the end you just push that
buffer to KMS as usual too. You do not need any graphics card specific
code, it is all abstracted in the public APIs offered by Mesa and
libdrm, e.g. GBM. The new libinput should make hooking into input
devices much less painful, etc. All this thanks to Wayland, because on
Wayland, there is no single "the server" like the X.org X server is.
There will be lots of servers and each one needs the same
infrastructure you would need to run without a display server.

No display server obfuscating your view to the hardware, no
compositing manager fiddling with your presentation, and most likely no
random programs hogging the GPU at random times. Would the trade-off
not be worth it?

I thought about EGL/GBM etc. as a last resort for especially demandingcases, timing-wise. But given that the good old X-Server was good enoughfor almost everything so far, i'd expect Wayland to perform as good asor better timing-wise. If that turns out to be true, it would be goodenough for hopefully almost all use cases, with all the benefits ofcompositing and GUI support when needed, and not having to reimplementmy own display server. E.g., i'm also using GStreamer as media backend(still 0.10 though) and while there is Wayland integration, i doubtthere will be Psychtoolbox integration. Or things like Optimus-stylehybrid graphics laptops with one gpu rendering, the other gpudisplaying. I assume Wayland will tackle this stuff at some point,whereas i'm not too keen to potentially learn and tackle all the ins andouts of rendernodes or dmabuf juggling.

Of course your GUI tools and apps could continue using a display server
and would probably like to be ported to be Wayland compliant, I'm just
suggesting this for the sensitive experiment programs. Would this be
possible for your infrastructure?

Not impossible when absolutely needed, just inconvenient for the user +yet another display backend to maintain and test for me. Psychtoolbox isa set of extensions for both GNU/Octave and Mathworks Matlab - both usethe same scripting language, so users can choose which to use and haveportable code between them. Matlab uses a GUI based on Java/Awt/Swing ontop of X11, whereas Octave just gained a QT-based GUI beginning thisyear. You can run both apps in a terminal or console, but most of myusers are mostly psychologists/neuro-biologists/physicians, and most ofthem only have basic programming skills and are somewhat frightened bycommand line environments. Most would touch a non-GUI environment onlyin moments of highest despair, let alone learn how to switch between aGUI environment and a console.


On Thu, 20 Feb 2014 04:56:02 +0100
Mario Kleiner <[email protected]> wrote:

On 17/02/14 14:12, Pekka Paalanen wrote:

On Mon, 17 Feb 2014 01:25:07 +0100
Mario Kleiner <[email protected]> wrote:

Hello Pekka,

i'm not yet subscribed to wayland-devel, and a bit short on time atm.,
so i'll take a shortcut via direct e-mail for some quick feedback for
your Wayland presentation extension v2.


Hi Mario,

I'm very happy to hear from you! I have seen your work fly by on
dri-devel@ (IIRC) mailing list, and when I was writing the RFCv2 email,
I was thinking whether I should personally CC you. Sorry I didn't. I
will definitely include you on v3.

I hope you don't mind me adding wayland-devel@ to CC, your feedback is
much appreciated and backs up my design nicely. ;-)


Hi again,

still not subscribed, but maybe the server accepts the e-mail to
wayland-devel anyway as i'm subscribed to other xorg lists? I got an
"invalid captcha" error when trying to subscribe, despite no captcha was
ever presented to me? You may need to cc wayland-devel again for me.


I guess the web interface is still down. Emailing
[email protected] should do, if I recall the
address correctly. And someone might process the moderation queue, too.

No worries anyway, I won't cut anything from you out, so it's all
copied below.

1. Wrt. an additional "preroll_feedback" request
<http://lists.freedesktop.org/archives/wayland-devel/2014-January/013014.html>,
essentially the equivalent of glXGetSyncValuesOML(), that would be very
valuable to us.

...


Indeed, the "preroll_feedback" request was modeled to match
glXGetSyncValuesOML.

Do you need to be able to call GetSyncValues at any time and have it
return ASAP? Do you call it continuously, and even between frames?


Yes, at any time, even between frames, with a return asap. This is
driven by the specific needs of user code, e.g., to poll for a vblank,
or to establish a baseline of current (msc, ust) for timing stuff
relative. Psychtoolbox is an extension to a scripting language, so
usercode often decides how this is used.

Internally to the toolkit these calls are used on X11/GLX to translate
target timestamps into target vblank counts for glXSwapBufferMscOML(),
because OML_sync_control is based on vblank counts, not absolute system
time, as you know, but ptb exposes an api where usercode specifies
target timestamps, like in your protocol proposal. The query is also
needed to work around some problems with the blocking nature of the X11
protocol when one tries to swap multiple independent windows with
different rates and uses glXWaitForSbcOML to wait for swap completion.
E.g., what doesn't work on X11 is using different x-display connections
- one per windows - and create GLX contexts which share resources across
those connections, so if you need multi-window operation you have to
create all GLX contexts on the same x-display connection and run all glx
calls over that connection. If you run multiple independent animations
in different windows you have to avoid blocking that x-connection, so i
use glXGetSyncValuesOML to poll current msc and ust to find out when it
is safe to do a blocking call.

I hope that the way Waylands protocol works will make some of these
hacks unneccessary, but user script code can call glXGetSyncValues at
any time.


Wayland won't allow you to use different connections even harder,
because there simply are no sharable references to protocol objects. But
OTOH, Wayland only provides you a low-level direct async interface to
the protocol as a library, so you will be doing all the blocking in your
app or GUI-toolkit.

Yes. If i understood correctly, with Wayland, rendering is separatedfrom the presentation, all rendering and buffer management client-side,only buffer presentation server-side. So i hope i'll be able to untangleeverything rendering related (like hope many OpenGL contexts to have andwhat resources they should or should not share) from everythingpresentation related. If the interface is async and has thread-safety inmind, that should hopefully help a lot on the presentation side.

The X11 protocol and at least XLib is not async and not thread-safe bydefault, and at least under DRI2 the x-server controlled and owned theback- and front-buffers, so you have lots of roundtrips and blockingbehavior on the single connection to make sure no backbuffer is touchedtoo early (while a swap is still pending), and many hacks to get aroundthat, and some race conditions in DRI2 around drawable invalidation.

So i'd be really surprised if Wayland wouldn't be an improvement for mefor multi-threaded or multi-window operations.

Sounds like we will need the "subscribe to streaming vblank events
interface" then. The recommended usage pattern would be to subscribe
only when needed, and unsubscribe ASAP.

There is one detail, though. Presentation timestamp is defined as "turns
to light", not vblank. If the compositor knows about monitor latency, it
will add this time to the presentation timestamp. To keep things
consistent, we'd need to define it as a stream of turned-to-light
events.

Yes, makes sense. The drm/kms timestamps are defined as "first pixel offrame leaves the graphics cards output connector" aka start of activescanout. In the time of CRT monitors, that was ~ "turns to light". In myfield CRT monitors are still very popular and actively hunted forbecause of that very well defined timing behaviour. Or very expensivedisplays which have custom made panel or dlp controllers with welldefined timing specifically for research use.

If the compositor knew the precise monitor latency it could add that asa constant offset to those timestamps. Do you know of reliable ways toget this info from any common commercial display equipment? Apple's OSXhas API in their CoreVideo framework for getting that number, and iimplement it in the OSX backend of my toolkit, but i haven't ever seenthat function returning anything else than "undefined" from any display?

Or would it be enough for you to present a dummy frame, and just wait
for the presentation feedback as usual? Since you are asking, I guess
this is not enough.


Wouldn't be enough in all use cases. Especially if i want to synchronize
other modalities like sound or digital i/o, it is nice to not add
overhead or complexity on the graphics side for things not related to
rendering.


We could take it even further if you need to monitor the values
continuously. We could add a protocol interface, where you could
subscribe to an event, per-surface or maybe per-output, whenever the
vblank counter increases. If the compositor is not in a continuous
repaint loop, it could use a timer to approximately sample the values
(ask DRM), or whatever. The benefit would be that we would avoid a
roundtrip for each query, as the compositor is streaming the events to
your app. Do you need the values so often that this would be worth it?

For special applications this would be ok, as in those cases power
consumption is not an issue.


I could think of cases where this could be useful to have per-output if
it is somewhat accurate. E.g., driving shutter glasses or similar
mechanisms in sync with refresh, or emitting digital triggers or network
packets at certain target vblank counts. Neuro-Scientists often have to
synchronize stimulation or recording hardware to visual stimulus
presentation or simply log certain events, e.g., start of refresh
cycles, so it is possible synchronize the timing of different devices. I
hoped to use glXWaitForMscOML() for this in the past, but it didn't work
out because of the constraint that everything would have to go over one
x-display connection and that connection wasn't allowed to block for
more than fractions of a millisecond.


I really do hope something like driving shutter glasses would be done
by the kernel drivers, but I guess we're not there yet. Otherwise it'd
be the compositor, but we don't have Wayland protocol for it yet,
either. I have seen some discussion fly by about TV 3D modes.

I'd hope that as well, long-term. Until then i have to do hacks in myclient app...

On drm/kms, queuing a vblank event would do the trick without the need
for Wayland to poll for vblanks.

Of course i don't know how much use would there be for such
functionality outside of my slightly exotic application domain, except
maybe for "things to do with stereo shutter-glasses", like NVidia's
NVision stuff.

2. As far as the decision boundary for your presentation target
timestamps. Fwiw, the only case i know of NV_present_video extension,
and the way i expose it to user code in my toolkit software, is that a
target timestamp means "present at that time or later, but never
earlier" ie., at the closest vblank with tVblank >= tTargetTime.

E.g., NV_present_video extension:
https://www.opengl.org/registry/specs/NV/present_video.txt

Iow. presentation will be delayed somewhere between 0 and 1 refresh
duration wrt. the requested target time stamp tTargetTime, or on average
half a refresh duration late if one assumes totally uniform distribution
of target times. If i understand your protocol correctly, i would need
to add half a refresh duration to achieve the same semantic, ie. never
present earlier than requested. Not a big deal.

In the end it doesn't matter to me, as long as the behavior is well
specified in the protocol and consistent across all Wayland compositor
implementations.


Yes, you are correct, and we could define it either way. The
differences would arise when the refresh duration is not a constant.

Wayland presentation extension (RFC) requires the compositor to be able
to predict the time of presentation when it is picking updates from the
queue. The implicit assumption is that the compositor either makes its
predicted presentation time or misses it, but never manages to present
earlier than the prediction.


Makes sense.

For dynamic refresh displays, I guess that means the compositor's
prediction must be optimistic: the earliest possible time, even if the
hardware usually takes longer.

Does it sound reasonable to you?


Yes, makes sense to me. For my use case, presenting a frame too late is
ok'ish, as that is the expected behaviour - on time or delayed, but
never earlier than wanted. So an optimistic guess for dynamic refresh
would be the right thing to do when picking a frame for presentation
from the queue.

But i don't have any experience with dynamic refresh displays, so i
don't know how they would behave in practice or where you'd find them? I
read about NVidia's G-Sync stuff lately and about AMD's Freesync
variant, but are there any other solutions available? I know that some
(Laptop?) panels have the option for a lower freq. self-refresh? How do
they behave when there's suddenly activity from the compositor?


You know roughly as much as I do. You even recalled the name
Freesync. :-)

I heard Chrome(OS?) is doing some sort of dynamic scanout rate
switching based on update rates or something, so that would count as
dynamic refresh too, IMO.

I wrote it this way, because I think the compositor would be in a
better position to guess the point of time for this screen update,
rather than clients working on old feedback.

Ah, but the "not before" vs. "rounded to" is orthogonal to
predicting or not the time of the next screen update.

I assume in your use cases the refresh duration will be required to be
constant for predictability, so it does not really matter whether we
pick "not before" or "rounding" semantics for the frame target time.


Yes. Then i can always tweak my target timestamps to get the behavior i
want. For your current proposal i'd add half a refresh duration to what
usercode provides me, because the advice for my users is to provide
target timestamps which are half a refresh before the target vblank for
maximum robustness. This way your "rounding" semantics would turn into
the "not before" semantics i need to expose to my users.


Exactly, and because you aim for constant refresh rate outputs, there
should be no problems. Your users depend on constant refresh rate
already, so that they actually can compute the timestamp according to
your advice.

I can also always feed just single frames into your presentation queue
and wait for present_feedback for those single frames, so i can be sure
the "proper" frame was presented.


Making that work so that you can actually hit every scanout cycle with
a new image is something the compositor should implement, yes, but the
protocol does not guarantee it. I suspect it would work in practise
though.

It works well enough on the X-Server, so i'd expect it to work onWayland as well. This link...


<https://github.com/Psychtoolbox-3/Psychtoolbox-3/blob/master/Psychtoolbox/PsychDocumentation/ECVP2010Poster_VisualTimingPrecision.pdf?raw=true>

...points to some results of tests i did a couple of years ago. Quiteoutdated by now, but Linux + X11 came out as more reliable wrt. timingprecision than Windows and OSX, especially when realtime scheduling oreven a realtime kernel was used, with both proprietary graphics driversand the open-source drivers. I was very pleased with that :) - The otherthing i learned there is how much dynamic power management on the gpucan bite you if the rendering/presentation behavior of your app doesn'tmatch the expectations of the algorithms used to control up/downclockingon the gpu. That would be another topic related to a presentationextension btw., having some sort of hints to the compositor about whatscheduling priority to choose, or if gpu power management should besomehow affected by the timing needs of clients...

But, this might be a reason to have the "frame callback" be queueable.
We have just had lots of discussion about what do we do with the frame
callback, what it means, and how it should work. It's not clear yet.

Ideally i could select when queuing a frame, if the compositor will skip
the frame if it is late, that is, when frames with a later target
presentation time are already queued, which are closer to the predicted
presentation time. Your current design is optimized for video playback,
where you want to drop late frames to maintain audio-video sync. I have
both cases. For video playback and periodic/smooth animations i want to
drop frames which are late, but for some other purposes i need to be
sure that although no frame is ever presented too early, all frames are
presented in sequence without dropping any, even if that increases the
mean error of target time vs. real presentation time.

Things like this stuff:
http://link.springer.com/article/10.3758/s13414-013-0605-z

Such studies need to present sequences of potentially dozens of images
in rapid succession, without dropping a single frame. Duplicating a
frame is not great, but not as bad as dropping one, because certain
frames in such sequences may have a special meaning.

This can also be achieved by my client just queuing one frame for
presentation and waiting for present feedback before queuing the next
one. I would lose the potential robustness advantage of queuing in the
server instead of one round trip away in my client. But that's how it's
done atm. with X11/GLX/DRI2 and on other operating systems because they
don't have presentation queues.


A flag for "do not skip", that sounds simpler than I thought. I've
heard that QtQuick would want no-skip for whatever reason, too, and so
far thought posting one by one rather than queueing in advance would do
it, since I didn't understand the use case. You gave another one.

But for dynamic refresh, can you think of reasons to prefer one or the
other?

My reason for picking the "rounding" semantics is that an app can queue
many frames in advance, but it will only know the one current or
previous refresh duration. Therefore subtracting half of that from the
actually intended presentation times may become incorrect if the
duration changes in the future. When the compositor does the equivalent
extrapolation for each framebuffer flip it schedules, it has better
accuracy as it can use the most current refresh duration estimate for
each pick from the queue.


The "rounding" semantics has the problem that if the client doesn't want
it, as in my case, it needs to know the refresh duration to counteract
your rounding by shifting the timestamps. On a dynamic refresh display i
wouldn't know how to do that. But this is basically your thinking
backwards, for a different definition of robust.


Right. When you have to set time based on an integer frame counter, it
is hard to reason about when it actually is. The only possibility is to
use the "not before" definition, because you just cannot subtract half
with integers, and the counter ticks when the buffer flips, basically.

But with nanosecond timestamps we actually have a choice.

Since the definition is turns-to-light and not vblank, I still think
rounding is more appropriate, since it is the closest we can do for a
request to "show this at time t". But yes, I agree the wanted behaviour
may depend on the use case.


Agreed.


As far as i'm concerned i'd tell my users that if they want it to work
well enough on a dynamic refresh display, they'd have to provide
different target timestamps to accommodate the different display
technology - in these cases ones that are optimized for your "rounding"
semantics.

So i think your proposal is fine. My concern is mostly to not get any
surprises on regular common fixed refresh rate displays, so that user
code written many years ago doesn't suddenly behave in weird and
backwards incompatible ways, even on exactly the same hardware, just
because they run it on Wayland instead of X11 or on some other operating
system. Having to rewrite dozens of their scripts wouldn't be the
behavior my users would expect after upgrading to some future
distribution with Wayland as display server ;-)


Keeping an eye on such concerns is exactly where I need help. :-)

3. I'm not sure if this applies to Wayland or if it is already covered
by some other part of the Wayland protocol, as i'm not yet really
familiar with Wayland, so i'm just asking: As part of the present
feedback it would be very valuable for my kind of applications to know
how the present was done. Which backend was used? Was the presentation
performed by a kms pageflip or something else, e.g., some partial buffer
copy? The INTEL_swap_event extension for X11/GLX exposes this info to
classic X11/GLX clients. The reason is that drm/kms-pageflips have very
reliable and precise timestamps attached, so i know those timestamps are
trustworthy for very timing sensitive applications like mine. If page
flipping isn't used, but instead something like a (partial) framebuffer
copy/blit, at least on drm the returned timestamps are then not
reliable/trustworthy enough, as they could be off by up to 1 video
refresh duration. My software, upon detecting anything but a
page-flipped presentation, logs errors or even aborts a session, as
wrong timestamps or presentation timestamping could be a disaster for
those applications if it went unnoticed.

So this kind of feedback about how the present was executed would be
useful to us.


How presentation was executed is not exposed in the Wayland protocol in
any direct nor reliable way.

But rather than knowning how presentation was executed, it sounds like
you would really want to know how much you can trust the feedback,
right?


Yes. Which means i also need to know how much i can trust the feedback
about how much i can trust the feedback ;-).

Therefore i prefer low level feedback about what is actually happening,
so i can look at the code myself and also sometimes perform testing to
find out, which presentation methods are trustworthy and which not. Then
have some white-lists or black-lists of what is ok and what is not.

In Weston's case, there are many ways the presentation can be executed.

Weston-on-DRM, regardless of using the GL (hardware) or Pixman
(software) renderer, will possibly composite and always flip, AFAIU. So
this is both vsync'd and accurate by your standards.


That's good. This is the main use case, so knowing i use the drm backend
would basically mean everything's fine and trustworthy.
Most of my users applications run as unoccluded, undecorated fullscreen
windows, just filling one or multiple outputs, so they use page-flipping
on X11 as well. For windowed presentation i couldn't make any
guarantees, but if Wayland always composes a full framebuffer and flips
in such a case then timestamps should be trustworthy.


Yes.

Weston-on-fbdev (only Pixman renderer) is not vsync'd, and the
presentation timing is based on reading a software clock after an
arbitrary delay. IOW, the feedback and presentation is rubbish.

Weston-on-X11 is basically as useless as Weston-on-fbdev, in its current
implementation.


Probably both not relevant for me.

Weston-on-rpi (rpi-renderer) uses vsync'd flips to the best of our
knowledge, but can and will fall back to "firmware compositing" when
the scene is too complex for its "hardware direct scanout engine". We
do not know when it falls back. The presentation time is recorded by
reading a software clock (clock_gettime) in a thread that gets called
asynchronously by the userspace driver library on rpi. It should be
better than fbdev, but for your purposes still a black box of rubbish I
guess.


There were a few requests for Rasperry Pi, but its hardware is mostly
too weak for our purposes. So also not a big deal for my purpose.


Well, it depends. If you have only ever seen the X desktop on RPi, then
you have not seen what it can actually do.

DispmanX (the proprietary display API) can easily allow you to flip
full-HD buffers at framerate, if you first have enough memory to
preload them. There is a surprisingly lot of power compared to the
"size", but it's all behind proprietary interfaces.

FWIW, Weston on rpi uses DispmanX, and compared to X, it flies.

I've seen a couple of impressive things with those in the lab. Thelimitations for my toolkit are more wrt. RAM size, not wrt. gpu. ModelB's 512 MB RAM are the minimum i'd probably need, but i should probablyplay with one and see how far i get.

It seems like we could have several flags describing the aspects of
the presentation feedback or presentation itself:

1. vsync'd or not
2. hardware or software clock, i.e. DRM/KMS ioctl reported time vs.
     compositor calls clock_gettime() as soon as it is notified the screen
     update is done (so maybe kernel vs. userspace clock rather?)
3. did screen update completion event exist, or was it faked by an
     arbitrary timer
4. flip vs. copy?


Yes, that would be sufficient for my purpose. I can always get the
OpenGL renderer/vendor string to do some basic checks for which kms
driver is in use, and your flags would give me all needed dynamic
information to judge reliability.


Except that the renderer info won't help you, if the machine has more
than one GPU. It is quite possible to render on one GPU and scan out
on another.

Yes, but i already use libpciaccess to enumerate gpu's on the bus, andother more scary things, so i guess there will be a few more scary andshady low level things to add ;-)

Now, I don't know what implements the "buffer copy" update method you
referred to with X11, or what it actually means, because I can think of
two different cases:
a) copy from application buffer to the current framebuffer on screen,
     but timed so that it won't tear, or
b) copy from application buffer to an off-screen framebuffer, which is
     later flipped to screen.

Case b) is the normal compositing case that Weston-on-DRM does in GL and
Pixman renderers.


Case a) was the typical case on classic X11 for swaps of non-fullscreen
windows. Case b) is fine. On X11 + old compositors there was no suitable
protocol in place. The Xserver/ddx would copy to some offscreen buffer
and the compositor would do something with that offscreen buffer, but
timestamps would refer to when the offscreen copy was scheduled to
happen, not when the final composition showed on the screen. Thereby the
timestamps were overly optimistic and useless for my purpose, only good
enough for some throttling.

Weston can also flip the application buffer directly on screen if the
buffer is suitable and the scene graph allows; a decision it does on a
frame-by-frame basis. However, this flip does not imply that *all*
compositing would have been skipped. Maybe there is a hardware overlay
that was just right for one wl_surface, but other things are still
composited, i.e. copied.


Given that my app mostly renders to opaque fullscreen windows, i'd
expect Weston to probably just flip its buffer to the screen.


Indeed, if you use EGL. At the moment we have no other portable ways to
post suitable buffers.


I'm not sure what we should be distinguishing here. No matter how
weston updates the screen, the flags 1-3 would still be applicable and
fully describe the accuracy of presentation and feedback, I think.
However, the copy case a) is something I don't see accounted for. So
would the flag 4 be actually "copy case a) or not"?


Yes, if Weston uses page-flipping in the end on drm/kms and the returned
timestamp is the page-flip completion timestamp from the kernel, then it
doesn't matter if copies were involved somewhere.

The problem is that I do not know why the X11 framebuffer copy case you
describe would have inaccurate timestamps. Do you know where it comes
from, and would it make sense to describe it as a flag?


That case luckily doesn't seem to exist on Weston + drm/kms, if i
understand you correctly - and the code in drm_output_repaint(), if i
look at the right bits?


You're right on Weston. It still does not prevent someone else writing
a compositor that does the just-in-time copy into a live framebuffer
trick, but if someone does that, I would expect them to give accurate
presentation feedback nevertheless. There is no excuse to give
anything else.

FYI, while Weston is the reference compositor, there is no "the
server". Every major DE is writing their own compositor-servers. Most
will run on DRM/KMS directly, and I think one has chosen to run only on
top of another Wayland compositor.

Yes, that will be a whole lot of extra "fun" of testing, once i have aWayland backend ;-) -- In the end i'll have to focus on maybe two orthree compositors like weston, kwin, gnome-shell wrt. to timing.

Old compositors like, e.g., Compiz, would do things like for partial
screen updates only recompose the affected part of the backbuffer and
then use functions like glXCopySubBufferMESA() or glCopyPixels() to copy
the affected backbuffer area to the frontbuffer, maybe after waiting for
vblank before executing the command.

Client calls glXSwapBuffer/glXSwapBufferMscOML -> server/ddx queues
vblank event in kernel -> kernel delivers vblank event to server/ddx at
target msc count -> ...random delay... -> ddx copies client window
backbuffer pixmap to offscreen surface and posts damage and returns the
vblank timestamp of the triggering vblank event as swap completion
timestamp to the client -> ...random delay... -> compositor does its
randomly time consuming thing which eventually leads to an update of the
display.


I regret I asked. ;-)

So in any case the returned timestamp from the vblank event delivered by
the kernel was always an (overly) optimistic one, also because all the
required rendering and copy operations would be queued through the gpu
command stream where they might get delayed by other pending rendering
commands, or maybe the gpu waiting for the scanout being outside the
target area for a copy to avoid tearing.

Without a compositor it worked the same for windowed windows, just that
the ddx queued some blit (i think via exa or uxa or maybe now sna on
intel iirc) into the gpu's command stream, with the same potential
delays due to queued commands in the command stream.

Only if the compositor was unredirecting fullscreen windows, or no
compositor was there at all + fullscreen window, would the client get
kms page-flipping and the reliable page flip completion timestamps from
the kernel.

This system of flags was the first thing that came to my mind. Using
any numerical values like "the error in presentation timestamp is
always between [min, max]" seems very hard to implement in a
compositor, and does not even convey information like vsync or not.


Indeed. I would have trust issues with that.

I suppose a single flag "timestamp is accurate" does not cut it? :-)

What do you think?


Yes, the flags sound good to me. As long as i know i'm on a drm/kms
backend they should be all i need.


That... is a problem. :-)

 From your feedback so far, I think you have only requested additional
features:
- ability to subscribe to a stream of vblank-like events
- do-not-skip flag for queued updates


Yes, and the present_feedback flags.

One more useful flag for me could be to know if the presented frame wascomposited - together with some other content - or if my own buffer wasjust flipped onscreen / no composition took place. More specifically -and maybe there's already some other event in the protocol for that -i'd like to know if my presented surface was obscured by something else,e.g., some kind of popup window like "system updates available", "youhave new mail", a user ALT+Tabbing away the window etc. On X11 i findout indirectly about such unwanted visual disruptions because thecompositor would fall back to compositing instead of simplepage-flipping. On Wayland something similar would be cool if it doesn'talready exist.


Not sure if that still belongs into a present_feedback extension though.

I will think about those.


Cool, thanks!
-mario

_______________________________________________
wayland-devel mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [RFC v2] Wayland presentation extension (video protocol)

Reply via email to