Re: Sub 16ms render but missing swap
Hi, On Wed, Oct 18, 2023 at 12:00:45AM +, Joe M wrote: > In a GLES app wired up to Weston/Wayland using EGL, we're calling > `eglSwapBuffers` in a render loop that takes about 12-13ms to draw our > content, based on CPU timing the run loop. However, the call to swap incurs a > duration of around 20ms, meaning we're only hitting 30fps. Most GPUs nowadays aren’t synchronous, your driver creates command buffers which are sent to the GPU asynchronously, and then the GPU does its work. But unless you call glFinish() or wait on a sync object, you won’t know at which time the GPU is actually done executing your commands. Most notably, eglSwapBufers() will not wait until your buffer is fully drawn, so that you can start preparing for the next frame ahead of time. > I've investigated a little bit into using the presentation callback protocol, > but my understanding is that the EGL display (part of drm-wayland in the mesa > source IIRC) is itself already plugged in to the necessary Wayland callbacks > such that using `eglSwapBuffers` is sufficient to maximize available draw > time. > I also tried enabling the profile flag in weston.ini that allows you to > collect a trace, viewable in *wesgr*. And I read Pekka's blog posts about > these diagrams, but I still can't understand if our app is doing something > suboptimally. I can't decipher what the wesgr diagram is saying. > The other weston.ini knob I've tried is the `repaint` window, setting it to > 12 or 13 to match our drawing loop. This seems to help sometimes but not in > this case. > A few questions: 1. What other avenues of investigation should I pursue for > the swap delay? As in, why when I take 12 ms to render do I not see about 4ms > for the swap call to return? My display is running in at 60hz. 2. Has EGL > been optimized to use the available wayland callbacks and maximize available > client drawing time? 3. Does EGL leverage "weston_direct_display_v1" when > available? What's required to take advantage of it in the app code? (ie. run > fullscreen?) > Thanks! -- Link Mauve
Re: Sub 16ms render but missing swap
Hi Joe, On Wed, 18 Oct 2023 at 02:00, Joe M wrote: > A few questions: > 1. What other avenues of investigation should I pursue for the swap delay? > As in, why when I take 12 ms to render do I not see about 4ms for the swap > call to return? My display is running in at 60hz. Further to Emmanuel's point about GPU rendering being async (you can validate by calling glFinish before eglSwapBuffers, which will wait for everything to complete) - which hardware platform are you using here, and which software stack as well? As in, do your Weston + drivers + etc come from upstream projects or are they provided by a vendor? > 2. Has EGL been optimized to use the available wayland callbacks and > maximize available client drawing time? Yes, very much. > 3. Does EGL leverage "weston_direct_display_v1" when available? What's > required to take advantage of it in the app code? (ie. run fullscreen?) No need. We bypass composition as much as we possibly can. You can try using weston-simple-egl with the flag to use direct-display if you want to satisfy yourself, but it's in no way required to bypass GPU composition and use the display controller to scan out. Cheers, Daniel
Re: Sub 16ms render but missing swap
On Wed, 18 Oct 2023 09:18:45 +0200 Emmanuel Gil Peyrot wrote: > Hi, > > On Wed, Oct 18, 2023 at 12:00:45AM +, Joe M wrote: > > In a GLES app wired up to Weston/Wayland using EGL, we're calling > > `eglSwapBuffers` in a render loop that takes about 12-13ms to draw > > our content, based on CPU timing the run loop. However, the call to > > swap incurs a duration of around 20ms, meaning we're only hitting > > 30fps. > > Most GPUs nowadays aren’t synchronous, your driver creates command > buffers which are sent to the GPU asynchronously, and then the GPU > does its work. But unless you call glFinish() or wait on a sync > object, you won’t know at which time the GPU is actually done > executing your commands. Most notably, eglSwapBufers() will not wait > until your buffer is fully drawn, so that you can start preparing for > the next frame ahead of time. Indeed, this is probably the explanation here. If the CPU already takes 12-13 ms before eglSwapBuffers submits that work to the GPU, then the GPU has fairly little time left to finish its work if you intend to not miss the flip deadline to keep up with 60 Hz. If Weston is compositing with GL ES, then also Weston's compositing GPU job needs to finish before the deadline, not only the application's. OTOH, Weston's compositing work is usually simple blits, so it's usually fast unless your bottleneck is GPU memory bandwidth. > > I've investigated a little bit into using the presentation callback > > protocol, but my understanding is that the EGL display (part of > > drm-wayland in the mesa source IIRC) is itself already plugged in > > to the necessary Wayland callbacks such that using `eglSwapBuffers` > > is sufficient to maximize available draw time. I also tried > > enabling the profile flag in weston.ini that allows you to collect > > a trace, viewable in *wesgr*. And I read Pekka's blog posts about > > these diagrams, but I still can't understand if our app is doing > > something suboptimally. I can't decipher what the wesgr diagram is > > saying. The other weston.ini knob I've tried is the `repaint` > > window, setting it to 12 or 13 to match our drawing loop. This > > seems to help sometimes but not in this case. A few questions: 1. > > What other avenues of investigation should I pursue for the swap > > delay? As in, why when I take 12 ms to render do I not see about > > 4ms for the swap call to return? My display is running in at 60hz. > > 2. Has EGL been optimized to use the available wayland callbacks > > and maximize available client drawing time? The problem is, eglSwapBuffers call is the point where the GPU usually *starts* your rendering work. You could glFlush a part of your work earlier, or Mesa might flush automatically at some points. A trace for wesgr, if you compress and attach it, could perhaps confirm this from Weston's point of view. eglSwapBuffers itself may stall for example 20 ms, because with eglSwapInterval 1 (the default) it will wait for your previous frame to be taken by the compositor into its screen update, and that screen update cannot start until the screen update before that has gone through. Maximum screen update rate is the monitor refresh rate. So you are likely both decimating frame rate and not maxing out GPU usage, or in other words, the system is automatically adjusting to the vsync'd steady framerate you *can* keep up: 30 Hz. Weston's repaint-window setting adjusts how long Weston will wait for app updates after the last KMS screen refresh finished before "locking" the next screen update contents. The time of locking is the deadline for apps to call eglSwapBuffers in order to have the possibility of making it for the very next screen update. You can still miss the actual screen update deadline (vblank), because even if everything has been submitted to the GPU and Weston has submitted the screen update to the KMS, KMS will not take that update into use until the GPU work it depends on has finished. Once the GPU work has finished, the update can be taken into use on the next vblank. You could try setting eglSwapInterval to 0 in your app and check what framerate you measure in your app. If that's still at or below 60 Hz, then I think your app is just too heavy for the hardware to meet 60 Hz. If it's steadily well over 60 fps, then it gets more interesting. > > 3. Does EGL leverage > > "weston_direct_display_v1" when available? What's required to take > > advantage of it in the app code? (ie. run fullscreen?) Thanks! Mesa does not use that, and the extension is both not going to help here, and not meant for this. The reason it is not going to help is that whenever at all possible, Weston will already automatically skip compositing with GL ES and just push application buffers straight to screen via DRM KMS. weston_direct_display is meant for content that will crash the hardware (yes, I mean really the hardware, not software) if anything other than DRM KMS attempts to access the pixels. IOW, it is
Re: [PATCH RFC v6 08/10] drm/msm/dpu: Allow NULL FBs in atomic commit
On 9/24/2023 3:29 AM, Dmitry Baryshkov wrote: On 29/08/2023 03:05, Jessica Zhang wrote: Since solid fill planes allow for a NULL framebuffer in a valid commit, add NULL framebuffer checks to atomic commit calls within DPU. Reviewed-by: Dmitry Baryshkov Signed-off-by: Jessica Zhang --- drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c | 9 ++- drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 41 --- 2 files changed, 34 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c index 8ce7586e2ddf..5e845510e8c1 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c @@ -451,6 +451,7 @@ static void _dpu_crtc_blend_setup_mixer(struct drm_crtc *crtc, struct drm_plane_state *state; struct dpu_crtc_state *cstate = to_dpu_crtc_state(crtc->state); struct dpu_plane_state *pstate = NULL; + const struct msm_format *fmt; struct dpu_format *format; struct dpu_hw_ctl *ctl = mixer->lm_ctl; @@ -470,7 +471,13 @@ static void _dpu_crtc_blend_setup_mixer(struct drm_crtc *crtc, pstate = to_dpu_plane_state(state); fb = state->fb; - format = to_dpu_format(msm_framebuffer_format(pstate->base.fb)); + if (drm_plane_solid_fill_enabled(state)) + fmt = dpu_get_msm_format(&_dpu_crtc_get_kms(crtc)->base, + DRM_FORMAT_ABGR, 0); + else + fmt = msm_framebuffer_format(pstate->base.fb); + + format = to_dpu_format(fmt); if (pstate->stage == DPU_STAGE_BASE && format->alpha_enable) bg_alpha_enable = true; diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c index c2aaaded07ed..114c803ff99b 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c @@ -837,8 +837,13 @@ static int dpu_plane_atomic_check(struct drm_plane *plane, pipe_cfg->dst_rect = new_plane_state->dst; - fb_rect.x2 = new_plane_state->fb->width; - fb_rect.y2 = new_plane_state->fb->height; + if (drm_plane_solid_fill_enabled(new_plane_state)) + return 0; This would skip all the width checks, dpu_plane_atomic_check_pipe(), etc. Could you please confirm that all of those checks are irrelevant for solid fill? Hi Dmitry, Good point -- I think it would be good to keep the rect and pipe checks for solid fill. + + if (new_plane_state->pixel_source == DRM_PLANE_PIXEL_SOURCE_FB && new_plane_state->fb) { + fb_rect.x2 = new_plane_state->fb->width; + fb_rect.y2 = new_plane_state->fb->height; + } /* Ensure fb size is supported */ if (drm_rect_width(&fb_rect) > MAX_IMG_WIDTH || @@ -1082,21 +1087,32 @@ static void dpu_plane_sspp_atomic_update(struct drm_plane *plane) struct drm_crtc *crtc = state->crtc; struct drm_framebuffer *fb = state->fb; bool is_rt_pipe; - const struct dpu_format *fmt = - to_dpu_format(msm_framebuffer_format(fb)); + const struct dpu_format *fmt; struct dpu_sw_pipe_cfg *pipe_cfg = &pstate->pipe_cfg; struct dpu_sw_pipe_cfg *r_pipe_cfg = &pstate->r_pipe_cfg; struct dpu_kms *kms = _dpu_plane_get_kms(&pdpu->base); struct msm_gem_address_space *aspace = kms->base.aspace; struct dpu_hw_fmt_layout layout; bool layout_valid = false; - int ret; - ret = dpu_format_populate_layout(aspace, fb, &layout); - if (ret) - DPU_ERROR_PLANE(pdpu, "failed to get format layout, %d\n", ret); - else - layout_valid = true; + if (state->pixel_source == DRM_PLANE_PIXEL_SOURCE_FB && fb) { + int ret; + + fmt = to_dpu_format(msm_framebuffer_format(fb)); + + ret = dpu_format_populate_layout(aspace, fb, &layout); + if (ret) + DPU_ERROR_PLANE(pdpu, "failed to get format layout, %d\n", ret); + else + layout_valid = true; + + DPU_DEBUG_PLANE(pdpu, "FB[%u] " DRM_RECT_FP_FMT "->crtc%u " DRM_RECT_FMT + ", %4.4s ubwc %d\n", fb->base.id, DRM_RECT_FP_ARG(&state->src), + crtc->base.id, DRM_RECT_ARG(&state->dst), + (char *)&fmt->base.pixel_format, DPU_FORMAT_IS_UBWC(fmt)); + } else { + fmt = dpu_get_dpu_format(DRM_FORMAT_ABGR); #define DPU_SOLID_FILL_FORMAT ? Acked. Also, I don't think that solid_fill planes consume bandwidth, so this likely needs to be fixed too. You're right -- I think we can actually return early here if solid fill is enabled. Thanks, Jessica Zhang + } pstate->pending = true; @@ -1104,11 +1120,6 @@ static void dpu_plane_sspp_atomic_update(struct drm_plane *plane) pstate->needs_qos_remap |= (is_rt_pipe != pdpu->is_rt_pipe); pdpu->is_rt_pipe = is_rt_pipe; - DPU_DEBUG_PLANE(pdpu, "FB[%u] " DRM_RECT_FP_FMT "->crtc%u " DRM_RECT_FMT - ", %4.4s ubwc %d\n", fb->base.id,
Re: [PATCH RFC v6 01/10] drm: Introduce pixel_source DRM plane property
On 9/24/2023 3:06 AM, Dmitry Baryshkov wrote: On 29/08/2023 03:05, Jessica Zhang wrote: Add support for pixel_source property to drm_plane and related documentation. In addition, force pixel_source to DRM_PLANE_PIXEL_SOURCE_FB in DRM_IOCTL_MODE_SETPLANE as to not break legacy userspace. This enum property will allow user to specify a pixel source for the plane. Possible pixel sources will be defined in the drm_plane_pixel_source enum. Currently, the only pixel sources are DRM_PLANE_PIXEL_SOURCE_FB (the default value) and DRM_PLANE_PIXEL_SOURCE_NONE. Signed-off-by: Jessica Zhang Acked-by: Dmitry Baryshkov Minor question below --- drivers/gpu/drm/drm_atomic_state_helper.c | 1 + drivers/gpu/drm/drm_atomic_uapi.c | 4 ++ drivers/gpu/drm/drm_blend.c | 90 +++ drivers/gpu/drm/drm_plane.c | 19 +-- include/drm/drm_blend.h | 2 + include/drm/drm_plane.h | 21 6 files changed, 133 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_atomic_state_helper.c b/drivers/gpu/drm/drm_atomic_state_helper.c index 784e63d70a42..01638c51ce0a 100644 --- a/drivers/gpu/drm/drm_atomic_state_helper.c +++ b/drivers/gpu/drm/drm_atomic_state_helper.c @@ -252,6 +252,7 @@ void __drm_atomic_helper_plane_state_reset(struct drm_plane_state *plane_state, plane_state->alpha = DRM_BLEND_ALPHA_OPAQUE; plane_state->pixel_blend_mode = DRM_MODE_BLEND_PREMULTI; + plane_state->pixel_source = DRM_PLANE_PIXEL_SOURCE_FB; if (plane->color_encoding_property) { if (!drm_object_property_get_default_value(&plane->base, diff --git a/drivers/gpu/drm/drm_atomic_uapi.c b/drivers/gpu/drm/drm_atomic_uapi.c index d867e7f9f2cd..454f980e16c9 100644 --- a/drivers/gpu/drm/drm_atomic_uapi.c +++ b/drivers/gpu/drm/drm_atomic_uapi.c @@ -544,6 +544,8 @@ static int drm_atomic_plane_set_property(struct drm_plane *plane, state->src_w = val; } else if (property == config->prop_src_h) { state->src_h = val; + } else if (property == plane->pixel_source_property) { + state->pixel_source = val; } else if (property == plane->alpha_property) { state->alpha = val; } else if (property == plane->blend_mode_property) { @@ -616,6 +618,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane, *val = state->src_w; } else if (property == config->prop_src_h) { *val = state->src_h; + } else if (property == plane->pixel_source_property) { + *val = state->pixel_source; } else if (property == plane->alpha_property) { *val = state->alpha; } else if (property == plane->blend_mode_property) { diff --git a/drivers/gpu/drm/drm_blend.c b/drivers/gpu/drm/drm_blend.c index 6e74de833466..c3c57bae06b7 100644 --- a/drivers/gpu/drm/drm_blend.c +++ b/drivers/gpu/drm/drm_blend.c @@ -185,6 +185,21 @@ * plane does not expose the "alpha" property, then this is * assumed to be 1.0 * + * pixel_source: + * pixel_source is set up with drm_plane_create_pixel_source_property(). + * It is used to toggle the active source of pixel data for the plane. + * The plane will only display data from the set pixel_source -- any + * data from other sources will be ignored. + * + * Possible values: + * + * "NONE": + * No active pixel source. + * Committing with a NONE pixel source will disable the plane. + * + * "FB": + * Framebuffer source set by the "FB_ID" property. + * * Note that all the property extensions described here apply either to the * plane or the CRTC (e.g. for the background color, which currently is not * exposed and assumed to be black). @@ -615,3 +630,78 @@ int drm_plane_create_blend_mode_property(struct drm_plane *plane, return 0; } EXPORT_SYMBOL(drm_plane_create_blend_mode_property); + +static const struct drm_prop_enum_list drm_pixel_source_enum_list[] = { + { DRM_PLANE_PIXEL_SOURCE_NONE, "NONE" }, + { DRM_PLANE_PIXEL_SOURCE_FB, "FB" }, +}; + +/** + * drm_plane_create_pixel_source_property - create a new pixel source property + * @plane: DRM plane + * @extra_sources: Bitmask of additional supported pixel_sources for the driver. + * DRM_PLANE_PIXEL_SOURCE_FB and DRM_PLANE_PIXEL_SOURCE_NONE will + * always be enabled as supported sources. + * + * This creates a new property describing the current source of pixel data for the + * plane. The pixel_source will be initialized as DRM_PLANE_PIXEL_SOURCE_FB by default. + * + * Drivers can set a custom default source by overriding the pixel_source value in + * drm_plane_funcs.reset() + * + * The property is exposed to userspace as an enumeration property called + * "pixel_source" and has the following enumeration values: + * + * "NONE": + * No active pixel source + * + * "FB": + * Framebuffer pixel source + * + * Returns: + *