Re: Sub 16ms render but missing swap

2023-10-18 Thread Emmanuel Gil Peyrot
Hi,

On Wed, Oct 18, 2023 at 12:00:45AM +, Joe M wrote:
> In a GLES app wired up to Weston/Wayland using EGL, we're calling 
> `eglSwapBuffers` in a render loop that takes about 12-13ms to draw our 
> content, based on CPU timing the run loop. However, the call to swap incurs a 
> duration of around 20ms, meaning we're only hitting 30fps.

Most GPUs nowadays aren’t synchronous, your driver creates command
buffers which are sent to the GPU asynchronously, and then the GPU does
its work.  But unless you call glFinish() or wait on a sync object, you
won’t know at which time the GPU is actually done executing your
commands.  Most notably, eglSwapBufers() will not wait until your buffer
is fully drawn, so that you can start preparing for the next frame ahead
of time.

> I've investigated a little bit into using the presentation callback protocol, 
> but my understanding is that the EGL display (part of drm-wayland in the mesa 
> source IIRC) is itself already plugged in to the necessary Wayland callbacks 
> such that using `eglSwapBuffers` is sufficient to maximize available draw 
> time.
> I also tried enabling the profile flag in weston.ini that allows you to 
> collect a trace, viewable in *wesgr*.  And I read Pekka's blog posts about 
> these diagrams, but I still can't understand if our app is doing something 
> suboptimally. I can't decipher what the wesgr diagram is saying.
> The other weston.ini knob I've tried is the `repaint` window, setting it to 
> 12 or 13 to match our drawing loop. This seems to help sometimes but not in 
> this case.
> A few questions:  1. What other avenues of investigation should I pursue for 
> the swap delay? As in, why when I take 12 ms to render do I not see about 4ms 
> for the swap call to return? My display is running in at 60hz.  2. Has EGL 
> been optimized to use the available wayland callbacks and maximize available 
> client drawing time?  3. Does EGL leverage "weston_direct_display_v1" when 
> available? What's required to take advantage of it in the app code? (ie. run 
> fullscreen?)
> Thanks!

-- 
Link Mauve


Re: Sub 16ms render but missing swap

2023-10-18 Thread Daniel Stone
Hi Joe,

On Wed, 18 Oct 2023 at 02:00, Joe M  wrote:
> A few questions:
>   1. What other avenues of investigation should I pursue for the swap delay? 
> As in, why when I take 12 ms to render do I not see about 4ms for the swap 
> call to return? My display is running in at 60hz.

Further to Emmanuel's point about GPU rendering being async (you can
validate by calling glFinish before eglSwapBuffers, which will wait
for everything to complete) - which hardware platform are you using
here, and which software stack as well? As in, do your Weston +
drivers + etc come from upstream projects or are they provided by a
vendor?

>   2. Has EGL been optimized to use the available wayland callbacks and 
> maximize available client drawing time?

Yes, very much.

>   3. Does EGL leverage "weston_direct_display_v1" when available? What's 
> required to take advantage of it in the app code? (ie. run fullscreen?)

No need. We bypass composition as much as we possibly can. You can try
using weston-simple-egl with the flag to use direct-display if you
want to satisfy yourself, but it's in no way required to bypass GPU
composition and use the display controller to scan out.

Cheers,
Daniel


Re: Sub 16ms render but missing swap

2023-10-18 Thread Pekka Paalanen
On Wed, 18 Oct 2023 09:18:45 +0200
Emmanuel Gil Peyrot  wrote:

> Hi,
> 
> On Wed, Oct 18, 2023 at 12:00:45AM +, Joe M wrote:
> > In a GLES app wired up to Weston/Wayland using EGL, we're calling
> > `eglSwapBuffers` in a render loop that takes about 12-13ms to draw
> > our content, based on CPU timing the run loop. However, the call to
> > swap incurs a duration of around 20ms, meaning we're only hitting
> > 30fps.  
> 
> Most GPUs nowadays aren’t synchronous, your driver creates command
> buffers which are sent to the GPU asynchronously, and then the GPU
> does its work.  But unless you call glFinish() or wait on a sync
> object, you won’t know at which time the GPU is actually done
> executing your commands.  Most notably, eglSwapBufers() will not wait
> until your buffer is fully drawn, so that you can start preparing for
> the next frame ahead of time.

Indeed, this is probably the explanation here.

If the CPU already takes 12-13 ms before eglSwapBuffers submits that
work to the GPU, then the GPU has fairly little time left to finish its
work if you intend to not miss the flip deadline to keep up with 60 Hz.

If Weston is compositing with GL ES, then also Weston's compositing GPU
job needs to finish before the deadline, not only the application's.
OTOH, Weston's compositing work is usually simple blits, so it's
usually fast unless your bottleneck is GPU memory bandwidth.

> > I've investigated a little bit into using the presentation callback
> > protocol, but my understanding is that the EGL display (part of
> > drm-wayland in the mesa source IIRC) is itself already plugged in
> > to the necessary Wayland callbacks such that using `eglSwapBuffers`
> > is sufficient to maximize available draw time. I also tried
> > enabling the profile flag in weston.ini that allows you to collect
> > a trace, viewable in *wesgr*.  And I read Pekka's blog posts about
> > these diagrams, but I still can't understand if our app is doing
> > something suboptimally. I can't decipher what the wesgr diagram is
> > saying. The other weston.ini knob I've tried is the `repaint`
> > window, setting it to 12 or 13 to match our drawing loop. This
> > seems to help sometimes but not in this case. A few questions:  1.
> > What other avenues of investigation should I pursue for the swap
> > delay? As in, why when I take 12 ms to render do I not see about
> > 4ms for the swap call to return? My display is running in at 60hz.
> > 2. Has EGL been optimized to use the available wayland callbacks
> > and maximize available client drawing time?

The problem is, eglSwapBuffers call is the point where the GPU usually
*starts* your rendering work. You could glFlush a part of your work
earlier, or Mesa might flush automatically at some points.

A trace for wesgr, if you compress and attach it, could perhaps
confirm this from Weston's point of view.

eglSwapBuffers itself may stall for example 20 ms, because with
eglSwapInterval 1 (the default) it will wait for your previous frame to
be taken by the compositor into its screen update, and that screen
update cannot start until the screen update before that has gone
through. Maximum screen update rate is the monitor refresh rate. So you
are likely both decimating frame rate and not maxing out GPU usage, or
in other words, the system is automatically adjusting to the vsync'd
steady framerate you *can* keep up: 30 Hz.

Weston's repaint-window setting adjusts how long Weston will wait for
app updates after the last KMS screen refresh finished before "locking"
the next screen update contents. The time of locking is the deadline
for apps to call eglSwapBuffers in order to have the possibility of
making it for the very next screen update. You can still miss the
actual screen update deadline (vblank), because even if everything has
been submitted to the GPU and Weston has submitted the screen update to
the KMS, KMS will not take that update into use until the GPU work it
depends on has finished. Once the GPU work has finished, the update can
be taken into use on the next vblank.

You could try setting eglSwapInterval to 0 in your app and check what
framerate you measure in your app. If that's still at or below 60 Hz,
then I think your app is just too heavy for the hardware to meet 60 Hz.
If it's steadily well over 60 fps, then it gets more interesting.

> >  3. Does EGL leverage
> > "weston_direct_display_v1" when available? What's required to take
> > advantage of it in the app code? (ie. run fullscreen?) Thanks!  

Mesa does not use that, and the extension is both not going to help
here, and not meant for this.

The reason it is not going to help is that whenever at all possible,
Weston will already automatically skip compositing with GL ES and just
push application buffers straight to screen via DRM KMS.

weston_direct_display is meant for content that will crash the hardware
(yes, I mean really the hardware, not software) if anything other than
DRM KMS attempts to access the pixels. IOW, it is 

Re: [PATCH RFC v6 08/10] drm/msm/dpu: Allow NULL FBs in atomic commit

2023-10-18 Thread Jessica Zhang




On 9/24/2023 3:29 AM, Dmitry Baryshkov wrote:

On 29/08/2023 03:05, Jessica Zhang wrote:

Since solid fill planes allow for a NULL framebuffer in a valid commit,
add NULL framebuffer checks to atomic commit calls within DPU.

Reviewed-by: Dmitry Baryshkov 
Signed-off-by: Jessica Zhang 
---
  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c  |  9 ++-
  drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 41 
---

  2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c

index 8ce7586e2ddf..5e845510e8c1 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
@@ -451,6 +451,7 @@ static void _dpu_crtc_blend_setup_mixer(struct 
drm_crtc *crtc,

  struct drm_plane_state *state;
  struct dpu_crtc_state *cstate = to_dpu_crtc_state(crtc->state);
  struct dpu_plane_state *pstate = NULL;
+    const struct msm_format *fmt;
  struct dpu_format *format;
  struct dpu_hw_ctl *ctl = mixer->lm_ctl;
@@ -470,7 +471,13 @@ static void _dpu_crtc_blend_setup_mixer(struct 
drm_crtc *crtc,

  pstate = to_dpu_plane_state(state);
  fb = state->fb;
-    format = to_dpu_format(msm_framebuffer_format(pstate->base.fb));
+    if (drm_plane_solid_fill_enabled(state))
+    fmt = dpu_get_msm_format(&_dpu_crtc_get_kms(crtc)->base,
+    DRM_FORMAT_ABGR, 0);
+    else
+    fmt = msm_framebuffer_format(pstate->base.fb);
+
+    format = to_dpu_format(fmt);
  if (pstate->stage == DPU_STAGE_BASE && format->alpha_enable)
  bg_alpha_enable = true;
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c

index c2aaaded07ed..114c803ff99b 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
@@ -837,8 +837,13 @@ static int dpu_plane_atomic_check(struct 
drm_plane *plane,

  pipe_cfg->dst_rect = new_plane_state->dst;
-    fb_rect.x2 = new_plane_state->fb->width;
-    fb_rect.y2 = new_plane_state->fb->height;
+    if (drm_plane_solid_fill_enabled(new_plane_state))
+    return 0;


This would skip all the width checks, dpu_plane_atomic_check_pipe(), 
etc. Could you please confirm that all of those checks are irrelevant 
for solid fill?


Hi Dmitry,

Good point -- I think it would be good to keep the rect and pipe checks 
for solid fill.





+
+    if (new_plane_state->pixel_source == DRM_PLANE_PIXEL_SOURCE_FB && 
new_plane_state->fb) {

+    fb_rect.x2 = new_plane_state->fb->width;
+    fb_rect.y2 = new_plane_state->fb->height;
+    }
  /* Ensure fb size is supported */
  if (drm_rect_width(&fb_rect) > MAX_IMG_WIDTH ||
@@ -1082,21 +1087,32 @@ static void 
dpu_plane_sspp_atomic_update(struct drm_plane *plane)

  struct drm_crtc *crtc = state->crtc;
  struct drm_framebuffer *fb = state->fb;
  bool is_rt_pipe;
-    const struct dpu_format *fmt =
-    to_dpu_format(msm_framebuffer_format(fb));
+    const struct dpu_format *fmt;
  struct dpu_sw_pipe_cfg *pipe_cfg = &pstate->pipe_cfg;
  struct dpu_sw_pipe_cfg *r_pipe_cfg = &pstate->r_pipe_cfg;
  struct dpu_kms *kms = _dpu_plane_get_kms(&pdpu->base);
  struct msm_gem_address_space *aspace = kms->base.aspace;
  struct dpu_hw_fmt_layout layout;
  bool layout_valid = false;
-    int ret;
-    ret = dpu_format_populate_layout(aspace, fb, &layout);
-    if (ret)
-    DPU_ERROR_PLANE(pdpu, "failed to get format layout, %d\n", ret);
-    else
-    layout_valid = true;
+    if (state->pixel_source == DRM_PLANE_PIXEL_SOURCE_FB && fb) {
+    int ret;
+
+    fmt = to_dpu_format(msm_framebuffer_format(fb));
+
+    ret = dpu_format_populate_layout(aspace, fb, &layout);
+    if (ret)
+    DPU_ERROR_PLANE(pdpu, "failed to get format layout, 
%d\n", ret);

+    else
+    layout_valid = true;
+
+    DPU_DEBUG_PLANE(pdpu, "FB[%u] " DRM_RECT_FP_FMT "->crtc%u " 
DRM_RECT_FMT
+    ", %4.4s ubwc %d\n", fb->base.id, 
DRM_RECT_FP_ARG(&state->src),

+    crtc->base.id, DRM_RECT_ARG(&state->dst),
+    (char *)&fmt->base.pixel_format, 
DPU_FORMAT_IS_UBWC(fmt));

+    } else {
+    fmt = dpu_get_dpu_format(DRM_FORMAT_ABGR);


#define DPU_SOLID_FILL_FORMAT ?


Acked.



Also, I don't think that solid_fill planes consume bandwidth, so this 
likely needs to be fixed too.


You're right -- I think we can actually return early here if solid fill 
is enabled.


Thanks,

Jessica Zhang





+    }
  pstate->pending = true;
@@ -1104,11 +1120,6 @@ static void dpu_plane_sspp_atomic_update(struct 
drm_plane *plane)

  pstate->needs_qos_remap |= (is_rt_pipe != pdpu->is_rt_pipe);
  pdpu->is_rt_pipe = is_rt_pipe;
-    DPU_DEBUG_PLANE(pdpu, "FB[%u] " DRM_RECT_FP_FMT "->crtc%u " 
DRM_RECT_FMT
-    ", %4.4s ubwc %d\n", fb->base.id,

Re: [PATCH RFC v6 01/10] drm: Introduce pixel_source DRM plane property

2023-10-18 Thread Jessica Zhang




On 9/24/2023 3:06 AM, Dmitry Baryshkov wrote:

On 29/08/2023 03:05, Jessica Zhang wrote:

Add support for pixel_source property to drm_plane and related
documentation. In addition, force pixel_source to
DRM_PLANE_PIXEL_SOURCE_FB in DRM_IOCTL_MODE_SETPLANE as to not break
legacy userspace.

This enum property will allow user to specify a pixel source for the
plane. Possible pixel sources will be defined in the
drm_plane_pixel_source enum.

Currently, the only pixel sources are DRM_PLANE_PIXEL_SOURCE_FB (the
default value) and DRM_PLANE_PIXEL_SOURCE_NONE.

Signed-off-by: Jessica Zhang 


Acked-by: Dmitry Baryshkov 

Minor question below


---
  drivers/gpu/drm/drm_atomic_state_helper.c |  1 +
  drivers/gpu/drm/drm_atomic_uapi.c |  4 ++
  drivers/gpu/drm/drm_blend.c   | 90 
+++

  drivers/gpu/drm/drm_plane.c   | 19 +--
  include/drm/drm_blend.h   |  2 +
  include/drm/drm_plane.h   | 21 
  6 files changed, 133 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_state_helper.c 
b/drivers/gpu/drm/drm_atomic_state_helper.c

index 784e63d70a42..01638c51ce0a 100644
--- a/drivers/gpu/drm/drm_atomic_state_helper.c
+++ b/drivers/gpu/drm/drm_atomic_state_helper.c
@@ -252,6 +252,7 @@ void __drm_atomic_helper_plane_state_reset(struct 
drm_plane_state *plane_state,

  plane_state->alpha = DRM_BLEND_ALPHA_OPAQUE;
  plane_state->pixel_blend_mode = DRM_MODE_BLEND_PREMULTI;
+    plane_state->pixel_source = DRM_PLANE_PIXEL_SOURCE_FB;
  if (plane->color_encoding_property) {
  if (!drm_object_property_get_default_value(&plane->base,
diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
b/drivers/gpu/drm/drm_atomic_uapi.c

index d867e7f9f2cd..454f980e16c9 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -544,6 +544,8 @@ static int drm_atomic_plane_set_property(struct 
drm_plane *plane,

  state->src_w = val;
  } else if (property == config->prop_src_h) {
  state->src_h = val;
+    } else if (property == plane->pixel_source_property) {
+    state->pixel_source = val;
  } else if (property == plane->alpha_property) {
  state->alpha = val;
  } else if (property == plane->blend_mode_property) {
@@ -616,6 +618,8 @@ drm_atomic_plane_get_property(struct drm_plane 
*plane,

  *val = state->src_w;
  } else if (property == config->prop_src_h) {
  *val = state->src_h;
+    } else if (property == plane->pixel_source_property) {
+    *val = state->pixel_source;
  } else if (property == plane->alpha_property) {
  *val = state->alpha;
  } else if (property == plane->blend_mode_property) {
diff --git a/drivers/gpu/drm/drm_blend.c b/drivers/gpu/drm/drm_blend.c
index 6e74de833466..c3c57bae06b7 100644
--- a/drivers/gpu/drm/drm_blend.c
+++ b/drivers/gpu/drm/drm_blend.c
@@ -185,6 +185,21 @@
   * plane does not expose the "alpha" property, then this is
   * assumed to be 1.0
   *
+ * pixel_source:
+ *    pixel_source is set up with 
drm_plane_create_pixel_source_property().
+ *    It is used to toggle the active source of pixel data for the 
plane.

+ *    The plane will only display data from the set pixel_source -- any
+ *    data from other sources will be ignored.
+ *
+ *    Possible values:
+ *
+ *    "NONE":
+ *    No active pixel source.
+ *    Committing with a NONE pixel source will disable the plane.
+ *
+ *    "FB":
+ *    Framebuffer source set by the "FB_ID" property.
+ *
   * Note that all the property extensions described here apply either 
to the
   * plane or the CRTC (e.g. for the background color, which currently 
is not

   * exposed and assumed to be black).
@@ -615,3 +630,78 @@ int drm_plane_create_blend_mode_property(struct 
drm_plane *plane,

  return 0;
  }
  EXPORT_SYMBOL(drm_plane_create_blend_mode_property);
+
+static const struct drm_prop_enum_list drm_pixel_source_enum_list[] = {
+    { DRM_PLANE_PIXEL_SOURCE_NONE, "NONE" },
+    { DRM_PLANE_PIXEL_SOURCE_FB, "FB" },
+};
+
+/**
+ * drm_plane_create_pixel_source_property - create a new pixel source 
property

+ * @plane: DRM plane
+ * @extra_sources: Bitmask of additional supported pixel_sources for 
the driver.
+ *   DRM_PLANE_PIXEL_SOURCE_FB and 
DRM_PLANE_PIXEL_SOURCE_NONE will

+ *   always be enabled as supported sources.
+ *
+ * This creates a new property describing the current source of pixel 
data for the
+ * plane. The pixel_source will be initialized as 
DRM_PLANE_PIXEL_SOURCE_FB by default.

+ *
+ * Drivers can set a custom default source by overriding the 
pixel_source value in

+ * drm_plane_funcs.reset()
+ *
+ * The property is exposed to userspace as an enumeration property 
called

+ * "pixel_source" and has the following enumeration values:
+ *
+ * "NONE":
+ * No active pixel source
+ *
+ * "FB":
+ *    Framebuffer pixel source
+ *
+ * Returns:
+ *