Re: [Mesa-dev] [PATCH 0/6] Support for 10 bpc EGLSurface

2013-09-16 Thread Daniel Vetter
On Sun, Sep 15, 2013 at 12:16:42AM -0700, Kristian Høgsberg wrote:
> This little series adds support for creating EGLSurfaces with color buffers
> using the ARGB2101010 pixel format.  We the new KMS addFB2 ioctl we can
> create KMS framebuffers with that format and this series ends up adding
> the pixel format to gbm so we can generate buffers with that format.

Creating 10bpc rgb framebuffers also works with the old addfb ioctl, same
for rbg565 and rgb555. Heck even c8 palette mode works ;-) Just so you
don't unduly restrict the wayland/weston support here ...

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/hsw: compute DDX in a subspan based only on top row

2013-09-16 Thread Chia-I Wu
On Sat, Sep 14, 2013 at 5:15 AM, Paul Berry  wrote:
> On 12 September 2013 22:06, Chia-I Wu  wrote:
>>
>> From: Chia-I Wu 
>>
>> Consider only the top-left and top-right pixels to approximate DDX in a
>> 2x2
>> subspan, unless the application or the user requests a more accurate
>> approximation.  This results in a less accurate approximation.  However,
>> it
>> improves the performance of Xonotic with Ultra settings by 24.3879% +/-
>> 0.832202% (at 95.0% confidence) on Haswell.  No noticeable image quality
>> difference observed.
>>
>> No piglit gpu.tests regressions (tested with v1)
>>
>> I failed to come up with an explanation for the performance difference, as
>> the
>> change does not affect Ivy Bridge.  If anyone has the insight, please
>> kindly
>> enlighten me.  Performance differences may also be observed on other games
>> that call textureGrad and dFdx.
>>
>> v2: Honor GL_FRAGMENT_SHADER_DERIVATIVE_HINT and add a drirc option.
>> Update
>> comments.
>
>
> I'm not entirely comfortable making a change that has a known negative
> impact on computational accuracy (even one that leads to such an impressive
> performance improvement) when we don't have any theories as to why the
> performance improvement happens, or why the improvement doesn't apply to Ivy
> Bridge.  In my experience, making changes to the codebase without
> understanding why they improve things almost always leads to improvements
> that are brittle, since it's likely that the true source of the improvement
> is a coincidence that will be wiped out by some future change (or won't be
> relevant to client programs other than this particular benchmark).  Having a
> theory as to why the performance improvement happens would help us be
> confident that we're applying the right fix under the right circumstances.
That is how I feel as I've mentioned.  I am really glad to have the
discussion.  I have done some experiments actually.  It is just that
those experiments only tell me what theories are likely to be wrong.
They could not tell me if a theory is right.

So I have a micro benchmark that draws a 256x256 texture to a 512x512
window, using texture2D() or textureGrad().  It can also set up the
vertex buffer such that the texture is rotated around the z-axis by 15
degrees.

On Haswell, when the texture is not rotated, rendering with
textureGrad() is less than 1% slower than rendering with texture2D().
The slowdown could be from the additional instructions to calculate
DDX/DDY or the extra message payload.  Whether this patch is applied
or not does not make any difference.

When the texture is rotated, rendering with textureGrad() is ~3%
slower than rendering with texture2D() without the patch.  With the
patch, the difference is down to less than 1% again.

Computing LOD in the shader results in ~17% slowdown comparing to textureGrad().

As a better way to control the result of DDX, I also hacked the driver
so that DDX produced the values I specified for each pixel.  When not
all pixels in the subspan have the same gradient, rendering is ~6%
slower comparing to when all pixels in the subspan have the same
gradient.

The weird thing is, in SIMD8 mode, two subspans are processed at the
same time.  When all pixels in one of the subspan have the same
gradient, whether the pixels in the other subspan have the same
gradient or not does not matter.

As for Ivy Bridge, rendering with textureGrad() is always
significantly slower than rendering with texture2D().  Computing LOD
in the shader results in another ~4% slowdown comparing to
textureGrad().

>
> For example, here's one theory as to why we might be seeing an improvement:
> perhaps Haswell's sample_d processing is smart enough to realize that when
> all the gradient values within a sub-span are the same, that means that all
> of the sampling for the sub-span will come from the same LOD, and that
> allows it to short-cut some expensive step in the LOD calculation.  Perhaps
> the same improvement isn't seen on Ivy Bridge because Ivy Bridge's sample_d
> processing logic is less sophisticated, so it's unable to perform the
> optimization.  If this is the case, then conditioning the optimization on
> brw->is_haswell (as you've done) makes sense.
This is also the theory I have and my experiments could not rule it
out.  The question I have is, if LODs of all pixels were calculated
parallely, could the short cut help this much?  I don't have enough
knowledge in hardware to know the answer or even to know this is a
question or not.

>
> Another possible explanation for the Haswell vs Ivy Bridge difference is
> that perhaps Ivy Bridge, being a lower-performing chip, has other
> bottlenecks that make the optimization irrelevant for this particular
> benchmark, but potentially still useful for other benchmarks.  For instance,
> maybe when this benchmark executes on Ivy Bridge, the texture that's being
> sampled from is located in sufficiently distant memory that optimizing the
> sample_d's memory access

Re: [Mesa-dev] [PATCH] i965/hsw: compute DDX in a subspan based only on top row

2013-09-16 Thread Chia-I Wu
On Mon, Sep 16, 2013 at 3:50 AM, Mark Mueller  wrote:
>
>
>
> On Fri, Sep 13, 2013 at 2:15 PM, Paul Berry  wrote:
>>
>> On 12 September 2013 22:06, Chia-I Wu  wrote:
>>>
>>> From: Chia-I Wu 
>>>
>>> Consider only the top-left and top-right pixels to approximate DDX in a
>>> 2x2
>>> subspan, unless the application or the user requests a more accurate
>>> approximation.  This results in a less accurate approximation.  However,
>>> it
>>> improves the performance of Xonotic with Ultra settings by 24.3879% +/-
>>> 0.832202% (at 95.0% confidence) on Haswell.  No noticeable image quality
>>> difference observed.
>>>
>>> No piglit gpu.tests regressions (tested with v1)
>>>
>>> I failed to come up with an explanation for the performance difference,
>>> as the
>>> change does not affect Ivy Bridge.  If anyone has the insight, please
>>> kindly
>>> enlighten me.  Performance differences may also be observed on other
>>> games
>>> that call textureGrad and dFdx.
>>>
>>> v2: Honor GL_FRAGMENT_SHADER_DERIVATIVE_HINT and add a drirc option.
>>> Update
>>> comments.
>>
>>
>> I'm not entirely comfortable making a change that has a known negative
>> impact on computational accuracy (even one that leads to such an impressive
>> performance improvement) when we don't have any theories as to why the
>> performance improvement happens, or why the improvement doesn't apply to Ivy
>> Bridge.  In my experience, making changes to the codebase without
>> understanding why they improve things almost always leads to improvements
>> that are brittle, since it's likely that the true source of the improvement
>> is a coincidence that will be wiped out by some future change (or won't be
>> relevant to client programs other than this particular benchmark).  Having a
>> theory as to why the performance improvement happens would help us be
>> confident that we're applying the right fix under the right circumstances.
> There's another angle to approach this and that is to develop a simple test
> case that will show the different results across a range of computational
> accuracy and run the test on proprietary drivers for the same hardware to
> determine what settings they are using.
Yes, I have a little test.  On Windows, rendering with texture2D() or
textureGrad() does not have a noticeable impact on the performance.
But I am not sure how to change dFdx() accuracy from the shaders.  On
Linux, I did that by modifying the driver.

>
>>
>>
>> For example, here's one theory as to why we might be seeing an
>> improvement: perhaps Haswell's sample_d processing is smart enough to
>> realize that when all the gradient values within a sub-span are the same,
>> that means that all of the sampling for the sub-span will come from the same
>> LOD, and that allows it to short-cut some expensive step in the LOD
>> calculation.  Perhaps the same improvement isn't seen on Ivy Bridge because
>> Ivy Bridge's sample_d processing logic is less sophisticated, so it's unable
>> to perform the optimization.  If this is the case, then conditioning the
>> optimization on brw->is_haswell (as you've done) makes sense.
>>
>> Another possible explanation for the Haswell vs Ivy Bridge difference is
>> that perhaps Ivy Bridge, being a lower-performing chip, has other
>> bottlenecks that make the optimization irrelevant for this particular
>> benchmark, but potentially still useful for other benchmarks.  For instance,
>> maybe when this benchmark executes on Ivy Bridge, the texture that's being
>> sampled from is located in sufficiently distant memory that optimizing the
>> sample_d's memory accesses makes no difference, since the bottleneck is the
>> speed with which the texture can be read into cache, rather than the speed
>> of operation of sample_d.  If this explanation is correct, then it might be
>> worth applying the optimization to both Ivy Bridge and Haswell (and perhaps
>> Sandy Bridge as well), since it might conceivably benefit those other chips
>> when running applications that place less cache pressure on the chip.
>
>
> This scenario is where I'd place my bets, especially given that the numbers
> are based on Xonotic. I benchmarked this patch using Xonotic on Bay Trail as
> is and by replacing !brw->is_haswell with !brw->is_baytrail. With ultra and
> ultimate levels at medium and high resolutions, the results were all
> essentially the same at comparable resolutions and quality levels.
Isn't Bay Trail based on Ivy Bridge?
>
> I don't see any justification to tie this change to just Haswell hardware.
> There's all sorts of reasons why doing that sounds like a big mistake. In
> fact, another _explanation_ to add to your list is maybe there's another
> is_haswell test elsewhere in the driver that is responsible for the
> performance anomaly.
I had a look at all 'if (brw->is_haswell)' paths but nothing special
came out to me.  It could as well be some place missing a 'if
(brw->is_haswell)'.

>
>>
>> Another possibile explanation is that Haswell has a bug in

Re: [Mesa-dev] [PATCH] i965/hsw: compute DDX in a subspan based only on top row

2013-09-16 Thread Chia-I Wu
On Mon, Sep 16, 2013 at 4:12 PM, Chia-I Wu  wrote:
> On Mon, Sep 16, 2013 at 3:50 AM, Mark Mueller  wrote:
>>
>>
>>
>> On Fri, Sep 13, 2013 at 2:15 PM, Paul Berry  wrote:
>>>
>>> On 12 September 2013 22:06, Chia-I Wu  wrote:

 From: Chia-I Wu 

 Consider only the top-left and top-right pixels to approximate DDX in a
 2x2
 subspan, unless the application or the user requests a more accurate
 approximation.  This results in a less accurate approximation.  However,
 it
 improves the performance of Xonotic with Ultra settings by 24.3879% +/-
 0.832202% (at 95.0% confidence) on Haswell.  No noticeable image quality
 difference observed.

 No piglit gpu.tests regressions (tested with v1)

 I failed to come up with an explanation for the performance difference,
 as the
 change does not affect Ivy Bridge.  If anyone has the insight, please
 kindly
 enlighten me.  Performance differences may also be observed on other
 games
 that call textureGrad and dFdx.

 v2: Honor GL_FRAGMENT_SHADER_DERIVATIVE_HINT and add a drirc option.
 Update
 comments.
>>>
>>>
>>> I'm not entirely comfortable making a change that has a known negative
>>> impact on computational accuracy (even one that leads to such an impressive
>>> performance improvement) when we don't have any theories as to why the
>>> performance improvement happens, or why the improvement doesn't apply to Ivy
>>> Bridge.  In my experience, making changes to the codebase without
>>> understanding why they improve things almost always leads to improvements
>>> that are brittle, since it's likely that the true source of the improvement
>>> is a coincidence that will be wiped out by some future change (or won't be
>>> relevant to client programs other than this particular benchmark).  Having a
>>> theory as to why the performance improvement happens would help us be
>>> confident that we're applying the right fix under the right circumstances.
>> There's another angle to approach this and that is to develop a simple test
>> case that will show the different results across a range of computational
>> accuracy and run the test on proprietary drivers for the same hardware to
>> determine what settings they are using.
> Yes, I have a little test.  On Windows, rendering with texture2D() or
> textureGrad() does not have a noticeable impact on the performance.
> But I am not sure how to change dFdx() accuracy from the shaders.  On
> Linux, I did that by modifying the driver.
>
>>
>>>
>>>
>>> For example, here's one theory as to why we might be seeing an
>>> improvement: perhaps Haswell's sample_d processing is smart enough to
>>> realize that when all the gradient values within a sub-span are the same,
>>> that means that all of the sampling for the sub-span will come from the same
>>> LOD, and that allows it to short-cut some expensive step in the LOD
>>> calculation.  Perhaps the same improvement isn't seen on Ivy Bridge because
>>> Ivy Bridge's sample_d processing logic is less sophisticated, so it's unable
>>> to perform the optimization.  If this is the case, then conditioning the
>>> optimization on brw->is_haswell (as you've done) makes sense.
>>>
>>> Another possible explanation for the Haswell vs Ivy Bridge difference is
>>> that perhaps Ivy Bridge, being a lower-performing chip, has other
>>> bottlenecks that make the optimization irrelevant for this particular
>>> benchmark, but potentially still useful for other benchmarks.  For instance,
>>> maybe when this benchmark executes on Ivy Bridge, the texture that's being
>>> sampled from is located in sufficiently distant memory that optimizing the
>>> sample_d's memory accesses makes no difference, since the bottleneck is the
>>> speed with which the texture can be read into cache, rather than the speed
>>> of operation of sample_d.  If this explanation is correct, then it might be
>>> worth applying the optimization to both Ivy Bridge and Haswell (and perhaps
>>> Sandy Bridge as well), since it might conceivably benefit those other chips
>>> when running applications that place less cache pressure on the chip.
>>
>>
>> This scenario is where I'd place my bets, especially given that the numbers
>> are based on Xonotic. I benchmarked this patch using Xonotic on Bay Trail as
>> is and by replacing !brw->is_haswell with !brw->is_baytrail. With ultra and
>> ultimate levels at medium and high resolutions, the results were all
>> essentially the same at comparable resolutions and quality levels.
> Isn't Bay Trail based on Ivy Bridge?
For Bay Trail, this might help you

  http://lists.freedesktop.org/archives/mesa-dev/2013-September/044288.html

if you are interested.

>>
>> I don't see any justification to tie this change to just Haswell hardware.
>> There's all sorts of reasons why doing that sounds like a big mistake. In
>> fact, another _explanation_ to add to your list is maybe there's another
>> is_haswell test elsewhe

[Mesa-dev] [PATCH] egl: add EGL_WAYLAND_Y_INVERTED_WL attribute

2013-09-16 Thread Stanislav Vorobiov
This patch enables querying of wl_buffer's orientation, it's
associated with weston commit
bfbb8e59fadda112fcdb0bf0a0ed2c0b6c1e1923
gl_renderer: Use EGL_WAYLAND_Y_INVERTED_WL to query wl_buffer's orientation

Stanislav Vorobiov (1):
  egl: add EGL_WAYLAND_Y_INVERTED_WL attribute

 docs/specs/WL_bind_wayland_display.spec |   17 +
 include/EGL/eglmesaext.h|2 ++
 2 files changed, 19 insertions(+)

-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] egl: add EGL_WAYLAND_Y_INVERTED_WL attribute

2013-09-16 Thread Stanislav Vorobiov
This enables querying of wl_buffer's orientation
---
 docs/specs/WL_bind_wayland_display.spec |   17 +
 include/EGL/eglmesaext.h|2 ++
 2 files changed, 19 insertions(+)

diff --git a/docs/specs/WL_bind_wayland_display.spec 
b/docs/specs/WL_bind_wayland_display.spec
index 8f0083c..47cf5cd 100644
--- a/docs/specs/WL_bind_wayland_display.spec
+++ b/docs/specs/WL_bind_wayland_display.spec
@@ -76,6 +76,10 @@ New Tokens
 EGL_TEXTURE_Y_UV_WL 0x31D8
 EGL_TEXTURE_Y_XUXV_WL   0x31D9
 
+Accepted in the  parameter of eglQueryWaylandBufferWL:
+
+EGL_WAYLAND_Y_INVERTED_WL   0x31DB
+
 
 Additions to the EGL 1.4 Specification:
 
@@ -157,6 +161,16 @@ Additions to the EGL 1.4 Specification:
 Further, eglQueryWaylandBufferWL accepts attributes EGL_WIDTH and
 EGL_HEIGHT to query the width and height of the wl_buffer.
 
+Also, eglQueryWaylandBufferWL may accept
+EGL_WAYLAND_Y_INVERTED_WL attribute to query orientation of
+wl_buffer. If EGL_WAYLAND_Y_INVERTED_WL is supported
+eglQueryWaylandBufferWL returns EGL_TRUE and value is a boolean
+that tells if wl_buffer is y-inverted or not. If
+EGL_WAYLAND_Y_INVERTED_WL is not supported
+eglQueryWaylandBufferWL returns EGL_FALSE, in that case
+wl_buffer should be treated as if value of
+EGL_WAYLAND_Y_INVERTED_WL was EGL_TRUE.
+
 Issues
 
 Revision History
@@ -177,3 +191,6 @@ Revision History
 Change eglQueryWaylandBufferWL to take a resource pointer to the
 buffer instead of a pointer to a struct wl_buffer, as the latter has
 been deprecated. (Ander Conselvan de Oliveira)
+Version 6, September 16, 2013
+Add EGL_WAYLAND_Y_INVERTED_WL attribute to allow specifying
+wl_buffer's orientation.
diff --git a/include/EGL/eglmesaext.h b/include/EGL/eglmesaext.h
index e0eae28..1f07d4c 100644
--- a/include/EGL/eglmesaext.h
+++ b/include/EGL/eglmesaext.h
@@ -115,6 +115,8 @@ typedef EGLDisplay (EGLAPIENTRYP PFNEGLGETDRMDISPLAYMESA) 
(int fd);
 #define EGL_WAYLAND_BUFFER_WL  0x31D5 /* eglCreateImageKHR target */
 #define EGL_WAYLAND_PLANE_WL   0x31D6 /* eglCreateImageKHR target */
 
+#define EGL_WAYLAND_Y_INVERTED_WL  0x31DB /* eglQueryWaylandBufferWL 
attribute */
+
 #define EGL_TEXTURE_Y_U_V_WL0x31D7
 #define EGL_TEXTURE_Y_UV_WL 0x31D8
 #define EGL_TEXTURE_Y_XUXV_WL   0x31D9
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] wayland: Add support for eglSwapInterval

2013-09-16 Thread Pekka Paalanen
On Fri, 13 Sep 2013 17:04:38 +0100
Neil Roberts  wrote:

> Here is another version of the patch which brings back the blocking
> when there are no buffers available to cope with the situation where
> the compositor isn't immediately releasing buffers. Maybe we could
> leave the decision about whether to increase the buffer count to 4 as
> a separate patch.

Hi Neil,

yes, I think this approach is how it should work. There are a few
details commented below.

> 
> -- >8 --
> 
> The Wayland EGL platform now respects the eglSwapInterval value. The value is
> clamped to either 0 or 1 because it is difficult (and probably not useful) to
> sync to more than 1 redraw.
> 
> The main change is that if the swap interval is 0 then instead of installing a
> frame callback it will just call the display sync method and throttle itself
> to that. When the application is not running fullscreen the compositor is
> likely to release the previous buffer immediately so this gives the
> application the best chance of reusing the buffer.
> 
> If there are no buffers available then instead of returning with an error,
> get_back_bo will now block until a buffer becomes available. This is necessary
> if the compositor is not releasing buffers immediately. As there are only
> three buffers, this could actually mean that the client ends up throttled to
> the vblank anyway because Weston can hold on to three buffers when the client
> is fullscreen. We could fix this by increasing the buffer count to 4 or
> changing Weston and KMS to allow cancelling a pending buffer swap, but for now
> this patch ignores that problem.

A good explanation, cool.

> This also moves the vblank configuration defines from platform_x11.c to the
> common egl_dri2.h header so they can be shared by both platforms.

This little part could be a separate patch, so it could go in already.

> ---
>  src/egl/drivers/dri2/egl_dri2.h |   7 ++
>  src/egl/drivers/dri2/platform_wayland.c | 159 
> 
>  src/egl/drivers/dri2/platform_x11.c |   6 --
>  3 files changed, 147 insertions(+), 25 deletions(-)
> 
> diff --git a/src/egl/drivers/dri2/egl_dri2.h b/src/egl/drivers/dri2/egl_dri2.h
> index fba5f81..cc657ba 100644
> --- a/src/egl/drivers/dri2/egl_dri2.h
> +++ b/src/egl/drivers/dri2/egl_dri2.h
> @@ -175,6 +175,7 @@ struct dri2_egl_surface
> intdx;
> intdy;
> struct wl_callback*frame_callback;
> +   struct wl_callback*throttle_callback;
> int format;
>  #endif
>  
> @@ -221,6 +222,12 @@ struct dri2_egl_image
> __DRIimage *dri_image;
>  };
>  
> +/* From xmlpool/options.h, user exposed so should be stable */
> +#define DRI_CONF_VBLANK_NEVER 0
> +#define DRI_CONF_VBLANK_DEF_INTERVAL_0 1
> +#define DRI_CONF_VBLANK_DEF_INTERVAL_1 2
> +#define DRI_CONF_VBLANK_ALWAYS_SYNC 3
> +
>  /* standard typecasts */
>  _EGL_DRIVER_STANDARD_TYPECASTS(dri2_egl)
>  _EGL_DRIVER_TYPECAST(dri2_egl_image, _EGLImage, obj)
> diff --git a/src/egl/drivers/dri2/platform_wayland.c 
> b/src/egl/drivers/dri2/platform_wayland.c
> index ffc5959..6ee6ffb 100644
> --- a/src/egl/drivers/dri2/platform_wayland.c
> +++ b/src/egl/drivers/dri2/platform_wayland.c
> @@ -180,8 +180,16 @@ dri2_create_window_surface(_EGLDriver *drv, _EGLDisplay 
> *disp,
>  _EGLConfig *conf, EGLNativeWindowType window,
>  const EGLint *attrib_list)
>  {
> -   return dri2_create_surface(drv, disp, EGL_WINDOW_BIT, conf,
> +   struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
> +   _EGLSurface *surf;
> +
> +   surf = dri2_create_surface(drv, disp, EGL_WINDOW_BIT, conf,
> window, attrib_list);
> +
> +   if (surf != NULL)
> +  drv->API.SwapInterval(drv, disp, surf, 
> dri2_dpy->default_swap_interval);
> +
> +   return surf;
>  }
>  
>  /**
> @@ -216,6 +224,8 @@ dri2_destroy_surface(_EGLDriver *drv, _EGLDisplay *disp, 
> _EGLSurface *surf)
>  
> if (dri2_surf->frame_callback)
>wl_callback_destroy(dri2_surf->frame_callback);
> +   if (dri2_surf->throttle_callback)
> +  wl_callback_destroy(dri2_surf->throttle_callback);
>  
> if (dri2_surf->base.Type == EGL_WINDOW_BIT) {
>dri2_surf->wl_win->private = NULL;
> @@ -261,24 +271,46 @@ get_back_bo(struct dri2_egl_surface *dri2_surf, 
> __DRIbuffer *buffer)
> __DRIimage *image;
> int i, name, pitch;
>  
> -   /* There might be a buffer release already queued that wasn't processed */
> -   wl_display_dispatch_queue_pending(dri2_dpy->wl_dpy, dri2_dpy->wl_queue);
> +   if (dri2_surf->throttle_callback == NULL) {
> +  /* There might be a buffer release already queued that wasn't 
> processed */
> +  wl_display_dispatch_queue_pending(dri2_dpy->wl_dpy, 
> dri2_dpy->wl_queue);
> +   } else {
> +  /* If we aren't throttling to the frame callbacks then the compositor
> +   * may have sent a release event after the last attach so we'll wai

Re: [Mesa-dev] [PATCH 3/3] mesa: add missing error checks in _mesa_GetObject[Ptr]Label()

2013-09-16 Thread Timothy Arceri
I tested on Catalyst and it allows both negative and 0 buf size without 
a GL_INVALID_VALUE error.


I'm not sure if I'm allowed to review patches but for whats its worth 
for the other two patches.


Reviewed-by: Timothy Arceri 

On 16/09/13 15:37, Timothy Arceri wrote:

Hi Brian,

Maybe its just another oversight in the spec but I thought I'd point out that 
the spec doesnt actually say to test for this in the get label functions. I 
assumed this was because label can be NULL in which case a bufSize of 0 would 
be valid. I haven't checked what the Catalyst drivers do yet.

Tim




- Original Message -
From: Brian Paul 
To: mesa-dev@lists.freedesktop.org
Cc:
Sent: Sunday, 15 September 2013 2:16 AM
Subject: [Mesa-dev] [PATCH 3/3] mesa: add missing error checks in   
_mesa_GetObject[Ptr]Label()

---
src/mesa/main/objectlabel.c |   12 
1 file changed, 12 insertions(+)

diff --git a/src/mesa/main/objectlabel.c b/src/mesa/main/objectlabel.c
index bfe9ba2..c373a46 100644
--- a/src/mesa/main/objectlabel.c
+++ b/src/mesa/main/objectlabel.c
@@ -256,6 +256,12 @@ _mesa_GetObjectLabel(GLenum identifier, GLuint name, 
GLsizei bufSize,
 GET_CURRENT_CONTEXT(ctx);
 char **labelPtr;

+   if (bufSize <= 0) {
+  _mesa_error(ctx, GL_INVALID_VALUE, "glGetObjectLabel(bufSize = %d)",
+  bufSize);
+  return;
+   }
+
 labelPtr = get_label_pointer(ctx, identifier, name, "glGetObjectLabel");
 if (!labelPtr)
return;
@@ -288,6 +294,12 @@ _mesa_GetObjectPtrLabel(const void *ptr, GLsizei bufSize, 
GLsizei *length,
 char **labelPtr;
 struct gl_sync_object *const syncObj = (struct gl_sync_object *) ptr;

+   if (bufSize <= 0) {
+  _mesa_error(ctx, GL_INVALID_VALUE, "glGetObjectPtrLabel(bufSize = %d)",
+  bufSize);
+  return;
+   }
+
 if (!_mesa_validate_sync(ctx, syncObj)) {
_mesa_error(ctx, GL_INVALID_VALUE, "glGetObjectPtrLabel (not a valid sync 
object)");
return;
--
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] a newbie asking newbie questions

2013-09-16 Thread Rogovin, Kevin
Hello all,

I am new to Mesa development (and in particular the i965 driver). I am 
currently trying to gain an understanding of Mesa's implementation with mostly 
an eye on (just) the i965 driver. Some questions:

There are some docs in docs, how up to date are those documents? In particular 
I saw this: ( from User Topics/Shading Language on the side bar)


a. Shading language programs are compiled into low-level programs very 
similar to those of GL_ARB_vertex/fragment_program.
b.All vector types (vec2, vec3, vec4, bvec2, etc) currently occupy full 
float[4] registers.
c. Float constants and variables are packed so that up to four floats can 
occupy one program parameter/register.
d.All function calls are inlined.
e.Shaders which use too many registers will not compile.
f. The quality of generated code is pretty good, register usage is fair.
g.Shader error detection and reporting of errors (InfoLog) is not very good 
yet.
h.The ftransform() function doesn't necessarily match the results of 
fixed-function transformation.

My questions are:

1.   for item (a), I thought Mesa was using a custom IR that sorted of 
looked like LISP, is that correct? Is (a) even correct? For that matter where 
does the assembly style shader magicks fit in? [even though the asm interface 
is old, it is used in common user-oriented games, like Doom3 for example]. 
Additionally, there are extensions out there that update the asm interface for 
GL3 and GL4 features. Are those extensions already implemented in Mesa?

2.   my question for item (b) mostly concerns the i965 driver. First a 
disclaimer: I am quite *new* so some of my starting points might be utterly 
wrong. My understanding of the i965 so far is that it strongly prefers to work 
with SIMD commands (I guess operating on vec4's typically). One idea is that 
compiling of fragment shaders would break all operations down to scalar 
operations and then the shader would be re-assembled to process 4 fragments at 
a time by "combining" the shader commands of 4 fragments from 4 scalar ops into 
one vec4 op. Is this already done? [I have memory of seeing a slide from Ian R. 
that the i965 driver draws 8(or was it 4) pixels "at a time"].

3.   related to (d), but not exactly the same: how are branches related to 
(non-unrollable) loops handled? in particular nested loops?


There is also some doxygen stuff found in doxygen, some of it (for example 
glapi.doxy) seems to being suffering from bit rot (for glapi.doxy it points to 
a directory that does not even exist anymore). On a related note, where are the 
beans about the dispatch table?

Also, is src/mesa/state_tracker part of Mesa main or Gallium stuff? Taking a 
quick, non-careful look, it looks like state tracking is handled by bits 
running around in mesa/main, is that correct?

What is the general rule of thumb of code that belongs in mesa/main vs further 
down to the drivers/dri/XXX (for me XXX-i965).

Best Regards
-Kevin Rogovin

-
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki 
Business Identity Code: 0357606 - 4 
Domiciled in Helsinki 

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] gallium-bind-sampler-states branch

2013-09-16 Thread Roland Scheidegger
Am 14.09.2013 18:24, schrieb Brian Paul:
> On 09/12/2013 09:06 PM, Chia-I Wu wrote:
>> Hi Brian,
>>
>> On Fri, Sep 13, 2013 at 8:46 AM, Brian Paul  wrote:
>>>
>>> I just pushed a gallium-bind-sampler-states branch to my git repo at
>>> git://people.freedesktop.org/~brianp/mesa
>>>
>>> It replaces the four
>>> pipe_context::bind_fragment/vertex/geometry/compute_sampler_states()
>>> functions with a single bind_sampler_states() function:
>>>
>>>   void (*bind_sampler_states)(struct pipe_context *,
>>>   unsigned shader, unsigned start_slot,
>>>   unsigned num_samplers, void **samplers);
>>>
>>> At this point start_slot is always zero (at least for non-compute
>>> shaders).
>>> And as the updated gallium docs explain, at some point calls to
>>> bind_sampler_states() will be used to updated sub-ranges, but that never
>>> happens currently.
>>>
>>> I've updated all the drivers, state trackers, utils, etc.
>>>
>>> I've tested the svga, llvmpipe and softpipe drivers.  'make check' and a
>>> texture subset of piglit pass w/out regressions.  I'd appreciate it
>>> if other
>>> driver developers would test their favorite driver.
>> For ilo, the new code does not follow the doc and unbinds samplers not
>> in range.
> 
> I think that's OK.  The CSO module (used by the state tracker) currently
> always calls pipe_context::bind_sampler_states() with start=0 and count
> such that it sets/replaces all samplers, never a sub-range.  That
> could/should change in the future.
> 
> See single_sampler_done() in cso_context.c.

Not all state trackers use cso though. While this can be fixed later, I
think I'd prefer if the docs would state this isn't really meant to be
that way and should be changed in the future in cso and certainly fixed
in the drivers.

Roland



> 
> 
>> Is it fine if I implement the new bind_sampler_states as a helper
>> function on master branch, so that you hook it up to
>> pipe_context::bind_sampler_states in your branch and remove the old
>> ones?
> 
> I'm not quite sure that I understand what you mean.  Can you elaborate?
> 
> -Brian
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] mesa: add missing error checks in _mesa_GetObject[Ptr]Label()

2013-09-16 Thread Brian Paul

Hmm, check out the man page at http://www.opengl.org/sdk/docs/man/xhtml

In the description it basically says that bufSize can be zero, but in 
the "Errors" section is says bufSize=0 is a GL_INVALID_VALUE.


I think you're right about bufSize=0 being OK since one might want to 
just query the length of the label without actually getting it.


Most GL functions with GLsizei parameters error-check them for negative 
or <=0 and raise GL_INVALID_VALUE.


I'll revise my patch for < 0 and file a spec bug report.

-Brian


On 09/16/2013 03:40 AM, Timothy Arceri wrote:

I tested on Catalyst and it allows both negative and 0 buf size without
a GL_INVALID_VALUE error.

I'm not sure if I'm allowed to review patches but for whats its worth
for the other two patches.

Reviewed-by: Timothy Arceri 

On 16/09/13 15:37, Timothy Arceri wrote:

Hi Brian,

Maybe its just another oversight in the spec but I thought I'd point
out that the spec doesnt actually say to test for this in the get
label functions. I assumed this was because label can be NULL in which
case a bufSize of 0 would be valid. I haven't checked what the
Catalyst drivers do yet.

Tim




- Original Message -
From: Brian Paul 
To: mesa-dev@lists.freedesktop.org
Cc:
Sent: Sunday, 15 September 2013 2:16 AM
Subject: [Mesa-dev] [PATCH 3/3] mesa: add missing error checks in
_mesa_GetObject[Ptr]Label()

---
src/mesa/main/objectlabel.c |   12 
1 file changed, 12 insertions(+)

diff --git a/src/mesa/main/objectlabel.c b/src/mesa/main/objectlabel.c
index bfe9ba2..c373a46 100644
--- a/src/mesa/main/objectlabel.c
+++ b/src/mesa/main/objectlabel.c
@@ -256,6 +256,12 @@ _mesa_GetObjectLabel(GLenum identifier, GLuint
name, GLsizei bufSize,
 GET_CURRENT_CONTEXT(ctx);
 char **labelPtr;

+   if (bufSize <= 0) {
+  _mesa_error(ctx, GL_INVALID_VALUE, "glGetObjectLabel(bufSize =
%d)",
+  bufSize);
+  return;
+   }
+
 labelPtr = get_label_pointer(ctx, identifier, name,
"glGetObjectLabel");
 if (!labelPtr)
return;
@@ -288,6 +294,12 @@ _mesa_GetObjectPtrLabel(const void *ptr, GLsizei
bufSize, GLsizei *length,
 char **labelPtr;
 struct gl_sync_object *const syncObj = (struct gl_sync_object *)
ptr;

+   if (bufSize <= 0) {
+  _mesa_error(ctx, GL_INVALID_VALUE,
"glGetObjectPtrLabel(bufSize = %d)",
+  bufSize);
+  return;
+   }
+
 if (!_mesa_validate_sync(ctx, syncObj)) {
_mesa_error(ctx, GL_INVALID_VALUE, "glGetObjectPtrLabel (not
a valid sync object)");
return;
--
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] a newbie asking newbie questions

2013-09-16 Thread Eric Anholt
"Rogovin, Kevin"  writes:

> Hello all,
>
> I am new to Mesa development (and in particular the i965 driver). I am
> currently trying to gain an understanding of Mesa's implementation
> with mostly an eye on (just) the i965 driver. Some questions:
>
> There are some docs in docs, how up to date are those documents? In
> particular I saw this: ( from User Topics/Shading Language on the side
> bar)
>
>
> a.  Shading language programs are compiled into low-level programs
> very similar to those of GL_ARB_vertex/fragment_program.  b.  All
> vector types (vec2, vec3, vec4, bvec2, etc) currently occupy full
> float[4] registers.  c.  Float constants and variables are packed so
> that up to four floats can occupy one program parameter/register.  d.
> All function calls are inlined.  e.  Shaders which use too many
> registers will not compile.  f.  The quality of generated code is
> pretty good, register usage is fair.  g.  Shader error detection and
> reporting of errors (InfoLog) is not very good yet.  h.  The
> ftransform() function doesn't necessarily match the results of
> fixed-function transformation.
>
> My questions are:
>
> 1.  for item (a), I thought Mesa was using a custom IR that sorted of
> looked like LISP, is that correct? Is (a) even correct? For that
> matter where does the assembly style shader magicks fit in? [even
> though the asm interface is old, it is used in common user-oriented
> games, like Doom3 for example]. Additionally, there are extensions out
> there that update the asm interface for GL3 and GL4 features. Are
> those extensions already implemented in Mesa?

It doesn't particularly "look like" a language, because you don't ever
store text of it except for the builtin functions library (which is
about to change).  It's a tree-structured IR, with all the types you
know from GLSL (vec2s, for example).  This makes certain types of
pattern matching easy (min(max(a, 0), 1) -> saturate(a), for example),
but makes a bunch of other things hard (def/use chain tracking).

The assembly shaders get put into the same driver backend compiler
infrastructure, see brw_fs_fp.cpp (asm shaders) and brw_fs_visitor.cpp
(glsl shaders).

> disclaimer: I am quite *new* so some of my starting points might be
> utterly wrong. My understanding of the i965 so far is that it strongly
> prefers to work with SIMD commands (I guess operating on vec4's
> typically). One idea is that compiling of fragment shaders would break
> all operations down to scalar operations and then the shader would be
> re-assembled to process 4 fragments at a time by "combining" the
> shader commands of 4 fragments from 4 scalar ops into one vec4 op. Is
> this already done? [I have memory of seeing a slide from Ian R. that
> the i965 driver draws 8(or was it 4) pixels "at a time"].

It's not a preference question.  The registers are 8 floats wide.
Vertex shaders get invoked 2 vertices at a time, with a register
containing these values:

.   +--+--+--+--+--+--+--+--+
.   | v0.x | v0.y | v0.z | v0.w | v1.x | v1.y | v1.z | v1.w |
.   +--+--+--+--+--+--+--+--+

while these 8 pixels in screen space:

.   +++++
.   | p0 | p1 | p2 | p3 |
.   +++++
.   | p4 | p5 | p6 | p7 |
.   +++++

are loaded in fragment shader registers as:

.   +--+--+--+--+--+--+--+--+
.   | p0.x | p1.x | p4.x | p5.x | p2.x | p3.x | p6.x | p7.x |
.   +--+--+--+--+--+--+--+--+

Note how one register just holds a single channel ('.x' here) of a
vector.  A vec4 would take up 4 registers, and to do value0.xyzw *
value1.xyzw, you'd emit 4 MULs.

I suppose you could go out of your way to move things around such that
you made it work like the vertex shader did, but it would be a massive
overhead for no win.  The fragment shader mode is ideal, unless you're
only lighting up a couple of pixels.

To see how this ends up working in practice, I'd recommend looking at
some similar shader code between the VS and FS (perhaps from some piglit
tests?) under INTEL_DEBUG=vs,fs.

(You'll also find references to "16-wide" dispatch.  That's a mode where
you make each channel take up 2 registers of 8 pixels each, and the HW
decodes each instruction as an operation on both registers)

> 3.  related to (d), but not exactly the same: how are branches related
> to (non-unrollable) loops handled? in particular nested loops?

We have structured control flow all the way down to the hardware,
because the hardware handles the divergent control flow when one pixel
takes on side of an "if" and another pixel takes the other.  (And loops,
too).

> There is also some doxygen stuff found in doxygen, some of it (for
> example glapi.doxy) seems to being suffering from bit rot (for
> glapi.doxy it points to a directory that does not even exist
> anymore).

Nobody regularly generates doxygen output.  I'd like to set it up to
automatically run 

Re: [Mesa-dev] gallium-bind-sampler-states branch

2013-09-16 Thread Brian Paul

On 09/15/2013 09:31 AM, Chia-I Wu wrote:

On Sun, Sep 15, 2013 at 12:24 AM, Brian Paul  wrote:

On 09/12/2013 09:06 PM, Chia-I Wu wrote:


Hi Brian,

On Fri, Sep 13, 2013 at 8:46 AM, Brian Paul  wrote:



I just pushed a gallium-bind-sampler-states branch to my git repo at
git://people.freedesktop.org/~brianp/mesa

It replaces the four
pipe_context::bind_fragment/vertex/geometry/compute_sampler_states()
functions with a single bind_sampler_states() function:

   void (*bind_sampler_states)(struct pipe_context *,
   unsigned shader, unsigned start_slot,
   unsigned num_samplers, void **samplers);

At this point start_slot is always zero (at least for non-compute
shaders).
And as the updated gallium docs explain, at some point calls to
bind_sampler_states() will be used to updated sub-ranges, but that never
happens currently.

I've updated all the drivers, state trackers, utils, etc.

I've tested the svga, llvmpipe and softpipe drivers.  'make check' and a
texture subset of piglit pass w/out regressions.  I'd appreciate it if
other
driver developers would test their favorite driver.


For ilo, the new code does not follow the doc and unbinds samplers not in
range.



I think that's OK.  The CSO module (used by the state tracker) currently
always calls pipe_context::bind_sampler_states() with start=0 and count such
that it sets/replaces all samplers, never a sub-range.  That could/should
change in the future.

See single_sampler_done() in cso_context.c.




Is it fine if I implement the new bind_sampler_states as a helper
function on master branch, so that you hook it up to
pipe_context::bind_sampler_states in your branch and remove the old
ones?



I'm not quite sure that I understand what you mean.  Can you elaborate?

There is already ilo_bind_sampler_states that does what
pipe_context::bind_sampler_states expects, except that the function
returns a bool.  I can make it return void so that, in your branch,
you can initialize pipe_context::bind_sampler_states to it instead of
adding ilo_bind_sampler_states2.


Sure, feel free to refactor as you see fit.  I didn't do that in the 
first place because the existing ilo_bind_X_sampler_states() functions 
are calling ilo_bind_sampler_states() twice and setting dirty flags.  I 
didn't feel comfortable rewriting all that and likely breaking it so I 
just wrapped things up with ilo_bind_sampler_states2().


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] gallium-bind-sampler-states branch

2013-09-16 Thread Chia-I Wu
On Tue, Sep 17, 2013 at 12:09 AM, Brian Paul  wrote:
> On 09/15/2013 09:31 AM, Chia-I Wu wrote:
>>
>> On Sun, Sep 15, 2013 at 12:24 AM, Brian Paul  wrote:
>>>
>>> On 09/12/2013 09:06 PM, Chia-I Wu wrote:


 Hi Brian,

 On Fri, Sep 13, 2013 at 8:46 AM, Brian Paul  wrote:
>
>
>
> I just pushed a gallium-bind-sampler-states branch to my git repo at
> git://people.freedesktop.org/~brianp/mesa
>
> It replaces the four
> pipe_context::bind_fragment/vertex/geometry/compute_sampler_states()
> functions with a single bind_sampler_states() function:
>
>void (*bind_sampler_states)(struct pipe_context *,
>unsigned shader, unsigned start_slot,
>unsigned num_samplers, void **samplers);
>
> At this point start_slot is always zero (at least for non-compute
> shaders).
> And as the updated gallium docs explain, at some point calls to
> bind_sampler_states() will be used to updated sub-ranges, but that
> never
> happens currently.
>
> I've updated all the drivers, state trackers, utils, etc.
>
> I've tested the svga, llvmpipe and softpipe drivers.  'make check' and
> a
> texture subset of piglit pass w/out regressions.  I'd appreciate it if
> other
> driver developers would test their favorite driver.


 For ilo, the new code does not follow the doc and unbinds samplers not
 in
 range.
>>>
>>>
>>>
>>> I think that's OK.  The CSO module (used by the state tracker) currently
>>> always calls pipe_context::bind_sampler_states() with start=0 and count
>>> such
>>> that it sets/replaces all samplers, never a sub-range.  That could/should
>>> change in the future.
>>>
>>> See single_sampler_done() in cso_context.c.
>>>
>>>
>>>
 Is it fine if I implement the new bind_sampler_states as a helper
 function on master branch, so that you hook it up to
 pipe_context::bind_sampler_states in your branch and remove the old
 ones?
>>>
>>>
>>>
>>> I'm not quite sure that I understand what you mean.  Can you elaborate?
>>
>> There is already ilo_bind_sampler_states that does what
>> pipe_context::bind_sampler_states expects, except that the function
>> returns a bool.  I can make it return void so that, in your branch,
>> you can initialize pipe_context::bind_sampler_states to it instead of
>> adding ilo_bind_sampler_states2.
>
>
> Sure, feel free to refactor as you see fit.  I didn't do that in the first
> place because the existing ilo_bind_X_sampler_states() functions are calling
> ilo_bind_sampler_states() twice and setting dirty flags.  I didn't feel
> comfortable rewriting all that and likely breaking it so I just wrapped
> things up with ilo_bind_sampler_states2().
I've pushed the change.  It is called twice by
ilo_bind_X_sampler_states() needs to bind the samplers in the range,
and unbind samplers that are not.

It's great to see the interface change.  Thanks for the work.  When
you get to work on consolidating set_X_sampler_views, I believe there
is also ilo_set_sampler_views that you can hook up to :P


>
> -Brian
>



-- 
o...@lunarg.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] gallium-bind-sampler-states branch

2013-09-16 Thread Roland Scheidegger
Am 16.09.2013 18:18, schrieb Brian Paul:
> On 09/16/2013 08:56 AM, Roland Scheidegger wrote:
>> Am 14.09.2013 18:24, schrieb Brian Paul:
>>> On 09/12/2013 09:06 PM, Chia-I Wu wrote:
 Hi Brian,

 On Fri, Sep 13, 2013 at 8:46 AM, Brian Paul  wrote:
>
> I just pushed a gallium-bind-sampler-states branch to my git repo at
> git://people.freedesktop.org/~brianp/mesa
>
> It replaces the four
> pipe_context::bind_fragment/vertex/geometry/compute_sampler_states()
> functions with a single bind_sampler_states() function:
>
>void (*bind_sampler_states)(struct pipe_context *,
>unsigned shader, unsigned start_slot,
>unsigned num_samplers, void
> **samplers);
>
> At this point start_slot is always zero (at least for non-compute
> shaders).
> And as the updated gallium docs explain, at some point calls to
> bind_sampler_states() will be used to updated sub-ranges, but that
> never
> happens currently.
>
> I've updated all the drivers, state trackers, utils, etc.
>
> I've tested the svga, llvmpipe and softpipe drivers.  'make check'
> and a
> texture subset of piglit pass w/out regressions.  I'd appreciate it
> if other
> driver developers would test their favorite driver.
 For ilo, the new code does not follow the doc and unbinds samplers not
 in range.
>>>
>>> I think that's OK.  The CSO module (used by the state tracker) currently
>>> always calls pipe_context::bind_sampler_states() with start=0 and count
>>> such that it sets/replaces all samplers, never a sub-range.  That
>>> could/should change in the future.
>>>
>>> See single_sampler_done() in cso_context.c.
>>
>> Not all state trackers use cso though.
> 
> AFAICT, all src/gallium/state_trackers/ use CSO, except for clover (and
> I did miss a function call change there).  I didn't look at any
> out-of-tree state trackers.
Yes, I was thinking about out-of-tree state trackers.

> 
> 
>> While this can be fixed later, I
>> think I'd prefer if the docs would state this isn't really meant to be
>> that way and should be changed in the future in cso and certainly fixed
>> in the drivers.
> 
> As I mentioned earlier, I updated the docs to explain how things current
> work and how it should work in the future:
> 
> """
> * :ref:`Sampler`: Texture sampler states are bound separately for fragment,
>   vertex, geometry and compute shaders with the ``bind_sampler_states``
>   function.  The ``start`` and ``num_samplers`` parameters indicate a range
>   of samplers to change.  NOTE: at this time, start is always zero and
>   the CSO module will always replace all samplers at once (no sub-ranges).
>   This may change in the future.
> """
> 
> Is that what you mean?

Yes, I don't particularly like the wording - it is not obvious that it
is really a bug if a driver can't cope with start != 0, and the "start
is always zero" etc. is just a guarantee from cso (and some other state
trackers) for now but not required already from the interface. No biggie
though.

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] i965: Refactor Gen7+ SURFACE_STATE setup for buffer surfaces.

2013-09-16 Thread Eric Anholt
Kenneth Graunke  writes:

> This was an embarassingly large amount of copy and pasted code,
> and it wasn't particularly simple code either.  By factoring it out
> into a helper function, we consolidate the complexity.
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 144 
> +-
>  1 file changed, 58 insertions(+), 86 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> index 37e3174..8413308 100644
> --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> @@ -224,6 +224,37 @@ gen7_check_surface_setup(uint32_t *surf, bool 
> is_render_target)
> }
>  }
>  
> +static void
> +gen7_emit_buffer_surface_state(struct brw_context *brw,
> +   uint32_t *out_offset,
> +   drm_intel_bo *bo,
> +   unsigned buffer_offset,
> +   unsigned surface_format,
> +   unsigned buffer_size,
> +   unsigned pitch,
> +   unsigned mocs)
> +{
> +   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> +8 * 4, 32, out_offset);
> +   memset(surf, 0, 8 * 4);
> +
> +   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
> + surface_format << BRW_SURFACE_FORMAT_SHIFT |
> + BRW_SURFACE_RC_READ_WRITE;
> +   surf[1] = bo->offset + buffer_offset; /* reloc */
> +   surf[2] = SET_FIELD(buffer_size & 0x7f, GEN7_SURFACE_WIDTH) |
> + SET_FIELD((buffer_size >> 7) & 0x3fff, GEN7_SURFACE_HEIGHT);
> +   surf[3] = SET_FIELD((buffer_size >> 21) & 0x3f, BRW_SURFACE_DEPTH) |
> + (pitch - 1);
> +   surf[4] = 0;
> +   surf[5] = SET_FIELD(mocs, GEN7_SURFACE_MOCS);
> +
> +   /* Emit relocation to surface contents */
> +   drm_intel_bo_emit_reloc(brw->batch.bo, *out_offset + 4,
> +   bo, buffer_offset, I915_GEM_DOMAIN_SAMPLER, 0);

I think this will segfault if BO is NULL

> +
> +   gen7_check_surface_setup(surf, false /* is_render_target */);
> +}
>  
>  static void
>  gen7_update_buffer_texture_surface(struct gl_context *ctx,
> @@ -237,39 +268,23 @@ gen7_update_buffer_texture_surface(struct gl_context 
> *ctx,
> drm_intel_bo *bo = intel_obj ? intel_obj->buffer : NULL;
> gl_format format = tObj->_BufferObjectFormat;
>  
> -   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> -8 * 4, 32, surf_offset);
> -   memset(surf, 0, 8 * 4);
> -
> uint32_t surface_format = brw_format_for_mesa_format(format);
> if (surface_format == 0 && format != MESA_FORMAT_RGBA_FLOAT32) {
>_mesa_problem(NULL, "bad format %s for texture buffer\n",
>  _mesa_get_format_name(format));
> }
>  
> -   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
> - surface_format << BRW_SURFACE_FORMAT_SHIFT |
> - BRW_SURFACE_RC_READ_WRITE;
> -
> -   if (bo) {
> -  surf[1] = bo->offset; /* reloc */
> -
> -  drm_intel_bo_emit_reloc(brw->batch.bo,
> -   *surf_offset + 4,
> -   bo, 0,
> -   I915_GEM_DOMAIN_SAMPLER, 0);
> -
> -  int texel_size = _mesa_get_format_bytes(format);
> -  int w = intel_obj->Base.Size / texel_size;
> -
> -  /* note that these differ from GEN6 */
> -  surf[2] = SET_FIELD(w & 0x7f, GEN7_SURFACE_WIDTH) | /* bits 6:0 of 
> size */
> -SET_FIELD((w >> 7) & 0x3fff, GEN7_SURFACE_HEIGHT); /* 20:7 */
> -  surf[3] = SET_FIELD((w >> 21) & 0x3f, BRW_SURFACE_DEPTH) | /* bits 
> 26:21 */
> -(texel_size - 1);
> -   }
> -
> -   gen7_check_surface_setup(surf, false /* is_render_target */);
> +   int texel_size = _mesa_get_format_bytes(format);
> +   int w = intel_obj ? intel_obj->Base.Size / texel_size : 0;
> +
> +   gen7_emit_buffer_surface_state(brw,
> +  surf_offset,
> +  bo,
> +  0,
> +  surface_format,
> +  w,
> +  texel_size,
> +  0 /* mocs */);

But here you might pass in a NULL bo.  Same for patch 4.  I'm confused
that this didn't produce a piglit regression.

>  }

I think the savings is well worth it in patches 1, 3, and 4, but I'm not
a fan of patch 2.  I've got a lot of branches outstanding where I'm
trying to finally finish off mt->first_level and the tile offset usage,
and sharing code between renderbuffers and textures is just going to
make that harder to do.  (I'm already really tempted to split gen4/5
From gen6 RB setup, since I think I could land gen4/5 without
regressing, while gen6 

Re: [Mesa-dev] [PATCH 11/15] i965/fs: When >64 input components, order them to match prev pipeline stage.

2013-09-16 Thread Paul Berry
On 15 September 2013 21:16, Kenneth Graunke  wrote:

> On 09/03/2013 04:18 PM, Paul Berry wrote:
> > Since the SF/SBE stage is only capable of performing arbitrary
> > reorderings of 16 varying slots, we can't arrange the fragment shader
> > inputs in an arbitrary order if there are more than 16 input varying
> > slots in use.  We need to make sure that slots 16-31 match the
> > corresponding outputs of the previous pipeline stage.
> >
> > The easiest way to accomplish this is to just make all varying slots
> > match up with the previous pipeline stage.
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 42
> ++--
> >  src/mesa/drivers/dri/i965/brw_wm.c   |  3 ++-
> >  2 files changed, 38 insertions(+), 7 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 7950d5f6..8d73a0f 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -1237,11 +1237,40 @@ fs_visitor::calculate_urb_setup()
> > int urb_next = 0;
> > /* Figure out where each of the incoming setup attributes lands. */
> > if (brw->gen >= 6) {
> > -  for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) {
> > -  if (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &
> > - BITFIELD64_BIT(i)) {
> > - c->prog_data.urb_setup[i] = urb_next++;
> > -  }
> > +  if (_mesa_bitcount_64(fp->Base.InputsRead &
> > +BRW_FS_VARYING_INPUT_MASK) <= 16) {
> > + /* The SF/SBE pipeline stage can do arbitrary rearrangement of
> the
> > +  * first 16 varying inputs, so we can put them wherever we
> want.
> > +  * Just put them in order.
> > +  */
>
> It might be nice to have a comment saying why this is useful (as opposed
> to always following the previous stage's ordering).  As I understand it,
> this is useful when the VS outputs a bunch of varyings but the FS only
> uses a couple of them---we can pack them into the first few slots,
> saving space.
>

Also, it allows us to avoid recompiling the fragment shader when the vertex
(or geometry) shader changes.  I've added this comment to clarify:

  * This is useful because it means that (a) inputs not used by the
  * fragment shader won't take up valuable register space, and (b)
we
  * won't have to recompile the fragment shader if it gets paired
with
  * a different vertex (or geometry) shader.



>
> > + for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) {
> > +if (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &
> > +BITFIELD64_BIT(i)) {
> > +   c->prog_data.urb_setup[i] = urb_next++;
> > +}
> > + }
> > +  } else {
> > + /* We have enough input varyings that the SF/SBE pipeline
> stage can't
> > +  * arbitrarily rearrange them to suit our whim; we have to put
> them
> > +  * in an order that matches the output of the previous
> pipeline stage
> > +  * (geometry or vertex shader).
> > +  */
> > + struct brw_vue_map prev_stage_vue_map;
> > + brw_compute_vue_map(brw, &prev_stage_vue_map,
> > + c->key.input_slots_valid);
> > + int first_slot = 2 * BRW_SF_URB_ENTRY_READ_OFFSET;
> > + assert(prev_stage_vue_map.num_slots <= first_slot + 32);
> > + for (int slot = first_slot; slot <
> prev_stage_vue_map.num_slots;
> > +  slot++) {
> > +int varying = prev_stage_vue_map.slot_to_varying[slot];
> > +if (varying != BRW_VARYING_SLOT_COUNT &&
>
> It wasn't immediately obvious to me why varying would be
> BRW_VARYING_SLOT_COUNT.  But, on further inspection, I see this is the
> value that gets assigned to empty slots.
>
> Up to you whether you want to add a small comment.
>

Good point.  I've added this comment:

/* Note that varying == BRW_VARYING_SLOT_COUNT when a slot is
 * unused.
 */



>
> > +(fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &
> > + BITFIELD64_BIT(varying))) {
> > +   c->prog_data.urb_setup[varying] = slot - first_slot;
> > +   urb_next = MAX2(urb_next, slot + 1);
> > +}
> > + }
> > + urb_next = prev_stage_vue_map.num_slots - first_slot;
>
> Huh?  It looks like you're setting urb_next in the loop, then clobbering
> it immediately after the loop.  This should probably be fixed.
>

Oops, I meant to delete the first reference to urb_next entirely, and I
guess I forgot.  Fixed, thanks.


>
> >}
> > } else {
> >/* FINISHME: The sf doesn't map VS->FS inputs for us very well. */
> > @@ -3149,7 +3178,8 @@ brw_fs_precompile(struct gl_context *ctx, struct
> gl_shader_program *prog)
> >key.iz_lookup |= IZ_DEPTH_WRITE_ENABLE_BIT;
> > }
> >
> > -   if (brw->gen < 6)
> > +   if (brw

Re: [Mesa-dev] [PATCH 1/4] i965: Refactor Gen7+ SURFACE_STATE setup for buffer surfaces.

2013-09-16 Thread Kenneth Graunke
On 09/16/2013 09:38 AM, Eric Anholt wrote:
> Kenneth Graunke  writes:
> 
>> This was an embarassingly large amount of copy and pasted code,
>> and it wasn't particularly simple code either.  By factoring it out
>> into a helper function, we consolidate the complexity.
>>
>> Signed-off-by: Kenneth Graunke 
>> ---
>>  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 144 
>> +-
>>  1 file changed, 58 insertions(+), 86 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
>> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>> index 37e3174..8413308 100644
>> --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>> +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>> @@ -224,6 +224,37 @@ gen7_check_surface_setup(uint32_t *surf, bool 
>> is_render_target)
>> }
>>  }
>>  
>> +static void
>> +gen7_emit_buffer_surface_state(struct brw_context *brw,
>> +   uint32_t *out_offset,
>> +   drm_intel_bo *bo,
>> +   unsigned buffer_offset,
>> +   unsigned surface_format,
>> +   unsigned buffer_size,
>> +   unsigned pitch,
>> +   unsigned mocs)
>> +{
>> +   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
>> +8 * 4, 32, out_offset);
>> +   memset(surf, 0, 8 * 4);
>> +
>> +   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
>> + surface_format << BRW_SURFACE_FORMAT_SHIFT |
>> + BRW_SURFACE_RC_READ_WRITE;
>> +   surf[1] = bo->offset + buffer_offset; /* reloc */
>> +   surf[2] = SET_FIELD(buffer_size & 0x7f, GEN7_SURFACE_WIDTH) |
>> + SET_FIELD((buffer_size >> 7) & 0x3fff, GEN7_SURFACE_HEIGHT);
>> +   surf[3] = SET_FIELD((buffer_size >> 21) & 0x3f, BRW_SURFACE_DEPTH) |
>> + (pitch - 1);
>> +   surf[4] = 0;
>> +   surf[5] = SET_FIELD(mocs, GEN7_SURFACE_MOCS);
>> +
>> +   /* Emit relocation to surface contents */
>> +   drm_intel_bo_emit_reloc(brw->batch.bo, *out_offset + 4,
>> +   bo, buffer_offset, I915_GEM_DOMAIN_SAMPLER, 0);
> 
> I think this will segfault if BO is NULL

Clearly.  I don't know how I missed that...thanks for catching it.

[snip]

> But here you might pass in a NULL bo.  Same for patch 4.  I'm confused
> that this didn't produce a piglit regression.

Fixed in both patches.  I'm not sure why it didn't cause regressions.

>>  }
> 
> I think the savings is well worth it in patches 1, 3, and 4, but I'm not
> a fan of patch 2.  I've got a lot of branches outstanding where I'm
> trying to finally finish off mt->first_level and the tile offset usage,
> and sharing code between renderbuffers and textures is just going to
> make that harder to do.  (I'm already really tempted to split gen4/5
> From gen6 RB setup, since I think I could land gen4/5 without
> regressing, while gen6 is the unstable hw)
> 

Hmm.  I'd really like to try and share some code, but I don't want to
get in the way of the work that you're doing to kill off tile offsets.
I think I'll drop patch 2 for now, and maybe try again once the code
isn't in as much flux.

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Could use your help with a bug involving piglit, waffle, and egl.

2013-09-16 Thread Paul Berry
(CC'ing Mesa-dev since this is a Mesa bug).

On 16 September 2013 11:00, Chad Versace wrote:

> CC'ing Piglit.
>
>
> On 09/14/2013 08:54 AM, Paul Berry wrote:
>
>> I'm investigating a failure in spec/OES_fixed_point/**attribute-arrays,
>> specifically the command line
>> "/home/pberry/.platform/**piglit-mesa/piglit/build/bin/**
>> oes_fixed_point-attribute-**arrays
>> -auto -fbo".  It's segfaulting during piglit/waffle initialization due to
>> Mesa accessing freed memory.  This only started happening for me recently,
>> however I suspect that's because the access to freed memory makes it a
>> heisenbug, and the root cause has probably been around for a long time.
>>
>> What's interesting about this test is that it's a GLES1 test being run in
>> -fbo mode, which means that piglit first starts initializing things
>> assuming it's going to run the test with fbo's, then at some point it
>> figures out that it can't (because fbo's are unsupported), so it tears
>> down
>> its configuration and starts a new configuration to test using a window.
>>
>> While establishing the new configuration, waffle calls eglMakeCurrent().
>> Deep in the bowels of Mesa's implementation of this function, it decides
>> that it needs to flush the context that was previously current.  But that
>> context refers to data structures that were freed when piglit tore down
>> its
>> old configuration (specifically, it refers to brw->bufmgr, which was freed
>> in response to a call to eglTerminate()).
>>
>> I've been studying the egl calls made by piglit/waffle during this test
>> and
>> I believe they look like this (I may be missing a few but I think I found
>> most of them):
>>
>> Initial setup:
>> - eglGetDisplay()
>> - eglInitialize() (causes intel_init_bufmgr() to be called, which creates
>> bufmgr 1)
>> - eglQueryString()
>> - eglChooseConfig()
>> - eglBindAPI()
>> - eglCreateContext() (causes brwCreateContext() to be called, which
>> creates
>> context 1)
>> - eglGetConfigAttrib()
>> - eglCreateWindowSurface()
>> - eglMakeCurrent()
>> Initial teardown:
>> - eglDestroySurface()
>> - eglDestroyContext() (interestingly, does not cause intelDestroyContext
>> to
>> be called, perhaps because the context is still current?)
>> - eglTerminate() (causes intelDestroyScreen() to be called, which frees
>> bufmgr 1)
>> Second setup:
>> - eglGetDisplay()
>> - eglInitialize() (causes intel_init_bufmgr() to be called, which creates
>> bufmgr 2)
>> - eglQueryString()
>> - eglChooseConfig()
>> - eglBindAPI()
>> - eglCreateContext() (causes brwCreateContext() to be called, which
>> creates
>> context 2)
>> - eglGetConfigAttrib()
>> - eglCreateWindowSurface()
>> - eglMakeCurrent() (at this point Mesa tries to flush context 1, which
>> causes a segfault beause this causes it to try to access bufmgr 1, which
>> has already been freed)
>>
>> So, my questions are:
>> - Does it look like piglit/waffle is making an allowed sequence of EGL
>> calls?  (In other words, is the bug in Mesa or piglit/waffle?)
>> - If the bug is in Mesa, what should be happening instead?  I assume that
>> at some point Mesa should have made the current context non-current (and
>> destroyed it, perhaps), but I'm not sure when that should have happened,
>> nor what code should have been responsible for doing it.
>>
>> Thanks in advance, Chad.  I hope you're enjoying your business travel!
>>
>
> The sequence of EGL calls is legal. The bug is in Mesa. After discovering
> the bug many months ago, I posted a test to the Piglit list, but it was
> ignored and then forgotten. (Gerrit please!) I'll repost the test in the
> next day or so after rebasing it.
>
> See the comments [1] in my test to see why the sequence of EGL calls is
> legal.
>
> [1] http://cgit.freedesktop.org/~**chadversary/piglit/tree/tests/**
> egl/spec/egl-1.4/egl-**terminate-then-unbind-context.**
> c?h=egl-terminate-then-unbind#**n26
>
> If I correctly understand the EGL spec quote above, the call to
> eglMakeCurrent that currently
> segfaults should instead flush the queued-to-be-destroyed context and then
> promptly destroy it.
>

Yeah, after consulation with ajax on IRC I agree.  In point of fact, this
is what eglMakeCurrent is trying to do--it's just not succeeding because
the act of flushing the context causes it to refer to the already-freed
bufmgr.

It sounds like what we need to do in Mesa is add some sort of mechanism to
make sure eglTerminate() doesn't destroy the bufmgr until after the context
using it is no longer current (e.g. reference counting).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] gallium-bind-sampler-states branch

2013-09-16 Thread Brian Paul

On 09/16/2013 08:56 AM, Roland Scheidegger wrote:

Am 14.09.2013 18:24, schrieb Brian Paul:

On 09/12/2013 09:06 PM, Chia-I Wu wrote:

Hi Brian,

On Fri, Sep 13, 2013 at 8:46 AM, Brian Paul  wrote:


I just pushed a gallium-bind-sampler-states branch to my git repo at
git://people.freedesktop.org/~brianp/mesa

It replaces the four
pipe_context::bind_fragment/vertex/geometry/compute_sampler_states()
functions with a single bind_sampler_states() function:

   void (*bind_sampler_states)(struct pipe_context *,
   unsigned shader, unsigned start_slot,
   unsigned num_samplers, void **samplers);

At this point start_slot is always zero (at least for non-compute
shaders).
And as the updated gallium docs explain, at some point calls to
bind_sampler_states() will be used to updated sub-ranges, but that never
happens currently.

I've updated all the drivers, state trackers, utils, etc.

I've tested the svga, llvmpipe and softpipe drivers.  'make check' and a
texture subset of piglit pass w/out regressions.  I'd appreciate it
if other
driver developers would test their favorite driver.

For ilo, the new code does not follow the doc and unbinds samplers not
in range.


I think that's OK.  The CSO module (used by the state tracker) currently
always calls pipe_context::bind_sampler_states() with start=0 and count
such that it sets/replaces all samplers, never a sub-range.  That
could/should change in the future.

See single_sampler_done() in cso_context.c.


Not all state trackers use cso though.


AFAICT, all src/gallium/state_trackers/ use CSO, except for clover (and 
I did miss a function call change there).  I didn't look at any 
out-of-tree state trackers.



> While this can be fixed later, I
> think I'd prefer if the docs would state this isn't really meant to be
> that way and should be changed in the future in cso and certainly fixed
> in the drivers.

As I mentioned earlier, I updated the docs to explain how things current 
work and how it should work in the future:


"""
* :ref:`Sampler`: Texture sampler states are bound separately for fragment,
  vertex, geometry and compute shaders with the ``bind_sampler_states``
  function.  The ``start`` and ``num_samplers`` parameters indicate a range
  of samplers to change.  NOTE: at this time, start is always zero and
  the CSO module will always replace all samplers at once (no sub-ranges).
  This may change in the future.
"""

Is that what you mean?

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 69437] Composite Bypass no longer works

2013-09-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69437

U. Artie Eoff  changed:

   What|Removed |Added

   Assignee|wayland-bugs@lists.freedesk |mesa-dev@lists.freedesktop.
   |top.org |org
 CC||chad.vers...@linux.intel.co
   ||m,
   ||wayland-bugs@lists.freedesk
   ||top.org
Product|Wayland |Mesa
  Component|weston  |Other

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] glsl: Drop shader_bit_encoding version checks.

2013-09-16 Thread Paul Berry
On 13 September 2013 11:25, Kenneth Graunke  wrote:

> We now set the ARB_shader_bit_encoding flag for versions that support
> this functionality, so we don't need to double check it here.
>
> Signed-off-by: Kenneth Graunke 
> Cc: Ian Romanick 
> ---
>  src/glsl/builtin_functions.cpp | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/src/glsl/builtin_functions.cpp
> b/src/glsl/builtin_functions.cpp
> index c468bd5..b020a7c 100644
> --- a/src/glsl/builtin_functions.cpp
> +++ b/src/glsl/builtin_functions.cpp
> @@ -182,8 +182,7 @@ shader_texture_lod_and_rect(const
> _mesa_glsl_parse_state *state)
>  static bool
>  shader_bit_encoding(const _mesa_glsl_parse_state *state)
>  {
> -   return state->is_version(330, 300) ||
> -  state->ARB_shader_bit_encoding_enable ||
> +   return state->ARB_shader_bit_encoding_enable ||
>state->ARB_gpu_shader5_enable;
>  }
>
> --
> 1.8.3.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>

I'm not a huge fan of this approach.  We're currently inconsistent in Mesa
as to how we handle GLSL extensions that were adopted into later versions
of GLSL (or backports from later versions of GLSL).  For these extensions:

ARB_draw_instanced
ARB_fragment_coord_conventions
ARB_gpu_shader5
ARB_shader_texture_lod
ARB_shading_language_420pack
ARB_shading_language_packing
ARB_texture_cube_map_array
ARB_texture_multisample
OES_texture_3D

we use what I'll call the "enable means explicitly enabled" style, which
means we only set the "_enable" flag if the shader contains text like
"#extension ARB_foo: enable"; if the extension's functionality is provided
by the given version of GLSL, we fold that into the if-test that enables
the functionality--e.g.:

   if (state->ARB_draw_instanced_enable || state->is_version(140, 300))
  add_system_value(SYSTEM_VALUE_INSTANCE_ID, int_t, "gl_InstanceID");

But for these extensions:

ARB_explicit_attrib_location
ARB_texture_rectangle

we use what I'll call the "enable means available" style, which means that
we set the "_enable" flag when processing the version directive (or in the
_mesa_glsl_parse_state constructor), to indicate that the functionality is
available, whether or not the user explicitly requested it or not.

(Note: for ARB_uniform_buffer_object it looks like we do some of each
style!)


Personally I'd prefer to see us consistently adopt the "enable means
explicitly enabled" style (to me, the "enable means available" style feels
like we're fibbing to ourselves).  An additional advantage of the "enable
means explicitly enabled" style is that it allows us to handle cases (such
as in ARB_draw_instanced) where the extension differs slightly from the
functionality that was eventually folded into GLSL.  Another advantage is
that if a client-supplied shader does something silly like "#extension
ARB_draw_instanced: disable" in a GLSL 1.40+ shader, we won't accidentally
disable built-in functionality (although in practice this is extremely
unlikely to ever crop up).

If it becomes too onerous to add the "|| state->is_version(...)" checks all
over the place we can always make some inline functions, e.g.:

struct _mesa_glsl_parse_state {
   ...
   bool is_draw_instanced_available() const
   {
  return this->ARB_draw_instanced_enable || this->is_version(140, 300);
   }
   ...
};
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] i965: Refactor Gen7+ SURFACE_STATE setup for buffer surfaces.

2013-09-16 Thread Paul Berry
On 13 September 2013 23:10, Kenneth Graunke  wrote:

> This was an embarassingly large amount of copy and pasted code,
> and it wasn't particularly simple code either.  By factoring it out
> into a helper function, we consolidate the complexity.
>
> Signed-off-by: Kenneth Graunke 
>

I really like what you're doing here.  A few minor comments:


> ---
>  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 144
> +-
>  1 file changed, 58 insertions(+), 86 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> index 37e3174..8413308 100644
> --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> @@ -224,6 +224,37 @@ gen7_check_surface_setup(uint32_t *surf, bool
> is_render_target)
> }
>  }
>
> +static void
> +gen7_emit_buffer_surface_state(struct brw_context *brw,
> +   uint32_t *out_offset,
> +   drm_intel_bo *bo,
> +   unsigned buffer_offset,
> +   unsigned surface_format,
> +   unsigned buffer_size,
> +   unsigned pitch,
> +   unsigned mocs)
> +{
> +   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> +8 * 4, 32, out_offset);
> +   memset(surf, 0, 8 * 4);
> +
> +   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
> + surface_format << BRW_SURFACE_FORMAT_SHIFT |
> + BRW_SURFACE_RC_READ_WRITE;
> +   surf[1] = bo->offset + buffer_offset; /* reloc */
> +   surf[2] = SET_FIELD(buffer_size & 0x7f, GEN7_SURFACE_WIDTH) |
> + SET_FIELD((buffer_size >> 7) & 0x3fff, GEN7_SURFACE_HEIGHT);
> +   surf[3] = SET_FIELD((buffer_size >> 21) & 0x3f, BRW_SURFACE_DEPTH) |
>

These three instances of buffer_size should be (buffer_size - 1).  I think
that there was a pre-existing bug in gen7_update_buffer_texture_surface()
where it wasn't subtracting 1 where it should.  Probably we should fix the
bug as a pre-patch.


> + (pitch - 1);
> +   surf[4] = 0;
>

Technically this line is unnecessary given the memset() above.  I'm
quibbling, though--it's hard to imagine this making a significant
performance difference :)


> +   surf[5] = SET_FIELD(mocs, GEN7_SURFACE_MOCS);
> +
> +   /* Emit relocation to surface contents */
> +   drm_intel_bo_emit_reloc(brw->batch.bo, *out_offset + 4,
> +   bo, buffer_offset, I915_GEM_DOMAIN_SAMPLER, 0);
> +
> +   gen7_check_surface_setup(surf, false /* is_render_target */);
> +}
>
>  static void
>  gen7_update_buffer_texture_surface(struct gl_context *ctx,
> @@ -237,39 +268,23 @@ gen7_update_buffer_texture_surface(struct gl_context
> *ctx,
> drm_intel_bo *bo = intel_obj ? intel_obj->buffer : NULL;
> gl_format format = tObj->_BufferObjectFormat;
>
> -   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> -8 * 4, 32, surf_offset);
> -   memset(surf, 0, 8 * 4);
> -
> uint32_t surface_format = brw_format_for_mesa_format(format);
> if (surface_format == 0 && format != MESA_FORMAT_RGBA_FLOAT32) {
>_mesa_problem(NULL, "bad format %s for texture buffer\n",
>  _mesa_get_format_name(format));
> }
>
> -   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
> - surface_format << BRW_SURFACE_FORMAT_SHIFT |
> - BRW_SURFACE_RC_READ_WRITE;
> -
> -   if (bo) {
> -  surf[1] = bo->offset; /* reloc */
> -
> -  drm_intel_bo_emit_reloc(brw->batch.bo,
> - *surf_offset + 4,
> - bo, 0,
> - I915_GEM_DOMAIN_SAMPLER, 0);
> -
> -  int texel_size = _mesa_get_format_bytes(format);
> -  int w = intel_obj->Base.Size / texel_size;
> -
> -  /* note that these differ from GEN6 */
> -  surf[2] = SET_FIELD(w & 0x7f, GEN7_SURFACE_WIDTH) | /* bits 6:0 of
> size */
> -SET_FIELD((w >> 7) & 0x3fff, GEN7_SURFACE_HEIGHT); /*
> 20:7 */
> -  surf[3] = SET_FIELD((w >> 21) & 0x3f, BRW_SURFACE_DEPTH) | /* bits
> 26:21 */
> -(texel_size - 1);
> -   }
> -
> -   gen7_check_surface_setup(surf, false /* is_render_target */);
> +   int texel_size = _mesa_get_format_bytes(format);
> +   int w = intel_obj ? intel_obj->Base.Size / texel_size : 0;
> +
> +   gen7_emit_buffer_surface_state(brw,
> +  surf_offset,
> +  bo,
> +  0,
> +  surface_format,
> +  w,
> +  texel_size,
> +  0 /* mocs */);
>  }
>
>  static void
> @@ -371,38 +386,15 @@ gen7_create_constant_surface(struct brw_context *brw,
>  

Re: [Mesa-dev] [PATCH 3/3] i965: Use gen7_upload_constant_state for 3DSTATE_CONSTANT_PS as well.

2013-09-16 Thread Paul Berry
On 13 September 2013 16:05, Kenneth Graunke  wrote:

> Now we use gen7_upload_constant_state() for all three shader stages.
>
> Signed-off-by: Kenneth Graunke 
> Cc: Paul Berry 
>

Series is:

Reviewed-by: Paul Berry 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 69437] Composite Bypass no longer works

2013-09-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69437

--- Comment #1 from Kristian Høgsberg  ---
Created attachment 85941
  --> https://bugs.freedesktop.org/attachment.cgi?id=85941&action=edit
Best fix so far

Here's the best fix so far.  Not entirely happy with it, but it's not terrible.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 69437] Composite Bypass no longer works

2013-09-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69437

U. Artie Eoff  changed:

   What|Removed |Added

   Assignee|mesa-dev@lists.freedesktop. |wayland-bugs@lists.freedesk
   |org |top.org
 QA Contact||mesa-dev@lists.freedesktop.
   ||org
  Component|Other   |EGL/Wayland

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] i965: Share Gen7+ SURFACE_STATE setup for textures and renderbuffers.

2013-09-16 Thread Paul Berry
On 13 September 2013 23:10, Kenneth Graunke  wrote:

> The SURFACE_STATE entries for textures and renderbuffers share almost
> all of the same fields.  Only a couple are specific to one or the other.
>
> Thus, it makes sense to have a single shared function that takes care of
> all the bit-shifting required to assemble the SURFACE_STATE structure.
>
> This removes a lot of complicated cut and pasted code.
>
> One change is that we now specify cube face enables for render targets,
> but as far as I can tell this is harmless.
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 210
> ++
>  1 file changed, 99 insertions(+), 111 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> index 8413308..8f95abe 100644
> --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> @@ -257,6 +257,70 @@ gen7_emit_buffer_surface_state(struct brw_context
> *brw,
>  }
>
>  static void
> +gen7_emit_image_surface_state(struct brw_context *brw,
> +  uint32_t *out_offset,
> +  const struct intel_mipmap_tree *mt,
> +  unsigned bo_offset,
> +  unsigned surface_type,
> +  unsigned surface_format,
> +  bool is_array,
> +  unsigned depth,
> +  unsigned min_array_element,
> +  unsigned rt_view_extent,
> +  unsigned mocs,
> +  unsigned mip_count,
> +  int swizzle,
> +  bool is_render_target)
> +{
> +   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> +8 * 4, 32, out_offset);
> +   surf[0] = surface_type << BRW_SURFACE_TYPE_SHIFT |
> + surface_format << BRW_SURFACE_FORMAT_SHIFT |
> + gen7_surface_tiling_mode(mt->region->tiling) |
> + BRW_SURFACE_CUBEFACE_ENABLES |
> + (mt->align_h == 4 ? GEN7_SURFACE_VALIGN_4 :
> GEN7_SURFACE_VALIGN_2) |
> + (mt->align_w == 8 ? GEN7_SURFACE_HALIGN_8 :
> GEN7_SURFACE_HALIGN_4) |
> + (is_array ? GEN7_SURFACE_IS_ARRAY : 0) |
> + (mt->array_spacing_lod0 ? GEN7_SURFACE_ARYSPC_LOD0
> + : GEN7_SURFACE_ARYSPC_FULL);
> +   surf[1] = mt->region->bo->offset + bo_offset; /* reloc */
> +   surf[2] = SET_FIELD(mt->logical_width0 - 1, GEN7_SURFACE_WIDTH) |
> + SET_FIELD(mt->logical_height0 - 1, GEN7_SURFACE_HEIGHT);
> +   surf[3] = SET_FIELD(depth - 1, BRW_SURFACE_DEPTH) |
> + (mt->region->pitch - 1);
> +   surf[4] = min_array_element << GEN7_SURFACE_MIN_ARRAY_ELEMENT_SHIFT |
> + rt_view_extent <<
> GEN7_SURFACE_RENDER_TARGET_VIEW_EXTENT_SHIFT |
> + gen7_surface_msaa_bits(mt->num_samples, mt->msaa_layout);
> +   surf[5] = SET_FIELD(mocs, GEN7_SURFACE_MOCS) | mip_count;
> +
> +   if (mt->mcs_mt) {
> +  gen7_set_surface_mcs_info(brw, surf, *out_offset, mt->mcs_mt, true);
> +   } else {
> +  surf[6] = 0;
> +   }
> +
> +   surf[7] = mt->fast_clear_color_value;
> +
> +   if (brw->is_haswell) {
> +  surf[7] |=
> + SET_FIELD(swizzle_to_scs(GET_SWZ(swizzle, 0)),
> GEN7_SURFACE_SCS_R) |
> + SET_FIELD(swizzle_to_scs(GET_SWZ(swizzle, 1)),
> GEN7_SURFACE_SCS_G) |
> + SET_FIELD(swizzle_to_scs(GET_SWZ(swizzle, 2)),
> GEN7_SURFACE_SCS_B) |
> + SET_FIELD(swizzle_to_scs(GET_SWZ(swizzle, 3)),
> GEN7_SURFACE_SCS_A);
> +   }
> +
> +   uint32_t read_domain =
> +  is_render_target ? I915_GEM_DOMAIN_RENDER : I915_GEM_DOMAIN_SAMPLER;
> +
> +   /* Emit relocation to surface contents */
> +   drm_intel_bo_emit_reloc(brw->batch.bo, *out_offset + 4,
> +  mt->region->bo, bo_offset,
> +   read_domain, 0);
> +
> +   gen7_check_surface_setup(surf, is_render_target);
> +}
> +
> +static void
>  gen7_update_buffer_texture_surface(struct gl_context *ctx,
> unsigned unit,
> uint32_t *surf_offset)
> @@ -305,43 +369,14 @@ gen7_update_texture_surface(struct gl_context *ctx,
>return;
> }
>
> -   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> -8 * 4, 32, surf_offset);
> -   memset(surf, 0, 8 * 4);
> -
> -   uint32_t tex_format = translate_tex_format(brw,
> +   bool is_array = mt->logical_depth0 > 1 && tObj->Target !=
> GL_TEXTURE_3D;
> +   unsigned mip_count = intelObj->_MaxLevel -
> intel_image->mt->first_level;
> +   uint32_t brw_format = translate_tex_format(brw,
>mt->format,
>   

Re: [Mesa-dev] [PATCH 1/4] i965: Refactor Gen7+ SURFACE_STATE setup for buffer surfaces.

2013-09-16 Thread Paul Berry
On 16 September 2013 09:38, Eric Anholt  wrote:

> Kenneth Graunke  writes:
>
> > This was an embarassingly large amount of copy and pasted code,
> > and it wasn't particularly simple code either.  By factoring it out
> > into a helper function, we consolidate the complexity.
> >
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 144
> +-
> >  1 file changed, 58 insertions(+), 86 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> > index 37e3174..8413308 100644
> > --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> > @@ -224,6 +224,37 @@ gen7_check_surface_setup(uint32_t *surf, bool
> is_render_target)
> > }
> >  }
> >
> > +static void
> > +gen7_emit_buffer_surface_state(struct brw_context *brw,
> > +   uint32_t *out_offset,
> > +   drm_intel_bo *bo,
> > +   unsigned buffer_offset,
> > +   unsigned surface_format,
> > +   unsigned buffer_size,
> > +   unsigned pitch,
> > +   unsigned mocs)
> > +{
> > +   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> > +8 * 4, 32, out_offset);
> > +   memset(surf, 0, 8 * 4);
> > +
> > +   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
> > + surface_format << BRW_SURFACE_FORMAT_SHIFT |
> > + BRW_SURFACE_RC_READ_WRITE;
> > +   surf[1] = bo->offset + buffer_offset; /* reloc */
> > +   surf[2] = SET_FIELD(buffer_size & 0x7f, GEN7_SURFACE_WIDTH) |
> > + SET_FIELD((buffer_size >> 7) & 0x3fff,
> GEN7_SURFACE_HEIGHT);
> > +   surf[3] = SET_FIELD((buffer_size >> 21) & 0x3f, BRW_SURFACE_DEPTH) |
> > + (pitch - 1);
> > +   surf[4] = 0;
> > +   surf[5] = SET_FIELD(mocs, GEN7_SURFACE_MOCS);
> > +
> > +   /* Emit relocation to surface contents */
> > +   drm_intel_bo_emit_reloc(brw->batch.bo, *out_offset + 4,
> > +   bo, buffer_offset, I915_GEM_DOMAIN_SAMPLER,
> 0);
>
> I think this will segfault if BO is NULL
>
> > +
> > +   gen7_check_surface_setup(surf, false /* is_render_target */);
> > +}
> >
> >  static void
> >  gen7_update_buffer_texture_surface(struct gl_context *ctx,
> > @@ -237,39 +268,23 @@ gen7_update_buffer_texture_surface(struct
> gl_context *ctx,
> > drm_intel_bo *bo = intel_obj ? intel_obj->buffer : NULL;
> > gl_format format = tObj->_BufferObjectFormat;
> >
> > -   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> > -8 * 4, 32, surf_offset);
> > -   memset(surf, 0, 8 * 4);
> > -
> > uint32_t surface_format = brw_format_for_mesa_format(format);
> > if (surface_format == 0 && format != MESA_FORMAT_RGBA_FLOAT32) {
> >_mesa_problem(NULL, "bad format %s for texture buffer\n",
> >  _mesa_get_format_name(format));
> > }
> >
> > -   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
> > - surface_format << BRW_SURFACE_FORMAT_SHIFT |
> > - BRW_SURFACE_RC_READ_WRITE;
> > -
> > -   if (bo) {
> > -  surf[1] = bo->offset; /* reloc */
> > -
> > -  drm_intel_bo_emit_reloc(brw->batch.bo,
> > -   *surf_offset + 4,
> > -   bo, 0,
> > -   I915_GEM_DOMAIN_SAMPLER, 0);
> > -
> > -  int texel_size = _mesa_get_format_bytes(format);
> > -  int w = intel_obj->Base.Size / texel_size;
> > -
> > -  /* note that these differ from GEN6 */
> > -  surf[2] = SET_FIELD(w & 0x7f, GEN7_SURFACE_WIDTH) | /* bits 6:0
> of size */
> > -SET_FIELD((w >> 7) & 0x3fff, GEN7_SURFACE_HEIGHT); /*
> 20:7 */
> > -  surf[3] = SET_FIELD((w >> 21) & 0x3f, BRW_SURFACE_DEPTH) | /*
> bits 26:21 */
> > -(texel_size - 1);
> > -   }
> > -
> > -   gen7_check_surface_setup(surf, false /* is_render_target */);
> > +   int texel_size = _mesa_get_format_bytes(format);
> > +   int w = intel_obj ? intel_obj->Base.Size / texel_size : 0;
> > +
> > +   gen7_emit_buffer_surface_state(brw,
> > +  surf_offset,
> > +  bo,
> > +  0,
> > +  surface_format,
> > +  w,
> > +  texel_size,
> > +  0 /* mocs */);
>
> But here you might pass in a NULL bo.  Same for patch 4.  I'm confused
> that this didn't produce a piglit regression.
>
> >  }
>
> I think the savings is well worth it in patches 1, 3, and 4, but I'm not
> a fan of patch 2.  I've got a lot of branches outstanding where I'm
> trying to finally finish off mt->f

Re: [Mesa-dev] [PATCH 3/4] i965: Remove MIPLAYOUT_BELOW from Gen4-6 constant buffer surface state.

2013-09-16 Thread Paul Berry
On 13 September 2013 23:10, Kenneth Graunke  wrote:

> Specifying a miptree layout makes no sense for constant buffers.
>

You might want to mention in the commit message that there's no functional
change since BRW_SURFACE_MIPMAPLAYOUT_BELOW == 0.

In any case, the patch is:

Reviewed-by: Paul Berry 


>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> index 25db2e0..8d87786 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> @@ -318,7 +318,6 @@ brw_create_constant_surface(struct brw_context *brw,
>   6 * 4, 32, out_offset);
>
> surf[0] = (BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
> - BRW_SURFACE_MIPMAPLAYOUT_BELOW <<
> BRW_SURFACE_MIPLAYOUT_SHIFT |
>   BRW_SURFACEFORMAT_R32G32B32A32_FLOAT <<
> BRW_SURFACE_FORMAT_SHIFT);
>
> if (brw->gen >= 6)
> --
> 1.8.3.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] i965: Refactor Gen4-6 SURFACE_STATE setup for buffer surfaces.

2013-09-16 Thread Paul Berry
On 13 September 2013 23:10, Kenneth Graunke  wrote:

> This was an embarassingly large amount of copy and pasted code,
> and it wasn't particularly simple code either.  By factoring it out
> into a helper function, we consolidate the complexity.
>

I believe the off-by-one error with buffer_size that I mentioned in patch 1
applies to this patch too.

With that fixed, the patch is:

Reviewed-by: Paul Berry 


>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 98
> +---
>  1 file changed, 37 insertions(+), 61 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> index 8d87786..bbe7803 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> @@ -190,6 +190,36 @@ brw_get_texture_swizzle(const struct gl_context *ctx,
>  swizzles[GET_SWZ(t->_Swizzle, 3)]);
>  }
>
> +static void
> +gen4_emit_buffer_surface_state(struct brw_context *brw,
> +   uint32_t *out_offset,
> +   drm_intel_bo *bo,
> +   unsigned buffer_offset,
> +   unsigned surface_format,
> +   unsigned buffer_size,
> +   unsigned pitch)
> +{
> +   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> +6 * 4, 32, out_offset);
> +   memset(surf, 0, 6 * 4);
> +
> +   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
> + surface_format << BRW_SURFACE_FORMAT_SHIFT |
> + (brw->gen >= 6 ? BRW_SURFACE_RC_READ_WRITE : 0);
> +   surf[1] = bo->offset + buffer_offset; /* reloc */
> +   surf[2] = (buffer_size & 0x7f) << BRW_SURFACE_WIDTH_SHIFT |
> + ((buffer_size >> 7) & 0x1fff) << BRW_SURFACE_HEIGHT_SHIFT;
> +   surf[3] = ((buffer_size >> 20) & 0x7f) << BRW_SURFACE_DEPTH_SHIFT |
> + (pitch - 1) << BRW_SURFACE_PITCH_SHIFT;
> +
> +   /* Emit relocation to surface contents.  The 965 PRM, Volume 4, section
> +* 5.1.2 "Data Cache" says: "the data cache does not exist as a
> separate
> +* physical cache.  It is mapped in hardware to the sampler cache."
> +*/
> +   drm_intel_bo_emit_reloc(brw->batch.bo, *out_offset + 4,
> +   bo, buffer_offset,
> +   I915_GEM_DOMAIN_SAMPLER, 0);
> +}
>
>  static void
>  brw_update_buffer_texture_surface(struct gl_context *ctx,
> @@ -198,49 +228,22 @@ brw_update_buffer_texture_surface(struct gl_context
> *ctx,
>  {
> struct brw_context *brw = brw_context(ctx);
> struct gl_texture_object *tObj = ctx->Texture.Unit[unit]._Current;
> -   uint32_t *surf;
> struct intel_buffer_object *intel_obj =
>intel_buffer_object(tObj->BufferObject);
> drm_intel_bo *bo = intel_obj ? intel_obj->buffer : NULL;
> gl_format format = tObj->_BufferObjectFormat;
> uint32_t brw_format = brw_format_for_mesa_format(format);
> int texel_size = _mesa_get_format_bytes(format);
> +   int w = intel_obj ? intel_obj->Base.Size / texel_size : 0;
>
> if (brw_format == 0 && format != MESA_FORMAT_RGBA_FLOAT32) {
>_mesa_problem(NULL, "bad format %s for texture buffer\n",
> _mesa_get_format_name(format));
> }
>
> -   surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> - 6 * 4, 32, surf_offset);
> -
> -   surf[0] = (BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
> - (brw_format_for_mesa_format(format) <<
> BRW_SURFACE_FORMAT_SHIFT));
> -
> -   if (brw->gen >= 6)
> -  surf[0] |= BRW_SURFACE_RC_READ_WRITE;
> -
> -   if (bo) {
> -  surf[1] = bo->offset; /* reloc */
> -
> -  /* Emit relocation to surface contents. */
> -  drm_intel_bo_emit_reloc(brw->batch.bo,
> - *surf_offset + 4,
> - bo, 0, I915_GEM_DOMAIN_SAMPLER, 0);
> -
> -  int w = intel_obj->Base.Size / texel_size;
> -  surf[2] = ((w & 0x7f) << BRW_SURFACE_WIDTH_SHIFT |
> -((w >> 7) & 0x1fff) << BRW_SURFACE_HEIGHT_SHIFT);
> -  surf[3] = (((w >> 20) & 0x7f) << BRW_SURFACE_DEPTH_SHIFT |
> -(texel_size - 1) << BRW_SURFACE_PITCH_SHIFT);
> -   } else {
> -  surf[1] = 0;
> -  surf[2] = 0;
> -  surf[3] = 0;
> -   }
> -
> -   surf[4] = 0;
> -   surf[5] = 0;
> +   gen4_emit_buffer_surface_state(brw, surf_offset, bo, 0,
> +  brw_format,
> +  w, texel_size);
>  }
>
>  static void
> @@ -311,37 +314,10 @@ brw_create_constant_surface(struct brw_context *brw,
>  {
> uint32_t stride = dword_pitch ? 4 : 16;
> uint32_t elements = ALIGN(size, stride) / stride;
> -   const GLint w = elements - 1;
> -   uint32_t *surf;
> -
> -   surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> -