[Mesa-dev] [PATCH 2/5] i965: Convert loop to memcpy in brw_vec4_upload_binding_table().

2013-09-17 Thread Kenneth Graunke
This is probably more efficient.  At any rate, it's less code.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
index 985fd67..216ff47 100644
--- a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
@@ -156,9 +156,6 @@ brw_vec4_upload_binding_table(struct brw_context *brw,
   struct brw_stage_state *stage_state,
   const struct brw_vec4_prog_data *prog_data)
 {
-   uint32_t *bind;
-   int i;
-
if (INTEL_DEBUG & DEBUG_SHADER_TIME) {
   gen7_create_shader_time_surface(brw, 
&stage_state->surf_offset[SURF_INDEX_VEC4_SHADER_TIME]);
}
@@ -173,14 +170,14 @@ brw_vec4_upload_binding_table(struct brw_context *brw,
   return;
}
 
-   bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
- sizeof(uint32_t) * entries,
- 32, &stage_state->bind_bo_offset);
+   size_t table_size_in_bytes = entries * sizeof(uint32_t);
+
+   uint32_t *bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
+table_size_in_bytes, 32,
+&stage_state->bind_bo_offset);
 
/* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */
-   for (i = 0; i < entries; i++) {
-  bind[i] = stage_state->surf_offset[i];
-   }
+   memcpy(bind, stage_state->surf_offset, table_size_in_bytes);
 
brw->state.dirty.brw |= brw_new_binding_table;
 }
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] i965: Update comments in brw_vec4_upload_binding_table().

2013-09-17 Thread Kenneth Graunke
The first comment was a bit stale; there are more kinds of surfaces than
textures and pull constants.

The second was a leftover "to do" comment for something I already did.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
index dbf26f4..985fd67 100644
--- a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
@@ -163,9 +163,7 @@ brw_vec4_upload_binding_table(struct brw_context *brw,
   gen7_create_shader_time_surface(brw, 
&stage_state->surf_offset[SURF_INDEX_VEC4_SHADER_TIME]);
}
 
-   /* Skip making a binding table if we don't use textures or pull
-* constants.
-*/
+   /* If there are no surfaces, skip making the binding table altogether. */
const unsigned entries = prog_data->binding_table_size;
if (entries == 0) {
   if (stage_state->bind_bo_offset != 0) {
@@ -175,9 +173,6 @@ brw_vec4_upload_binding_table(struct brw_context *brw,
   return;
}
 
-   /* Might want to calculate nr_surfaces first, to avoid taking up so much
-* space for the binding table.
-*/
bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
  sizeof(uint32_t) * entries,
  32, &stage_state->bind_bo_offset);
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] i965: Use brw_upload_binding_table() for the pixel shader as well.

2013-09-17 Thread Kenneth Graunke
This is not quite the same: brw_upload_binding_table() also has code to
early-return if there are no entries, while the existing code did not.

The PS binding table is unlikely to be empty since it will have at least
one color buffer.  If it ever is empty, early returning seems wise.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 23 +--
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index 25db2e0..962407f 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -863,25 +863,12 @@ const struct brw_tracked_state brw_wm_ubo_surfaces = {
 static void
 brw_upload_wm_binding_table(struct brw_context *brw)
 {
-   uint32_t *bind;
-   int i;
+   struct brw_stage_state *stage_state = &brw->wm.base;
 
-   if (INTEL_DEBUG & DEBUG_SHADER_TIME) {
-  gen7_create_shader_time_surface(brw, 
&brw->wm.base.surf_offset[SURF_INDEX_WM_SHADER_TIME]);
-   }
-
-   /* CACHE_NEW_WM_PROG */
-   unsigned entries = brw->wm.prog_data->binding_table_size;
-   bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
- sizeof(uint32_t) * entries,
- 32, &brw->wm.base.bind_bo_offset);
-
-   /* BRW_NEW_SURFACES */
-   for (i = 0; i < entries; i++) {
-  bind[i] = brw->wm.base.surf_offset[i];
-   }
-
-   brw->state.dirty.brw |= BRW_NEW_PS_BINDING_TABLE;
+   /* BRW_NEW_SURFACES and CACHE_NEW_WM_PROG */
+   brw_upload_binding_table(brw, BRW_NEW_PS_BINDING_TABLE, stage_state,
+brw->wm.prog_data->binding_table_size,
+SURF_INDEX_WM_SHADER_TIME);
 }
 
 const struct brw_tracked_state brw_wm_binding_table = {
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] i965: Generalize brw_vec4_upload_binding_table() beyond vec4 stages.

2013-09-17 Thread Kenneth Graunke
Instead of passing in a brw_vec4_prog_data structure, we can simply
pass the one field it needs: the number of entries in the binding table.

We also need to pass in the shader time surface index rather than
hardcoding SURF_INDEX_VEC4_SHADER_TIME.

Since the resulting function is stage-agnostic, this patch removes
"vec4_" from the name.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_gs_surface_state.c |  5 +++--
 src/mesa/drivers/dri/i965/brw_state.h|  9 +
 src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 21 +++--
 3 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
index bae6015..ad4c003 100644
--- a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
@@ -106,8 +106,9 @@ brw_gs_upload_binding_table(struct brw_context *brw)
const struct brw_vec4_prog_data *prog_data = &brw->gs.prog_data->base;
 
/* BRW_NEW_SURFACES and BRW_NEW_GS_CONSTBUF */
-   brw_vec4_upload_binding_table(brw, BRW_NEW_GS_BINDING_TABLE, stage_state,
- prog_data);
+   brw_upload_binding_table(brw, BRW_NEW_GS_BINDING_TABLE, stage_state,
+prog_data->binding_table_size,
+SURF_INDEX_VEC4_SHADER_TIME);
 }
 
 const struct brw_tracked_state brw_gs_binding_table = {
diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 14f5feb..ec64328 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -240,10 +240,11 @@ brw_upload_vec4_pull_constants(struct brw_context *brw,
struct brw_stage_state *stage_state,
const struct brw_vec4_prog_data *prog_data);
 void
-brw_vec4_upload_binding_table(struct brw_context *brw,
-  GLbitfield brw_new_binding_table,
-  struct brw_stage_state *stage_state,
-  const struct brw_vec4_prog_data *prog_data);
+brw_upload_binding_table(struct brw_context *brw,
+ GLbitfield brw_new_binding_table,
+ struct brw_stage_state *stage_state,
+ unsigned binding_table_entries,
+ int shader_time_surf_index);
 
 /* gen7_vs_state.c */
 void
diff --git a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
index 216ff47..6fbe8eb 100644
--- a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
@@ -151,18 +151,18 @@ const struct brw_tracked_state brw_vs_ubo_surfaces = {
 
 
 void
-brw_vec4_upload_binding_table(struct brw_context *brw,
-  GLbitfield brw_new_binding_table,
-  struct brw_stage_state *stage_state,
-  const struct brw_vec4_prog_data *prog_data)
+brw_upload_binding_table(struct brw_context *brw,
+ GLbitfield brw_new_binding_table,
+ struct brw_stage_state *stage_state,
+ unsigned binding_table_entries,
+ int shader_time_surf_index)
 {
if (INTEL_DEBUG & DEBUG_SHADER_TIME) {
-  gen7_create_shader_time_surface(brw, 
&stage_state->surf_offset[SURF_INDEX_VEC4_SHADER_TIME]);
+  gen7_create_shader_time_surface(brw, 
&stage_state->surf_offset[shader_time_surf_index]);
}
 
/* If there are no surfaces, skip making the binding table altogether. */
-   const unsigned entries = prog_data->binding_table_size;
-   if (entries == 0) {
+   if (binding_table_entries == 0) {
   if (stage_state->bind_bo_offset != 0) {
 brw->state.dirty.brw |= brw_new_binding_table;
 stage_state->bind_bo_offset = 0;
@@ -170,7 +170,7 @@ brw_vec4_upload_binding_table(struct brw_context *brw,
   return;
}
 
-   size_t table_size_in_bytes = entries * sizeof(uint32_t);
+   size_t table_size_in_bytes = binding_table_entries * sizeof(uint32_t);
 
uint32_t *bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
 table_size_in_bytes, 32,
@@ -195,8 +195,9 @@ brw_vs_upload_binding_table(struct brw_context *brw)
const struct brw_vec4_prog_data *prog_data = &brw->vs.prog_data->base;
 
/* BRW_NEW_SURFACES and BRW_NEW_VS_CONSTBUF */
-   brw_vec4_upload_binding_table(brw, BRW_NEW_VS_BINDING_TABLE, stage_state,
- prog_data);
+   brw_upload_binding_table(brw, BRW_NEW_VS_BINDING_TABLE, stage_state,
+prog_data->binding_table_size,
+SURF_INDEX_VEC4_SHADER_TIME);
 }
 
 const struct brw_tracked_state brw_vs_binding_table = {
-- 
1.8.3.4

___
mesa-dev mailing 

[Mesa-dev] [PATCH 5/5] i965: Move binding table code to a new file, brw_binding_tables.c.

2013-09-17 Thread Kenneth Graunke
The code to upload the binding tables for each stage was scattered
across brw_{vs,gs,wm}_surface_state.c and brw_misc_state.c, which also
contain a lot of code to populate individual SURFACE_STATE structures.

This patch brings all the binding table upload code together, and splits
it out from the code which fills in SURFACE_STATE entries.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/Makefile.sources   |   1 +
 src/mesa/drivers/dri/i965/brw_binding_tables.c   | 242 +++
 src/mesa/drivers/dri/i965/brw_gs_surface_state.c |  34 
 src/mesa/drivers/dri/i965/brw_misc_state.c   |  66 ---
 src/mesa/drivers/dri/i965/brw_vs_surface_state.c |  62 --
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c |  25 ---
 6 files changed, 243 insertions(+), 187 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_binding_tables.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 07c1053..a2b164d 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -27,6 +27,7 @@ i965_FILES = \
intel_tex_subimage.c \
intel_tex_validate.c \
intel_upload.c \
+   brw_binding_tables.c \
brw_blorp.cpp \
brw_blorp_blit.cpp \
brw_blorp_clear.cpp \
diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
b/src/mesa/drivers/dri/i965/brw_binding_tables.c
new file mode 100644
index 000..9d15bac
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
@@ -0,0 +1,242 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/**
+ * \file brw_binding_tables.c
+ *
+ * State atoms which upload the "binding table" for each shader stage.
+ *
+ * Binding tables map a numeric "surface index" to the SURFACE_STATE structure
+ * for a currently bound surface.  This allows SEND messages (such as sampler
+ * or data port messages) to refer to a particular surface by number, rather
+ * than by pointer.
+ *
+ * The binding table is stored as a (sparse) array of SURFACE_STATE entries;
+ * surface indexes are simply indexes into the array.  The ordering of the
+ * entries is entirely left up to software; see the SURF_INDEX_* macros in
+ * brw_context.h to see our current layout.
+ */
+
+#include "main/mtypes.h"
+
+#include "brw_context.h"
+#include "brw_defines.h"
+#include "brw_state.h"
+#include "intel_batchbuffer.h"
+
+/**
+ * Upload a shader stage's binding table as indirect state.
+ *
+ * This copies brw_stage_state::surf_offset[] into the indirect state section
+ * of the batchbuffer (allocated by brw_state_batch()).
+ */
+void
+brw_upload_binding_table(struct brw_context *brw,
+ GLbitfield brw_new_binding_table,
+ struct brw_stage_state *stage_state,
+ unsigned binding_table_entries,
+ int shader_time_surf_index)
+{
+   if (INTEL_DEBUG & DEBUG_SHADER_TIME) {
+  gen7_create_shader_time_surface(brw, 
&stage_state->surf_offset[shader_time_surf_index]);
+   }
+
+   /* If there are no surfaces, skip making the binding table altogether. */
+   if (binding_table_entries == 0) {
+  if (stage_state->bind_bo_offset != 0) {
+ brw->state.dirty.brw |= brw_new_binding_table;
+ stage_state->bind_bo_offset = 0;
+  }
+  return;
+   }
+
+   size_t table_size_in_bytes = binding_table_entries * sizeof(uint32_t);
+
+   uint32_t *bind = brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
+table_size_in_bytes, 32,
+&stage_state->bind_bo_offset);
+
+   /* BRW_NEW_SURFACES and BRW_NEW_*_CONSTBUF */
+   memcpy(bind, stage_state->surf_offset, table_size_in_bytes);
+
+   brw->state.dirty.brw |= brw_new_binding_table;
+}
+
+/**
+ * State atoms which upload 

Re: [Mesa-dev] [PATCH] egl: add EGL_WAYLAND_Y_INVERTED_WL attribute

2013-09-17 Thread Kristian Høgsberg
On Mon, Sep 16, 2013 at 01:02:46PM +0400, Stanislav Vorobiov wrote:
> This enables querying of wl_buffer's orientation

Thanks, committed.  I followed up with a commit to also add
EGL_TEXTURE_FORMAT to the new table of valid eglQueryWaylandBufferWL
attributes.

Kristian

> ---
>  docs/specs/WL_bind_wayland_display.spec |   17 +
>  include/EGL/eglmesaext.h|2 ++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/docs/specs/WL_bind_wayland_display.spec 
> b/docs/specs/WL_bind_wayland_display.spec
> index 8f0083c..47cf5cd 100644
> --- a/docs/specs/WL_bind_wayland_display.spec
> +++ b/docs/specs/WL_bind_wayland_display.spec
> @@ -76,6 +76,10 @@ New Tokens
>  EGL_TEXTURE_Y_UV_WL 0x31D8
>  EGL_TEXTURE_Y_XUXV_WL   0x31D9
>  
> +Accepted in the  parameter of eglQueryWaylandBufferWL:
> +
> +EGL_WAYLAND_Y_INVERTED_WL   0x31DB
> +
>  
>  Additions to the EGL 1.4 Specification:
>  
> @@ -157,6 +161,16 @@ Additions to the EGL 1.4 Specification:
>  Further, eglQueryWaylandBufferWL accepts attributes EGL_WIDTH and
>  EGL_HEIGHT to query the width and height of the wl_buffer.
>  
> +Also, eglQueryWaylandBufferWL may accept
> +EGL_WAYLAND_Y_INVERTED_WL attribute to query orientation of
> +wl_buffer. If EGL_WAYLAND_Y_INVERTED_WL is supported
> +eglQueryWaylandBufferWL returns EGL_TRUE and value is a boolean
> +that tells if wl_buffer is y-inverted or not. If
> +EGL_WAYLAND_Y_INVERTED_WL is not supported
> +eglQueryWaylandBufferWL returns EGL_FALSE, in that case
> +wl_buffer should be treated as if value of
> +EGL_WAYLAND_Y_INVERTED_WL was EGL_TRUE.
> +
>  Issues
>  
>  Revision History
> @@ -177,3 +191,6 @@ Revision History
>  Change eglQueryWaylandBufferWL to take a resource pointer to the
>  buffer instead of a pointer to a struct wl_buffer, as the latter has
>  been deprecated. (Ander Conselvan de Oliveira)
> +Version 6, September 16, 2013
> +Add EGL_WAYLAND_Y_INVERTED_WL attribute to allow specifying
> +wl_buffer's orientation.
> diff --git a/include/EGL/eglmesaext.h b/include/EGL/eglmesaext.h
> index e0eae28..1f07d4c 100644
> --- a/include/EGL/eglmesaext.h
> +++ b/include/EGL/eglmesaext.h
> @@ -115,6 +115,8 @@ typedef EGLDisplay (EGLAPIENTRYP PFNEGLGETDRMDISPLAYMESA) 
> (int fd);
>  #define EGL_WAYLAND_BUFFER_WL0x31D5 /* eglCreateImageKHR 
> target */
>  #define EGL_WAYLAND_PLANE_WL 0x31D6 /* eglCreateImageKHR target */
>  
> +#define EGL_WAYLAND_Y_INVERTED_WL0x31DB /* eglQueryWaylandBufferWL 
> attribute */
> +
>  #define EGL_TEXTURE_Y_U_V_WL0x31D7
>  #define EGL_TEXTURE_Y_UV_WL 0x31D8
>  #define EGL_TEXTURE_Y_XUXV_WL   0x31D9
> -- 
> 1.7.9.5
> 
> ___
> wayland-devel mailing list
> wayland-de...@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/wayland-devel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix brw_vs_prog_data_compare to actually check field members.

2013-09-17 Thread Kenneth Graunke
&a and &b are the address of the local stack variables, not the actual
structures.  Instead of comparing the fields of a and b, we compared
...some stack memory.

Caught by Valgrind on Piglit's glsl-lod-bias test (among many others).

Signed-off-by: Kenneth Graunke 
Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/drivers/dri/i965/brw_vs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index d5909a5..f5bf590 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -188,7 +188,7 @@ brw_vs_prog_data_compare(const void *in_a, const void *in_b,
 
/* Compare the rest of the struct. */
const unsigned offset = sizeof(struct brw_vec4_prog_data);
-   if (memcmp(((char *) &a) + offset, ((char *) &b) + offset,
+   if (memcmp(((char *) a) + offset, ((char *) b) + offset,
   sizeof(struct brw_vs_prog_data) - offset)) {
   return false;
}
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix brw_gs_prog_data_compare to actually check field members.

2013-09-17 Thread Kenneth Graunke
&a and &b are the address of the local stack variables, not the actual
structures.  Instead of comparing the fields of a and b, we compared
...some stack memory.

Not a candidate for stable since GS code doesn't exist in 9.2.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_vec4_gs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c 
b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
index c526107..0219915 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c
@@ -313,7 +313,7 @@ brw_gs_prog_data_compare(const void *in_a, const void *in_b,
 
/* Compare the rest of the struct. */
const unsigned offset = sizeof(struct brw_vec4_prog_data);
-   if (memcmp(((char *) &a) + offset, ((char *) &b) + offset,
+   if (memcmp(((char *) a) + offset, ((char *) b) + offset,
   sizeof(struct brw_gs_prog_data) - offset)) {
   return false;
}
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: Delete builtin_builder::shader when destroying built-ins.

2013-09-17 Thread Kenneth Graunke
I would use _mesa_delete_shader, but it's declared static, and we don't
really need any of the stuff in it anyway.

This fixes a memory leak caught by Valgrind.

Signed-off-by: Kenneth Graunke 
---
 src/glsl/builtin_functions.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index 528af0d..dbd9d53 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -578,6 +578,9 @@ builtin_builder::release()
 {
ralloc_free(mem_ctx);
mem_ctx = NULL;
+
+   ralloc_free(shader);
+   shader = NULL;
 }
 
 void
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] a newbie asking newbie questions

2013-09-17 Thread Rogovin, Kevin
Hello,

 Thank you for the very fast answers, some more questions:


> It's not a preference question.  The registers are 8 floats wide.
> Vertex shaders get invoked 2 vertices at a time, with a register containing 
> these values:
>
> .   +--+--+--+--+--+--+--+--+
> .   | v0.x | v0.y | v0.z | v0.w | v1.x | v1.y | v1.z | v1.w |
> .   +--+--+--+--+--+--+--+--+

This seems best to me: run two vertices in each invocation with the hopes that 
the
shader compiler will merge (multiple) float, vec2 and maybe even vec3 
operations into 
vec4 operations (does it)?


> while these 8 pixels in screen space:
> 
> .   +++++
> .   | p0 | p1 | p2 | p3 |
> .   +++++
> .   | p4 | p5 | p6 | p7 |
> .   +++++
>
> are loaded in fragment shader registers as:
>
> .   +--+--+--+--+--+--+--+--+
>.   | p0.x | p1.x | p4.x | p5.x | p2.x | p3.x | p6.x | p7.x |
> .   +--+--+--+--+--+--+--+--+
>
> Note how one register just holds a single channel ('.x' here) of a vector.  A 
> vec4 would take up 4 registers, and to do value0.xyzw * value1.xyzw, you'd 
> emit 4 MULs.

This is exactly what I was trying to ask/say about the fragment shader running, 
i.e. n-fragments are processed with 1 n-SIMD command (for i965, n=8),
sighs my e-mail communications leave something to be desired. 
Some questions:
 1) do the fragments need to be in a 4x2 block, or can it be two separate 2x2 
blocks?
 2) for tiny triangles for fragment shaders that do not require dFdx, dFdy or 
fwidth, can the fragments be totally scattered?

Along further lines, for non-dependent texture lookups, are there code lines 
where the derivatives are computed
analytically so that selecting the correct LOD does not require to process 
fragments in 2x2 (or larger) blocks? Or does
the i965 hardware sampler interface does not allow this kind of madness? 

>> On a related note, where are the beans about the dispatch table?
>I don't know this one (or particularly what you're asking, I guess).

Viewing docs/index.html, on the side panel "Developer Topics --> GL Dispatch" 
there is text (broken into sections "1. Complexity of GL Dispatch", "2. 
Overview of Mesa's Implementation"  and "3. Optimizations " describing how 
different GL contexts for the same hardware can do different things for the 
same GL function and that mesa has stubs which in turn call the "real"  
function. The documents go on to talk about various ways the function tables 
are filled and accessed across separate threads. My questions are:
 0) is that information text still accurate? In particular, the directory 
src/glapi is gone from Mesa (atleast what I git cloned) and I thought that was 
the location of it.
 1) where/how does the i965 driver fill that table, if it exists?
 
Along similar lines, I see that some of the code in src/mesa/main performs 
various checks of various API calls and at times has some conditions dependent 
on what context type it is, which kind of contradicts the idea of different 
context have different dispatch tables [sort of, since the functions might just 
be the driver magick, where as the stub is validate and then call driver 
magick]. 

-Kevin
-
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki 
Business Identity Code: 0357606 - 4 
Domiciled in Helsinki 

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] a newbie asking newbie questions

2013-09-17 Thread Kenneth Graunke
On 09/17/2013 05:13 AM, Rogovin, Kevin wrote:
> Hello,
> 
>  Thank you for the very fast answers, some more questions:
> 
> 
>> It's not a preference question.  The registers are 8 floats wide.
>> Vertex shaders get invoked 2 vertices at a time, with a register containing 
>> these values:
>>
>> .   +--+--+--+--+--+--+--+--+
>> .   | v0.x | v0.y | v0.z | v0.w | v1.x | v1.y | v1.z | v1.w |
>> .   +--+--+--+--+--+--+--+--+
> 
> This seems best to me: run two vertices in each invocation with the hopes 
> that the
> shader compiler will merge (multiple) float, vec2 and maybe even vec3 
> operations into 
> vec4 operations (does it)?

Not as well as it should.  There's a lot of room for improvement in our
SIMD4x2/vector backend.  We haven't spent a ton of effort optimizing it
since vertex shaders have rarely been the bottleneck in application
performance.

>> while these 8 pixels in screen space:
>>
>> .   +++++
>> .   | p0 | p1 | p2 | p3 |
>> .   +++++
>> .   | p4 | p5 | p6 | p7 |
>> .   +++++
>>
>> are loaded in fragment shader registers as:
>>
>> .   +--+--+--+--+--+--+--+--+
>> .   | p0.x | p1.x | p4.x | p5.x | p2.x | p3.x | p6.x | p7.x |
>> .   +--+--+--+--+--+--+--+--+
>>
>> Note how one register just holds a single channel ('.x' here) of a vector.  
>> A vec4 would take up 4 registers, and to do value0.xyzw * value1.xyzw, you'd 
>> emit 4 MULs.
> 
> This is exactly what I was trying to ask/say about the fragment shader 
> running, i.e. n-fragments are processed with 1 n-SIMD command (for i965, n=8),
> sighs my e-mail communications leave something to be desired. 
> Some questions:
>  1) do the fragments need to be in a 4x2 block, or can it be two separate 2x2 
> blocks?

The GPU processes two separate 2x2 blocks of pixels, which may actually
not be anywhere near each other.

>  2) for tiny triangles for fragment shaders that do not require dFdx, dFdy or 
> fwidth, can the fragments be totally scattered?

Nope, the pixel shader always works on 2x2 blocks.

> Along further lines, for non-dependent texture lookups, are there code lines 
> where the derivatives are computed
> analytically so that selecting the correct LOD does not require to process 
> fragments in 2x2 (or larger) blocks? Or does
> the i965 hardware sampler interface does not allow this kind of madness? 
> 
>>> On a related note, where are the beans about the dispatch table?
>> I don't know this one (or particularly what you're asking, I guess).
> 
> Viewing docs/index.html, on the side panel "Developer Topics --> GL
> Dispatch" there is text (broken into sections "1. Complexity of GL
> Dispatch", "2. Overview of Mesa's Implementation" and "3. Optimizations
> " describing how different GL contexts for the same hardware can do
> different things for the same GL function and that mesa has stubs which
> in turn call the "real" function. The documents go on to talk about
> various ways the function tables are filled and accessed across separate
> threads. My questions are:
>  0) is that information text still accurate? In particular, the directory 
> src/glapi is gone from Mesa (atleast what I git cloned) and I thought that 
> was the location of it.
>  1) where/how does the i965 driver fill that table, if it exists?
>  
> Along similar lines, I see that some of the code in src/mesa/main performs 
> various checks of various API calls and at times has some conditions 
> dependent on what context type it is, which kind of contradicts the idea of 
> different context have different dispatch tables [sort of, since the 
> functions might just be the driver magick, where as the stub is validate and 
> then call driver magick]. 
> 
> -Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] a newbie asking newbie questions

2013-09-17 Thread Paul Berry
On 17 September 2013 05:13, Rogovin, Kevin  wrote:

> Hello,
>
>  Thank you for the very fast answers, some more questions:
>
>
> > It's not a preference question.  The registers are 8 floats wide.
> > Vertex shaders get invoked 2 vertices at a time, with a register
> containing these values:
> >
> > .   +--+--+--+--+--+--+--+--+
> > .   | v0.x | v0.y | v0.z | v0.w | v1.x | v1.y | v1.z | v1.w |
> > .   +--+--+--+--+--+--+--+--+
>
> This seems best to me: run two vertices in each invocation with the hopes
> that the
> shader compiler will merge (multiple) float, vec2 and maybe even vec3
> operations into
> vec4 operations (does it)?
>

>
> > while these 8 pixels in screen space:
> >
> > .   +++++
> > .   | p0 | p1 | p2 | p3 |
> > .   +++++
> > .   | p4 | p5 | p6 | p7 |
> > .   +++++
> >
> > are loaded in fragment shader registers as:
> >
> > .   +--+--+--+--+--+--+--+--+
> >.   | p0.x | p1.x | p4.x | p5.x | p2.x | p3.x | p6.x | p7.x |
> > .   +--+--+--+--+--+--+--+--+
> >
> > Note how one register just holds a single channel ('.x' here) of a
> vector.  A vec4 would take up 4 registers, and to do value0.xyzw *
> value1.xyzw, you'd emit 4 MULs.
>
> This is exactly what I was trying to ask/say about the fragment shader
> running, i.e. n-fragments are processed with 1 n-SIMD command (for i965,
> n=8),
> sighs my e-mail communications leave something to be desired.
> Some questions:
>  1) do the fragments need to be in a 4x2 block, or can it be two separate
> 2x2 blocks?
>
 2) for tiny triangles for fragment shaders that do not require dFdx, dFdy
> or fwidth, can the fragments be totally scattered?
>

> Along further lines, for non-dependent texture lookups, are there code
> lines where the derivatives are computed
> analytically so that selecting the correct LOD does not require to process
> fragments in 2x2 (or larger) blocks? Or does
> the i965 hardware sampler interface does not allow this kind of madness?
>

We don't do any such optimizations in the Mesa/i965 driver, and I suspect
it wouldn't help much if we did (the sampler hardware computes the
gradients from the input coordinates by taking advantage of the 2x2 block
arrangement, so the gradient computation is extremely cheap).


>
> >> On a related note, where are the beans about the dispatch table?
> >I don't know this one (or particularly what you're asking, I guess).
>
> Viewing docs/index.html, on the side panel "Developer Topics --> GL
> Dispatch" there is text (broken into sections "1. Complexity of GL
> Dispatch", "2. Overview of Mesa's Implementation"  and "3. Optimizations "
> describing how different GL contexts for the same hardware can do different
> things for the same GL function and that mesa has stubs which in turn call
> the "real"  function. The documents go on to talk about various ways the
> function tables are filled and accessed across separate threads. My
> questions are:
>  0) is that information text still accurate? In particular, the directory
> src/glapi is gone from Mesa (atleast what I git cloned) and I thought that
> was the location of it.
>  1) where/how does the i965 driver fill that table, if it exists?
>

Some of this documentation may be out of date--we often forget that it
exists, so we don't keep it very well updated.  If you find specific
errors, please feel free to submit patches to fix them.

The directory src/glapi is now src/mapi/glapi.

A lot of the code to fill in the dispatch table is in
src/mesa/main/api_exec.c, which is generated at compile time by
src/mapi/glapi/gen/gl_genexec.py from the .xml files in the
src/mapi/glapi/gen directory.  A handful of dispatch table functions aren't
populated by api_exec.c because they change dynamically depending on GL
state.  Functions that specify vertex attributes (e.g. glColor4f()) are set
up by install_vtxfmt() in src/mesa/main/vtxfmt.c.  Functions whose
behaviour needs to be saved in exec lists are set up by
_mesa_initialize_save_table() in src/mesa/main/dlist.c.

If you're new to Mesa, I'd recommend shying away from this dispatch code
for now, since it's fairly subtle and most people don't need to understand
it in order to contribute usefully to Mesa.  The rule of thumb is, if
you're looking for the implementation of the function glFoo(), grep the
source code for a function called _mesa_Foo().  If you find it, that's the
function you're looking for.  If you don't, then it's probably one of the
functions whose behaviour changes based on GL state, in which case
congratulations, you're one of the few people who actually need to
understand the dispatch code :)


>
> Along similar lines, I see that some of the code in src/mesa/main performs
> various checks of various API calls and at times has some conditions
> dependent on what context type it is, which kind of contradicts the idea of
> different 

Re: [Mesa-dev] [PATCH 03/24] i965: Initialize all member variables of vec4_instruction on construction.

2013-09-17 Thread Paul Berry
On 15 September 2013 00:10, Francisco Jerez  wrote:

> Ditto.  Otherwise some of its member variables are going to have
> uninitialized contents in cases where its memory is not allocated
> using rzalloc().
>

Nit pick: "ditto" implicitly refers to the commit message in patch 2.  I'd
prefer to see each commit message stand on its own, since commits often
wind up getting cherry-picked into alternate orders, or viewed in
isolation, so it won't always be obvious to the reader what "ditto" means.

With that fixed, the patch is:

Reviewed-by: Paul Berry 


> ---
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index 304636a..9770f13 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -38,7 +38,22 @@ vec4_instruction::vec4_instruction(vec4_visitor *v,
> this->src[0] = src0;
> this->src[1] = src1;
> this->src[2] = src2;
> +   this->saturate = false;
> +   this->force_writemask_all = false;
> +   this->no_dd_clear = false;
> +   this->no_dd_check = false;
> +   this->conditional_mod = BRW_CONDITIONAL_NONE;
> +   this->sampler = 0;
> +   this->texture_offset = 0;
> +   this->target = 0;
> +   this->shadow_compare = false;
> this->ir = v->base_ir;
> +   this->urb_write_flags = BRW_URB_WRITE_NO_FLAGS;
> +   this->header_present = false;
> +   this->mlen = 0;
> +   this->base_mrf = 0;
> +   this->offset = 0;
> +   this->ir = NULL;
> this->annotation = v->current_annotation;
>  }
>
> --
> 1.8.3.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965: Update comments in brw_vec4_upload_binding_table().

2013-09-17 Thread Eric Anholt
Kenneth Graunke  writes:

> The first comment was a bit stale; there are more kinds of surfaces than
> textures and pull constants.
>
> The second was a leftover "to do" comment for something I already did.

This series is:

Reviewed-by: Eric Anholt 


pgpv63Ha31zdu.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] a newbie asking newbie questions

2013-09-17 Thread Eric Anholt
"Rogovin, Kevin"  writes:

> Hello,
>
>  Thank you for the very fast answers, some more questions:
>
>
>> It's not a preference question.  The registers are 8 floats wide.
>> Vertex shaders get invoked 2 vertices at a time, with a register
>> containing these values:
>>
>> .   +--+--+--+--+--+--+--+--+
>> .   | v0.x | v0.y | v0.z | v0.w | v1.x | v1.y | v1.z | v1.w |
>> .   +--+--+--+--+--+--+--+--+
>
> This seems best to me: run two vertices in each invocation with the
> hopes that the shader compiler will merge (multiple) float, vec2 and
> maybe even vec3 operations into vec4 operations (does it)?

This is the worst, actually, since you're wasting channels that could
have got some work done if you're dealing with things smaller than vec4
(and you do that a lot).  That is, unless running fewer shader instances
at a time happens to prevent register spilling.  But it's the hardware's
choice, not ours.

>> while these 8 pixels in screen space:
>> 
>> .  +++++ .  | p0 | p1 | p2 | p3 | .
>> +++++ .  | p4 | p5 | p6 | p7 | .
>> +++++
>>
>> are loaded in fragment shader registers as:
>>
>> .  +--+--+--+--+--+--+--+--+ .  |
>>p0.x | p1.x | p4.x | p5.x | p2.x | p3.x | p6.x | p7.x | .
>>+--+--+--+--+--+--+--+--+
>>
>> Note how one register just holds a single channel ('.x' here) of a
>> vector.  A vec4 would take up 4 registers, and to do value0.xyzw *
>> value1.xyzw, you'd emit 4 MULs.
>
> This is exactly what I was trying to ask/say about the fragment shader
> running, i.e. n-fragments are processed with 1 n-SIMD command (for
> i965, n=8), sighs my e-mail communications leave something to be
> desired.  Some questions: 1) do the fragments need to be in a 4x2
> block, or can it be two separate 2x2 blocks?  2) for tiny triangles
> for fragment shaders that do not require dFdx, dFdy or fwidth, can the
> fragments be totally scattered?

This is all the hardware's choice, not ours.  And of course, any normal
texturing at all requires the implicit calculation derivatives to
determine LOD, so it's always 2x2 subspans.


pgpSHdso4nbHK.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: some bits of seamless cube filtering implementation

2013-09-17 Thread Jose Fonseca
LGTM.

Jose

- Original Message -
> From: Roland Scheidegger 
> 
> Simply adjust wrap mode to clamp_to_edge. This is all that's needed for a
> correct implementation for nearest filtering, and it's way better than
> using repeat wrap for instance for linear filtering (though obviously this
> doesn't actually do seamless filtering).
> 
> v2: fix s/t wrap not r/s...
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_sample.c |1 +
>  src/gallium/auxiliary/gallivm/lp_bld_sample.h |1 +
>  src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |   41
>  ++---
>  3 files changed, 29 insertions(+), 14 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
> b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
> index 9b0a92c..c775382 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
> @@ -155,6 +155,7 @@ lp_sampler_static_sampler_state(struct
> lp_static_sampler_state *state,
> state->wrap_r= sampler->wrap_r;
> state->min_img_filter= sampler->min_img_filter;
> state->mag_img_filter= sampler->mag_img_filter;
> +   state->seamless_cube_map = sampler->seamless_cube_map;
>  
> if (sampler->max_lod > 0.0f) {
>state->min_mip_filter = sampler->min_mip_filter;
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.h
> b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
> index e6b9f30..803a99e 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.h
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
> @@ -114,6 +114,7 @@ struct lp_static_sampler_state
> unsigned lod_bias_non_zero:1;
> unsigned apply_min_lod:1;  /**< min_lod > 0 ? */
> unsigned apply_max_lod:1;  /**< max_lod < last_level ? */
> +   unsigned seamless_cube_map:1;
>  
> /* Hacks */
> unsigned force_nearest_s:1;
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> index 7e98919..355e97d 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
> @@ -2123,8 +2123,21 @@ lp_build_sample_soa(struct gallivm_state *gallivm,
>debug_printf("  .min_mip_filter = %u\n",
>derived_sampler_state.min_mip_filter);
> }
>  
> -   min_img_filter = static_sampler_state->min_img_filter;
> -   mag_img_filter = static_sampler_state->mag_img_filter;
> +   if ((static_texture_state->target == PIPE_TEXTURE_CUBE ||
> +static_texture_state->target == PIPE_TEXTURE_CUBE_ARRAY) &&
> +   static_sampler_state->seamless_cube_map)
> +   {
> +  /*
> +   * Seamless filtering ignores wrap modes.
> +   * Setting to CLAMP_TO_EDGE is correct for nearest filtering, for
> +   * bilinear it's not correct but way better than using for instance
> repeat.
> +   */
> +  derived_sampler_state.wrap_s = PIPE_TEX_WRAP_CLAMP_TO_EDGE;
> +  derived_sampler_state.wrap_t = PIPE_TEX_WRAP_CLAMP_TO_EDGE;
> +   }
> +
> +   min_img_filter = derived_sampler_state.min_img_filter;
> +   mag_img_filter = derived_sampler_state.mag_img_filter;
>  
>  
> /*
> @@ -2260,16 +2273,16 @@ lp_build_sample_soa(struct gallivm_state *gallivm,
>LLVMValueRef ilevel0 = NULL, ilevel1 = NULL;
>boolean use_aos = util_format_fits_8unorm(bld.format_desc) &&
>  /* not sure this is strictly needed or simply
>  impossible */
> -static_sampler_state->compare_mode ==
> PIPE_TEX_COMPARE_NONE &&
> -
> lp_is_simple_wrap_mode(static_sampler_state->wrap_s);
> +derived_sampler_state.compare_mode ==
> PIPE_TEX_COMPARE_NONE &&
> +
> lp_is_simple_wrap_mode(derived_sampler_state.wrap_s);
>  
>use_aos &= bld.num_lods <= num_quads ||
> - static_sampler_state->min_img_filter ==
> -static_sampler_state->mag_img_filter;
> + derived_sampler_state.min_img_filter ==
> +derived_sampler_state.mag_img_filter;
>if (dims > 1) {
> - use_aos &= lp_is_simple_wrap_mode(static_sampler_state->wrap_t);
> + use_aos &= lp_is_simple_wrap_mode(derived_sampler_state.wrap_t);
>   if (dims > 2) {
> -use_aos &= lp_is_simple_wrap_mode(static_sampler_state->wrap_r);
> +use_aos &= lp_is_simple_wrap_mode(derived_sampler_state.wrap_r);
>   }
>}
>  
> @@ -2278,12 +2291,12 @@ lp_build_sample_soa(struct gallivm_state *gallivm,
>   debug_printf("%s: using floating point linear filtering for %s\n",
>__FUNCTION__, bld.format_desc->short_name);
>   debug_printf("  min_img %d  mag_img %d  mip %d  wraps %d  wrapt %d
>   wrapr %d\n",
> -  static_sampler_state->min_img_filter,
> -  static_sampler_state->mag_img_filter,
> -  static_sampler_state->min_mip_filter,
> - 

Re: [Mesa-dev] [PATCH] i965: Fix brw_gs_prog_data_compare to actually check field members.

2013-09-17 Thread Chad Versace

Whoops...

Series is
Reviewed-by: Chad Versace 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] util/u_blit: Implement util_blit_pixels via pipe_context::blit.

2013-09-17 Thread jfonseca
From: José Fonseca 

This removes a lot of code, but not everything, as util_blit_pixels_tex
is still useful when one needs to override pipe_sampler_view::swizzle_?.
---
 src/gallium/auxiliary/util/u_blit.c | 447 +++-
 1 file changed, 37 insertions(+), 410 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_blit.c 
b/src/gallium/auxiliary/util/u_blit.c
index e9bec4a..4ba71b9 100644
--- a/src/gallium/auxiliary/util/u_blit.c
+++ b/src/gallium/auxiliary/util/u_blit.c
@@ -57,29 +57,20 @@ struct blit_state
struct pipe_context *pipe;
struct cso_context *cso;
 
-   struct pipe_blend_state blend_write_color, blend_keep_color;
+   struct pipe_blend_state blend_write_color;
struct pipe_depth_stencil_alpha_state dsa_keep_depthstencil;
-   struct pipe_depth_stencil_alpha_state dsa_write_depthstencil;
-   struct pipe_depth_stencil_alpha_state dsa_write_depth;
-   struct pipe_depth_stencil_alpha_state dsa_write_stencil;
struct pipe_rasterizer_state rasterizer;
struct pipe_sampler_state sampler;
struct pipe_viewport_state viewport;
struct pipe_vertex_element velem[2];
-   enum pipe_texture_target internal_target;
 
void *vs;
void *fs[PIPE_MAX_TEXTURE_TYPES][TGSI_WRITEMASK_XYZW + 1];
-   void *fs_depthstencil[PIPE_MAX_TEXTURE_TYPES];
-   void *fs_depth[PIPE_MAX_TEXTURE_TYPES];
-   void *fs_stencil[PIPE_MAX_TEXTURE_TYPES];
 
struct pipe_resource *vbuf;  /**< quad vertices */
unsigned vbuf_slot;
 
float vertices[4][2][4];   /**< vertex/texcoords for quad */
-
-   boolean has_stencil_export;
 };
 
 
@@ -103,20 +94,6 @@ util_create_blit(struct pipe_context *pipe, struct 
cso_context *cso)
/* disabled blending/masking */
ctx->blend_write_color.rt[0].colormask = PIPE_MASK_RGBA;
 
-   /* depth stencil states */
-   ctx->dsa_write_depth.depth.enabled = 1;
-   ctx->dsa_write_depth.depth.writemask = 1;
-   ctx->dsa_write_depth.depth.func = PIPE_FUNC_ALWAYS;
-   ctx->dsa_write_stencil.stencil[0].enabled = 1;
-   ctx->dsa_write_stencil.stencil[0].func = PIPE_FUNC_ALWAYS;
-   ctx->dsa_write_stencil.stencil[0].fail_op = PIPE_STENCIL_OP_REPLACE;
-   ctx->dsa_write_stencil.stencil[0].zpass_op = PIPE_STENCIL_OP_REPLACE;
-   ctx->dsa_write_stencil.stencil[0].zfail_op = PIPE_STENCIL_OP_REPLACE;
-   ctx->dsa_write_stencil.stencil[0].valuemask = 0xff;
-   ctx->dsa_write_stencil.stencil[0].writemask = 0xff;
-   ctx->dsa_write_depthstencil.depth = ctx->dsa_write_depth.depth;
-   ctx->dsa_write_depthstencil.stencil[0] = ctx->dsa_write_stencil.stencil[0];
-
/* rasterizer */
ctx->rasterizer.cull_face = PIPE_FACE_NONE;
ctx->rasterizer.half_pixel_center = 1;
@@ -147,14 +124,6 @@ util_create_blit(struct pipe_context *pipe, struct 
cso_context *cso)
   ctx->vertices[i][1][3] = 1.0f; /* q */
}
 
-   if(pipe->screen->get_param(pipe->screen, PIPE_CAP_NPOT_TEXTURES))
-  ctx->internal_target = PIPE_TEXTURE_2D;
-   else
-  ctx->internal_target = PIPE_TEXTURE_RECT;
-
-   ctx->has_stencil_export =
-  pipe->screen->get_param(pipe->screen, PIPE_CAP_SHADER_STENCIL_EXPORT);
-
return ctx;
 }
 
@@ -178,18 +147,6 @@ util_destroy_blit(struct blit_state *ctx)
   }
}
 
-   for (i = 0; i < PIPE_MAX_TEXTURE_TYPES; i++) {
-  if (ctx->fs_depthstencil[i]) {
- pipe->delete_fs_state(pipe, ctx->fs_depthstencil[i]);
-  }
-  if (ctx->fs_depth[i]) {
- pipe->delete_fs_state(pipe, ctx->fs_depth[i]);
-  }
-  if (ctx->fs_stencil[i]) {
- pipe->delete_fs_state(pipe, ctx->fs_stencil[i]);
-  }
-   }
-
pipe_resource_reference(&ctx->vbuf, NULL);
 
FREE(ctx);
@@ -217,63 +174,6 @@ set_fragment_shader(struct blit_state *ctx, uint writemask,
 
 
 /**
- * Helper function to set the shader which writes depth and stencil.
- */
-static INLINE void
-set_depthstencil_fragment_shader(struct blit_state *ctx,
- enum pipe_texture_target pipe_tex)
-{
-   if (!ctx->fs_depthstencil[pipe_tex]) {
-  unsigned tgsi_tex = util_pipe_tex_to_tgsi_tex(pipe_tex, 0);
-
-  ctx->fs_depthstencil[pipe_tex] =
- util_make_fragment_tex_shader_writedepthstencil(ctx->pipe, tgsi_tex,
-  TGSI_INTERPOLATE_LINEAR);
-   }
-
-   cso_set_fragment_shader_handle(ctx->cso, ctx->fs_depthstencil[pipe_tex]);
-}
-
-
-/**
- * Helper function to set the shader which writes depth.
- */
-static INLINE void
-set_depth_fragment_shader(struct blit_state *ctx,
-  enum pipe_texture_target pipe_tex)
-{
-   if (!ctx->fs_depth[pipe_tex]) {
-  unsigned tgsi_tex = util_pipe_tex_to_tgsi_tex(pipe_tex, 0);
-
-  ctx->fs_depth[pipe_tex] =
- util_make_fragment_tex_shader_writedepth(ctx->pipe, tgsi_tex,
-  TGSI_INTERPOLATE_LINEAR);
-   }
-
-   cso_set_fragment_shader_handle(ctx->cso, ctx->fs_depth[pipe_tex]);
-}
-
-
-/**
- * Helper function to set the shader which writes stencil.
- */
-stati

[Mesa-dev] [PATCH 2/3] util/u_blit: Support blits from cubemaps.

2013-09-17 Thread jfonseca
From: José Fonseca 

By calling util_map_texcoords2d_onto_cubemap.

A new parameter for util_blit_pixels_tex is necessary, as
pipe_sampler_view::first_layer is always supposed to point to the first
face when sampling from cubemaps.
---
 src/gallium/auxiliary/util/u_blit.c | 34 +++---
 src/gallium/auxiliary/util/u_blit.h |  1 +
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_blit.c 
b/src/gallium/auxiliary/util/u_blit.c
index 8cc080c..e9bec4a 100644
--- a/src/gallium/auxiliary/util/u_blit.c
+++ b/src/gallium/auxiliary/util/u_blit.c
@@ -46,6 +46,7 @@
 #include "util/u_math.h"
 #include "util/u_memory.h"
 #include "util/u_sampler.h"
+#include "util/u_texture.h"
 #include "util/u_simple_shaders.h"
 
 #include "cso_cache/cso_context.h"
@@ -143,7 +144,6 @@ util_create_blit(struct pipe_context *pipe, struct 
cso_context *cso)
/* init vertex data that doesn't change */
for (i = 0; i < 4; i++) {
   ctx->vertices[i][0][3] = 1.0f; /* w */
-  ctx->vertices[i][1][2] = 0.0f; /* r */
   ctx->vertices[i][1][3] = 1.0f; /* q */
}
 
@@ -327,6 +327,8 @@ get_next_slot( struct blit_state *ctx )
  */
 static unsigned
 setup_vertex_data_tex(struct blit_state *ctx,
+  unsigned src_target,
+  unsigned src_face,
   float x0, float y0, float x1, float y1,
   float s0, float t0, float s1, float t1,
   float z)
@@ -338,24 +340,37 @@ setup_vertex_data_tex(struct blit_state *ctx,
ctx->vertices[0][0][2] = z;
ctx->vertices[0][1][0] = s0; /*s*/
ctx->vertices[0][1][1] = t0; /*t*/
+   ctx->vertices[0][1][2] = 0;  /*r*/
 
ctx->vertices[1][0][0] = x1;
ctx->vertices[1][0][1] = y0;
ctx->vertices[1][0][2] = z;
ctx->vertices[1][1][0] = s1; /*s*/
ctx->vertices[1][1][1] = t0; /*t*/
+   ctx->vertices[1][1][2] = 0;  /*r*/
 
ctx->vertices[2][0][0] = x1;
ctx->vertices[2][0][1] = y1;
ctx->vertices[2][0][2] = z;
ctx->vertices[2][1][0] = s1;
ctx->vertices[2][1][1] = t1;
+   ctx->vertices[3][1][2] = 0;
 
ctx->vertices[3][0][0] = x0;
ctx->vertices[3][0][1] = y1;
ctx->vertices[3][0][2] = z;
ctx->vertices[3][1][0] = s0;
ctx->vertices[3][1][1] = t1;
+   ctx->vertices[3][1][2] = 0;
+
+   if (src_target == PIPE_TEXTURE_CUBE ||
+   src_target == PIPE_TEXTURE_CUBE_ARRAY) {
+  /* Map cubemap texture coordinates inplace. */
+  const unsigned stride = sizeof ctx->vertices[0] / sizeof 
ctx->vertices[0][0][0];
+  util_map_texcoords2d_onto_cubemap(src_face,
+&ctx->vertices[0][1][0], stride,
+&ctx->vertices[0][1][0], stride);
+   }
 
offset = get_next_slot( ctx );
 
@@ -770,6 +785,8 @@ util_blit_pixels(struct blit_state *ctx,
 
/* draw quad */
offset = setup_vertex_data_tex(ctx,
+  sampler_view->texture->target,
+  srcZ0 % 6,
   (float) dstX0 / dst_surface->width * 2.0f - 
1.0f,
   (float) dstY0 / dst_surface->height * 2.0f - 
1.0f,
   (float) dstX1 / dst_surface->width * 2.0f - 
1.0f,
@@ -811,16 +828,25 @@ util_blit_pixels(struct blit_state *ctx,
 
 
 /**
- * Copy pixel block from src texture to dst surface.
+ * Copy pixel block from src sampler view to dst surface.
+ *
  * The sampler view's first_level field indicates the source
  * mipmap level to use.
- * XXX need some control over blitting Z and/or stencil.
+ *
+ * The sampler view's first_layer indicate the layer to use, but for
+ * cube maps it must point to the first face.  Face is passed in src_face.
+ *
+ * The main advantage over util_blit_pixels is that it allows to specify 
swizzles in
+ * pipe_sampler_view::swizzle_?.
+ *
+ * But there is no control over blitting Z and/or stencil.
  */
 void
 util_blit_pixels_tex(struct blit_state *ctx,
  struct pipe_sampler_view *src_sampler_view,
  int srcX0, int srcY0,
  int srcX1, int srcY1,
+ unsigned src_face,
  struct pipe_surface *dst,
  int dstX0, int dstY0,
  int dstX1, int dstY1,
@@ -922,6 +948,8 @@ util_blit_pixels_tex(struct blit_state *ctx,
 
/* draw quad */
offset = setup_vertex_data_tex(ctx,
+  src_sampler_view->texture->target,
+  src_face,
   (float) dstX0 / dst->width * 2.0f - 1.0f,
   (float) dstY0 / dst->height * 2.0f - 1.0f,
   (float) dstX1 / dst->width * 2.0f - 1.0f,
diff --git a/src/gallium/auxiliary/util/u_blit.h 
b/src/gallium/auxiliary/util/u_blit.h
index 56ab030..bfcd1bb 100644
--- a/src/gallium/auxiliary/util/u_blit

Re: [Mesa-dev] [PATCH 04/24] ralloc: Unify overloads of the new operator and guarantee object destruction.

2013-09-17 Thread Paul Berry
On 15 September 2013 00:10, Francisco Jerez  wrote:

> This patch introduces a pair of helper functions providing a common
> implementation of the "new" and "delete" operators for all C++ classes
> that are allocated by ralloc via placement new.  The 'ralloc_new'
> helper function takes care of setting up an ralloc destructor callback
> that will call the appropriate destructor before the memory allocated
> to an object is released.
>
> Until now objects needed to call 'ralloc_set_destructor' explicitly
> with a pointer to a static method which in turn called the actual
> destructor in order to get something that should be transparent to
> them.  After this patch they'll only need to call 'ralloc_new' from
> the new operator and 'ralloc_delete' from the delete operator, turning
> all overloads of new and delete into one-liners.
> ---
>  src/glsl/ast.h | 26 +++--
>  src/glsl/glsl_parser_extras.h  |  9 +
>  src/glsl/glsl_symbol_table.cpp |  7 +---
>  src/glsl/glsl_symbol_table.h   | 23 +--
>  src/glsl/glsl_types.h  | 11 +-
>  src/glsl/ir_function_detect_recursion.cpp  | 11 +-
>  src/glsl/list.h| 22 ++-
>  src/glsl/loop_analysis.h   | 14 +--
>  src/glsl/ralloc.h  | 44
> ++
>  src/mesa/drivers/dri/i965/brw_cfg.h| 14 +--
>  src/mesa/drivers/dri/i965/brw_fs.h | 21 ++-
>  src/mesa/drivers/dri/i965/brw_fs_live_variables.h  |  7 +---
>  src/mesa/drivers/dri/i965/brw_vec4.h   | 21 ++-
>  .../drivers/dri/i965/brw_vec4_live_variables.h |  7 +---
>  src/mesa/program/ir_to_mesa.cpp|  7 +---
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  7 +---
>  16 files changed, 77 insertions(+), 174 deletions(-)
>
> diff --git a/src/glsl/ast.h b/src/glsl/ast.h
> index 1c7fc63..26c4701 100644
> --- a/src/glsl/ast.h
> +++ b/src/glsl/ast.h
> @@ -53,19 +53,12 @@ public:
>  * easier to just ralloc_free 'ctx' (or any of its ancestors). */
> static void* operator new(size_t size, void *ctx)
> {
> -  void *node;
> -
> -  node = rzalloc_size(ctx, size);
> -  assert(node != NULL);
> -
> -  return node;
> +  return ralloc_new(size, ctx);
> }
>
> -   /* If the user *does* call delete, that's OK, we will just
> -* ralloc_free in that case. */
> -   static void operator delete(void *table)
> +   static void operator delete(void *p)
> {
> -  ralloc_free(table);
> +  ralloc_delete(p);
> }
>
> /**
> @@ -367,19 +360,12 @@ struct ast_type_qualifier {
>  * easier to just ralloc_free 'ctx' (or any of its ancestors). */
> static void* operator new(size_t size, void *ctx)
> {
> -  void *node;
> -
> -  node = rzalloc_size(ctx, size);
> -  assert(node != NULL);
> -
> -  return node;
> +  return ralloc_new(size, ctx);
> }
>
> -   /* If the user *does* call delete, that's OK, we will just
> -* ralloc_free in that case. */
> -   static void operator delete(void *table)
> +   static void operator delete(void *p)
> {
> -  ralloc_free(table);
> +  ralloc_delete(p);
> }
>
> union {
> diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h
> index 2e2440a..6c2a63e 100644
> --- a/src/glsl/glsl_parser_extras.h
> +++ b/src/glsl/glsl_parser_extras.h
> @@ -77,17 +77,12 @@ struct _mesa_glsl_parse_state {
>  * easier to just ralloc_free 'ctx' (or any of its ancestors). */
> static void* operator new(size_t size, void *ctx)
> {
> -  void *mem = rzalloc_size(ctx, size);
> -  assert(mem != NULL);
> -
> -  return mem;
> +  return ralloc_new<_mesa_glsl_parse_state>(size, ctx);
> }
>
> -   /* If the user *does* call delete, that's OK, we will just
> -* ralloc_free in that case. */
> static void operator delete(void *mem)
> {
> -  ralloc_free(mem);
> +  ralloc_delete(mem);
> }
>
> /**
> diff --git a/src/glsl/glsl_symbol_table.cpp
> b/src/glsl/glsl_symbol_table.cpp
> index 4c96620..11fe06e 100644
> --- a/src/glsl/glsl_symbol_table.cpp
> +++ b/src/glsl/glsl_symbol_table.cpp
> @@ -30,15 +30,12 @@ public:
>  * easier to just ralloc_free 'ctx' (or any of its ancestors). */
> static void* operator new(size_t size, void *ctx)
> {
> -  void *entry = ralloc_size(ctx, size);
> -  assert(entry != NULL);
> -  return entry;
> +  return ralloc_new(size, ctx);
> }
>
> -   /* If the user *does* call delete, that's OK, we will just
> ralloc_free. */
> static void operator delete(void *entry)
> {
> -  ralloc_free(entry);
> +  ralloc_delete(entry);
> }
>
> bool add_interface(const glsl_type *i, enum ir_variable_mode mode)
> diff --git a/src/glsl/glsl_symbol_table.h b/src/gl

Re: [Mesa-dev] [PATCH 3/4] i965: Remove MIPLAYOUT_BELOW from Gen4-6 constant buffer surface state.

2013-09-17 Thread Kenneth Graunke
On 09/16/2013 05:16 PM, Paul Berry wrote:
> On 13 September 2013 23:10, Kenneth Graunke  > wrote:
> 
> Specifying a miptree layout makes no sense for constant buffers.
> 
> 
> You might want to mention in the commit message that there's no
> functional change since BRW_SURFACE_MIPMAPLAYOUT_BELOW == 0.
> 
> In any case, the patch is:
> 
> Reviewed-by: Paul Berry  >

Hah.  I didn't even realize that.  Good catch :)

--Ken

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] i965: Refactor Gen7+ SURFACE_STATE setup for buffer surfaces.

2013-09-17 Thread Kenneth Graunke
On 09/16/2013 02:55 PM, Paul Berry wrote:
> On 13 September 2013 23:10, Kenneth Graunke  > wrote:
> 
> This was an embarassingly large amount of copy and pasted code,
> and it wasn't particularly simple code either.  By factoring it out
> into a helper function, we consolidate the complexity.
> 
> Signed-off-by: Kenneth Graunke  >
> 
> 
> I really like what you're doing here.  A few minor comments:

Thanks!

> ---
>  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 144
> +-
>  1 file changed, 58 insertions(+), 86 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> index 37e3174..8413308 100644
> --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> @@ -224,6 +224,37 @@ gen7_check_surface_setup(uint32_t *surf, bool
> is_render_target)
> }
>  }
> 
> +static void
> +gen7_emit_buffer_surface_state(struct brw_context *brw,
> +   uint32_t *out_offset,
> +   drm_intel_bo *bo,
> +   unsigned buffer_offset,
> +   unsigned surface_format,
> +   unsigned buffer_size,
> +   unsigned pitch,
> +   unsigned mocs)
> +{
> +   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> +8 * 4, 32, out_offset);
> +   memset(surf, 0, 8 * 4);
> +
> +   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
> + surface_format << BRW_SURFACE_FORMAT_SHIFT |
> + BRW_SURFACE_RC_READ_WRITE;
> +   surf[1] = bo->offset + buffer_offset; /* reloc */
> +   surf[2] = SET_FIELD(buffer_size & 0x7f, GEN7_SURFACE_WIDTH) |
> + SET_FIELD((buffer_size >> 7) & 0x3fff,
> GEN7_SURFACE_HEIGHT);
> +   surf[3] = SET_FIELD((buffer_size >> 21) & 0x3f, BRW_SURFACE_DEPTH) |
> 
> 
> These three instances of buffer_size should be (buffer_size - 1).  I
> think that there was a pre-existing bug in
> gen7_update_buffer_texture_surface() where it wasn't subtracting 1 where
> it should.  Probably we should fix the bug as a pre-patch.

Yeah, I think there's a pre-existing bug here.  I'll fix that in a patch
at the start of the series.

> 
> + (pitch - 1);
> +   surf[4] = 0;
> 
> 
> Technically this line is unnecessary given the memset() above.  I'm
> quibbling, though--it's hard to imagine this making a significant
> performance difference :)

True enough :) Removed.

[snip]
> @@ -371,38 +386,15 @@ gen7_create_constant_surface(struct
> brw_context *brw,
>  {
> uint32_t stride = dword_pitch ? 4 : 16;
> uint32_t elements = ALIGN(size, stride) / stride;
> -   const GLint w = elements - 1;
> 
> -   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> -8 * 4, 32, out_offset);
> -   memset(surf, 0, 8 * 4);
> -
> -   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
> - BRW_SURFACEFORMAT_R32G32B32A32_FLOAT <<
> BRW_SURFACE_FORMAT_SHIFT |
> - BRW_SURFACE_RC_READ_WRITE;
> -
> -   assert(bo);
> -   surf[1] = bo->offset + offset; /* reloc */
> -
> -   /* note that these differ from GEN6 */
> -   surf[2] = SET_FIELD(w & 0x7f, GEN7_SURFACE_WIDTH) |
> - SET_FIELD((w >> 7) & 0x3fff, GEN7_SURFACE_HEIGHT);
> -   surf[3] = SET_FIELD((w >> 21) & 0x3f, BRW_SURFACE_DEPTH) |
> - (stride - 1);
> -
> -   if (brw->is_haswell) {
> -  surf[7] = SET_FIELD(HSW_SCS_RED,   GEN7_SURFACE_SCS_R) |
> -SET_FIELD(HSW_SCS_GREEN, GEN7_SURFACE_SCS_G) |
> -SET_FIELD(HSW_SCS_BLUE,  GEN7_SURFACE_SCS_B) |
> -SET_FIELD(HSW_SCS_ALPHA, GEN7_SURFACE_SCS_A);
> -   }
> 
> 
> I don't see this Haswell-specific code in
> gen7_emit_buffer_surface_state().  Is that a problem?

I don't think those matter for buffer surfaces - I was just overzealous
when I added them originally.  We already dropped it for some buffer
surfaces.

Still, it's definitely worth testing.

> 
> -
> -   drm_intel_bo_emit_reloc(brw->batch.bo ,
> -  *out_offset + 4,
> -  bo, offset,
> -  I915_GEM_DOMAIN_SAMPLER, 0);
> -
> -   gen7_check_surface_setup(surf, false /* is_render_target */);
> +   gen7_emit_buffer_surface_state(brw,
> +  out_offset,
> +  bo,
> +  offset,
>

Re: [Mesa-dev] [PATCH 08/24] glsl: Add new atomic_uint built-in GLSL type.

2013-09-17 Thread Paul Berry
On 15 September 2013 00:10, Francisco Jerez  wrote:

> ---
>  src/glsl/ast_to_hir.cpp  |  1 +
>  src/glsl/builtin_type_macros.h   |  2 ++
>  src/glsl/builtin_types.cpp   |  6 ++
>  src/glsl/glsl_types.cpp  |  2 ++
>  src/glsl/glsl_types.h| 14 ++
>  src/glsl/ir_clone.cpp|  1 +
>  src/glsl/link_uniform_initializers.cpp   |  1 +
>  src/glsl/tests/uniform_initializer_utils.cpp |  3 +++
>  src/mesa/program/ir_to_mesa.cpp  |  2 ++
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp   |  1 +
>  10 files changed, 33 insertions(+)
>
> diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
> index 2316cf8..fcca5df 100644
> --- a/src/glsl/ast_to_hir.cpp
> +++ b/src/glsl/ast_to_hir.cpp
> @@ -902,6 +902,7 @@ do_comparison(void *mem_ctx, int operation, ir_rvalue
> *op0, ir_rvalue *op1)
> case GLSL_TYPE_VOID:
> case GLSL_TYPE_SAMPLER:
> case GLSL_TYPE_INTERFACE:
> +   case GLSL_TYPE_ATOMIC_UINT:
>/* I assume a comparison of a struct containing a sampler just
> * ignores the sampler present in the type.
> */
> diff --git a/src/glsl/builtin_type_macros.h
> b/src/glsl/builtin_type_macros.h
> index fec38da..263fd83 100644
> --- a/src/glsl/builtin_type_macros.h
> +++ b/src/glsl/builtin_type_macros.h
> @@ -110,6 +110,8 @@ DECL_TYPE(sampler2DRectShadow,
>  GL_SAMPLER_2D_RECT_SHADOW,GLSL_SAMPLER
>
>  DECL_TYPE(samplerExternalOES, GL_SAMPLER_EXTERNAL_OES,
>  GLSL_SAMPLER_DIM_EXTERNAL, 0, 0, GLSL_TYPE_FLOAT)
>
> +DECL_TYPE(atomic_uint, GL_UNSIGNED_INT_ATOMIC_COUNTER,
> GLSL_TYPE_ATOMIC_UINT, 1, 1)
> +
>  STRUCT_TYPE(gl_DepthRangeParameters)
>  STRUCT_TYPE(gl_PointParameters)
>  STRUCT_TYPE(gl_MaterialParameters)
> diff --git a/src/glsl/builtin_types.cpp b/src/glsl/builtin_types.cpp
> index 722eda2..8311a91 100644
> --- a/src/glsl/builtin_types.cpp
> +++ b/src/glsl/builtin_types.cpp
> @@ -203,6 +203,8 @@ const static struct builtin_type_versions {
> T(sampler2DRectShadow, 140, 999)
>
> T(struct_gl_DepthRangeParameters,  110, 100)
> +
> +   T(atomic_uint, 130, 999)
>

This should be the GLSL version in which the type became available without
the use of an extension, so it should be 420, not 130.


>  };
>
>  const glsl_type *const deprecated_types[] = {
> @@ -284,5 +286,9 @@ _mesa_glsl_initialize_types(struct
> _mesa_glsl_parse_state *state)
> if (state->OES_texture_3D_enable) {
>add_type(symbols, glsl_type::sampler3D_type);
> }
> +
> +   if (state->ARB_shader_atomic_counters_enable) {
> +  add_type(symbols, glsl_type::atomic_uint_type);
> +   }
>  }
>  /** @} */
> diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
> index 3c396dd..e1fe153 100644
> --- a/src/glsl/glsl_types.cpp
> +++ b/src/glsl/glsl_types.cpp
> @@ -586,6 +586,7 @@ glsl_type::component_slots() const
>return this->length * this->fields.array->component_slots();
>
> case GLSL_TYPE_SAMPLER:
> +   case GLSL_TYPE_ATOMIC_UINT:
> case GLSL_TYPE_VOID:
> case GLSL_TYPE_ERROR:
>break;
> @@ -874,6 +875,7 @@ glsl_type::count_attribute_slots() const
>return this->length * this->fields.array->count_attribute_slots();
>
> case GLSL_TYPE_SAMPLER:
> +   case GLSL_TYPE_ATOMIC_UINT:
> case GLSL_TYPE_VOID:
> case GLSL_TYPE_ERROR:
>break;
> diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h
> index acdf48f..d0274e6 100644
> --- a/src/glsl/glsl_types.h
> +++ b/src/glsl/glsl_types.h
> @@ -53,6 +53,7 @@ enum glsl_base_type {
> GLSL_TYPE_FLOAT,
> GLSL_TYPE_BOOL,
> GLSL_TYPE_SAMPLER,
> +   GLSL_TYPE_ATOMIC_UINT,
> GLSL_TYPE_STRUCT,
> GLSL_TYPE_INTERFACE,
> GLSL_TYPE_ARRAY,
> @@ -434,6 +435,19 @@ struct glsl_type {
> }
>
> /**
> +* Return the amount of atomic counter storage required for a type.
> +*/
> +   unsigned atomic_size() const
> +   {
> +  if (base_type == GLSL_TYPE_ATOMIC_UINT)
> + return ATOMIC_COUNTER_SIZE;
> +  else if (is_array())
> + return length * element_type()->atomic_size();
> +  else
> + return 0;
> +   }
>

Can atomic counters appear inside structs?  If so, we probably need an
is_record() case here.  If not, it would be nice to have a comment
explaining why it's unnecessary.


> +
> +   /**
>  * Query the full type of a matrix row
>  *
>  * \return
> diff --git a/src/glsl/ir_clone.cpp b/src/glsl/ir_clone.cpp
> index fb303b0..b70b7db 100644
> --- a/src/glsl/ir_clone.cpp
> +++ b/src/glsl/ir_clone.cpp
> @@ -385,6 +385,7 @@ ir_constant::clone(void *mem_ctx, struct hash_table
> *ht) const
> }
>
> case GLSL_TYPE_SAMPLER:
> +   case GLSL_TYPE_ATOMIC_UINT:
> case GLSL_TYPE_VOID:
> case GLSL_TYPE_ERROR:
> case GLSL_TYPE_INTERFACE:
> diff --git a/src/glsl/link_uniform_initializers.cpp
> b/src/glsl/link_uniform_initializers.cpp
> index 3f66710..786a

[Mesa-dev] [PATCH 1/3] vega: Use pipe_context::blit instead of util_blit_pixels_tex.

2013-09-17 Thread jfonseca
From: José Fonseca 

Only compile-tested but it seems straightforward.
---
 src/gallium/state_trackers/vega/vg_context.c | 44 
 src/gallium/state_trackers/vega/vg_context.h |  2 --
 2 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/src/gallium/state_trackers/vega/vg_context.c 
b/src/gallium/state_trackers/vega/vg_context.c
index c636188..46c7d96 100644
--- a/src/gallium/state_trackers/vega/vg_context.c
+++ b/src/gallium/state_trackers/vega/vg_context.c
@@ -41,7 +41,6 @@
 #include "cso_cache/cso_context.h"
 
 #include "util/u_memory.h"
-#include "util/u_blit.h"
 #include "util/u_sampler.h"
 #include "util/u_surface.h"
 #include "util/u_format.h"
@@ -138,8 +137,6 @@ struct vg_context * vg_create_context(struct pipe_context 
*pipe,
ctx->sc = shaders_cache_create(ctx);
ctx->shader = shader_create(ctx);
 
-   ctx->blit = util_create_blit(ctx->pipe, ctx->cso_context);
-
return ctx;
 }
 
@@ -147,7 +144,6 @@ void vg_destroy_context(struct vg_context *ctx)
 {
struct pipe_resource **cbuf = &ctx->mask.cbuf;
 
-   util_destroy_blit(ctx->blit);
renderer_destroy(ctx->renderer);
shaders_cache_destroy(ctx->sc);
shader_destroy(ctx->shader);
@@ -443,23 +439,35 @@ static void vg_prepare_blend_texture(struct vg_context 
*ctx,
  struct pipe_sampler_view *src)
 {
struct st_framebuffer *stfb = ctx->draw_buffer;
-   struct pipe_surface *surf;
-   struct pipe_surface surf_tmpl;
+   struct pipe_context *pipe = ctx->pipe;
+   struct pipe_blit_info info;
 
vg_context_update_blend_texture_view(ctx, stfb->width, stfb->height);
 
-   u_surface_default_template(&surf_tmpl, stfb->blend_texture_view->texture);
-   surf = ctx->pipe->create_surface(ctx->pipe,
-stfb->blend_texture_view->texture,
-&surf_tmpl);
-   if (surf) {
-  util_blit_pixels_tex(ctx->blit,
-   src, 0, 0, stfb->width, stfb->height,
-   surf, 0, 0, stfb->width, stfb->height,
-   0.0, PIPE_TEX_MIPFILTER_NEAREST);
-
-  pipe_surface_reference(&surf, NULL);
-   }
+   memset(&info, 0, sizeof info);
+   info.dst.resource = stfb->blend_texture_view->texture;
+   info.dst.level = 0;
+   info.dst.box.x = 0;
+   info.dst.box.y = 0;
+   info.dst.box.z = 0;
+   info.dst.box.width = stfb->width;
+   info.dst.box.height = stfb->height;
+   info.dst.box.depth = 1;
+   info.dst.format = stfb->blend_texture_view->format;
+   info.src.resource = src->texture;
+   info.src.level = src->u.tex.first_level;
+   info.src.box.x = 0;
+   info.src.box.y = 0;
+   info.src.box.z = src->u.tex.first_layer;
+   info.src.box.width = stfb->width;
+   info.src.box.height = stfb->height;
+   info.src.box.depth = 1;
+   info.src.format = src->format;
+   info.mask = PIPE_MASK_RGBA;
+   info.filter = PIPE_TEX_MIPFILTER_NEAREST;
+   info.scissor_enable = 0;
+
+   pipe->blit(pipe, &info);
 }
 
 struct pipe_sampler_view *vg_prepare_blend_surface(struct vg_context *ctx)
diff --git a/src/gallium/state_trackers/vega/vg_context.h 
b/src/gallium/state_trackers/vega/vg_context.h
index d61c7cf..5a2a322 100644
--- a/src/gallium/state_trackers/vega/vg_context.h
+++ b/src/gallium/state_trackers/vega/vg_context.h
@@ -129,8 +129,6 @@ struct vg_context
struct pipe_sampler_state blend_sampler;
struct vg_paint *default_paint;
 
-   struct blit_state *blit;
-
int32_t draw_stamp;
 };
 
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/24] Support for ARB_shader_atomic_counters.

2013-09-17 Thread Francisco Jerez
Francisco Jerez  writes:

> This patch series implements support for the
> ARB_shader_atomic_counters extension, which is part of GL core since
> GL 4.2.  It includes patches adding support for the new APIs and GLSL
> language features, and working back-end code for Intel Gen7 hardware
> -- Ivy Bridge should work with these patches alone, Haswell is going
> to need a small kernel change I'll probably submit for review during
> the next week.
>
> The series depends on Ken's surface state tidying patches [1] and
> on patches 1-4, which are seemingly unrelated fixes.
>
> There's also a series of ~30 unit tests for this extension I will send
> to the piglit mailing list soon.
>
> Thanks.
>
> [1] http://lists.freedesktop.org/archives/mesa-dev/2013-September/044691.html
>

I've pushed this and Ken's patch series to a branch here [2].  I'll try
to keep it in sync with master until we decide they can go in.

Thanks.

[2] http://cgit.freedesktop.org/~currojerez/mesa/log/?h=atomic-counters 

> [PATCH 01/24] mesa: Fix misplaced includes of "main/uniforms.h".
> [PATCH 02/24] glsl: Initialize all member variables of _mesa_glsl_parse_state 
> on construction.
> [PATCH 03/24] i965: Initialize all member variables of vec4_instruction on 
> construction.
> [PATCH 04/24] ralloc: Unify overloads of the new operator and guarantee 
> object destruction.
> [PATCH 05/24] glapi: Add support for ARB_shader_atomic_counters.
> [PATCH 06/24] mesa: Add support for ARB_shader_atomic_counters.
> [PATCH 07/24] glsl: Add extension enables for ARB_shader_atomic_counters.
> [PATCH 08/24] glsl: Add new atomic_uint built-in GLSL type.
> [PATCH 09/24] glsl: Add IR node for atomic operations.
> [PATCH 10/24] glsl: Implement parser support for atomic counters.
> [PATCH 11/24] glsl: Add built-in functions and constants required for 
> ARB_shader_atomic_counters.
> [PATCH 12/24] glsl: Add predicate to determine if an IR node has side effects.
> [PATCH 13/24] glsl: Linker support for ARB_shader_atomic_counters.
> [PATCH 14/24] i965: Define vtbl method that initializes an untyped R/W 
> surface.
> [PATCH 15/24] i965: Implement ABO surface state emission.
> [PATCH 16/24] i965/gen7: Implement code generation for untyped atomic 
> instructions.
> [PATCH 17/24] i965/gen7: Implement code generation for untyped surface read 
> instructions.
> [PATCH 18/24] i965: Add a 'has_side_effects' back-end instruction predicate.
> [PATCH 19/24] i965: Handle the 'atomic_uint' GLSL type.
> [PATCH 20/24] i965: Add brw_reg constructors taking a dynamically determined 
> vector width.
> [PATCH 21/24] i965/gen7: Handle atomic instructions from the FS back-end.
> [PATCH 22/24] i965/gen7: Handle atomic instructions from the VEC4 back-end.
> [PATCH 23/24] i965/gen7: Expose ARB_shader_atomic_counters.
> [PATCH 24/24] i965: Simplify the shader time code by using atomic counter 
> helpers.


pgpsyhdb8k90H.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] winsys/sw/xlib: fix compile error in xlib_sw_winsys.c.

2013-09-17 Thread Gaetan Nadon
xlib_sw_winsys.h:5:22: fatal error: X11/Xlib.h: No such file or directory

The compiler cannot find the Xlib.h in the installed system headers.
All supplied include directives point to inside the mesa module.
The X11_CFLAGS variable is undefined (not defined in config.status).

It appears the intent was to use X11_INCLUDES defined in configure.ac.

The Xlib.h file is not installed on my workstation. It is supplied in
the libx11-dev package. This allows an X developer control over which
version of this file is used for X development.

Signed-off-by: Gaetan Nadon 
---
 src/gallium/winsys/sw/xlib/Makefile.am |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/winsys/sw/xlib/Makefile.am 
b/src/gallium/winsys/sw/xlib/Makefile.am
index 59da37d..ea6b742 100644
--- a/src/gallium/winsys/sw/xlib/Makefile.am
+++ b/src/gallium/winsys/sw/xlib/Makefile.am
@@ -24,7 +24,7 @@ include $(top_srcdir)/src/gallium/Automake.inc
 
 AM_CPPFLAGS = \
$(GALLIUM_CFLAGS) \
-   $(X11_CFLAGS)
+   $(X11_INCLUDES)
 
 noinst_LTLIBRARIES = libws_xlib.la
 
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix writemask != 0 assertions on Sandybridge.

2013-09-17 Thread Kenneth Graunke
This fixes myriads of regressions since commit 169f9c030c16d1247a3a7629
("i965: Add an assertion that writemask != NULL for non-ARFs.").

On Sandybridge, our control flow handling (such as brw_IF) does:

   brw_set_dest(p, insn, brw_imm_w(0));
   insn->bits1.branch_gen6.jump_count = 0;

This results in a IMM destination with zero for the writemask.  IMM
destinations are rather bizarre (I'm guessing the file is irrelevant),
but the code has been working for ages, so I'm loathe to change it.

Fixes glxgears on Sandybridge.

Signed-off-by: Kenneth Graunke 
Cc: Eric Anholt 
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index cce8752..7ed3df0 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -126,8 +126,10 @@ brw_set_dest(struct brw_compile *p, struct brw_instruction 
*insn,
   else {
 insn->bits1.da16.dest_subreg_nr = dest.subnr / 16;
 insn->bits1.da16.dest_writemask = dest.dw1.bits.writemask;
- assert(dest.dw1.bits.writemask != 0 ||
-dest.file == BRW_ARCHITECTURE_REGISTER_FILE);
+ if (dest.file == BRW_GENERAL_REGISTER_FILE ||
+ dest.file == BRW_MESSAGE_REGISTER_FILE) {
+assert(dest.dw1.bits.writemask != 0);
+ }
 /* From the Ivybridge PRM, Vol 4, Part 3, Section 5.2.4.1:
  *Although Dst.HorzStride is a don't care for Align16, HW needs
  *this to be programmed as "01".
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glx: fix compile error in egl_glx.c.

2013-09-17 Thread Gaetan Nadon
egl_glx.c:40:22: fatal error: X11/Xlib.h: No such file or directory

The compiler cannot find the Xlib.h in the installed system headers.
All supplied include directives point to inside the mesa module.
The X11_CFLAGS variable is undefined (not defined in config.status).

It appears the intent was to use X11_INCLUDES defined in configure.ac.

The Xlib.h file is not installed on my workstation. It is supplied in
the libx11-dev package. This allows an X developer control over which
version of this file is used for X development.

Signed-off-by: Gaetan Nadon 
---
 src/egl/drivers/glx/Makefile.am |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/egl/drivers/glx/Makefile.am b/src/egl/drivers/glx/Makefile.am
index 6db95b4..5dd5228 100644
--- a/src/egl/drivers/glx/Makefile.am
+++ b/src/egl/drivers/glx/Makefile.am
@@ -23,7 +23,7 @@ AM_CFLAGS = \
-I$(top_srcdir)/include \
-I$(top_srcdir)/src/egl/main \
$(VISIBILITY_CFLAGS) \
-   $(X11_CFLAGS) \
+   $(X11_INCLUDES) \
$(DEFINES)
 
 noinst_LTLIBRARIES = libegl_glx.la
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] i965: Refactor Gen7+ SURFACE_STATE setup for buffer surfaces.

2013-09-17 Thread Kenneth Graunke
This was an embarassingly large amount of copy and pasted code,
and it wasn't particularly simple code either.  By factoring it out
into a helper function, we consolidate the complexity.

v2: Properly NULL-check bo.  Caught by Eric Anholt.
v3: Do the subtraction by 1 in gen7_emit_buffer_surface_state, rather
than making callers do it.  This makes the buffer_size parameter
the actual size of the buffer.  Suggested by Paul Berry.

Signed-off-by: Kenneth Graunke 
Cc: Paul Berry 
Cc: Eric Anholt 
---
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 146 +-
 1 file changed, 60 insertions(+), 86 deletions(-)

Tested on Ivybridge.

diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
index c38843f..6938b1a 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
@@ -224,6 +224,39 @@ gen7_check_surface_setup(uint32_t *surf, bool 
is_render_target)
}
 }
 
+static void
+gen7_emit_buffer_surface_state(struct brw_context *brw,
+   uint32_t *out_offset,
+   drm_intel_bo *bo,
+   unsigned buffer_offset,
+   unsigned surface_format,
+   unsigned buffer_size,
+   unsigned pitch,
+   unsigned mocs)
+{
+   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
+8 * 4, 32, out_offset);
+   memset(surf, 0, 8 * 4);
+
+   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
+ surface_format << BRW_SURFACE_FORMAT_SHIFT |
+ BRW_SURFACE_RC_READ_WRITE;
+   surf[1] = (bo ? bo->offset : 0) + buffer_offset; /* reloc */
+   surf[2] = SET_FIELD((buffer_size - 1) & 0x7f, GEN7_SURFACE_WIDTH) |
+ SET_FIELD(((buffer_size - 1) >> 7) & 0x3fff, GEN7_SURFACE_HEIGHT);
+   surf[3] = SET_FIELD(((buffer_size - 1) >> 21) & 0x3f, BRW_SURFACE_DEPTH) |
+ (pitch - 1);
+
+   surf[5] = SET_FIELD(mocs, GEN7_SURFACE_MOCS);
+
+   /* Emit relocation to surface contents */
+   if (bo) {
+  drm_intel_bo_emit_reloc(brw->batch.bo, *out_offset + 4,
+  bo, buffer_offset, I915_GEM_DOMAIN_SAMPLER, 0);
+   }
+
+   gen7_check_surface_setup(surf, false /* is_render_target */);
+}
 
 static void
 gen7_update_buffer_texture_surface(struct gl_context *ctx,
@@ -237,39 +270,23 @@ gen7_update_buffer_texture_surface(struct gl_context *ctx,
drm_intel_bo *bo = intel_obj ? intel_obj->buffer : NULL;
gl_format format = tObj->_BufferObjectFormat;
 
-   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
-8 * 4, 32, surf_offset);
-   memset(surf, 0, 8 * 4);
-
uint32_t surface_format = brw_format_for_mesa_format(format);
if (surface_format == 0 && format != MESA_FORMAT_RGBA_FLOAT32) {
   _mesa_problem(NULL, "bad format %s for texture buffer\n",
 _mesa_get_format_name(format));
}
 
-   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
- surface_format << BRW_SURFACE_FORMAT_SHIFT |
- BRW_SURFACE_RC_READ_WRITE;
-
-   if (bo) {
-  surf[1] = bo->offset; /* reloc */
-
-  drm_intel_bo_emit_reloc(brw->batch.bo,
- *surf_offset + 4,
- bo, 0,
- I915_GEM_DOMAIN_SAMPLER, 0);
-
-  int texel_size = _mesa_get_format_bytes(format);
-  int w = (intel_obj->Base.Size / texel_size) - 1;
-
-  /* note that these differ from GEN6 */
-  surf[2] = SET_FIELD(w & 0x7f, GEN7_SURFACE_WIDTH) | /* bits 6:0 of size 
*/
-SET_FIELD((w >> 7) & 0x3fff, GEN7_SURFACE_HEIGHT); /* 20:7 */
-  surf[3] = SET_FIELD((w >> 21) & 0x3f, BRW_SURFACE_DEPTH) | /* bits 26:21 
*/
-(texel_size - 1);
-   }
-
-   gen7_check_surface_setup(surf, false /* is_render_target */);
+   int texel_size = _mesa_get_format_bytes(format);
+   int w = intel_obj ? intel_obj->Base.Size / texel_size : 0;
+
+   gen7_emit_buffer_surface_state(brw,
+  surf_offset,
+  bo,
+  0,
+  surface_format,
+  w,
+  texel_size,
+  0 /* mocs */);
 }
 
 static void
@@ -371,38 +388,15 @@ gen7_create_constant_surface(struct brw_context *brw,
 {
uint32_t stride = dword_pitch ? 4 : 16;
uint32_t elements = ALIGN(size, stride) / stride;
-   const GLint w = elements - 1;
 
-   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
-8 * 4, 32, out_offset);
-   memset(surf, 0, 8 * 4);
-
-   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
- BRW_SURFACEFORMA

[Mesa-dev] [PATCH 3/3] i965: Refactor Gen4-6 SURFACE_STATE setup for buffer surfaces.

2013-09-17 Thread Kenneth Graunke
This was an embarassingly large amount of copy and pasted code,
and it wasn't particularly simple code either.  By factoring it out
into a helper function, we consolidate the complexity.

v2: Properly NULL-check bo.  Caught by Eric Anholt.
v3: Do the subtraction by 1 in gen7_emit_buffer_surface_state, rather
than making callers do it.  This makes the buffer_size parameter
the actual size of the buffer.  Suggested by Paul Berry.

Signed-off-by: Kenneth Graunke 
Cc: Paul Berry 
Cc: Eric Anholt 
---
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 100 +--
 1 file changed, 39 insertions(+), 61 deletions(-)

Actually tested on SNB this time.  No Piglit regressions.

diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index 0078161..56ec8b1 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -190,6 +190,38 @@ brw_get_texture_swizzle(const struct gl_context *ctx,
 swizzles[GET_SWZ(t->_Swizzle, 3)]);
 }
 
+static void
+gen4_emit_buffer_surface_state(struct brw_context *brw,
+   uint32_t *out_offset,
+   drm_intel_bo *bo,
+   unsigned buffer_offset,
+   unsigned surface_format,
+   unsigned buffer_size,
+   unsigned pitch)
+{
+   uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
+6 * 4, 32, out_offset);
+   memset(surf, 0, 6 * 4);
+
+   surf[0] = BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
+ surface_format << BRW_SURFACE_FORMAT_SHIFT |
+ (brw->gen >= 6 ? BRW_SURFACE_RC_READ_WRITE : 0);
+   surf[1] = (bo ? bo->offset : 0) + buffer_offset; /* reloc */
+   surf[2] = (buffer_size & 0x7f) << BRW_SURFACE_WIDTH_SHIFT |
+ ((buffer_size >> 7) & 0x1fff) << BRW_SURFACE_HEIGHT_SHIFT;
+   surf[3] = ((buffer_size >> 20) & 0x7f) << BRW_SURFACE_DEPTH_SHIFT |
+ (pitch - 1) << BRW_SURFACE_PITCH_SHIFT;
+
+   /* Emit relocation to surface contents.  The 965 PRM, Volume 4, section
+* 5.1.2 "Data Cache" says: "the data cache does not exist as a separate
+* physical cache.  It is mapped in hardware to the sampler cache."
+*/
+   if (bo) {
+  drm_intel_bo_emit_reloc(brw->batch.bo, *out_offset + 4,
+  bo, buffer_offset,
+  I915_GEM_DOMAIN_SAMPLER, 0);
+   }
+}
 
 static void
 brw_update_buffer_texture_surface(struct gl_context *ctx,
@@ -198,49 +230,22 @@ brw_update_buffer_texture_surface(struct gl_context *ctx,
 {
struct brw_context *brw = brw_context(ctx);
struct gl_texture_object *tObj = ctx->Texture.Unit[unit]._Current;
-   uint32_t *surf;
struct intel_buffer_object *intel_obj =
   intel_buffer_object(tObj->BufferObject);
drm_intel_bo *bo = intel_obj ? intel_obj->buffer : NULL;
gl_format format = tObj->_BufferObjectFormat;
uint32_t brw_format = brw_format_for_mesa_format(format);
int texel_size = _mesa_get_format_bytes(format);
+   int w = intel_obj ? intel_obj->Base.Size / texel_size : 0;
 
if (brw_format == 0 && format != MESA_FORMAT_RGBA_FLOAT32) {
   _mesa_problem(NULL, "bad format %s for texture buffer\n",
_mesa_get_format_name(format));
}
 
-   surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
- 6 * 4, 32, surf_offset);
-
-   surf[0] = (BRW_SURFACE_BUFFER << BRW_SURFACE_TYPE_SHIFT |
- (brw_format_for_mesa_format(format) << BRW_SURFACE_FORMAT_SHIFT));
-
-   if (brw->gen >= 6)
-  surf[0] |= BRW_SURFACE_RC_READ_WRITE;
-
-   if (bo) {
-  surf[1] = bo->offset; /* reloc */
-
-  /* Emit relocation to surface contents. */
-  drm_intel_bo_emit_reloc(brw->batch.bo,
- *surf_offset + 4,
- bo, 0, I915_GEM_DOMAIN_SAMPLER, 0);
-
-  int w = (intel_obj->Base.Size / texel_size) - 1;
-  surf[2] = ((w & 0x7f) << BRW_SURFACE_WIDTH_SHIFT |
-((w >> 7) & 0x1fff) << BRW_SURFACE_HEIGHT_SHIFT);
-  surf[3] = (((w >> 20) & 0x7f) << BRW_SURFACE_DEPTH_SHIFT |
-(texel_size - 1) << BRW_SURFACE_PITCH_SHIFT);
-   } else {
-  surf[1] = 0;
-  surf[2] = 0;
-  surf[3] = 0;
-   }
-
-   surf[4] = 0;
-   surf[5] = 0;
+   gen4_emit_buffer_surface_state(brw, surf_offset, bo, 0,
+  brw_format,
+  w, texel_size);
 }
 
 static void
@@ -311,37 +316,10 @@ brw_create_constant_surface(struct brw_context *brw,
 {
uint32_t stride = dword_pitch ? 4 : 16;
uint32_t elements = ALIGN(size, stride) / stride;
-   const GLint w = elements - 1;
-   uint32_t *surf;
-
-   surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
- 6 * 4, 32, out_offset);
-
-   su

Re: [Mesa-dev] [PATCH 08/24] glsl: Add new atomic_uint built-in GLSL type.

2013-09-17 Thread Paul Berry
On 17 September 2013 12:18, Paul Berry  wrote:

> On 15 September 2013 00:10, Francisco Jerez  wrote:
>
>>
>> /**
>> +* Return the amount of atomic counter storage required for a type.
>> +*/
>> +   unsigned atomic_size() const
>> +   {
>> +  if (base_type == GLSL_TYPE_ATOMIC_UINT)
>> + return ATOMIC_COUNTER_SIZE;
>> +  else if (is_array())
>> + return length * element_type()->atomic_size();
>> +  else
>> + return 0;
>> +   }
>>
>
> Can atomic counters appear inside structs?  If so, we probably need an
> is_record() case here.  If not, it would be nice to have a comment
> explaining why it's unnecessary.
>

I also notice that most of your uses of atomic_size() in this patch series
are merely to see whether the type contains any atomics.  You might
consider adding a contains_atomic() function (like we've already done with
contains_sampler() and contains_integer()) just for code clarity.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] i965: Fix off by one errors in texture buffer size calculations.

2013-09-17 Thread Kenneth Graunke
The value that's split into width/height/depth needs to be the size of
the buffer minus one.  This makes it consistent with the constant buffer
and shader time SURFACE_STATE setup code.

Signed-off-by: Kenneth Graunke 
Cc: Paul Berry 
Cc: Eric Anholt 
---
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c  | 2 +-
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index 8d87786..0078161 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -228,7 +228,7 @@ brw_update_buffer_texture_surface(struct gl_context *ctx,
  *surf_offset + 4,
  bo, 0, I915_GEM_DOMAIN_SAMPLER, 0);
 
-  int w = intel_obj->Base.Size / texel_size;
+  int w = (intel_obj->Base.Size / texel_size) - 1;
   surf[2] = ((w & 0x7f) << BRW_SURFACE_WIDTH_SHIFT |
 ((w >> 7) & 0x1fff) << BRW_SURFACE_HEIGHT_SHIFT);
   surf[3] = (((w >> 20) & 0x7f) << BRW_SURFACE_DEPTH_SHIFT |
diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
index 37e3174..c38843f 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
@@ -260,7 +260,7 @@ gen7_update_buffer_texture_surface(struct gl_context *ctx,
  I915_GEM_DOMAIN_SAMPLER, 0);
 
   int texel_size = _mesa_get_format_bytes(format);
-  int w = intel_obj->Base.Size / texel_size;
+  int w = (intel_obj->Base.Size / texel_size) - 1;
 
   /* note that these differ from GEN6 */
   surf[2] = SET_FIELD(w & 0x7f, GEN7_SURFACE_WIDTH) | /* bits 6:0 of size 
*/
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/24] glsl: Add IR node for atomic operations.

2013-09-17 Thread Paul Berry
On 15 September 2013 00:10, Francisco Jerez  wrote:

> Add a subclass of ir_rvalue that represents an atomic operation on
> some ir_variable.  Also define a new IR visitor method, and implement
> IR builder, printer and reader support for it.
>

I don't think this approach is going to be reliable.  A lot of optimization
passes assume that ir_rvalues are free of side effects, so they can freely
duplicate, consolidate, dead-code-eliminate, or reorder them.  It's not
safe to do any of those things to atomic operations, because it may change
the meaning of the program.

IMHO, the best way to represent atomic operations in the IR would be to use
LLVM-style intrinsics (we've discussed this idea previously here:
http://lists.freedesktop.org/archives/mesa-dev/2013-August/043768.html).
The basic idea is that in the IR, they would be treated as ir_call nodes,
but they wouldn't be subject to linking or inlining, so they would stay in
the IR tree all the way to the back-end, which would translate them into
their native back-end representation.  The advantage of this approach is
that all of our optimization passes already assume that function calls can
have side effects, and know how to track function arguments and return
values when computing things like variable lifetimes, so the optimization
logic wouldn't have to change at all.

If we don't go the intrinsics route, an alternative would be to represent
atomic operations as a subtype of ir_instruction.  The disadvantage of that
is that it means we'll have to teach all the optimization passes about
atomic operations, which is a lot of tedious work that's easy to get wrong.


> ---
>  src/glsl/ir.cpp|  2 +-
>  src/glsl/ir.h  | 42
> ++
>  src/glsl/ir_builder.cpp|  7 +
>  src/glsl/ir_builder.h  |  2 ++
>  src/glsl/ir_clone.cpp  | 11 +++
>  src/glsl/ir_constant_expression.cpp|  7 +
>  src/glsl/ir_hierarchical_visitor.cpp   | 16 ++
>  src/glsl/ir_hierarchical_visitor.h |  2 ++
>  src/glsl/ir_hv_accept.cpp  | 14 +
>  src/glsl/ir_print_visitor.cpp  | 20 
>  src/glsl/ir_print_visitor.h|  1 +
>  src/glsl/ir_reader.cpp | 38
> +++
>  src/glsl/ir_rvalue_visitor.cpp | 18 +++
>  src/glsl/ir_rvalue_visitor.h   |  3 ++
>  src/glsl/ir_visitor.h  |  2 ++
>  src/mesa/drivers/dri/i965/brw_fs.h |  1 +
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  5 +++
>  src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |  5 +++
>  src/mesa/program/ir_to_mesa.cpp|  7 +
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  7 +
>  21 files changed, 210 insertions(+), 1 deletion(-)
>
> diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp
> index 1b17999..83bcda2 100644
> --- a/src/glsl/ir.cpp
> +++ b/src/glsl/ir.cpp
> @@ -1565,7 +1565,7 @@ ir_swizzle::variable_referenced() const
>  ir_variable::ir_variable(const struct glsl_type *type, const char *name,
>  ir_variable_mode mode)
> : max_array_access(0), read_only(false), centroid(false),
> invariant(false),
> - mode(mode), interpolation(INTERP_QUALIFIER_NONE)
> + mode(mode), interpolation(INTERP_QUALIFIER_NONE), atomic()
>  {
> this->ir_type = ir_type_variable;
> this->type = type;
> diff --git a/src/glsl/ir.h b/src/glsl/ir.h
> index 2637b40..c4b4677 100644
> --- a/src/glsl/ir.h
> +++ b/src/glsl/ir.h
> @@ -83,6 +83,7 @@ enum ir_node_type {
> ir_type_texture,
> ir_type_emit_vertex,
> ir_type_end_primitive,
> +   ir_type_atomic,
> ir_type_max /**< maximum ir_type enum number, for validation */
>  };
>
> @@ -547,6 +548,14 @@ public:
> int binding;
>
> /**
> +* Location an atomic counter is stored at.
> +*/
> +   struct {
> +  int buffer_index;
> +  int offset;
> +   } atomic;
> +
> +   /**
>  * Built-in state that backs this uniform
>  *
>  * Once set at variable creation, \c state_slots must remain invariant.
> @@ -2085,6 +2094,39 @@ public:
> virtual ir_visitor_status accept(ir_hierarchical_visitor *);
>  };
>
> +enum ir_atomic_opcode {
> +   ir_atomic_read,
> +   ir_atomic_inc,
> +   ir_atomic_dec
> +};
> +
> +class ir_atomic : public ir_rvalue {
> +public:
> +   ir_atomic(enum ir_atomic_opcode op, ir_dereference *location = NULL)
> +  : op(op), location(location)
> +   {
> +  this->type = glsl_type::get_instance(GLSL_TYPE_UINT, 1, 1);
> +  this->ir_type = ir_type_atomic;
> +   }
> +
> +   virtual ir_atomic *clone(void *mem_ctx, struct hash_table *) const;
> +
> +   virtual ir_constant *constant_expression_value(struct hash_table
> *variable_context = NULL);
> +
> +   vi

Re: [Mesa-dev] [PATCH 10/24] glsl: Implement parser support for atomic counters.

2013-09-17 Thread Paul Berry
On 15 September 2013 00:10, Francisco Jerez  wrote:

> ---
>  src/glsl/ast.h| 15 ++
>  src/glsl/ast_to_hir.cpp   | 68
> +--
>  src/glsl/ast_type.cpp | 13 +++--
>  src/glsl/glsl_lexer.ll|  2 +-
>  src/glsl/glsl_parser.yy   | 13 +++--
>  src/glsl/glsl_parser_extras.h | 10 +++
>  6 files changed, 114 insertions(+), 7 deletions(-)
>
> diff --git a/src/glsl/ast.h b/src/glsl/ast.h
> index 26c4701..8a5d3fc 100644
> --- a/src/glsl/ast.h
> +++ b/src/glsl/ast.h
> @@ -405,6 +405,12 @@ struct ast_type_qualifier {
>*/
>   unsigned explicit_binding:1;
>
> + /**
> +  * Flag set if GL_ARB_shader_atomic counter "offset" layout
> +  * qualifier is used.
> +  */
> + unsigned explicit_offset:1;
> +
>   /** \name Layout qualifiers for GL_AMD_conservative_depth */
>   /** \{ */
>   unsigned depth_any:1;
> @@ -468,6 +474,15 @@ struct ast_type_qualifier {
> int binding;
>
> /**
> +* Offset specified via GL_ARB_shader_atomic_counter's "offset"
> +* keyword.
> +*
> +* \note
> +* This field is only valid if \c explicit_offset is set.
> +*/
> +   int offset;
> +
> +   /**
>  * Return true if and only if an interpolation qualifier is present.
>  */
> bool has_interpolation() const;
> diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
> index fcca5df..7edbee4 100644
> --- a/src/glsl/ast_to_hir.cpp
> +++ b/src/glsl/ast_to_hir.cpp
> @@ -1197,6 +1197,9 @@ ast_expression::hir(exec_list *instructions,
>   !state->check_version(120, 300, &loc,
> "array comparisons forbidden")) {
>  error_emitted = true;
> +  } else if ((op[0]->type->atomic_size() ||
> op[1]->type->atomic_size())) {
> +_mesa_glsl_error(&loc, state, "atomic counter comparisons
> forbidden");
> +error_emitted = true;
>

Do we have spec text to back this up?  I looked around and couldn't find
anything.  It seems like doing equality comparisons on atomic counters is
ill-defined, though (do two counters compare equal if they have equal
values, or if they point to the same counter?  Both possibilities seem
dodgy).  Maybe we should file a spec bug so we can get clarification from
khronos about what's intended.

Note that we currently permit other comparisons that are similarly dodgy
(e.g. comparing samplers).  It would be nice to get clarification from
khronos about this too.


>}
>
>if (error_emitted) {
> @@ -1952,10 +1955,19 @@ validate_binding_qualifier(struct
> _mesa_glsl_parse_state *state,
>
>   return false;
>}
> +   } else if (var->type->atomic_size()) {
> +  if (unsigned(qual->binding) >= ctx->Const.MaxAtomicBufferBindings) {
> + _mesa_glsl_error(loc, state, "layout(binding = %d) exceeds the "
> +  " maximum number of atomic counter buffer
> bindings"
> +  "(%d)", qual->binding,
> +  ctx->Const.MaxAtomicBufferBindings);
> +
> + return false;
> +  }
> } else {
>_mesa_glsl_error(loc, state,
> "the \"binding\" qualifier only applies to uniform
> "
> -   "blocks, samplers, or arrays of samplers");
> +   "blocks, samplers, atomic counters, or arrays
> thereof");
>return false;
> }
>
> @@ -1983,7 +1995,7 @@ apply_type_qualifier_to_variable(const struct
> ast_type_qualifier *qual,
> }
>
> if (qual->flags.q.constant || qual->flags.q.attribute
> -   || qual->flags.q.uniform
> +   || (qual->flags.q.uniform && var->type !=
> glsl_type::atomic_uint_type)
> || (qual->flags.q.varying && (state->target == fragment_shader)))
>var->read_only = 1;
>

I'm not entirely convinced this is right.  An atomic counter, like a
sampler, is a handle to an underlying object, not the underlying object
itself.  The handle should be considered read-only even if the object it
refers to is mutable.  That way we still prohibit

uniform atomic_uint foo;
uniform atomic_uint bar;
void main() {
   foo = bar;
}


>
> @@ -2225,6 +2237,35 @@ apply_type_qualifier_to_variable(const struct
> ast_type_qualifier *qual,
>var->binding = qual->binding;
> }
>
> +   if (var->type->atomic_size()) {
> +  if (var->mode == ir_var_uniform) {
> + if (var->explicit_binding) {
> +_mesa_glsl_parse_state::atomic_counter_binding &binding =
> +   state->atomic_counter_bindings[var->binding];
> +
> +if (binding.next_offset % ATOMIC_COUNTER_SIZE)
> +   _mesa_glsl_error(loc, state,
> +"misaligned atomic counter offset");
> +
> +if (binding.offsets.count(binding.next_offset))
> +   _mesa_glsl_error(loc, state,
> +"atomic co

Re: [Mesa-dev] [PATCH 0/6] Support for 10 bpc EGLSurface

2013-09-17 Thread Chad Versace

On 09/15/2013 12:16 AM, Kristian Høgsberg wrote:

Hi,

This little series adds support for creating EGLSurfaces with color buffers
using the ARGB2101010 pixel format.  We the new KMS addFB2 ioctl we can
create KMS framebuffers with that format and this series ends up adding
the pixel format to gbm so we can generate buffers with that format.

The first two patches make sure we don't advertise ARGB2101010 configs that
you can use with an ARGB X window.  The X visual to EGL config
matching just compares visual depth and EGL config buffer size, and they're
both 32 bits for those two pixel formats.  Unless we match on the
pixel layout, we will advertise EGLConfigs with 10 bpc that you can use
with a ARGB X window.

With this patch series, I can run weston on KMS in 10 bpc, but anything that
uses gbm will benefit from this.  We also add support for 10 bpc
GLX pixmaps and pbuffers.

Kristian


(CC'ing Ian to alert him about the EGLConfig sort order.)

This series looks good to me.
Reviewed-by: Chad Versace 

Something to note is that eglChooseConfig(r=8, g=8, b=8) sorts the
RGBA1010102 EGLConfigs *before* the RGBA configs. I confirmed
this on GBM. The EGL spec requires that behavior, but I find it
surprising.

Someone else found it surprising too and complained
loudly to Khronos, resulting in Footnote 8 on page 26 of the
EGL 1.4 2013-02-11 spec:

"[fn8] This rule places configs with deeper color buffers first in the list 
returned by eglChooseConfig.
Applications may find this counterintuitive if they expect configs with smaller 
buffer sizes to be
returned first."

Luckily for now, this surprise will hurt no one on X because X/EGL advertises 
no RGBA1010102 configs.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 12/24] glsl: Add predicate to determine if an IR node has side effects.

2013-09-17 Thread Paul Berry
On 15 September 2013 00:10, Francisco Jerez  wrote:

> And fix the dead code elimination pass so atomic writes aren't
> optimized out in cases where the return value isn't used by the
> program.
>

As I mentioned in my comments on patch 9, I'd prefer if we went with a
different approach where we don't add ir_rvalue nodes that have side
effects at all, but instead stick with our current system where only
ir_instruction nodes can have side effects.

However, if we do wind up going with this approach, we'll need to make
has_side_effects() recurse through the expression tree, so that it can see
that expressions like (atomicCounterIncrement(c) + 1) and
foo[atomicCounterIncrement(c)] also have side effects.

We would also need to modify several other optimization passes to check for
side effects:

- opt_dead_code_local (which does the same job as opt_dead_code, but within
basic blocks)

- opt_tree_grafting - as is, this would try to transform:

uint x = atomicCounterIncrement(c);
uint y = atomicCounterIncrement(c);
uvec3 v = uvec3(y, x, y);

Into:

uint y = atomicCounterIncrement(c);
uvec3 v = uvec3(y, atomicCounterIncrement(c), y);

Which is not equivalent.

- lower_variable_index_to_cond_assign, lower_vec_index_to_cond_assign, and
lower_vector_insert (which sometimes clone arbitrary rvalues)

And possibly others I've missed.


There's also at least one contrived case which is propably not worth fixing:

- opt_flip_matrices (in principle this might try to rewrite
gl_TextureMatrix[atomicCounterIncrement(c)] *
foo[atomicCounterIncrement(c)] to foo[atomicCounterIncrement(c)] *
gl_TextureMatrixTranspose(c), which is technically not the same operation).


> ---
>  src/glsl/ir.h  | 16 
>  src/glsl/opt_dead_code.cpp |  3 ++-
>  2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/src/glsl/ir.h b/src/glsl/ir.h
> index c4b4677..4f506a3 100644
> --- a/src/glsl/ir.h
> +++ b/src/glsl/ir.h
> @@ -139,6 +139,17 @@ public:
> virtual class ir_jump *  as_jump() { return
> NULL; }
> /*@}*/
>
> +   /**
> +* Determine if an IR instruction has side effects other than its
> +* returned value(s).  Optimization passes are expected to be
> +* especially careful with reordering or removing these, unless
> +* they know what they are doing.
> +*/
> +   virtual bool has_side_effects() const
> +   {
> +  return false;
> +   }
> +
>  protected:
> ir_instruction()
> {
> @@ -2120,6 +2131,11 @@ public:
>
> virtual ir_visitor_status accept(ir_hierarchical_visitor *);
>
> +   virtual bool has_side_effects() const
> +   {
> +  return true;
> +   }
> +
> /** Kind of atomic instruction. */
> enum ir_atomic_opcode op;
>
> diff --git a/src/glsl/opt_dead_code.cpp b/src/glsl/opt_dead_code.cpp
> index b65e5c2..fd05034 100644
> --- a/src/glsl/opt_dead_code.cpp
> +++ b/src/glsl/opt_dead_code.cpp
> @@ -81,7 +81,8 @@ do_dead_code(exec_list *instructions, bool
> uniform_locations_assigned)
>   */
>  if (entry->var->mode != ir_var_function_out &&
>  entry->var->mode != ir_var_function_inout &&
> - entry->var->mode != ir_var_shader_out) {
> + entry->var->mode != ir_var_shader_out &&
> + !entry->assign->rhs->has_side_effects()) {
> entry->assign->remove();
> progress = true;
>
> --
> 1.8.3.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/24] Support for ARB_shader_atomic_counters.

2013-09-17 Thread Paul Berry
On 15 September 2013 00:10, Francisco Jerez  wrote:

> This patch series implements support for the
> ARB_shader_atomic_counters extension, which is part of GL core since
> GL 4.2.  It includes patches adding support for the new APIs and GLSL
> language features, and working back-end code for Intel Gen7 hardware
> -- Ivy Bridge should work with these patches alone, Haswell is going
> to need a small kernel change I'll probably submit for review during
> the next week.
>
> The series depends on Ken's surface state tidying patches [1] and
> on patches 1-4, which are seemingly unrelated fixes.
>
> There's also a series of ~30 unit tests for this extension I will send
> to the piglit mailing list soon.
>
> Thanks.
>
> [1]
> http://lists.freedesktop.org/archives/mesa-dev/2013-September/044691.html
>
> [PATCH 01/24] mesa: Fix misplaced includes of "main/uniforms.h".
> [PATCH 02/24] glsl: Initialize all member variables of
> _mesa_glsl_parse_state on construction.
> [PATCH 03/24] i965: Initialize all member variables of vec4_instruction on
> construction.
> [PATCH 04/24] ralloc: Unify overloads of the new operator and guarantee
> object destruction.
> [PATCH 05/24] glapi: Add support for ARB_shader_atomic_counters.
> [PATCH 06/24] mesa: Add support for ARB_shader_atomic_counters.
> [PATCH 07/24] glsl: Add extension enables for ARB_shader_atomic_counters.
> [PATCH 08/24] glsl: Add new atomic_uint built-in GLSL type.
> [PATCH 09/24] glsl: Add IR node for atomic operations.
> [PATCH 10/24] glsl: Implement parser support for atomic counters.
> [PATCH 11/24] glsl: Add built-in functions and constants required for
> ARB_shader_atomic_counters.
> [PATCH 12/24] glsl: Add predicate to determine if an IR node has side
> effects.
> [PATCH 13/24] glsl: Linker support for ARB_shader_atomic_counters.
> [PATCH 14/24] i965: Define vtbl method that initializes an untyped R/W
> surface.
> [PATCH 15/24] i965: Implement ABO surface state emission.
> [PATCH 16/24] i965/gen7: Implement code generation for untyped atomic
> instructions.
> [PATCH 17/24] i965/gen7: Implement code generation for untyped surface
> read instructions.
> [PATCH 18/24] i965: Add a 'has_side_effects' back-end instruction
> predicate.
> [PATCH 19/24] i965: Handle the 'atomic_uint' GLSL type.
> [PATCH 20/24] i965: Add brw_reg constructors taking a dynamically
> determined vector width.
> [PATCH 21/24] i965/gen7: Handle atomic instructions from the FS back-end.
> [PATCH 22/24] i965/gen7: Handle atomic instructions from the VEC4 back-end.
> [PATCH 23/24] i965/gen7: Expose ARB_shader_atomic_counters.
> [PATCH 24/24] i965: Simplify the shader time code by using atomic counter
> helpers.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>

I'm halfway through the series and I'm out of time for code review today.
I sent comments on patches 3, 4, 8, 9, 10, and 12.  Patches 1, 2, 5, 7, and
11 are:

Reviewed-by: Paul Berry 

Patch 6 deserves the attention of someone who is more familiar with the
Mesa front end than I am, so on that one I'm just going to say:

Acked-by: Paul Berry 

I'll try to get to patches 13-24 as soon as I can.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/24] i965: Initialize all member variables of vec4_instruction on construction.

2013-09-17 Thread Chad Versace

Patches 1-3 are
Reviewed-by: Chad Versace 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/hsw: compute DDX in a subspan based only on top row

2013-09-17 Thread Mark Mueller
On Mon, Sep 16, 2013 at 1:31 AM, Chia-I Wu  wrote:

> On Mon, Sep 16, 2013 at 4:12 PM, Chia-I Wu  wrote:
> > On Mon, Sep 16, 2013 at 3:50 AM, Mark Mueller 
> wrote:
> >>
> >>
> >>
> >> On Fri, Sep 13, 2013 at 2:15 PM, Paul Berry 
> wrote:
> >>>
> >>> On 12 September 2013 22:06, Chia-I Wu  wrote:
> 
>  From: Chia-I Wu 
>
> >>
> >>
> >> This scenario is where I'd place my bets, especially given that the
> numbers
> >> are based on Xonotic. I benchmarked this patch using Xonotic on Bay
> Trail as
> >> is and by replacing !brw->is_haswell with !brw->is_baytrail. With ultra
> and
> >> ultimate levels at medium and high resolutions, the results were all
> >> essentially the same at comparable resolutions and quality levels.
> > Isn't Bay Trail based on Ivy Bridge?
> For Bay Trail, this might help you
>
>
> http://lists.freedesktop.org/archives/mesa-dev/2013-September/044288.html
>
> if you are interested.
>

Testing with Bay Trail shows no performance improvement with this patch.
Most likely there are one or more CPU bottlenecks on Bay Tail that hide a
majority of the performance gains of this change.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/24] ralloc: Unify overloads of the new operator and guarantee object destruction.

2013-09-17 Thread Chad Versace

I like this patch. It removes a lot of boilerplate. And, it makes object
destruction easier to reason about, making it easier to reason about tricky
destruction bugs.

On 09/17/2013 11:59 AM, Paul Berry wrote:

On 15 September 2013 00:10, Francisco Jerez  wrote:


This patch introduces a pair of helper functions providing a common
implementation of the "new" and "delete" operators for all C++ classes
that are allocated by ralloc via placement new.  The 'ralloc_new'
helper function takes care of setting up an ralloc destructor callback
that will call the appropriate destructor before the memory allocated
to an object is released.

Until now objects needed to call 'ralloc_set_destructor' explicitly
with a pointer to a static method which in turn called the actual
destructor in order to get something that should be transparent to
them.  After this patch they'll only need to call 'ralloc_new' from
the new operator and 'ralloc_delete' from the delete operator, turning
all overloads of new and delete into one-liners.
---
  src/glsl/ast.h | 26 +++--
  src/glsl/glsl_parser_extras.h  |  9 +
  src/glsl/glsl_symbol_table.cpp |  7 +---
  src/glsl/glsl_symbol_table.h   | 23 +--
  src/glsl/glsl_types.h  | 11 +-
  src/glsl/ir_function_detect_recursion.cpp  | 11 +-
  src/glsl/list.h| 22 ++-
  src/glsl/loop_analysis.h   | 14 +--
  src/glsl/ralloc.h  | 44
++
  src/mesa/drivers/dri/i965/brw_cfg.h| 14 +--
  src/mesa/drivers/dri/i965/brw_fs.h | 21 ++-
  src/mesa/drivers/dri/i965/brw_fs_live_variables.h  |  7 +---
  src/mesa/drivers/dri/i965/brw_vec4.h   | 21 ++-
  .../drivers/dri/i965/brw_vec4_live_variables.h |  7 +---
  src/mesa/program/ir_to_mesa.cpp|  7 +---
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  7 +---
  16 files changed, 77 insertions(+), 174 deletions(-)





diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h
b/src/mesa/drivers/dri/i965/brw_cfg.h
index 95a18e9..f8037a9 100644
--- a/src/mesa/drivers/dri/i965/brw_cfg.h
+++ b/src/mesa/drivers/dri/i965/brw_cfg.h
@@ -41,12 +41,7 @@ class bblock_t {
  public:
 static void* operator new(size_t size, void *ctx)
 {
-  void *node;
-
-  node = rzalloc_size(ctx, size);
-  assert(node != NULL);
-
-  return node;
+  return ralloc_new(size, ctx);
 }

 bblock_link *make_list(void *mem_ctx);


Post-patch, it's safe for bblock_t to lack an override of operator delete
only because it's destructor is a no-op. However, it remains a no-op only if
the exec_list destructor remains a no-op. I'd like to see bblock_t 
future-proofed
now against any double-destructor bugs by overriding operator delete.


@@ -70,12 +65,7 @@ class cfg_t {
  public:
 static void* operator new(size_t size, void *ctx)
 {
-  void *node;
-
-  node = rzalloc_size(ctx, size);
-  assert(node != NULL);
-
-  return node;
+  return ralloc_new(size, ctx);
 }



I'm worried about this one--it introduces a behavioural change.
Previously, if a cfg_t object was reclaimed through the standard ralloc
mechanism, cfg_t::~cfg_t() would *not* be called.  Now it will be.  Since
cfg_t::~cfg_t() calls ralloc_free(mem_ctx), and since cfg_t's mem_ctx is
usually (always?) the same mem_ctx that the cfg_t is contained in, I think
that will result in double-freeing of the mem_ctx.

However, looking further into the users of cfg_t, it looks like they all
pass it a "borrowed" mem_ctx (in other words, the caller always retains
responsibility for freeing the mem_ctx.  So I believe we have a
pre-existing double-free bug.  The correct solution is probably to get rid
of the cfg_t destructor entirely.


Whatever you do with cfg_t's destructor, I'd like to an override of operator
delete for the same reason I stated for bblock_t.

Actually, for each class that this patch touches, I want to see an override
of operator delete if one does not already exist, lest a future someone 
introduces
a bug by changing that class's destructor from trivial to non-trivial but 
forgets
to override delete as well.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix writemask != 0 assertions on Sandybridge.

2013-09-17 Thread Eric Anholt
Kenneth Graunke  writes:

> This fixes myriads of regressions since commit 169f9c030c16d1247a3a7629
> ("i965: Add an assertion that writemask != NULL for non-ARFs.").
>
> On Sandybridge, our control flow handling (such as brw_IF) does:
>
>brw_set_dest(p, insn, brw_imm_w(0));
>insn->bits1.branch_gen6.jump_count = 0;
>
> This results in a IMM destination with zero for the writemask.  IMM
> destinations are rather bizarre (I'm guessing the file is irrelevant),
> but the code has been working for ages, so I'm loathe to change it.
>
> Fixes glxgears on Sandybridge.

The specs were explicit about "this has to be imm word", for mysterious
reasons.

Reviewed-by: Eric Anholt 


pgpB3YnoMfSk3.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] vega: Use pipe_context::blit instead of util_blit_pixels_tex.

2013-09-17 Thread Marek Olšák
On Tue, Sep 17, 2013 at 8:32 PM,   wrote:
> From: José Fonseca 
>
> Only compile-tested but it seems straightforward.
> ---
>  src/gallium/state_trackers/vega/vg_context.c | 44 
> 
>  src/gallium/state_trackers/vega/vg_context.h |  2 --
>  2 files changed, 26 insertions(+), 20 deletions(-)
>
> diff --git a/src/gallium/state_trackers/vega/vg_context.c 
> b/src/gallium/state_trackers/vega/vg_context.c
> index c636188..46c7d96 100644
> --- a/src/gallium/state_trackers/vega/vg_context.c
> +++ b/src/gallium/state_trackers/vega/vg_context.c
> @@ -41,7 +41,6 @@
>  #include "cso_cache/cso_context.h"
>
>  #include "util/u_memory.h"
> -#include "util/u_blit.h"
>  #include "util/u_sampler.h"
>  #include "util/u_surface.h"
>  #include "util/u_format.h"
> @@ -138,8 +137,6 @@ struct vg_context * vg_create_context(struct pipe_context 
> *pipe,
> ctx->sc = shaders_cache_create(ctx);
> ctx->shader = shader_create(ctx);
>
> -   ctx->blit = util_create_blit(ctx->pipe, ctx->cso_context);
> -
> return ctx;
>  }
>
> @@ -147,7 +144,6 @@ void vg_destroy_context(struct vg_context *ctx)
>  {
> struct pipe_resource **cbuf = &ctx->mask.cbuf;
>
> -   util_destroy_blit(ctx->blit);
> renderer_destroy(ctx->renderer);
> shaders_cache_destroy(ctx->sc);
> shader_destroy(ctx->shader);
> @@ -443,23 +439,35 @@ static void vg_prepare_blend_texture(struct vg_context 
> *ctx,
>   struct pipe_sampler_view *src)
>  {
> struct st_framebuffer *stfb = ctx->draw_buffer;
> -   struct pipe_surface *surf;
> -   struct pipe_surface surf_tmpl;
> +   struct pipe_context *pipe = ctx->pipe;
> +   struct pipe_blit_info info;
>
> vg_context_update_blend_texture_view(ctx, stfb->width, stfb->height);
>
> -   u_surface_default_template(&surf_tmpl, stfb->blend_texture_view->texture);
> -   surf = ctx->pipe->create_surface(ctx->pipe,
> -stfb->blend_texture_view->texture,
> -&surf_tmpl);
> -   if (surf) {
> -  util_blit_pixels_tex(ctx->blit,
> -   src, 0, 0, stfb->width, stfb->height,
> -   surf, 0, 0, stfb->width, stfb->height,
> -   0.0, PIPE_TEX_MIPFILTER_NEAREST);
> -
> -  pipe_surface_reference(&surf, NULL);
> -   }
> +   memset(&info, 0, sizeof info);
> +   info.dst.resource = stfb->blend_texture_view->texture;
> +   info.dst.level = 0;
> +   info.dst.box.x = 0;
> +   info.dst.box.y = 0;
> +   info.dst.box.z = 0;
> +   info.dst.box.width = stfb->width;
> +   info.dst.box.height = stfb->height;
> +   info.dst.box.depth = 1;
> +   info.dst.format = stfb->blend_texture_view->format;
> +   info.src.resource = src->texture;
> +   info.src.level = src->u.tex.first_level;
> +   info.src.box.x = 0;
> +   info.src.box.y = 0;
> +   info.src.box.z = src->u.tex.first_layer;
> +   info.src.box.width = stfb->width;
> +   info.src.box.height = stfb->height;
> +   info.src.box.depth = 1;
> +   info.src.format = src->format;
> +   info.mask = PIPE_MASK_RGBA;
> +   info.filter = PIPE_TEX_MIPFILTER_NEAREST;

This should be PIPE_TEX_FILTER_NEAREST, because pipe->blit doesn't do
mip filtering. Not that it matters much.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] i965: Fix off by one errors in texture buffer size calculations.

2013-09-17 Thread Eric Anholt
Kenneth Graunke  writes:

> The value that's split into width/height/depth needs to be the size of
> the buffer minus one.  This makes it consistent with the constant buffer
> and shader time SURFACE_STATE setup code.
>
> Signed-off-by: Kenneth Graunke 
> Cc: Paul Berry 
> Cc: Eric Anholt 
> ---
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c  | 2 +-
>  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> index 8d87786..0078161 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> @@ -228,7 +228,7 @@ brw_update_buffer_texture_surface(struct gl_context *ctx,
> *surf_offset + 4,
> bo, 0, I915_GEM_DOMAIN_SAMPLER, 0);
>  
> -  int w = intel_obj->Base.Size / texel_size;
> +  int w = (intel_obj->Base.Size / texel_size) - 1;
>surf[2] = ((w & 0x7f) << BRW_SURFACE_WIDTH_SHIFT |
>((w >> 7) & 0x1fff) << BRW_SURFACE_HEIGHT_SHIFT);
>surf[3] = (((w >> 20) & 0x7f) << BRW_SURFACE_DEPTH_SHIFT |
> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> index 37e3174..c38843f 100644
> --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> @@ -260,7 +260,7 @@ gen7_update_buffer_texture_surface(struct gl_context *ctx,
> I915_GEM_DOMAIN_SAMPLER, 0);
>  
>int texel_size = _mesa_get_format_bytes(format);
> -  int w = intel_obj->Base.Size / texel_size;
> +  int w = (intel_obj->Base.Size / texel_size) - 1;

This is all:

Reviewed-by: Eric Anholt 


pgplZ7MYD3syt.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] util/u_blit: Implement util_blit_pixels via pipe_context::blit.

2013-09-17 Thread Marek Olšák
Isn't u_blit a candidate for removal considering it has no user in Mesa?

In any case, for the series:

Reviewed-by: Marek Olšák 

Marek

On Tue, Sep 17, 2013 at 8:33 PM,   wrote:
> From: José Fonseca 
>
> This removes a lot of code, but not everything, as util_blit_pixels_tex
> is still useful when one needs to override pipe_sampler_view::swizzle_?.
> ---
>  src/gallium/auxiliary/util/u_blit.c | 447 
> +++-
>  1 file changed, 37 insertions(+), 410 deletions(-)
>
> diff --git a/src/gallium/auxiliary/util/u_blit.c 
> b/src/gallium/auxiliary/util/u_blit.c
> index e9bec4a..4ba71b9 100644
> --- a/src/gallium/auxiliary/util/u_blit.c
> +++ b/src/gallium/auxiliary/util/u_blit.c
> @@ -57,29 +57,20 @@ struct blit_state
> struct pipe_context *pipe;
> struct cso_context *cso;
>
> -   struct pipe_blend_state blend_write_color, blend_keep_color;
> +   struct pipe_blend_state blend_write_color;
> struct pipe_depth_stencil_alpha_state dsa_keep_depthstencil;
> -   struct pipe_depth_stencil_alpha_state dsa_write_depthstencil;
> -   struct pipe_depth_stencil_alpha_state dsa_write_depth;
> -   struct pipe_depth_stencil_alpha_state dsa_write_stencil;
> struct pipe_rasterizer_state rasterizer;
> struct pipe_sampler_state sampler;
> struct pipe_viewport_state viewport;
> struct pipe_vertex_element velem[2];
> -   enum pipe_texture_target internal_target;
>
> void *vs;
> void *fs[PIPE_MAX_TEXTURE_TYPES][TGSI_WRITEMASK_XYZW + 1];
> -   void *fs_depthstencil[PIPE_MAX_TEXTURE_TYPES];
> -   void *fs_depth[PIPE_MAX_TEXTURE_TYPES];
> -   void *fs_stencil[PIPE_MAX_TEXTURE_TYPES];
>
> struct pipe_resource *vbuf;  /**< quad vertices */
> unsigned vbuf_slot;
>
> float vertices[4][2][4];   /**< vertex/texcoords for quad */
> -
> -   boolean has_stencil_export;
>  };
>
>
> @@ -103,20 +94,6 @@ util_create_blit(struct pipe_context *pipe, struct 
> cso_context *cso)
> /* disabled blending/masking */
> ctx->blend_write_color.rt[0].colormask = PIPE_MASK_RGBA;
>
> -   /* depth stencil states */
> -   ctx->dsa_write_depth.depth.enabled = 1;
> -   ctx->dsa_write_depth.depth.writemask = 1;
> -   ctx->dsa_write_depth.depth.func = PIPE_FUNC_ALWAYS;
> -   ctx->dsa_write_stencil.stencil[0].enabled = 1;
> -   ctx->dsa_write_stencil.stencil[0].func = PIPE_FUNC_ALWAYS;
> -   ctx->dsa_write_stencil.stencil[0].fail_op = PIPE_STENCIL_OP_REPLACE;
> -   ctx->dsa_write_stencil.stencil[0].zpass_op = PIPE_STENCIL_OP_REPLACE;
> -   ctx->dsa_write_stencil.stencil[0].zfail_op = PIPE_STENCIL_OP_REPLACE;
> -   ctx->dsa_write_stencil.stencil[0].valuemask = 0xff;
> -   ctx->dsa_write_stencil.stencil[0].writemask = 0xff;
> -   ctx->dsa_write_depthstencil.depth = ctx->dsa_write_depth.depth;
> -   ctx->dsa_write_depthstencil.stencil[0] = 
> ctx->dsa_write_stencil.stencil[0];
> -
> /* rasterizer */
> ctx->rasterizer.cull_face = PIPE_FACE_NONE;
> ctx->rasterizer.half_pixel_center = 1;
> @@ -147,14 +124,6 @@ util_create_blit(struct pipe_context *pipe, struct 
> cso_context *cso)
>ctx->vertices[i][1][3] = 1.0f; /* q */
> }
>
> -   if(pipe->screen->get_param(pipe->screen, PIPE_CAP_NPOT_TEXTURES))
> -  ctx->internal_target = PIPE_TEXTURE_2D;
> -   else
> -  ctx->internal_target = PIPE_TEXTURE_RECT;
> -
> -   ctx->has_stencil_export =
> -  pipe->screen->get_param(pipe->screen, PIPE_CAP_SHADER_STENCIL_EXPORT);
> -
> return ctx;
>  }
>
> @@ -178,18 +147,6 @@ util_destroy_blit(struct blit_state *ctx)
>}
> }
>
> -   for (i = 0; i < PIPE_MAX_TEXTURE_TYPES; i++) {
> -  if (ctx->fs_depthstencil[i]) {
> - pipe->delete_fs_state(pipe, ctx->fs_depthstencil[i]);
> -  }
> -  if (ctx->fs_depth[i]) {
> - pipe->delete_fs_state(pipe, ctx->fs_depth[i]);
> -  }
> -  if (ctx->fs_stencil[i]) {
> - pipe->delete_fs_state(pipe, ctx->fs_stencil[i]);
> -  }
> -   }
> -
> pipe_resource_reference(&ctx->vbuf, NULL);
>
> FREE(ctx);
> @@ -217,63 +174,6 @@ set_fragment_shader(struct blit_state *ctx, uint 
> writemask,
>
>
>  /**
> - * Helper function to set the shader which writes depth and stencil.
> - */
> -static INLINE void
> -set_depthstencil_fragment_shader(struct blit_state *ctx,
> - enum pipe_texture_target pipe_tex)
> -{
> -   if (!ctx->fs_depthstencil[pipe_tex]) {
> -  unsigned tgsi_tex = util_pipe_tex_to_tgsi_tex(pipe_tex, 0);
> -
> -  ctx->fs_depthstencil[pipe_tex] =
> - util_make_fragment_tex_shader_writedepthstencil(ctx->pipe, tgsi_tex,
> -  TGSI_INTERPOLATE_LINEAR);
> -   }
> -
> -   cso_set_fragment_shader_handle(ctx->cso, ctx->fs_depthstencil[pipe_tex]);
> -}
> -
> -
> -/**
> - * Helper function to set the shader which writes depth.
> - */
> -static INLINE void
> -set_depth_fragment_shader(struct blit_state *ctx,
> -  enum pipe_texture_target pipe_tex)
> -{
> -   if (!ctx->fs_depth[pip

Re: [Mesa-dev] [PATCH 3/3] glsl: Drop shader_bit_encoding version checks.

2013-09-17 Thread Ian Romanick
On 09/16/2013 04:20 PM, Paul Berry wrote:
> On 13 September 2013 11:25, Kenneth Graunke  > wrote:
> 
> We now set the ARB_shader_bit_encoding flag for versions that support
> this functionality, so we don't need to double check it here.
> 
> Signed-off-by: Kenneth Graunke  >
> Cc: Ian Romanick mailto:i...@freedesktop.org>>
> ---
>  src/glsl/builtin_functions.cpp | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/src/glsl/builtin_functions.cpp
> b/src/glsl/builtin_functions.cpp
> index c468bd5..b020a7c 100644
> --- a/src/glsl/builtin_functions.cpp
> +++ b/src/glsl/builtin_functions.cpp
> @@ -182,8 +182,7 @@ shader_texture_lod_and_rect(const
> _mesa_glsl_parse_state *state)
>  static bool
>  shader_bit_encoding(const _mesa_glsl_parse_state *state)
>  {
> -   return state->is_version(330, 300) ||
> -  state->ARB_shader_bit_encoding_enable ||
> +   return state->ARB_shader_bit_encoding_enable ||
>state->ARB_gpu_shader5_enable;
>  }
> 
> --
> 1.8.3.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org 
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 
> I'm not a huge fan of this approach.  We're currently inconsistent in
> Mesa as to how we handle GLSL extensions that were adopted into later
> versions of GLSL (or backports from later versions of GLSL).  For these
> extensions:
> 
> ARB_draw_instanced
> ARB_fragment_coord_conventions
> ARB_gpu_shader5
> ARB_shader_texture_lod

This extension is slightly special because of lolz.

> ARB_shading_language_420pack
> ARB_shading_language_packing
> ARB_texture_cube_map_array
> ARB_texture_multisample
> OES_texture_3D
> 
> we use what I'll call the "enable means explicitly enabled" style, which
> means we only set the "_enable" flag if the shader contains text like
> "#extension ARB_foo: enable"; if the extension's functionality is
> provided by the given version of GLSL, we fold that into the if-test
> that enables the functionality--e.g.:
> 
>if (state->ARB_draw_instanced_enable || state->is_version(140, 300))
>   add_system_value(SYSTEM_VALUE_INSTANCE_ID, int_t, "gl_InstanceID");
> 
> But for these extensions:
> 
> ARB_explicit_attrib_location
> ARB_texture_rectangle
> 
> we use what I'll call the "enable means available" style, which means
> that we set the "_enable" flag when processing the version directive (or
> in the _mesa_glsl_parse_state constructor), to indicate that the
> functionality is available, whether or not the user explicitly requested
> it or not.
> 
> (Note: for ARB_uniform_buffer_object it looks like we do some of each
> style!)
> 
> 
> Personally I'd prefer to see us consistently adopt the "enable means
> explicitly enabled" style (to me, the "enable means available" style
> feels like we're fibbing to ourselves).  An additional advantage of the
> "enable means explicitly enabled" style is that it allows us to handle
> cases (such as in ARB_draw_instanced) where the extension differs
> slightly from the functionality that was eventually folded into GLSL. 
> Another advantage is that if a client-supplied shader does something
> silly like "#extension ARB_draw_instanced: disable" in a GLSL 1.40+
> shader, we won't accidentally disable built-in functionality (although
> in practice this is extremely unlikely to ever crop up).
> 
> If it becomes too onerous to add the "|| state->is_version(...)" checks
> all over the place we can always make some inline functions, e.g.:
> 
> struct _mesa_glsl_parse_state {
>...
>bool is_draw_instanced_available() const
>{
>   return this->ARB_draw_instanced_enable || this->is_version(140, 300);
>}
>...
> };

If we /is/has/, then I like this approach.  I also think we should use
it for all extension functionality in the compiler.  We currently have a
problem that a dumb shader like the following will do the wrong thing:

#version 330
#extension GL_ARB_explicit_attrib_location: disable

layout(location=0) in vec4 v;

It's not a big deal, but it has bugged me for years.

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/6] Support for 10 bpc EGLSurface

2013-09-17 Thread Kristian Høgsberg
On Tue, Sep 17, 2013 at 1:49 PM, Chad Versace
 wrote:
> On 09/15/2013 12:16 AM, Kristian Høgsberg wrote:
>>
>> Hi,
>>
>> This little series adds support for creating EGLSurfaces with color
>> buffers
>> using the ARGB2101010 pixel format.  We the new KMS addFB2 ioctl we can
>> create KMS framebuffers with that format and this series ends up adding
>> the pixel format to gbm so we can generate buffers with that format.
>>
>> The first two patches make sure we don't advertise ARGB2101010 configs
>> that
>> you can use with an ARGB X window.  The X visual to EGL config
>> matching just compares visual depth and EGL config buffer size, and
>> they're
>> both 32 bits for those two pixel formats.  Unless we match on the
>> pixel layout, we will advertise EGLConfigs with 10 bpc that you can use
>> with a ARGB X window.
>>
>> With this patch series, I can run weston on KMS in 10 bpc, but anything
>> that
>> uses gbm will benefit from this.  We also add support for 10 bpc
>> GLX pixmaps and pbuffers.
>>
>> Kristian
>
>
> (CC'ing Ian to alert him about the EGLConfig sort order.)
>
> This series looks good to me.
> Reviewed-by: Chad Versace 
>
> Something to note is that eglChooseConfig(r=8, g=8, b=8) sorts the
> RGBA1010102 EGLConfigs *before* the RGBA configs. I confirmed
> this on GBM. The EGL spec requires that behavior, but I find it
> surprising.
>
> Someone else found it surprising too and complained
> loudly to Khronos, resulting in Footnote 8 on page 26 of the
> EGL 1.4 2013-02-11 spec:
>
> "[fn8] This rule places configs with deeper color buffers first in the list
> returned by eglChooseConfig.
> Applications may find this counterintuitive if they expect configs with
> smaller buffer sizes to be
> returned first."

Yup, I hit that with weston too.  It's not surprising that it works
that way, but I think a lot of applications assume that there will be
no 10 bit configs.  For gbm, I'm thinking that we may want to put the
GBM format code into the EGL_NATIVE_VISUAL_ID config attribute so you
can choose a config, then read out the id and use it when creating the
GBM surface.  Or alternatively, pick a GBM format first and then look
for a config where the visual ID matches.

> Luckily for now, this surprise will hurt no one on X because X/EGL
> advertises no RGBA1010102 configs.

It does and GLX does too with these patches.  The configs are only
usable with pixmaps or pbuffers, not windows (unless the window visual
is 10 bit too).  This could still break things under X where an
application may pick an EGLConfig or GLXFBConfig and get a 10 bit
config and then use that to create a GLXPixmap, then later try to
XCopyArea or such from the pixmap.  It may be safer to disable this
under X...

Kristian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/24] glsl: Implement parser support for atomic counters.

2013-09-17 Thread Francisco Jerez
Paul Berry  writes:

> On 15 September 2013 00:10, Francisco Jerez  wrote:
>[...]
>> +  } else if ((op[0]->type->atomic_size() ||
>> op[1]->type->atomic_size())) {
>> +_mesa_glsl_error(&loc, state, "atomic counter comparisons
>> forbidden");
>> +error_emitted = true;
>>
>
> Do we have spec text to back this up?  I looked around and couldn't find
> anything.  It seems like doing equality comparisons on atomic counters is
> ill-defined, though (do two counters compare equal if they have equal
> values, or if they point to the same counter?  Both possibilities seem
> dodgy).  Maybe we should file a spec bug so we can get clarification from
> khronos about what's intended.
>

It's implied by the same wording you quoted below, "Except for array
indexing, structure field selection, and parenthesis, counters are not
allowed to be operands in expressions.".

> Note that we currently permit other comparisons that are similarly dodgy
> (e.g. comparing samplers).  It would be nice to get clarification from
> khronos about this too.
>
>[...]
>> @@ -1983,7 +1995,7 @@ apply_type_qualifier_to_variable(const struct
>> ast_type_qualifier *qual,
>> }
>>
>> if (qual->flags.q.constant || qual->flags.q.attribute
>> -   || qual->flags.q.uniform
>> +   || (qual->flags.q.uniform && var->type !=
>> glsl_type::atomic_uint_type)
>> || (qual->flags.q.varying && (state->target == fragment_shader)))
>>var->read_only = 1;
>>
>
> I'm not entirely convinced this is right.  An atomic counter, like a
> sampler, is a handle to an underlying object, not the underlying object
> itself.  The handle should be considered read-only even if the object it
> refers to is mutable.  That way we still prohibit
>
I guess the question is if we want atomic ir_variables represent the
handle of an atomic counter or the atomic counter itself.  The latter
seemed to make more sense to me but you're right that the spec favours
the opposite point of view, I will fix that.

> uniform atomic_uint foo;
> uniform atomic_uint bar;
> void main() {
>foo = bar;
> }
>
>
>>[...]
>> @@ -4475,6 +4533,12 @@ ast_process_structure_or_interface_block(exec_list
>> *instructions,
>>   "uniform in non-default uniform block
>> contains sampler");
>>   }
>>
>> + if (field_type->atomic_size()) {
>> +YYLTYPE loc = decl_list->get_location();
>> +_mesa_glsl_error(&loc, state, "atomic counter in structure or
>> "
>> + "uniform block");
>> + }
>> +
>>
>
> Are you sure this is not allowed?  I can't find any spec text to back this
> up.  All I found was this text from ARB_shader_atomic_counters:
>
> "Except for array indexing, structure field selection, and parenthesis,
> counters are not allowed to be operands in expressions."
>
> Which would seem to imply that atomic counters *are* allowed as structure
> fields.
>

That's a tricky question, my impression when I first read this extension
was that it was allowed, but the spec seemed to be inconsistent in that
regard and it leaves a number of questions unanswered.

The wording you're quoting doesn't apply directly to atomic counters as
they are defined by the GL 4.2 spec, it refers to opaque types in
general:

"Except for array indexing, structure member selection, and parentheses,
 opaque variables are not allowed to be operands in expressions; such
 use results in a compile-time error."

Later on the spec requires atomic counter uniform declarations to have a
layout() specification tying them to some specific binding point, which
seems to be forbidden in structure member declarations:

"[Structure] member declarators may contain precision qualifiers, but
 use of any other qualifier results in a compile-time error."

One could imagine that the structure declaration syntax could be relaxed
in cases where a structure contains atomic counters to allow binding and
offset qualifiers on the whole structure, like:

| struct S {
|   
|   atomic_uint x;
| };
|
| layout(binding=X, offset=Y) uniform S foo;
|

The spec doesn't specify this syntax nor how atomic counters contained
in the structure would be supposed to get their bindings and offsets
assigned.  It seems it would be problematic because of the way the
binding and offset qualifiers have different meanings for different type
declarations -- E.g. what if you declare a structure containing an
atomic counter and an image2D?  Are they supposed to get the same
binding even though they refer to completely different name spaces?  Is
the binding number supposed to be post-incremented for each declaration
like is the case for image and sampler arrays, or stay constant as in
atomic counter arrays?

To summarize, the spec doesn't seem to provide any reasonable way to
declare structures with atomic counters even though it provides the
syntax for atomic structure member selection...

For uniform blocks OTOH it's quite clear that they are not all

Re: [Mesa-dev] [PATCH 04/24] ralloc: Unify overloads of the new operator and guarantee object destruction.

2013-09-17 Thread Francisco Jerez
Paul Berry  writes:

> On 15 September 2013 00:10, Francisco Jerez  wrote:
>[...]
>> @@ -70,12 +65,7 @@ class cfg_t {
>>  public:
>> static void* operator new(size_t size, void *ctx)
>> {
>> -  void *node;
>> -
>> -  node = rzalloc_size(ctx, size);
>> -  assert(node != NULL);
>> -
>> -  return node;
>> +  return ralloc_new(size, ctx);
>> }
>
> I'm worried about this one--it introduces a behavioural change.
> Previously, if a cfg_t object was reclaimed through the standard ralloc
> mechanism, cfg_t::~cfg_t() would *not* be called.  Now it will be.  Since
> cfg_t::~cfg_t() calls ralloc_free(mem_ctx), and since cfg_t's mem_ctx is
> usually (always?) the same mem_ctx that the cfg_t is contained in, I think
> that will result in double-freeing of the mem_ctx.

That's really bad...  It looks like you are right, cfg_t sort of relies
on its destructor not being called in that case...

> However, looking further into the users of cfg_t, it looks like they all
> pass it a "borrowed" mem_ctx (in other words, the caller always retains
> responsibility for freeing the mem_ctx.  So I believe we have a
> pre-existing double-free bug.  The correct solution is probably to get rid
> of the cfg_t destructor entirely.

Probably this hasn't been a problem until now because cfg_t creates its
own ralloc context as a child of its allocator's context, the former is
what's being passed to ralloc_free() from ~cfg_t(), there was no
possibility for a double-free in all cases where cfg_t's destructor used
to end up getting called.

It looks like it might be useful to keep cfg_t's own allocation context
around because its children objects seem to have a shorter lifetime than
its parent context(s), removing the destructor would keep them alive for
a longer while until the parent is destroyed...  I think that now that
destructors are going to be called by ralloc it doesn't make sense
anymore to the have cfg_t's allocation context be a child of its
parent's allocation context, it should probably take care of destroying
it from its destructor as any other memory resource, to ensure all
objects it creates are destroyed timely.

>>[...]
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h
>> b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h
>> index 1cde5f4..74542e5 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h
>> @@ -55,12 +55,7 @@ class fs_live_variables {
>>  public:
>> static void* operator new(size_t size, void *ctx)
>> {
>> -  void *node;
>> -
>> -  node = rzalloc_size(ctx, size);
>> -  assert(node != NULL);
>> -
>> -  return node;
>> +  return ralloc_new(size, ctx);
>> }
>
> I'm worried about this one too, for similar reasons.  I believe in this
> case the appropriate solution is the same: get rid of the fs_live_variables
> destructor entirely.
>>
>> fs_live_variables(fs_visitor *v, cfg_t *cfg);
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h
>> b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h
>> index 438a03e..168da40 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.h
>> @@ -54,12 +54,7 @@ class vec4_live_variables {
>>  public:
>> static void* operator new(size_t size, void *ctx)
>> {
>> -  void *node;
>> -
>> -  node = rzalloc_size(ctx, size);
>> -  assert(node != NULL);
>> -
>> -  return node;
>> +  return ralloc_new(size, ctx);
>> }
>
> The exact same situation exists here (yay code duplication).

These last two cases are only apparent problems because both objects
only seem to be allocated from the stack, the overloaded new operators
don't seem to be used at all.  I think for these two I'm going to remove
the unused new operators and fix the constructors to stop allocating
From its parent's context, if that sounds reasonable to you.

> Other than these double-free issues, I really like what you've done here.
> It's a substantial reduction in boilerplate code that's easy to get wrong.

Thank you for your very useful review. :)


pgpfFwiP8pJx0.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/6] Support for 10 bpc EGLSurface

2013-09-17 Thread Chad Versace

On 09/17/2013 04:20 PM, Kristian Høgsberg wrote:

On Tue, Sep 17, 2013 at 1:49 PM, Chad Versace
 wrote:

On 09/15/2013 12:16 AM, Kristian Høgsberg wrote:


Hi,

This little series adds support for creating EGLSurfaces with color
buffers
using the ARGB2101010 pixel format.  We the new KMS addFB2 ioctl we can
create KMS framebuffers with that format and this series ends up adding
the pixel format to gbm so we can generate buffers with that format.

The first two patches make sure we don't advertise ARGB2101010 configs
that
you can use with an ARGB X window.  The X visual to EGL config
matching just compares visual depth and EGL config buffer size, and
they're
both 32 bits for those two pixel formats.  Unless we match on the
pixel layout, we will advertise EGLConfigs with 10 bpc that you can use
with a ARGB X window.

With this patch series, I can run weston on KMS in 10 bpc, but anything
that
uses gbm will benefit from this.  We also add support for 10 bpc
GLX pixmaps and pbuffers.

Kristian



(CC'ing Ian to alert him about the EGLConfig sort order.)

This series looks good to me.
Reviewed-by: Chad Versace 

Something to note is that eglChooseConfig(r=8, g=8, b=8) sorts the
RGBA1010102 EGLConfigs *before* the RGBA configs. I confirmed
this on GBM. The EGL spec requires that behavior, but I find it
surprising.

Someone else found it surprising too and complained
loudly to Khronos, resulting in Footnote 8 on page 26 of the
EGL 1.4 2013-02-11 spec:

"[fn8] This rule places configs with deeper color buffers first in the list
returned by eglChooseConfig.
Applications may find this counterintuitive if they expect configs with
smaller buffer sizes to be
returned first."


Yup, I hit that with weston too.  It's not surprising that it works
that way, but I think a lot of applications assume that there will be
no 10 bit configs.  For gbm, I'm thinking that we may want to put the
GBM format code into the EGL_NATIVE_VISUAL_ID config attribute so you
can choose a config, then read out the id and use it when creating the
GBM surface.  Or alternatively, pick a GBM format first and then look
for a config where the visual ID matches.


Storing the GBM format in EGL_NATIVE_VISUAL_ID sounds like a good idea.




Luckily for now, this surprise will hurt no one on X because X/EGL
advertises no RGBA1010102 configs.


It does and GLX does too with these patches.  The configs are only
usable with pixmaps or pbuffers, not windows (unless the window visual
is 10 bit too).  This could still break things under X where an
application may pick an EGLConfig or GLXFBConfig and get a 10 bit
config and then use that to create a GLXPixmap, then later try to
XCopyArea or such from the pixmap.  It may be safer to disable this
under X...


I don't think the existence of the 1010102 GLXFBConfigs will
cause any problems, because glxinfo shows that they are sorted
after the  configs.

I'm unable to find any 1010102 EGLConfigs under X/EGL. Here's the proof,
where I print the result of eglGetConfigs. (And yes, I restarted the Xserver
after installing Mesa).

  i:  id   r   g   b   a
-
  0:  14   8   8   8   8
  1:  18   8   8   8   8
  2:  36   8   8   8   8
  3:  50   8   8   8   8
  4:  52   8   8   8   8
  5:  54   8   8   8   8
  6:  56   8   8   8   8
  7:  78   8   8   8   8
  8:  82   8   8   8   8
  9: 100   8   8   8   8
 10: 114   8   8   8   8
 11: 116   8   8   8   8
 12: 118   8   8   8   8
 13: 120   8   8   8   8

Ditto for Weston. I see no 1010102 configs. Am I doing something wrong?
GBM was able to find one, because according to the Weston log it's
using a 1010102 EGLConfig.

  i:  id   r   g   b   a
-
  0:   7   8   8   8   8
  1:   9   8   8   8   8
  2:  18   8   8   8   8
  3:  25   8   8   8   8
  4:  26   8   8   8   8
  5:  27   8   8   8   8
  6:  28   8   8   8   8
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Gallium debugging help

2013-09-17 Thread James Simmons

Hello.

I'm the main developer from the Openchrome project and we have 
reached the point were work is being started for the Gallium driver for 
VIA hardware. To help develope a stable simple start I decided to go for
bare bones libgbm support using a gallium backend. So like any start my
simple libgbm app fails at the start. Well to figure it out I need to 
debug my gallium driver. So I started to read up on how to debug gallium 
and so I attempted to use the trace pipe driver. So the command I used
was

GALLIUM_TRACE=~/gallium.trace;./gbmtest

but no XML file was generated. What am I doing wrong? How do I properly 
debug? Thanks for your help. Once I get this working I will push to 
master.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] util/u_blit: Implement util_blit_pixels via pipe_context::blit.

2013-09-17 Thread Zack Rusin
The entire series looks good to me.

Reviewed-by: Zack Rusin 

- Original Message -
> From: José Fonseca 
> 
> This removes a lot of code, but not everything, as util_blit_pixels_tex
> is still useful when one needs to override pipe_sampler_view::swizzle_?.
> ---
>  src/gallium/auxiliary/util/u_blit.c | 447
>  +++-
>  1 file changed, 37 insertions(+), 410 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/util/u_blit.c
> b/src/gallium/auxiliary/util/u_blit.c
> index e9bec4a..4ba71b9 100644
> --- a/src/gallium/auxiliary/util/u_blit.c
> +++ b/src/gallium/auxiliary/util/u_blit.c
> @@ -57,29 +57,20 @@ struct blit_state
> struct pipe_context *pipe;
> struct cso_context *cso;
>  
> -   struct pipe_blend_state blend_write_color, blend_keep_color;
> +   struct pipe_blend_state blend_write_color;
> struct pipe_depth_stencil_alpha_state dsa_keep_depthstencil;
> -   struct pipe_depth_stencil_alpha_state dsa_write_depthstencil;
> -   struct pipe_depth_stencil_alpha_state dsa_write_depth;
> -   struct pipe_depth_stencil_alpha_state dsa_write_stencil;
> struct pipe_rasterizer_state rasterizer;
> struct pipe_sampler_state sampler;
> struct pipe_viewport_state viewport;
> struct pipe_vertex_element velem[2];
> -   enum pipe_texture_target internal_target;
>  
> void *vs;
> void *fs[PIPE_MAX_TEXTURE_TYPES][TGSI_WRITEMASK_XYZW + 1];
> -   void *fs_depthstencil[PIPE_MAX_TEXTURE_TYPES];
> -   void *fs_depth[PIPE_MAX_TEXTURE_TYPES];
> -   void *fs_stencil[PIPE_MAX_TEXTURE_TYPES];
>  
> struct pipe_resource *vbuf;  /**< quad vertices */
> unsigned vbuf_slot;
>  
> float vertices[4][2][4];   /**< vertex/texcoords for quad */
> -
> -   boolean has_stencil_export;
>  };
>  
>  
> @@ -103,20 +94,6 @@ util_create_blit(struct pipe_context *pipe, struct
> cso_context *cso)
> /* disabled blending/masking */
> ctx->blend_write_color.rt[0].colormask = PIPE_MASK_RGBA;
>  
> -   /* depth stencil states */
> -   ctx->dsa_write_depth.depth.enabled = 1;
> -   ctx->dsa_write_depth.depth.writemask = 1;
> -   ctx->dsa_write_depth.depth.func = PIPE_FUNC_ALWAYS;
> -   ctx->dsa_write_stencil.stencil[0].enabled = 1;
> -   ctx->dsa_write_stencil.stencil[0].func = PIPE_FUNC_ALWAYS;
> -   ctx->dsa_write_stencil.stencil[0].fail_op = PIPE_STENCIL_OP_REPLACE;
> -   ctx->dsa_write_stencil.stencil[0].zpass_op = PIPE_STENCIL_OP_REPLACE;
> -   ctx->dsa_write_stencil.stencil[0].zfail_op = PIPE_STENCIL_OP_REPLACE;
> -   ctx->dsa_write_stencil.stencil[0].valuemask = 0xff;
> -   ctx->dsa_write_stencil.stencil[0].writemask = 0xff;
> -   ctx->dsa_write_depthstencil.depth = ctx->dsa_write_depth.depth;
> -   ctx->dsa_write_depthstencil.stencil[0] =
> ctx->dsa_write_stencil.stencil[0];
> -
> /* rasterizer */
> ctx->rasterizer.cull_face = PIPE_FACE_NONE;
> ctx->rasterizer.half_pixel_center = 1;
> @@ -147,14 +124,6 @@ util_create_blit(struct pipe_context *pipe, struct
> cso_context *cso)
>ctx->vertices[i][1][3] = 1.0f; /* q */
> }
>  
> -   if(pipe->screen->get_param(pipe->screen, PIPE_CAP_NPOT_TEXTURES))
> -  ctx->internal_target = PIPE_TEXTURE_2D;
> -   else
> -  ctx->internal_target = PIPE_TEXTURE_RECT;
> -
> -   ctx->has_stencil_export =
> -  pipe->screen->get_param(pipe->screen, PIPE_CAP_SHADER_STENCIL_EXPORT);
> -
> return ctx;
>  }
>  
> @@ -178,18 +147,6 @@ util_destroy_blit(struct blit_state *ctx)
>}
> }
>  
> -   for (i = 0; i < PIPE_MAX_TEXTURE_TYPES; i++) {
> -  if (ctx->fs_depthstencil[i]) {
> - pipe->delete_fs_state(pipe, ctx->fs_depthstencil[i]);
> -  }
> -  if (ctx->fs_depth[i]) {
> - pipe->delete_fs_state(pipe, ctx->fs_depth[i]);
> -  }
> -  if (ctx->fs_stencil[i]) {
> - pipe->delete_fs_state(pipe, ctx->fs_stencil[i]);
> -  }
> -   }
> -
> pipe_resource_reference(&ctx->vbuf, NULL);
>  
> FREE(ctx);
> @@ -217,63 +174,6 @@ set_fragment_shader(struct blit_state *ctx, uint
> writemask,
>  
>  
>  /**
> - * Helper function to set the shader which writes depth and stencil.
> - */
> -static INLINE void
> -set_depthstencil_fragment_shader(struct blit_state *ctx,
> - enum pipe_texture_target pipe_tex)
> -{
> -   if (!ctx->fs_depthstencil[pipe_tex]) {
> -  unsigned tgsi_tex = util_pipe_tex_to_tgsi_tex(pipe_tex, 0);
> -
> -  ctx->fs_depthstencil[pipe_tex] =
> - util_make_fragment_tex_shader_writedepthstencil(ctx->pipe,
> tgsi_tex,
> -  TGSI_INTERPOLATE_LINEAR);
> -   }
> -
> -   cso_set_fragment_shader_handle(ctx->cso, ctx->fs_depthstencil[pipe_tex]);
> -}
> -
> -
> -/**
> - * Helper function to set the shader which writes depth.
> - */
> -static INLINE void
> -set_depth_fragment_shader(struct blit_state *ctx,
> -  enum pipe_texture_target pipe_tex)
> -{
> -   if (!ctx->fs_depth[pipe_tex]) {
> -  unsigned tgsi_tex = util_pipe_tex_to_tgsi_te

Re: [Mesa-dev] [PATCH 3/3] glsl: Drop shader_bit_encoding version checks.

2013-09-17 Thread Kenneth Graunke
On 09/16/2013 02:20 PM, Paul Berry wrote:
> On 13 September 2013 11:25, Kenneth Graunke  > wrote:
> 
> We now set the ARB_shader_bit_encoding flag for versions that support
> this functionality, so we don't need to double check it here.
> 
> Signed-off-by: Kenneth Graunke  >
> Cc: Ian Romanick mailto:i...@freedesktop.org>>
> ---
>  src/glsl/builtin_functions.cpp | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/src/glsl/builtin_functions.cpp
> b/src/glsl/builtin_functions.cpp
> index c468bd5..b020a7c 100644
> --- a/src/glsl/builtin_functions.cpp
> +++ b/src/glsl/builtin_functions.cpp
> @@ -182,8 +182,7 @@ shader_texture_lod_and_rect(const
> _mesa_glsl_parse_state *state)
>  static bool
>  shader_bit_encoding(const _mesa_glsl_parse_state *state)
>  {
> -   return state->is_version(330, 300) ||
> -  state->ARB_shader_bit_encoding_enable ||
> +   return state->ARB_shader_bit_encoding_enable ||
>state->ARB_gpu_shader5_enable;
>  }
> 
> --
> 1.8.3.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org 
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 
> I'm not a huge fan of this approach.  We're currently inconsistent in
> Mesa as to how we handle GLSL extensions that were adopted into later
> versions of GLSL (or backports from later versions of GLSL).  For these
> extensions:
> 
> ARB_draw_instanced
> ARB_fragment_coord_conventions
> ARB_gpu_shader5
> ARB_shader_texture_lod
> ARB_shading_language_420pack
> ARB_shading_language_packing
> ARB_texture_cube_map_array
> ARB_texture_multisample
> OES_texture_3D
> 
> we use what I'll call the "enable means explicitly enabled" style, which
> means we only set the "_enable" flag if the shader contains text like
> "#extension ARB_foo: enable"; if the extension's functionality is
> provided by the given version of GLSL, we fold that into the if-test
> that enables the functionality--e.g.:
> 
>if (state->ARB_draw_instanced_enable || state->is_version(140, 300))
>   add_system_value(SYSTEM_VALUE_INSTANCE_ID, int_t, "gl_InstanceID");
> 
> But for these extensions:
> 
> ARB_explicit_attrib_location
> ARB_texture_rectangle
> 
> we use what I'll call the "enable means available" style, which means
> that we set the "_enable" flag when processing the version directive (or
> in the _mesa_glsl_parse_state constructor), to indicate that the
> functionality is available, whether or not the user explicitly requested
> it or not.
> 
> (Note: for ARB_uniform_buffer_object it looks like we do some of each
> style!)
> 
> 
> Personally I'd prefer to see us consistently adopt the "enable means
> explicitly enabled" style (to me, the "enable means available" style
> feels like we're fibbing to ourselves).  An additional advantage of the
> "enable means explicitly enabled" style is that it allows us to handle
> cases (such as in ARB_draw_instanced) where the extension differs
> slightly from the functionality that was eventually folded into GLSL. 
> Another advantage is that if a client-supplied shader does something
> silly like "#extension ARB_draw_instanced: disable" in a GLSL 1.40+
> shader, we won't accidentally disable built-in functionality (although
> in practice this is extremely unlikely to ever crop up).
> 
> If it becomes too onerous to add the "|| state->is_version(...)" checks
> all over the place we can always make some inline functions, e.g.:
> 
> struct _mesa_glsl_parse_state {
>...
>bool is_draw_instanced_available() const
>{
>   return this->ARB_draw_instanced_enable || this->is_version(140, 300);
>}
>...
> };

I also like this style a lot better than mashing the extension enables.
 Now that you, Ian, and I are all in agreement, I'll plan on writing
some new patches that switch to this style.

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/hsw: compute DDX in a subspan based only on top row

2013-09-17 Thread Chia-I Wu
On Wed, Sep 18, 2013 at 6:27 AM, Mark Mueller  wrote:
> On Mon, Sep 16, 2013 at 1:31 AM, Chia-I Wu  wrote:
>>
>> On Mon, Sep 16, 2013 at 4:12 PM, Chia-I Wu  wrote:
>> > On Mon, Sep 16, 2013 at 3:50 AM, Mark Mueller 
>> > wrote:
>> >>
>> >>
>> >>
>> >> On Fri, Sep 13, 2013 at 2:15 PM, Paul Berry 
>> >> wrote:
>> >>>
>> >>> On 12 September 2013 22:06, Chia-I Wu  wrote:
>> 
>>  From: Chia-I Wu 
>>
>> >>
>> >>
>> >> This scenario is where I'd place my bets, especially given that the
>> >> numbers
>> >> are based on Xonotic. I benchmarked this patch using Xonotic on Bay
>> >> Trail as
>> >> is and by replacing !brw->is_haswell with !brw->is_baytrail. With ultra
>> >> and
>> >> ultimate levels at medium and high resolutions, the results were all
>> >> essentially the same at comparable resolutions and quality levels.
>> > Isn't Bay Trail based on Ivy Bridge?
>> For Bay Trail, this might help you
>>
>>
>> http://lists.freedesktop.org/archives/mesa-dev/2013-September/044288.html
>>
>> if you are interested.
>
>
> Testing with Bay Trail shows no performance improvement with this patch.
> Most likely there are one or more CPU bottlenecks on Bay Tail that hide a
> majority of the performance gains of this change.
And no performance lost?  It could also be

 - the gain from SIMD16 was even out by the math ops
 - the lowering did not kick in because of one of the conditional checks
 - the game did not run in Ultra or Ultimate mode

I think the discussion belongs to that other thread.

>
>



-- 
o...@lunarg.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] util/u_blit: Support blits from cubemaps.

2013-09-17 Thread Roland Scheidegger
Am 17.09.2013 20:32, schrieb jfons...@vmware.com:
> From: José Fonseca 
> 
> By calling util_map_texcoords2d_onto_cubemap.
> 
> A new parameter for util_blit_pixels_tex is necessary, as
> pipe_sampler_view::first_layer is always supposed to point to the first
> face when sampling from cubemaps.
> ---
>  src/gallium/auxiliary/util/u_blit.c | 34 +++---
>  src/gallium/auxiliary/util/u_blit.h |  1 +
>  2 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/util/u_blit.c 
> b/src/gallium/auxiliary/util/u_blit.c
> index 8cc080c..e9bec4a 100644
> --- a/src/gallium/auxiliary/util/u_blit.c
> +++ b/src/gallium/auxiliary/util/u_blit.c
> @@ -46,6 +46,7 @@
>  #include "util/u_math.h"
>  #include "util/u_memory.h"
>  #include "util/u_sampler.h"
> +#include "util/u_texture.h"
>  #include "util/u_simple_shaders.h"
>  
>  #include "cso_cache/cso_context.h"
> @@ -143,7 +144,6 @@ util_create_blit(struct pipe_context *pipe, struct 
> cso_context *cso)
> /* init vertex data that doesn't change */
> for (i = 0; i < 4; i++) {
>ctx->vertices[i][0][3] = 1.0f; /* w */
> -  ctx->vertices[i][1][2] = 0.0f; /* r */
>ctx->vertices[i][1][3] = 1.0f; /* q */
> }
>  
> @@ -327,6 +327,8 @@ get_next_slot( struct blit_state *ctx )
>   */
>  static unsigned
>  setup_vertex_data_tex(struct blit_state *ctx,
> +  unsigned src_target,
> +  unsigned src_face,
>float x0, float y0, float x1, float y1,
>float s0, float t0, float s1, float t1,
>float z)
> @@ -338,24 +340,37 @@ setup_vertex_data_tex(struct blit_state *ctx,
> ctx->vertices[0][0][2] = z;
> ctx->vertices[0][1][0] = s0; /*s*/
> ctx->vertices[0][1][1] = t0; /*t*/
> +   ctx->vertices[0][1][2] = 0;  /*r*/
>  
> ctx->vertices[1][0][0] = x1;
> ctx->vertices[1][0][1] = y0;
> ctx->vertices[1][0][2] = z;
> ctx->vertices[1][1][0] = s1; /*s*/
> ctx->vertices[1][1][1] = t0; /*t*/
> +   ctx->vertices[1][1][2] = 0;  /*r*/
>  
> ctx->vertices[2][0][0] = x1;
> ctx->vertices[2][0][1] = y1;
> ctx->vertices[2][0][2] = z;
> ctx->vertices[2][1][0] = s1;
> ctx->vertices[2][1][1] = t1;
> +   ctx->vertices[3][1][2] = 0;
>  
> ctx->vertices[3][0][0] = x0;
> ctx->vertices[3][0][1] = y1;
> ctx->vertices[3][0][2] = z;
> ctx->vertices[3][1][0] = s0;
> ctx->vertices[3][1][1] = t1;
> +   ctx->vertices[3][1][2] = 0;
> +
> +   if (src_target == PIPE_TEXTURE_CUBE ||
> +   src_target == PIPE_TEXTURE_CUBE_ARRAY) {
> +  /* Map cubemap texture coordinates inplace. */
> +  const unsigned stride = sizeof ctx->vertices[0] / sizeof 
> ctx->vertices[0][0][0];
> +  util_map_texcoords2d_onto_cubemap(src_face,
> +&ctx->vertices[0][1][0], stride,
> +&ctx->vertices[0][1][0], stride);
> +   }
>  
> offset = get_next_slot( ctx );
>  
> @@ -770,6 +785,8 @@ util_blit_pixels(struct blit_state *ctx,
>  
> /* draw quad */
> offset = setup_vertex_data_tex(ctx,
> +  sampler_view->texture->target,
> +  srcZ0 % 6,
>(float) dstX0 / dst_surface->width * 2.0f 
> - 1.0f,
>(float) dstY0 / dst_surface->height * 2.0f 
> - 1.0f,
>(float) dstX1 / dst_surface->width * 2.0f 
> - 1.0f,
> @@ -811,16 +828,25 @@ util_blit_pixels(struct blit_state *ctx,
>  
>  
>  /**
> - * Copy pixel block from src texture to dst surface.
> + * Copy pixel block from src sampler view to dst surface.
> + *
>   * The sampler view's first_level field indicates the source
>   * mipmap level to use.
> - * XXX need some control over blitting Z and/or stencil.
> + *
> + * The sampler view's first_layer indicate the layer to use, but for
> + * cube maps it must point to the first face.  Face is passed in src_face.
> + *
> + * The main advantage over util_blit_pixels is that it allows to specify 
> swizzles in
> + * pipe_sampler_view::swizzle_?.
> + *
> + * But there is no control over blitting Z and/or stencil.
>   */
>  void
>  util_blit_pixels_tex(struct blit_state *ctx,
>   struct pipe_sampler_view *src_sampler_view,
>   int srcX0, int srcY0,
>   int srcX1, int srcY1,
> + unsigned src_face,
>   struct pipe_surface *dst,
>   int dstX0, int dstY0,
>   int dstX1, int dstY1,
> @@ -922,6 +948,8 @@ util_blit_pixels_tex(struct blit_state *ctx,
>  
> /* draw quad */
> offset = setup_vertex_data_tex(ctx,
> +  src_sampler_view->texture->target,
> +  src_face,
>(float) dstX0 / dst->width * 2.0f - 1.0f,
> 

Re: [Mesa-dev] [PATCH] i965/hsw: compute DDX in a subspan based only on top row

2013-09-17 Thread Chia-I Wu
Hi Paul,

On Mon, Sep 16, 2013 at 3:46 PM, Chia-I Wu  wrote:
> On Sat, Sep 14, 2013 at 5:15 AM, Paul Berry  wrote:
>> On 12 September 2013 22:06, Chia-I Wu  wrote:
>>>
>>> From: Chia-I Wu 
>>>
>>> Consider only the top-left and top-right pixels to approximate DDX in a
>>> 2x2
>>> subspan, unless the application or the user requests a more accurate
>>> approximation.  This results in a less accurate approximation.  However,
>>> it
>>> improves the performance of Xonotic with Ultra settings by 24.3879% +/-
>>> 0.832202% (at 95.0% confidence) on Haswell.  No noticeable image quality
>>> difference observed.
>>>
>>> No piglit gpu.tests regressions (tested with v1)
>>>
>>> I failed to come up with an explanation for the performance difference, as
>>> the
>>> change does not affect Ivy Bridge.  If anyone has the insight, please
>>> kindly
>>> enlighten me.  Performance differences may also be observed on other games
>>> that call textureGrad and dFdx.
>>>
>>> v2: Honor GL_FRAGMENT_SHADER_DERIVATIVE_HINT and add a drirc option.
>>> Update
>>> comments.
>>
>>
>> I'm not entirely comfortable making a change that has a known negative
>> impact on computational accuracy (even one that leads to such an impressive
>> performance improvement) when we don't have any theories as to why the
>> performance improvement happens, or why the improvement doesn't apply to Ivy
>> Bridge.  In my experience, making changes to the codebase without
>> understanding why they improve things almost always leads to improvements
>> that are brittle, since it's likely that the true source of the improvement
>> is a coincidence that will be wiped out by some future change (or won't be
>> relevant to client programs other than this particular benchmark).  Having a
>> theory as to why the performance improvement happens would help us be
>> confident that we're applying the right fix under the right circumstances.
> That is how I feel as I've mentioned.  I am really glad to have the
> discussion.  I have done some experiments actually.  It is just that
> those experiments only tell me what theories are likely to be wrong.
> They could not tell me if a theory is right.
Do the experiments make sense to you?  What other experiments do you
want to see conducted?

It could be hard to get direct proof without knowing the internal working..

>
> So I have a micro benchmark that draws a 256x256 texture to a 512x512
> window, using texture2D() or textureGrad().  It can also set up the
> vertex buffer such that the texture is rotated around the z-axis by 15
> degrees.
>
> On Haswell, when the texture is not rotated, rendering with
> textureGrad() is less than 1% slower than rendering with texture2D().
> The slowdown could be from the additional instructions to calculate
> DDX/DDY or the extra message payload.  Whether this patch is applied
> or not does not make any difference.
>
> When the texture is rotated, rendering with textureGrad() is ~3%
> slower than rendering with texture2D() without the patch.  With the
> patch, the difference is down to less than 1% again.
>
> Computing LOD in the shader results in ~17% slowdown comparing to 
> textureGrad().
>
> As a better way to control the result of DDX, I also hacked the driver
> so that DDX produced the values I specified for each pixel.  When not
> all pixels in the subspan have the same gradient, rendering is ~6%
> slower comparing to when all pixels in the subspan have the same
> gradient.
>
> The weird thing is, in SIMD8 mode, two subspans are processed at the
> same time.  When all pixels in one of the subspan have the same
> gradient, whether the pixels in the other subspan have the same
> gradient or not does not matter.
>
> As for Ivy Bridge, rendering with textureGrad() is always
> significantly slower than rendering with texture2D().  Computing LOD
> in the shader results in another ~4% slowdown comparing to
> textureGrad().
>
>>
>> For example, here's one theory as to why we might be seeing an improvement:
>> perhaps Haswell's sample_d processing is smart enough to realize that when
>> all the gradient values within a sub-span are the same, that means that all
>> of the sampling for the sub-span will come from the same LOD, and that
>> allows it to short-cut some expensive step in the LOD calculation.  Perhaps
>> the same improvement isn't seen on Ivy Bridge because Ivy Bridge's sample_d
>> processing logic is less sophisticated, so it's unable to perform the
>> optimization.  If this is the case, then conditioning the optimization on
>> brw->is_haswell (as you've done) makes sense.
> This is also the theory I have and my experiments could not rule it
> out.  The question I have is, if LODs of all pixels were calculated
> parallely, could the short cut help this much?  I don't have enough
> knowledge in hardware to know the answer or even to know this is a
> question or not.
>
>>
>> Another possible explanation for the Haswell vs Ivy Bridge difference is
>> that perhaps Ivy Bridge, being