Re: [Mesa-dev] RFC: Haswell resource streamer/hw-generated binding tables (v2)

2013-10-14 Thread Abdiel Janulgue
On Friday, October 11, 2013 11:39:53 AM Eric Anholt wrote:

> 
> As I understand it, the thing that you think will make this eventually
> actually improve performance is state flagging that indicates which
> individual surfaces need updating.  Since that should improve
> performance even in the non-resource-streamer case, it can be pursued
> independently.

These are comments that I exactly need. Thanks anyway,
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 0/7] ARB_gpu_shader5 textureGather*, Part 2

2013-10-14 Thread Chris Forbes
This series adds new textureGatherOffset overloads, and support for
nonconstant offsets.

V2: - Update for general churn + send-from-grf
- Reinstate assert on nonconstant offset if ARB_gpu_shader5 not supported.
  This ought not to actually get hit as the GLSL frontend insists on a
  constant expression.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 1/7] glsl: Add ARB_gpu_shader5 textureGatherOffset signatures

2013-10-14 Thread Chris Forbes
- gsampler2DRect
- optional `comp` parameter

Signed-off-by: Chris Forbes 
Reviewed-by: Ian Romanick 
---
 src/glsl/builtin_functions.cpp | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index d40888d..aa40876 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -1919,6 +1919,22 @@ builtin_builder::create_builtins()
 _texture(ir_tg4, texture_gather, glsl_type::vec4_type, 
glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
 _texture(ir_tg4, texture_gather, glsl_type::ivec4_type, 
glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
 _texture(ir_tg4, texture_gather, glsl_type::uvec4_type, 
glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_OFFSET | TEX_COMPONENT),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET | TEX_COMPONENT),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET | TEX_COMPONENT),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET | 
TEX_COMPONENT),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET | 
TEX_COMPONENT),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET | 
TEX_COMPONENT),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET | 
TEX_COMPONENT),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET | 
TEX_COMPONENT),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET | 
TEX_COMPONENT),
 NULL);
 
F(dFdx)
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 2/7] glsl: relax const offset requirement for textureGatherOffset

2013-10-14 Thread Chris Forbes
Prior to ARB_gpu_shader5 / GLSL 4.0, the offset is required to be
a constant expression.

With that extension, it is relaxed to be an arbitrary expression.

Signed-off-by: Chris Forbes 
Reviewed-by: Ian Romanick 
---
 src/glsl/builtin_functions.cpp | 61 --
 1 file changed, 41 insertions(+), 20 deletions(-)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index aa40876..db6a0a9 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -277,6 +277,17 @@ texture_gather(const _mesa_glsl_parse_state *state)
   state->ARB_gpu_shader5_enable;
 }
 
+/* Only ARB_texture_gather but not GLSL 4.0 or ARB_gpu_shader5.
+ * used for relaxation of const offset requirements.
+ */
+static bool
+texture_gather_only(const _mesa_glsl_parse_state *state)
+{
+   return !state->is_version(400, 0) &&
+  !state->ARB_gpu_shader5_enable &&
+  state->ARB_texture_gather_enable;
+}
+
 /* Desktop GL or OES_standard_derivatives + fragment shader only */
 static bool
 fs_oes_derivatives(const _mesa_glsl_parse_state *state)
@@ -495,6 +506,7 @@ private:
 #define TEX_PROJECT 1
 #define TEX_OFFSET  2
 #define TEX_COMPONENT 4
+#define TEX_OFFSET_NONCONST 8
 
ir_function_signature *_texture(ir_texture_opcode opcode,
builtin_available_predicate avail,
@@ -1912,29 +1924,37 @@ builtin_builder::create_builtins()
 NULL);
 
add_function("textureGatherOffset",
-_texture(ir_tg4, texture_gather, glsl_type::vec4_type, 
glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
-_texture(ir_tg4, texture_gather, glsl_type::ivec4_type, 
glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
-_texture(ir_tg4, texture_gather, glsl_type::uvec4_type, 
glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+_texture(ir_tg4, texture_gather_only, glsl_type::vec4_type, 
glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+_texture(ir_tg4, texture_gather_only, glsl_type::ivec4_type, 
glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+_texture(ir_tg4, texture_gather_only, glsl_type::uvec4_type, 
glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET),
+
+_texture(ir_tg4, texture_gather_only, glsl_type::vec4_type, 
glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+_texture(ir_tg4, texture_gather_only, glsl_type::ivec4_type, 
glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+_texture(ir_tg4, texture_gather_only, glsl_type::uvec4_type, 
glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
 
-_texture(ir_tg4, texture_gather, glsl_type::vec4_type, 
glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
-_texture(ir_tg4, texture_gather, glsl_type::ivec4_type, 
glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
-_texture(ir_tg4, texture_gather, glsl_type::uvec4_type, 
glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET),
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_NONCONST),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_NONCONST),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_NONCONST),
 
-_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET),
-_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET),
-_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET),
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
 
-_texture(ir_tg4, gpu_shader5, glsl_type::v

[Mesa-dev] [PATCH V2 3/7] i965: add missing tg4 case in brw_instruction_name

2013-10-14 Thread Chris Forbes
Signed-off-by: Chris Forbes 
Reviewed-by: Ian Romanick 
---
 src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 61c4bf5..19500d1 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -437,6 +437,8 @@ brw_instruction_name(enum opcode op)
   return "txb";
case SHADER_OPCODE_TXF_MS:
   return "txf_ms";
+   case SHADER_OPCODE_TG4:
+  return "tg4";
 
case FS_OPCODE_DDX:
   return "ddx";
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 4/7] i965: Add SHADER_OPCODE_TG4_OFFSET for gather with nonconstant offsets.

2013-10-14 Thread Chris Forbes
The generator code ends up clearer this way than if we had to sniff
via the message length. Implemented via the gather4_po message in
hardware, which is present in Gen7 and later.

Signed-off-by: Chris Forbes 
Reviewed-by: Ian Romanick 
---
 src/mesa/drivers/dri/i965/brw_defines.h  | 1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp | 1 +
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   | 8 +++-
 src/mesa/drivers/dri/i965/brw_shader.cpp | 5 -
 src/mesa/drivers/dri/i965/brw_vec4.cpp   | 1 +
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 7 ++-
 6 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index c1e7f31..f1ea736 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -771,6 +771,7 @@ enum opcode {
SHADER_OPCODE_TXF_MS,
SHADER_OPCODE_LOD,
SHADER_OPCODE_TG4,
+   SHADER_OPCODE_TG4_OFFSET,
 
SHADER_OPCODE_SHADER_TIME_ADD,
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index e5d6e4b..a0e4624 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -753,6 +753,7 @@ fs_visitor::implied_mrf_writes(fs_inst *inst)
case SHADER_OPCODE_TXF:
case SHADER_OPCODE_TXF_MS:
case SHADER_OPCODE_TG4:
+   case SHADER_OPCODE_TG4_OFFSET:
case SHADER_OPCODE_TXL:
case SHADER_OPCODE_TXS:
case SHADER_OPCODE_LOD:
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 746c873..d1a9370 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -436,6 +436,10 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
  assert(brw->gen >= 6);
  msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4;
  break;
+  case SHADER_OPCODE_TG4_OFFSET:
+ assert(brw->gen >= 7);
+ msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO;
+ break;
   default:
 assert(!"not reached");
 break;
@@ -550,7 +554,8 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
   }
}
 
-   uint32_t surface_index = inst->opcode == SHADER_OPCODE_TG4
+   uint32_t surface_index = (inst->opcode == SHADER_OPCODE_TG4 ||
+  inst->opcode == SHADER_OPCODE_TG4_OFFSET)
   ? SURF_INDEX_GATHER_TEXTURE(inst->sampler)
   : SURF_INDEX_TEXTURE(inst->sampler);
 
@@ -1501,6 +1506,7 @@ fs_generator::generate_code(exec_list *instructions)
   case SHADER_OPCODE_TXS:
   case SHADER_OPCODE_LOD:
   case SHADER_OPCODE_TG4:
+  case SHADER_OPCODE_TG4_OFFSET:
 generate_tex(inst, dst, src[0]);
 break;
   case FS_OPCODE_DDX:
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 19500d1..6b37f58 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -439,6 +439,8 @@ brw_instruction_name(enum opcode op)
   return "txf_ms";
case SHADER_OPCODE_TG4:
   return "tg4";
+   case SHADER_OPCODE_TG4_OFFSET:
+  return "tg4_offset";
 
case FS_OPCODE_DDX:
   return "ddx";
@@ -535,7 +537,8 @@ backend_instruction::is_tex()
opcode == SHADER_OPCODE_TXL ||
opcode == SHADER_OPCODE_TXS ||
opcode == SHADER_OPCODE_LOD ||
-   opcode == SHADER_OPCODE_TG4);
+   opcode == SHADER_OPCODE_TG4 ||
+   opcode == SHADER_OPCODE_TG4_OFFSET);
 }
 
 bool
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 149a1a0..b0688c1 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -271,6 +271,7 @@ vec4_visitor::implied_mrf_writes(vec4_instruction *inst)
case SHADER_OPCODE_TXF_MS:
case SHADER_OPCODE_TXS:
case SHADER_OPCODE_TG4:
+   case SHADER_OPCODE_TG4_OFFSET:
   return inst->header_present ? 1 : 0;
default:
   assert(!"not reached");
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 67af0dd..cb83231 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -311,6 +311,9 @@ vec4_generator::generate_tex(vec4_instruction *inst,
   case SHADER_OPCODE_TG4:
  msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4;
  break;
+  case SHADER_OPCODE_TG4_OFFSET:
+ msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO;
+ break;
   default:
 assert(!"should not get here: invalid VS texture opcode");
 break;
@@ -385,7 +388,8 @@ vec4_generator::generate_tex(vec4_instruction *inst,
   break;
}
 
-   uint32_t surface_index = inst->opcode == SHADER_OPCODE_TG4
+   uint32_t surface_index = (in

[Mesa-dev] [PATCH V2 5/7] i965: relax brw_texture_offset assert

2013-10-14 Thread Chris Forbes
Some texturing ops are about to have nonconstant offset support; the
offset in the header in these cases should be zero.

Signed-off-by: Chris Forbes 
Reviewed-by: Ian Romanick 
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 2 +-
 src/mesa/drivers/dri/i965/brw_shader.cpp   | 9 +++--
 src/mesa/drivers/dri/i965/brw_shader.h | 2 +-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 2 +-
 4 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index e659203..3fde443 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1595,7 +1595,7 @@ fs_visitor::visit(ir_texture *ir)
}
 
if (ir->offset != NULL && ir->op != ir_txf)
-  inst->texture_offset = brw_texture_offset(ir->offset->as_constant());
+  inst->texture_offset = brw_texture_offset(ctx, 
ir->offset->as_constant());
 
if (ir->op == ir_tg4)
   inst->texture_offset |= gather_channel(ir, sampler) << 16; // M0.2:16-17
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 6b37f58..5da1b0f 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -368,9 +368,14 @@ brw_math_function(enum opcode op)
 }
 
 uint32_t
-brw_texture_offset(ir_constant *offset)
+brw_texture_offset(struct gl_context *ctx, ir_constant *offset)
 {
-   assert(offset != NULL);
+   /* If the driver does not support GL_ARB_gpu_shader5, the offset
+* must be constant.
+*/
+   assert(offset != NULL || ctx->Extensions.ARB_gpu_shader5);
+
+   if (!offset) return 0;  /* nonconstant offset; caller will handle it. */
 
signed char offsets[3];
for (unsigned i = 0; i < offset->type->vector_elements; i++)
diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index 4dbd38d..3feee69 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -72,7 +72,7 @@ public:
void dump_instructions();
 };
 
-uint32_t brw_texture_offset(ir_constant *offset);
+uint32_t brw_texture_offset(struct gl_context *ctx, ir_constant *offset);
 
 #endif /* __cplusplus */
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 0cf8277..bd9c9e9 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -2286,7 +2286,7 @@ vec4_visitor::visit(ir_texture *ir)
inst->shadow_compare = ir->shadow_comparitor != NULL;
 
if (use_texture_offset)
-  inst->texture_offset = brw_texture_offset(ir->offset->as_constant());
+  inst->texture_offset = brw_texture_offset(ctx, 
ir->offset->as_constant());
 
/* Stuff the channel select bits in the top of the texture offset */
if (ir->op == ir_tg4)
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 6/7] i965/fs: add support for gather4 with nonconstant offsets

2013-10-14 Thread Chris Forbes
Signed-off-by: Chris Forbes 
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 46 +---
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 3fde443..fe4741d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1250,11 +1250,12 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg 
dst, fs_reg coordinate,
   next.reg_offset++;
}
 
+   bool has_nonconstant_offset = ir->offset && !ir->offset->as_constant();
+
/* Set up the LOD info */
switch (ir->op) {
case ir_tex:
case ir_lod:
-   case ir_tg4:
   break;
case ir_txb:
   emit(MOV(next, lod));
@@ -1348,10 +1349,43 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg 
dst, fs_reg coordinate,
  next.reg_offset++;
   }
   break;
+   case ir_tg4:
+  if (has_nonconstant_offset) {
+ /* More crazy intermixing */
+ ir->offset->accept(this);
+ fs_reg offset_value = this->result;
+
+ for (int i = 0; i < 2; i++) { /* u, v */
+emit(MOV(next, coordinate));
+coordinate.reg_offset++;
+next.reg_offset++;
+ }
+
+ for (int i = 0; i < 2; i++) { /* offu, offv */
+emit(MOV(next.retype(BRW_REGISTER_TYPE_D), offset_value));
+offset_value.reg_offset++;
+next.reg_offset++;
+ }
+
+ if (ir->coordinate->type->vector_elements == 3) { /* r if present */
+emit(MOV(next, coordinate));
+coordinate.reg_offset++;
+next.reg_offset++;
+ }
+  }
+  else {
+ /* just do the usual thing */
+ for (int i = 0; i < ir->coordinate->type->vector_elements; i++) {
+emit(MOV(next, coordinate));
+coordinate.reg_offset++;
+next.reg_offset++;
+ }
+  }
+  break;
}
 
/* Set up the coordinate (except for cases where it was done above) */
-   if (ir->op != ir_txd && ir->op != ir_txs && ir->op != ir_txf && ir->op != 
ir_txf_ms && ir->op != ir_query_levels) {
+   if (ir->op != ir_txd && ir->op != ir_txs && ir->op != ir_txf && ir->op != 
ir_txf_ms && ir->op != ir_query_levels && ir->op != ir_tg4) {
   for (int i = 0; i < ir->coordinate->type->vector_elements; i++) {
 emit(MOV(next, coordinate));
 coordinate.reg_offset++;
@@ -1371,14 +1405,18 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg 
dst, fs_reg coordinate,
case ir_txs: inst = emit(SHADER_OPCODE_TXS, dst, payload); break;
case ir_query_levels: inst = emit(SHADER_OPCODE_TXS, dst, payload); break;
case ir_lod: inst = emit(SHADER_OPCODE_LOD, dst, payload); break;
-   case ir_tg4: inst = emit(SHADER_OPCODE_TG4, dst, payload); break;
+   case ir_tg4:
+  if (has_nonconstant_offset)
+ inst = emit(SHADER_OPCODE_TG4_OFFSET, dst, payload);
+  else
+ inst = emit(SHADER_OPCODE_TG4, dst, payload);
+  break;
}
inst->base_mrf = -1;
if (reg_width == 2)
   inst->mlen = next.reg_offset * reg_width - header_present;
else
   inst->mlen = next.reg_offset * reg_width;
-
inst->header_present = header_present;
inst->regs_written = 4;
 
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 7/7] i965/vs: add support for gather4 with nonconstant offsets

2013-10-14 Thread Chris Forbes
Signed-off-by: Chris Forbes 
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index bd9c9e9..d6c565a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -2202,6 +2202,13 @@ vec4_visitor::visit(ir_texture *ir)
   shadow_comparitor = this->result;
}
 
+   bool has_nonconstant_offset = ir->offset && !ir->offset->as_constant();
+   src_reg offset_value;
+   if (has_nonconstant_offset) {
+  ir->offset->accept(this);
+  offset_value = src_reg(this->result);
+   }
+
const glsl_type *lod_type = NULL, *sample_index_type = NULL;
src_reg lod, dPdx, dPdy, sample_index;
switch (ir->op) {
@@ -2259,7 +2266,10 @@ vec4_visitor::visit(ir_texture *ir)
   inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TXS);
   break;
case ir_tg4:
-  inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TG4);
+  if (has_nonconstant_offset)
+ inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TG4_OFFSET);
+  else
+ inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TG4);
   break;
case ir_query_levels:
   inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TXS);
@@ -2395,6 +2405,10 @@ vec4_visitor::visit(ir_texture *ir)
emit(MOV(dst_reg(MRF, param_base + 2, type, WRITEMASK_XYZ), dPdy));
inst->mlen += 2;
 }
+  } else if (ir->op == ir_tg4 && has_nonconstant_offset) {
+ emit(MOV(dst_reg(MRF, param_base + 1, glsl_type::ivec2_type, 
WRITEMASK_XY),
+  offset_value));
+ inst->mlen++;
   }
}
 
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 0/5] ARB_gpu_shader5 textureGather*, Part 3

2013-10-14 Thread Chris Forbes
Adds infrastructure for separate reference Z in texturing functions,
and support for shadow comparitors with textureGather*.

V2: - General churn, send-from-grf rebase, etc
- Make it actually work (Thanks Eric for pointing out that it didnt)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 1/5] glsl: Add support for separate reference Z for shadow samplers

2013-10-14 Thread Chris Forbes
ARB_gpu_shader5's textureGather*() functions which take shadow samplers
have a separate `refz` parameter rather than adding it to the
coordinate.

Signed-off-by: Chris Forbes 
Reviewed-by: Eric Anholt 
Reviewed-by: Kenneth Graunke 
---
 src/glsl/builtin_functions.cpp | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index db6a0a9..ef8b7bb 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -3383,11 +3383,21 @@ builtin_builder::_texture(ir_texture_opcode opcode,
if (flags & TEX_PROJECT)
   tex->projector = swizzle(P, coord_type->vector_elements - 1, 1);
 
-   /* The shadow comparitor is normally in the Z component, but a few types
-* have sufficiently large coordinates that it's in W.
-*/
-   if (sampler_type->sampler_shadow)
-  tex->shadow_comparitor = swizzle(P, MAX2(coord_size, SWIZZLE_Z), 1);
+   if (sampler_type->sampler_shadow) {
+  if (opcode == ir_tg4) {
+ /* gather has refz as a separate parameter, immediately after the
+  * coordinate
+  */
+ ir_variable *refz = in_var(glsl_type::float_type, "refz");
+ sig->parameters.push_tail(refz);
+ tex->shadow_comparitor = var_ref(refz);
+  } else {
+ /* The shadow comparitor is normally in the Z component, but a few 
types
+  * have sufficiently large coordinates that it's in W.
+  */
+ tex->shadow_comparitor = swizzle(P, MAX2(coord_size, SWIZZLE_Z), 1);
+  }
+   }
 
if (opcode == ir_txl) {
   ir_variable *lod = in_var(glsl_type::float_type, "lod");
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 2/5] glsl: Add new textureGather[Offset]() overloads for shadow samplers

2013-10-14 Thread Chris Forbes
Signed-off-by: Chris Forbes 
Reviewed-by: Eric Anholt 
Reviewed-by: Kenneth Graunke 
---
 src/glsl/builtin_functions.cpp | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index ef8b7bb..deedddb 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -1921,6 +1921,12 @@ builtin_builder::create_builtins()
 _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::samplerCubeArray_type, glsl_type::vec4_type, TEX_COMPONENT),
 _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isamplerCubeArray_type, glsl_type::vec4_type, TEX_COMPONENT),
 _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usamplerCubeArray_type, glsl_type::vec4_type, TEX_COMPONENT),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DShadow_type, glsl_type::vec2_type),
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DArrayShadow_type, glsl_type::vec3_type),
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::samplerCubeShadow_type, glsl_type::vec3_type),
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::samplerCubeArrayShadow_type, glsl_type::vec4_type),
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DRectShadow_type, glsl_type::vec2_type),
 NULL);
 
add_function("textureGatherOffset",
@@ -1955,6 +1961,10 @@ builtin_builder::create_builtins()
 _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST | 
TEX_COMPONENT),
 _texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST | 
TEX_COMPONENT),
 _texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST | 
TEX_COMPONENT),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DShadow_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DArrayShadow_type, glsl_type::vec3_type, 
TEX_OFFSET_NONCONST),
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DRectShadow_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
 NULL);
 
F(dFdx)
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 3/5] i965: Add Gen7 gather4_c and gather4_po_c message types

2013-10-14 Thread Chris Forbes
Signed-off-by: Chris Forbes 
Reviewed-by: Eric Anholt 
Reviewed-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_defines.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index f1ea736..c0caba6 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1050,7 +1050,9 @@ enum brw_message_target {
 #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4  8
 #define GEN5_SAMPLER_MESSAGE_LOD 9
 #define GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO  10
+#define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_C16
 #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO   17
+#define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C 18
 #define HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE 20
 #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS   29
 #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS   30
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 4/5] i965/vs: Add support for shadow comparitors with gather4

2013-10-14 Thread Chris Forbes
gather4_c's argument layout is straightforward -- refz just goes on the
end.

gather4_po_c's layout however -- the array index is replaced with refz.

Signed-off-by: Chris Forbes 
---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 12 ++--
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   |  7 ++-
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index cb83231..dcd493d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -309,10 +309,18 @@ vec4_generator::generate_tex(vec4_instruction *inst,
 msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO;
 break;
   case SHADER_OPCODE_TG4:
- msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4;
+ if (inst->shadow_compare) {
+msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_C;
+ } else {
+msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4;
+ }
  break;
   case SHADER_OPCODE_TG4_OFFSET:
- msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO;
+ if (inst->shadow_compare) {
+msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C;
+ } else {
+msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO;
+ }
  break;
   default:
 assert(!"should not get here: invalid VS texture opcode");
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index d6c565a..2ab5d95 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -2341,7 +2341,7 @@ vec4_visitor::visit(ir_texture *ir)
   src_reg(0)));
   }
   /* Load the shadow comparitor */
-  if (ir->shadow_comparitor && ir->op != ir_txd) {
+  if (ir->shadow_comparitor && ir->op != ir_txd && (ir->op != ir_tg4 || 
!has_nonconstant_offset)) {
 emit(MOV(dst_reg(MRF, param_base + 1, ir->shadow_comparitor->type,
  WRITEMASK_X),
  shadow_comparitor));
@@ -2406,6 +2406,11 @@ vec4_visitor::visit(ir_texture *ir)
inst->mlen += 2;
 }
   } else if (ir->op == ir_tg4 && has_nonconstant_offset) {
+ if (ir->shadow_comparitor) {
+emit(MOV(dst_reg(MRF, param_base, ir->shadow_comparitor->type, 
WRITEMASK_W),
+ shadow_comparitor));
+ }
+
  emit(MOV(dst_reg(MRF, param_base + 1, glsl_type::ivec2_type, 
WRITEMASK_XY),
   offset_value));
  inst->mlen++;
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 5/5] i965/fs: Add support for shadow comparitors with gather4

2013-10-14 Thread Chris Forbes
Note that gather4_po_c's parameters are too long for SIMD16. It might be
worth emitting 2xSIMD8 messages in this case at some point.

Signed-off-by: Chris Forbes 
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 15 ---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  3 +++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index d1a9370..8c0e361 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -433,12 +433,21 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
  msg_type = GEN5_SAMPLER_MESSAGE_LOD;
  break;
   case SHADER_OPCODE_TG4:
- assert(brw->gen >= 6);
- msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4;
+ if (inst->shadow_compare) {
+assert(brw->gen >= 7);
+msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_C;
+ } else {
+assert(brw->gen >= 6);
+msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4;
+ }
  break;
   case SHADER_OPCODE_TG4_OFFSET:
  assert(brw->gen >= 7);
- msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO;
+ if (inst->shadow_compare) {
+msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C;
+ } else {
+msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO;
+ }
  break;
   default:
 assert(!"not reached");
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index fe4741d..242634c 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1351,6 +1351,9 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, 
fs_reg coordinate,
   break;
case ir_tg4:
   if (has_nonconstant_offset) {
+ if (ir->shadow_comparitor && dispatch_width == 16)
+fail("Gen7 does not support gather4_po_c in SIMD16 mode.");
+
  /* More crazy intermixing */
  ir->offset->accept(this);
  fs_reg offset_value = this->result;
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 0/4] ARB_gpu_shader5 textureGather*, Part 4

2013-10-14 Thread Chris Forbes
Adds support for textureGatherOffsets() [which takes an array of texel offsets].
This isn't directly supported on i965, so we lower it to 4x 
textureGatherOffset().

V2: - Rebase, etc.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 1/4] glsl: add support for texture functions with offset arrays

2013-10-14 Thread Chris Forbes
This is needed for textureGatherOffsets()

Signed-off-by: Chris Forbes 
---
 src/glsl/builtin_functions.cpp | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index deedddb..1b23677 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -507,6 +507,7 @@ private:
 #define TEX_OFFSET  2
 #define TEX_COMPONENT 4
 #define TEX_OFFSET_NONCONST 8
+#define TEX_OFFSET_ARRAY 16
 
ir_function_signature *_texture(ir_texture_opcode opcode,
builtin_available_predicate avail,
@@ -3432,6 +3433,14 @@ builtin_builder::_texture(ir_texture_opcode opcode,
   tex->offset = var_ref(offset);
}
 
+   if (flags & TEX_OFFSET_ARRAY) {
+  ir_variable *offsets =
+ new(mem_ctx) 
ir_variable(glsl_type::get_array_instance(glsl_type::ivec2_type, 4),
+  "offsets", ir_var_const_in);
+  sig->parameters.push_tail(offsets);
+  tex->offset = var_ref(offsets);
+   }
+
if (opcode == ir_tg4) {
   if (flags & TEX_COMPONENT) {
  ir_variable *component =
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 2/4] glsl: add signatures for textureGatherOffsets()

2013-10-14 Thread Chris Forbes
Signed-off-by: Chris Forbes 
---
 src/glsl/builtin_functions.cpp | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
index 1b23677..45fff4c 100644
--- a/src/glsl/builtin_functions.cpp
+++ b/src/glsl/builtin_functions.cpp
@@ -1968,6 +1968,36 @@ builtin_builder::create_builtins()
 _texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DRectShadow_type, glsl_type::vec2_type, TEX_OFFSET_NONCONST),
 NULL);
 
+   add_function("textureGatherOffsets",
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY | 
TEX_COMPONENT),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY | 
TEX_COMPONENT),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY | 
TEX_COMPONENT),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY | 
TEX_COMPONENT),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY | 
TEX_COMPONENT),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY | 
TEX_COMPONENT),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY | 
TEX_COMPONENT),
+_texture(ir_tg4, gpu_shader5, glsl_type::ivec4_type, 
glsl_type::isampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY | 
TEX_COMPONENT),
+_texture(ir_tg4, gpu_shader5, glsl_type::uvec4_type, 
glsl_type::usampler2DRect_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY | 
TEX_COMPONENT),
+
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DShadow_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DArrayShadow_type, glsl_type::vec3_type, TEX_OFFSET_ARRAY),
+_texture(ir_tg4, gpu_shader5, glsl_type::vec4_type, 
glsl_type::sampler2DRectShadow_type, glsl_type::vec2_type, TEX_OFFSET_ARRAY),
+NULL);
+
F(dFdx)
F(dFdy)
F(fwidth)
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 3/4] i965: Add asserts to ensure that ir_tg4 offset arrays are lowered

2013-10-14 Thread Chris Forbes
We don't have a message that does 4 independent offsets; a lowering
pass needs to lower it to 4 normal gather4s before reaching this
point.

Signed-off-by: Chris Forbes 
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 3 +++
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 242634c..8e423f6 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1567,6 +1567,9 @@ fs_visitor::visit(ir_texture *ir)
/* Should be lowered by do_lower_texture_projection */
assert(!ir->projector);
 
+   /* Should be lowered */
+   assert(!ir->offset || !ir->offset->type->is_array());
+
/* Generate code to compute all the subexpression trees.  This has to be
 * done before loading any values into MRFs for the sampler message since
 * generating these values may involve SEND messages that need the MRFs.
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 2ab5d95..cf87b5c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -2186,6 +2186,9 @@ vec4_visitor::visit(ir_texture *ir)
/* Should be lowered by do_lower_texture_projection */
assert(!ir->projector);
 
+   /* Should be lowered */
+   assert(!ir->offset || !ir->offset->type->is_array());
+
/* Generate code to compute all the subexpression trees.  This has to be
 * done before loading any values into MRFs for the sampler message since
 * generating these values may involve SEND messages that need the MRFs.
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 4/4] i965: Add lowering pass for splitting textureGatherOffsets

2013-10-14 Thread Chris Forbes
Rewrites textureGatherOffsets(s, p, offsets) into

   gvec4(
  textureGatherOffset(s, p, offsets[0]).w,
  textureGatherOffset(s, p, offsets[1]).w,
  textureGatherOffset(s, p, offsets[2]).w,
  textureGatherOffset(s, p, offsets[3]).w
  )

Signed-off-by: Chris Forbes 
---
 src/mesa/drivers/dri/i965/Makefile.sources |  1 +
 src/mesa/drivers/dri/i965/brw_context.h|  1 +
 .../drivers/dri/i965/brw_lower_offset_array.cpp| 93 ++
 src/mesa/drivers/dri/i965/brw_shader.cpp   |  1 +
 4 files changed, 96 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/brw_lower_offset_array.cpp

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index b8e83ef..816ecc1 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -68,6 +68,7 @@ i965_FILES = \
brw_gs_surface_state.c \
brw_interpolation_map.c \
brw_lower_texture_gradients.cpp \
+   brw_lower_offset_array.cpp \
brw_misc_state.c \
brw_object_purgeable.c \
brw_performance_monitor.c \
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 5725ef6..84ff552 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1731,6 +1731,7 @@ brw_program_reloc(struct brw_context *brw, uint32_t 
state_offset,
 bool brw_do_cubemap_normalize(struct exec_list *instructions);
 bool brw_lower_texture_gradients(struct brw_context *brw,
  struct exec_list *instructions);
+bool brw_do_lower_offset_arrays(struct exec_list *instructions);
 
 struct opcode_desc {
 char*name;
diff --git a/src/mesa/drivers/dri/i965/brw_lower_offset_array.cpp 
b/src/mesa/drivers/dri/i965/brw_lower_offset_array.cpp
new file mode 100644
index 000..c512406
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_lower_offset_array.cpp
@@ -0,0 +1,93 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file brw_lower_offset_array.cpp
+ *
+ * IR lower pass to decompose ir_texture ir_tg4 with an array of offsets
+ * into four ir_tg4s with a single ivec2 offset, select the .w component of 
each,
+ * and return those four values packed into a gvec4.
+ *
+ * \author Chris Forbes 
+ */
+
+#include "glsl/glsl_types.h"
+#include "glsl/ir.h"
+#include "program/prog_instruction.h" /* For WRITEMASK_* */
+
+class brw_lower_offset_array_visitor : public ir_hierarchical_visitor {
+public:
+   brw_lower_offset_array_visitor()
+   {
+  progress = false;
+   }
+
+   ir_visitor_status visit_leave(ir_texture *ir);
+
+   bool progress;
+};
+
+ir_visitor_status
+brw_lower_offset_array_visitor::visit_leave(ir_texture *ir)
+{
+   if (ir->op != ir_tg4 || !ir->offset || !ir->offset->type->is_array())
+  return visit_continue;
+
+   void *mem_ctx = ralloc_parent(ir);
+
+   ir_variable *var = new (mem_ctx) ir_variable(ir->type, "result", 
ir_var_auto);
+   base_ir->insert_before(var);
+
+   for (int i = 0; i < 4; i++) {
+  ir_texture *tex = ir->clone(mem_ctx, NULL);
+  tex->offset = new (mem_ctx) ir_dereference_array(tex->offset,
+new (mem_ctx) ir_constant(i));
+
+  ir_assignment *assign = new (mem_ctx) ir_assignment(
+new (mem_ctx) ir_dereference_variable(var),
+new (mem_ctx) ir_swizzle(tex, 3, 0, 0, 0, 1)); /* .w */
+
+  assign->write_mask = 1 << i;
+
+  base_ir->insert_before(assign);
+   }
+
+   base_ir->replace_with(new (mem_ctx) ir_dereference_variable(var));
+
+   progress = true;
+   return visit_continue;
+}
+
+extern "C" {
+
+bool
+brw_do_lower_offset_arrays(exec_list *instructions)
+{
+   brw_lower_offset_array_visitor v;
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}
+
+}
diff

[Mesa-dev] [PATCH V2 0/3] ARB_gpu_shader5 textureGather*, Part 5

2013-10-14 Thread Chris Forbes
This is the fifth (and final) piece of ARB_gpu_shader5 textureGather* support.

It turns out that unnormalized texcoords and texture offsets don't mix.
This series adds yet another lowering pass, so that *sampler2DRect work
with gather.

2-3/3 expand the lowering pass to cover integer coordinates too, and use it
for texelFetchOffset. This removes a bit of duplication.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 1/3] i965: Add lowering pass to fold offset into unnormalized coords

2013-10-14 Thread Chris Forbes
It turns out that nonzero offsets with gsampler2DRect don't work -- they
just return garbage. Work around this by folding the offset into the
coord.

Done as an IR pass rather than yet another hack in the visitors because
it's clear what's going on this way. Can possibly reuse this to replace
the existing txf coord+offset hacks.

Signed-off-by: Chris Forbes 
---
 src/mesa/drivers/dri/i965/Makefile.sources |  1 +
 src/mesa/drivers/dri/i965/brw_context.h|  1 +
 .../dri/i965/brw_lower_unnormalized_offset.cpp | 84 ++
 src/mesa/drivers/dri/i965/brw_shader.cpp   |  1 +
 4 files changed, 87 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/brw_lower_unnormalized_offset.cpp

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 816ecc1..09cb2c1 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -69,6 +69,7 @@ i965_FILES = \
brw_interpolation_map.c \
brw_lower_texture_gradients.cpp \
brw_lower_offset_array.cpp \
+   brw_lower_unnormalized_offset.cpp \
brw_misc_state.c \
brw_object_purgeable.c \
brw_performance_monitor.c \
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 84ff552..c062648 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1732,6 +1732,7 @@ bool brw_do_cubemap_normalize(struct exec_list 
*instructions);
 bool brw_lower_texture_gradients(struct brw_context *brw,
  struct exec_list *instructions);
 bool brw_do_lower_offset_arrays(struct exec_list *instructions);
+bool brw_do_lower_unnormalized_offset(struct exec_list *instructions);
 
 struct opcode_desc {
 char*name;
diff --git a/src/mesa/drivers/dri/i965/brw_lower_unnormalized_offset.cpp 
b/src/mesa/drivers/dri/i965/brw_lower_unnormalized_offset.cpp
new file mode 100644
index 000..733c289
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_lower_unnormalized_offset.cpp
@@ -0,0 +1,84 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \file brw_lower_unnormalized_offset.cpp
+ *
+ * IR lower pass to convert a texture offset into an adjusted coordinate,
+ * for use with unnormalized coordinates. At least the gather4* messages
+ * on Ivybridge and Haswell make a mess with nonzero offsets.
+ *
+ * \author Chris Forbes 
+ */
+
+#include "glsl/glsl_types.h"
+#include "glsl/ir.h"
+#include "program/prog_instruction.h" /* For WRITEMASK_* */
+
+class brw_lower_unnormalized_offset_visitor : public ir_hierarchical_visitor {
+public:
+   brw_lower_unnormalized_offset_visitor()
+   {
+  progress = false;
+   }
+
+   ir_visitor_status visit_leave(ir_texture *ir);
+
+   bool progress;
+};
+
+ir_visitor_status
+brw_lower_unnormalized_offset_visitor::visit_leave(ir_texture *ir)
+{
+   if (ir->sampler->type->sampler_dimensionality != GLSL_SAMPLER_DIM_RECT ||
+   !ir->offset || ir->op != ir_tg4)
+  return visit_continue;
+
+   void *mem_ctx = ralloc_parent(ir);
+
+   ir->coordinate = new (mem_ctx) ir_expression(
+ ir_binop_add,
+ ir->coordinate,
+ new (mem_ctx) ir_expression(
+ir_unop_i2f,
+ir->offset));
+   ir->offset = NULL;
+
+
+   progress = true;
+   return visit_continue;
+}
+
+extern "C" {
+
+bool
+brw_do_lower_unnormalized_offset(exec_list *instructions)
+{
+   brw_lower_unnormalized_offset_visitor v;
+
+   visit_list_elements(&v, instructions);
+
+   return v.progress;
+}
+
+}
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 5090d22..8816c24 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -169,6 +169,7 @@ brw_lin

[Mesa-dev] [PATCH V2 2/3] i965: Generalize coord+offset lowering pass for ir_txf

2013-10-14 Thread Chris Forbes
ir_txf expects an ivec* coordinate, and may be larger than ivec2;
shuffle things around so that this will work.

Signed-off-by: Chris Forbes 
---
 .../dri/i965/brw_lower_unnormalized_offset.cpp | 51 ++
 1 file changed, 42 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_lower_unnormalized_offset.cpp 
b/src/mesa/drivers/dri/i965/brw_lower_unnormalized_offset.cpp
index 733c289..9106726 100644
--- a/src/mesa/drivers/dri/i965/brw_lower_unnormalized_offset.cpp
+++ b/src/mesa/drivers/dri/i965/brw_lower_unnormalized_offset.cpp
@@ -50,20 +50,53 @@ public:
 ir_visitor_status
 brw_lower_unnormalized_offset_visitor::visit_leave(ir_texture *ir)
 {
-   if (ir->sampler->type->sampler_dimensionality != GLSL_SAMPLER_DIM_RECT ||
-   !ir->offset || ir->op != ir_tg4)
+   if (!ir->offset)
   return visit_continue;
 
+   if (ir->op == ir_tg4) {
+  if (ir->sampler->type->sampler_dimensionality != GLSL_SAMPLER_DIM_RECT)
+ return visit_continue;
+   }
+   else if (ir->op != ir_txf) {
+  return visit_continue;
+   }
+
void *mem_ctx = ralloc_parent(ir);
 
-   ir->coordinate = new (mem_ctx) ir_expression(
- ir_binop_add,
- ir->coordinate,
- new (mem_ctx) ir_expression(
-ir_unop_i2f,
-ir->offset));
-   ir->offset = NULL;
+   if (ir->op == ir_txf) {
+  ir_variable *var = new (mem_ctx) ir_variable(
+ir->coordinate->type, "coordinate", ir_var_auto);
+  base_ir->insert_before(var);
 
+  ir_assignment *assign = new (mem_ctx) ir_assignment(
+new (mem_ctx) ir_dereference_variable(var),
+ir->coordinate,
+NULL);
+  base_ir->insert_before(assign);
+
+  assign = new (mem_ctx) ir_assignment(
+new (mem_ctx) ir_dereference_variable(var),
+new (mem_ctx) ir_expression(
+   ir_binop_add,
+   new (mem_ctx) ir_swizzle(
+  new (mem_ctx) ir_dereference_variable(var),
+  0, 1, 2, 3, ir->offset->type->vector_elements),
+   ir->offset),
+NULL);
+  assign->write_mask = (1 << ir->offset->type->vector_elements) - 1;
+  base_ir->insert_before(assign);
+
+  ir->coordinate = new (mem_ctx) ir_dereference_variable(var);
+   } else {
+  ir->coordinate = new (mem_ctx) ir_expression(
+ir_binop_add,
+ir->coordinate,
+new (mem_ctx) ir_expression(
+   ir_unop_i2f,
+   ir->offset));
+   }
+
+   ir->offset = NULL;
 
progress = true;
return visit_continue;
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 3/3] i965: Remove ir_txf coord+offset special case in visitors

2013-10-14 Thread Chris Forbes
Just let it be handled by the lowering pass.

Signed-off-by: Chris Forbes 
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 56 ++
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 25 ++--
 2 files changed, 16 insertions(+), 65 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 8e423f6..545f23a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1092,34 +1092,19 @@ fs_visitor::emit_texture_gen5(ir_texture *ir, fs_reg 
dst, fs_reg coordinate,
const int vector_elements =
   ir->coordinate ? ir->coordinate->type->vector_elements : 0;
 
-   if (ir->offset != NULL && ir->op == ir_txf) {
-  /* It appears that the ld instruction used for txf does its
-   * address bounds check before adding in the offset.  To work
-   * around this, just add the integer offset to the integer texel
-   * coordinate, and don't put the offset in the header.
+   if (ir->offset) {
+  /* The offsets set up by the ir_texture visitor are in the
+   * m1 header, so we can't go headerless.
*/
-  ir_constant *offset = ir->offset->as_constant();
-  for (int i = 0; i < vector_elements; i++) {
-emit(ADD(fs_reg(MRF, base_mrf + mlen + i * reg_width, coordinate.type),
-  coordinate,
-  offset->value.i[i]));
-coordinate.reg_offset++;
-  }
-   } else {
-  if (ir->offset) {
-/* The offsets set up by the ir_texture visitor are in the
- * m1 header, so we can't go headerless.
- */
-header_present = true;
-mlen++;
-base_mrf--;
-  }
+  header_present = true;
+  mlen++;
+  base_mrf--;
+   }
 
-  for (int i = 0; i < vector_elements; i++) {
-emit(MOV(fs_reg(MRF, base_mrf + mlen + i * reg_width, coordinate.type),
-  coordinate));
-coordinate.reg_offset++;
-  }
+   for (int i = 0; i < vector_elements; i++) {
+  emit(MOV(fs_reg(MRF, base_mrf + mlen + i * reg_width, coordinate.type),
+   coordinate));
+  coordinate.reg_offset++;
}
mlen += vector_elements * reg_width;
 
@@ -1228,7 +1213,6 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, 
fs_reg coordinate,
 {
int reg_width = dispatch_width / 8;
bool header_present = false;
-   int offsets[3];
 
fs_reg payload = fs_reg(this, glsl_type::float_type);
fs_reg next = payload;
@@ -1301,22 +1285,8 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg 
dst, fs_reg coordinate,
   next.reg_offset++;
   break;
case ir_txf:
-  /* It appears that the ld instruction used for txf does its
-   * address bounds check before adding in the offset.  To work
-   * around this, just add the integer offset to the integer texel
-   * coordinate, and don't put the offset in the header.
-   */
-  if (ir->offset) {
-ir_constant *offset = ir->offset->as_constant();
-offsets[0] = offset->value.i[0];
-offsets[1] = offset->value.i[1];
-offsets[2] = offset->value.i[2];
-  } else {
-memset(offsets, 0, sizeof(offsets));
-  }
-
   /* Unfortunately, the parameters for LD are intermixed: u, lod, v, r. */
-  emit(ADD(next.retype(BRW_REGISTER_TYPE_D), coordinate, offsets[0]));
+  emit(MOV(next.retype(BRW_REGISTER_TYPE_D), coordinate));
   coordinate.reg_offset++;
   next.reg_offset++;
 
@@ -1324,7 +1294,7 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, 
fs_reg coordinate,
   next.reg_offset++;
 
   for (int i = 1; i < ir->coordinate->type->vector_elements; i++) {
-emit(ADD(next.retype(BRW_REGISTER_TYPE_D), coordinate, offsets[i]));
+emit(MOV(next.retype(BRW_REGISTER_TYPE_D), coordinate));
 coordinate.reg_offset++;
 next.reg_offset++;
   }
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index cf87b5c..af061c5 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -2317,28 +2317,9 @@ vec4_visitor::visit(ir_texture *ir)
   int coord_mask = (1 << ir->coordinate->type->vector_elements) - 1;
   int zero_mask = 0xf & ~coord_mask;
 
-  if (ir->offset && ir->op == ir_txf) {
-/* It appears that the ld instruction used for txf does its
- * address bounds check before adding in the offset.  To work
- * around this, just add the integer offset to the integer
- * texel coordinate, and don't put the offset in the header.
- */
-ir_constant *offset = ir->offset->as_constant();
-assert(offset);
-
-for (int j = 0; j < ir->coordinate->type->vector_elements; j++) {
-   src_reg src = coordinate;
-   src.swizzle = BRW_SWIZZLE4(BRW_GET_SWZ(src.swizzle, j),
-  

[Mesa-dev] [Bug 70123] Freeze caused by 'winsys/radeon: remove cs_queue_empty' commit

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70123

--- Comment #10 from Jeff Blake  ---
Created attachment 87588
  --> https://bugs.freedesktop.org/attachment.cgi?id=87588&action=edit
thread apply all bt

With that patch things are still freezing up, see the attachment for the gdb
output (which still has just one thread).

The command which causes the freeze is :-

compton -b --backend glx --config /dev/null

(The '--config /dev/null' is suggested by compton's maintainer to force
everthing to their defaults when trying to troubleshoot; Using my own config
file and changing the glx-related config settings within it seems to have no
effect on whatever is at fault.)

In my openbox autostart script the following line invokes compton in the
background without any problems at all :-

compton --backend glx --config /dev/null &

If the backend is changed to xrender then things run fine whether the -b switch
is used or not.

So compton crashes when using the -b switch to daemonise in conjunction with
the glx backend, and omitting the switch works around the problem.

Perhaps not unexpectedly, if I revert the following commits then things run
fine.

8bc7673ef874faa95d43c255c7fc631c2d2160c0 radeon/winsys: fix handling in
radeon_drm_cs_flush v2
0653c66ef40ac553f91b29bbda7f59f7ce6948fa winsys/radeon: remove cs_queue_empty

I'm starting to wonder if this is a bug in compton.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70123] Freeze caused by 'winsys/radeon: remove cs_queue_empty' commit

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70123

--- Comment #11 from Christian König  ---
(In reply to comment #10)
> The command which causes the freeze is :-
> 
> compton -b --backend glx --config /dev/null

Thanks for this, I can reproduce the problem now.

Not sure if that's an issue in compton or not, but it's definately a bit odd.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70378] fatal error: xmlpool/options.h: No such file or directory

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70378

--- Comment #10 from Kai  ---
(In reply to comment #7)
> Created attachment 87497 [details] [review]
> swrast: add correct include for out-of-tree builds
> 
> A slightly more overzealous version, explicitly including the builddir for
> classic swrast on top of the reported gallium/swrast.

The patch works; you can have my Tested-by.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] texelFetch*Offset from EXT_gpu_shader4

2013-10-14 Thread Mike Lothian
Hi

I'm going to try adding in support for texelFetch*Offset to try and get
anti aliasing working in the Unigine demos

I'm not sure if anything more will be required but I just wanted to check
no one else is working on this

Thanks

Mike
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fast texture upload now supports all levels

2013-10-14 Thread Chad Versace

On 10/13/2013 08:33 PM, Ian Romanick wrote:

On 10/13/2013 01:50 PM, Frank Henigman wrote:

On Fri, Oct 11, 2013 at 10:00 PM, Chad Versace
 wrote:

On 10/11/2013 10:17 AM, Courtney Goeltzenleuchter wrote:


Support all levels of a supported texture format.
---
   src/mesa/drivers/dri/i965/intel_tex_subimage.c | 13 +++--
   1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
index 4aec05d..5e46760 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
@@ -541,14 +541,13 @@ intel_texsubimage_tiled_memcpy(struct gl_context *
ctx,
  uint32_t cpp;
  mem_copy_fn mem_copy = NULL;

-   /* This fastpath is restricted to specific texture types: level 0 of
+   /* This fastpath is restricted to specific texture types:
   * a 2D BGRA, RGBA, L8 or A8 texture. It could be generalized to
support
   * more types.
   */
  if (!brw->has_llc ||
  type != GL_UNSIGNED_BYTE ||
  texImage->TexObject->Target != GL_TEXTURE_2D ||
-   texImage->Level != 0 ||
  pixels == NULL ||
  _mesa_is_bufferobj(packing->BufferObj) ||
  packing->Alignment > 4 ||
@@ -616,6 +615,16 @@ intel_texsubimage_tiled_memcpy(struct gl_context *
ctx,
  DBG("%s: level=%d offset=(%d,%d) (w,h)=(%d,%d)\n",
  __FUNCTION__, texImage->Level, xoffset, yoffset, width, height);

+   /* Adjust x and y offset based on miplevel
+*/
+   if (texImage->Level) {
+  GLuint xlevel, ylevel;
+  intel_miptree_get_image_offset(image->mt, texImage->Level, 0,
+  &xlevel, &ylevel);
+  xoffset += xlevel;
+  yoffset += ylevel;
+   }
+
  linear_to_tiled(
 xoffset * cpp, (xoffset + width) * cpp,
 yoffset, yoffset + height,



Usually when we commit performance patches like this, we state in the
commit message what the observed relative performance gain.

What gain did you see? Hardware? Benchmark? Kernel version? How many
runs?


We could quote from my patch, as this is just opening more paths into that code.
Or do you think this calls for different testing?


I think what Chad is asking is whether there's some information like
"Improves load time of application XYZ 12.3+4.5%" or similar.

In the past, we've had problems with patches that just make vague claims
of "improves performance" when we later find critical bugs in those
patches... can we just revert the code, or is it going to run the
performance of... something?

For reference, see commit 329cd6a9b and this thread from mesa-dev:

http://lists.freedesktop.org/archives/mesa-dev/2013-June/040811.html


Ian read my mind correctly. The commit message should say "Improves XYZ of
application ABC by 10.3+-1.2%", as well as state the hardware at a minimum,
and kernel version too if you're feeling gracious.

In the future, if someone discover that this patch introduces a bug, the commit
message's performance claim will prevent that someone from simply reverting the
code.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70435] [regression] Fast Texture Upload optimization results in corrupt rendering.

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70435

U. Artie Eoff  changed:

   What|Removed |Added

Summary|glTexSubImage corrupted |[regression] Fast Texture
   |rendering   |Upload optimization results
   ||in corrupt rendering.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70411] glInvalidateFramebuffer fails with GL_INVALID_ENUM

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70411

Brian Paul  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Brian Paul  ---
Thanks!  I'll commit this shortly.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swrast: add correct include for out-of-tree builds

2013-10-14 Thread Brian Paul

On 10/12/2013 10:29 AM, Emil Velikov wrote:

The xmlpool/options.h file was not accessible when building
out-of-tree leading to failure.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70378
Reported-by: Fabio Pedretti 
Tested-by: Andre Heider 
Signed-off-by: Emil Velikov 
---
  src/gallium/targets/dri-swrast/Makefile.am | 1 +
  src/mesa/drivers/dri/swrast/Makefile.am| 1 +
  2 files changed, 2 insertions(+)

diff --git a/src/gallium/targets/dri-swrast/Makefile.am 
b/src/gallium/targets/dri-swrast/Makefile.am
index 5d2f146..6b629df 100644
--- a/src/gallium/targets/dri-swrast/Makefile.am
+++ b/src/gallium/targets/dri-swrast/Makefile.am
@@ -33,6 +33,7 @@ AM_CPPFLAGS = \
-I$(top_srcdir)/src/gallium/winsys \
-I$(top_srcdir)/src/mesa \
-I$(top_srcdir)/src/mapi \
+   -I$(top_builddir)/src/mesa/drivers/dri/common \
-DGALLIUM_RBUG \
-DGALLIUM_TRACE \
-DGALLIUM_SOFTPIPE \
diff --git a/src/mesa/drivers/dri/swrast/Makefile.am 
b/src/mesa/drivers/dri/swrast/Makefile.am
index 9652583..c51ad2d 100644
--- a/src/mesa/drivers/dri/swrast/Makefile.am
+++ b/src/mesa/drivers/dri/swrast/Makefile.am
@@ -30,6 +30,7 @@ AM_CFLAGS = \
-I$(top_srcdir)/src/mapi \
-I$(top_srcdir)/src/mesa/ \
-I$(top_srcdir)/src/mesa/drivers/dri/common \
+   -I$(top_builddir)/src/mesa/drivers/dri/common \
$(DEFINES) \
$(VISIBILITY_CFLAGS)




Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: increase fs shader variant instruction cache limit by factor 4

2013-10-14 Thread Jose Fonseca


- Original Message -
> Am 11.10.2013 16:21, schrieb Brian Paul:
> > On 10/11/2013 07:11 AM, srol...@vmware.com wrote:
> >> From: Roland Scheidegger 
> >>
> >> The previous limit of of 128*1024 was reported to cause frequent
> >> recompiles
> >> in some apps due to shader variant thrashing on IRC in some apps leading
> >> to noticeable lags.
> >> Note that the LP_MAX_SHADER_VARIANTS limit (1024) was more or less
> >> impossible
> >> to reach, since even simple fragment shaders without texturing
> >> (glxgears) used
> >> more than twice than 128 instructions, hence the instruction limit
> >> would have
> >> always been reached first (excluding things like trivial shaders not
> >> writing
> >> color). Even with the new limit it is VERY likely the instruction
> >> limit is hit
> >> first.
> >> Should help with such lags due to recompiles (though other shader
> >> types have
> >> their own limits, LP_MAX_SETUP_VARIANTS and DRAW_MAX_SHADER_VARIANTS, in
> >> particular the latter seems a bit small (128)).
> >> ---
> >>   src/gallium/drivers/llvmpipe/lp_limits.h |2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/src/gallium/drivers/llvmpipe/lp_limits.h
> >> b/src/gallium/drivers/llvmpipe/lp_limits.h
> >> index af31b35..8cfab5a 100644
> >> --- a/src/gallium/drivers/llvmpipe/lp_limits.h
> >> +++ b/src/gallium/drivers/llvmpipe/lp_limits.h
> >> @@ -79,7 +79,7 @@
> >>* Max number of instructions (for all fragment shaders combined per
> >> context)
> >>* that will be kept around.
> >>*/
> >> -#define LP_MAX_SHADER_INSTRUCTIONS (128*1024)
> >> +#define LP_MAX_SHADER_INSTRUCTIONS (512*LP_MAX_SHADER_VARIANTS)
> >>
> >>   /**
> >>* Max number of setup variants that will be kept around.
> >>
> > 

Looks good to me.

> > Reviewed-by: Brian Paul 
> > 
> > Maybe the comment on LP_MAX_SHADER_INSTRUCTIONS should indicate that
> > these are LLVM IR instructions, not TGSI instructions, not machine
> > instructions.  I had to dig for a while to find that.
> 
> Oh yes that's a good idea. I think ideally we'd just count compiled size
> of the shaders but IIRC that was actually difficult to do.
> 


Agreed.

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fast texture upload now supports all levels

2013-10-14 Thread Courtney Goeltzenleuchter
Does anyone know of a test that measures frame 0 time? Or texture upload
speed?

For Smokin' Guns, I tried measuring the overall time, but an improved frame
0 time has difficulty standing out of a 2607 frame test.

I may have to create something. Suggestions for an appropriate framework?

Thanks,
 Courtney


On Mon, Oct 14, 2013 at 8:32 AM, Chad Versace
wrote:

> On 10/13/2013 08:33 PM, Ian Romanick wrote:
>
>> On 10/13/2013 01:50 PM, Frank Henigman wrote:
>>
>>> On Fri, Oct 11, 2013 at 10:00 PM, Chad Versace
>>>  wrote:
>>>
 On 10/11/2013 10:17 AM, Courtney Goeltzenleuchter wrote:

>
> Support all levels of a supported texture format.
> ---
>src/mesa/drivers/dri/i965/**intel_tex_subimage.c | 13 +++--
>1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/**intel_tex_subimage.c
> b/src/mesa/drivers/dri/i965/**intel_tex_subimage.c
> index 4aec05d..5e46760 100644
> --- a/src/mesa/drivers/dri/i965/**intel_tex_subimage.c
> +++ b/src/mesa/drivers/dri/i965/**intel_tex_subimage.c
> @@ -541,14 +541,13 @@ intel_texsubimage_tiled_**memcpy(struct
> gl_context *
> ctx,
>   uint32_t cpp;
>   mem_copy_fn mem_copy = NULL;
>
> -   /* This fastpath is restricted to specific texture types: level 0
> of
> +   /* This fastpath is restricted to specific texture types:
>* a 2D BGRA, RGBA, L8 or A8 texture. It could be generalized to
> support
>* more types.
>*/
>   if (!brw->has_llc ||
>   type != GL_UNSIGNED_BYTE ||
>   texImage->TexObject->Target != GL_TEXTURE_2D ||
> -   texImage->Level != 0 ||
>   pixels == NULL ||
>   _mesa_is_bufferobj(packing->**BufferObj) ||
>   packing->Alignment > 4 ||
> @@ -616,6 +615,16 @@ intel_texsubimage_tiled_**memcpy(struct
> gl_context *
> ctx,
>   DBG("%s: level=%d offset=(%d,%d) (w,h)=(%d,%d)\n",
>   __FUNCTION__, texImage->Level, xoffset, yoffset, width,
> height);
>
> +   /* Adjust x and y offset based on miplevel
> +*/
> +   if (texImage->Level) {
> +  GLuint xlevel, ylevel;
> +  intel_miptree_get_image_**offset(image->mt, texImage->Level, 0,
> +  &xlevel, &ylevel);
> +  xoffset += xlevel;
> +  yoffset += ylevel;
> +   }
> +
>   linear_to_tiled(
>  xoffset * cpp, (xoffset + width) * cpp,
>  yoffset, yoffset + height,
>
>
 Usually when we commit performance patches like this, we state in the
 commit message what the observed relative performance gain.

 What gain did you see? Hardware? Benchmark? Kernel version? How many
 runs?

>>>
>>> We could quote from my patch, as this is just opening more paths into
>>> that code.
>>> Or do you think this calls for different testing?
>>>
>>
>> I think what Chad is asking is whether there's some information like
>> "Improves load time of application XYZ 12.3+4.5%" or similar.
>>
>> In the past, we've had problems with patches that just make vague claims
>> of "improves performance" when we later find critical bugs in those
>> patches... can we just revert the code, or is it going to run the
>> performance of... something?
>>
>> For reference, see commit 329cd6a9b and this thread from mesa-dev:
>>
>> http://lists.freedesktop.org/**archives/mesa-dev/2013-June/**040811.html
>>
>
> Ian read my mind correctly. The commit message should say "Improves XYZ of
> application ABC by 10.3+-1.2%", as well as state the hardware at a minimum,
> and kernel version too if you're feeling gracious.
>
> In the future, if someone discover that this patch introduces a bug, the
> commit
> message's performance claim will prevent that someone from simply
> reverting the
> code.
>
>


-- 
Courtney Goeltzenleuchter
LunarG
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] texelFetch*Offset from EXT_gpu_shader4

2013-10-14 Thread Ian Romanick
On 10/14/2013 06:40 AM, Mike Lothian wrote:
> Hi
> 
> I'm going to try adding in support for texelFetch*Offset to try and get
> anti aliasing working in the Unigine demos

texelFetchOffset is already supported by GLSL 1.30.

There's a lot more to EXT_gpu_shader4 than just these built in
functions.  That includes a bunch of garbage in the parser.  We're not
interested in adding (and maintaining) support for EXT_gpu_shader4 in Mesa.

> I'm not sure if anything more will be required but I just wanted to
> check no one else is working on this
> 
> Thanks
> 
> Mike
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] swrast: add correct include for out-of-tree builds

2013-10-14 Thread Emil Velikov
The xmlpool/options.h file was not accessible when building
out-of-tree leading to failure.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70378
Reported-by: Fabio Pedretti 
Tested-by: Fabio Pedretti 
Tested-by: Andre Heider 
Signed-off-by: Emil Velikov 
Reviewed-by: Brian Paul 
---
 src/gallium/targets/dri-swrast/Makefile.am | 1 +
 src/mesa/drivers/dri/swrast/Makefile.am| 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/gallium/targets/dri-swrast/Makefile.am 
b/src/gallium/targets/dri-swrast/Makefile.am
index 5d2f146..6b629df 100644
--- a/src/gallium/targets/dri-swrast/Makefile.am
+++ b/src/gallium/targets/dri-swrast/Makefile.am
@@ -33,6 +33,7 @@ AM_CPPFLAGS = \
-I$(top_srcdir)/src/gallium/winsys \
-I$(top_srcdir)/src/mesa \
-I$(top_srcdir)/src/mapi \
+   -I$(top_builddir)/src/mesa/drivers/dri/common \
-DGALLIUM_RBUG \
-DGALLIUM_TRACE \
-DGALLIUM_SOFTPIPE \
diff --git a/src/mesa/drivers/dri/swrast/Makefile.am 
b/src/mesa/drivers/dri/swrast/Makefile.am
index 9652583..c51ad2d 100644
--- a/src/mesa/drivers/dri/swrast/Makefile.am
+++ b/src/mesa/drivers/dri/swrast/Makefile.am
@@ -30,6 +30,7 @@ AM_CFLAGS = \
-I$(top_srcdir)/src/mapi \
-I$(top_srcdir)/src/mesa/ \
-I$(top_srcdir)/src/mesa/drivers/dri/common \
+   -I$(top_builddir)/src/mesa/drivers/dri/common \
$(DEFINES) \
$(VISIBILITY_CFLAGS)
 
-- 
1.8.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: Haswell resource streamer/hw-generated binding tables (v2)

2013-10-14 Thread Abdiel Janulgue
On Friday, October 11, 2013 11:39:53 AM Eric Anholt wrote:
> As a general rule, we don't land code whose purpose is performance
> improvement if it doesn't actually improve performance.  If more work is
> needed to make it actually improve performance, then we wait until then.
> 
> As I understand it, the thing that you think will make this eventually
> actually improve performance is state flagging that indicates which
> individual surfaces need updating.  Since that should improve
> performance even in the non-resource-streamer case, it can be pursued
> independently.


One optimization idea that I had in mind a few months ago was to find a way to 
reduce emission of surface state objects. Currently, we rebuild surface states 
every time we generate binding tables. The idea is to basically relocate the 
surface state indirect state objects on a separate buffer object from the 
command batch. Using the resource streamer, we can then publish the deltas 
when indices referring to them needs to be changed.

So whenever a surface needs to be used, instead of rebuilding the whole 
binding table structure the driver can essentially say on a per-slot basis 
"hey a surface got activated but it it was previously bound to index 10, lets 
rebind it to index 12".

This potentially reduces the CPU overhead of generating and uploading binding 
tables. I did a previous experiment and found out that it reduced generation 
of surfaces states to as much as 99% with this approach. What do you think?

Abdiel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: consolidate cube width=height error checking

2013-10-14 Thread Brian Paul
Instead of checking width==height in four places, just do it in
_mesa_legal_texture_dimensions() where we do the other width, height,
depth checks.  Similarly, move the check that cube map array depth is
a multiple of 6.

This change also fixes some missing cube dimension checks for the
glTexStorage[23]D() functions.

Remove width==height assertion in _mesa_get_tex_max_num_levels() since
that's called before the other size checks for glTexStorage.

Cc: "9.2" 
---
 src/mesa/main/teximage.c |   43 +--
 1 file changed, 5 insertions(+), 38 deletions(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index e6cae00..ea2f15b 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -1112,7 +1112,6 @@ _mesa_get_tex_max_num_levels(GLenum target, GLsizei 
width, GLsizei height,
case GL_TEXTURE_CUBE_MAP_ARRAY:
case GL_PROXY_TEXTURE_CUBE_MAP:
case GL_PROXY_TEXTURE_CUBE_MAP_ARRAY:
-  ASSERT(width == height);
   size = width;
   break;
case GL_TEXTURE_2D:
@@ -1447,6 +1446,8 @@ _mesa_legal_texture_dimensions(struct gl_context *ctx, 
GLenum target,
case GL_PROXY_TEXTURE_CUBE_MAP_ARB:
   maxSize = 1 << (ctx->Const.MaxCubeTextureLevels - 1);
   maxSize >>= level;
+  if (width != height)
+ return GL_FALSE;
   if (width < 2 * border || width > 2 * border + maxSize)
  return GL_FALSE;
   if (height < 2 * border || height > 2 * border + maxSize)
@@ -1500,7 +1501,9 @@ _mesa_legal_texture_dimensions(struct gl_context *ctx, 
GLenum target,
  return GL_FALSE;
   if (height < 2 * border || height > 2 * border + maxSize)
  return GL_FALSE;
-  if (depth < 1 || depth > ctx->Const.MaxArrayTextureLayers)
+  if (depth < 1 || depth > ctx->Const.MaxArrayTextureLayers || depth % 6)
+ return GL_FALSE;
+  if (width != height)
  return GL_FALSE;
   if (level >= ctx->Const.MaxCubeTextureLevels)
  return GL_FALSE;
@@ -1991,27 +1994,6 @@ texture_error_check( struct gl_context *ctx,
   }
}
 
-   if ((target == GL_PROXY_TEXTURE_CUBE_MAP_ARB ||
-_mesa_is_cube_face(target)) && width != height) {
-  _mesa_error(ctx, GL_INVALID_VALUE,
-  "glTexImage2D(cube width != height)");
-  return GL_TRUE;
-   }
-
-   if ((target == GL_PROXY_TEXTURE_CUBE_MAP_ARRAY ||
-target == GL_TEXTURE_CUBE_MAP_ARRAY) && width != height) {
-  _mesa_error(ctx, GL_INVALID_VALUE,
-  "glTexImage3D(cube array width != height)");
-  return GL_TRUE;
-   }
-
-   if ((target == GL_PROXY_TEXTURE_CUBE_MAP_ARRAY ||
-target == GL_TEXTURE_CUBE_MAP_ARRAY) && (depth % 6)) {
-  _mesa_error(ctx, GL_INVALID_VALUE,
-  "glTexImage3D(cube array depth not multiple of 6)");
-  return GL_TRUE;
-   }
-
/* Check internalFormat */
if (_mesa_base_tex_format(ctx, internalFormat) < 0) {
   _mesa_error(ctx, GL_INVALID_VALUE,
@@ -2243,14 +2225,6 @@ compressed_texture_error_check(struct gl_context *ctx, 
GLint dimensions,
   goto error;
}
 
-   /* For cube map, width must equal height */
-   if ((target == GL_PROXY_TEXTURE_CUBE_MAP_ARB ||
-_mesa_is_cube_face(target)) && width != height) {
-  reason = "width != height";
-  error = GL_INVALID_VALUE;
-  goto error;
-   }
-
/* check image size in bytes */
if (expectedSize != imageSize) {
   /* Per GL_ARB_texture_compression:  GL_INVALID_VALUE is generated [...]
@@ -2596,13 +2570,6 @@ copytexture_error_check( struct gl_context *ctx, GLuint 
dimensions,
   }
}
 
-   if ((target == GL_PROXY_TEXTURE_CUBE_MAP_ARB ||
-_mesa_is_cube_face(target)) && width != height) {
-  _mesa_error(ctx, GL_INVALID_VALUE,
-  "glTexImage2D(cube width != height)");
-  return GL_TRUE;
-   }
-
if (_mesa_is_compressed_format(ctx, internalFormat)) {
   if (!target_can_be_compressed(ctx, target, internalFormat)) {
  _mesa_error(ctx, GL_INVALID_ENUM,
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fast texture upload now supports all levels

2013-10-14 Thread Mark Mueller
Hi Courtney,
I've been doing similar work but using Xenotic as the benchmark. Here is
how I've been estimating upload times: First I made a rough determination
of a texture count, like 300, as a metric. In intelTexImage I use
clock_gettime to determine the elapsed time between the loading of texture
0 and texture 300. Then I use _mesa_debug to output the results to a log
file by setting these env variables:

export MESA_DEBUG=1;
export MESA_LOG_FILE=`pwd`/mesa.log

Cheers,
Mark



On Mon, Oct 14, 2013 at 9:00 AM, Courtney Goeltzenleuchter <
court...@lunarg.com> wrote:

> Does anyone know of a test that measures frame 0 time? Or texture upload
> speed?
>
> For Smokin' Guns, I tried measuring the overall time, but an improved
> frame 0 time has difficulty standing out of a 2607 frame test.
>
> I may have to create something. Suggestions for an appropriate framework?
>
> Thanks,
>  Courtney
>
>
> On Mon, Oct 14, 2013 at 8:32 AM, Chad Versace <
> chad.vers...@linux.intel.com> wrote:
>
>> On 10/13/2013 08:33 PM, Ian Romanick wrote:
>>
>>> On 10/13/2013 01:50 PM, Frank Henigman wrote:
>>>
 On Fri, Oct 11, 2013 at 10:00 PM, Chad Versace
  wrote:

> On 10/11/2013 10:17 AM, Courtney Goeltzenleuchter wrote:
>
>>
>> Support all levels of a supported texture format.
>> ---
>>src/mesa/drivers/dri/i965/**intel_tex_subimage.c | 13
>> +++--
>>1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/**intel_tex_subimage.c
>> b/src/mesa/drivers/dri/i965/**intel_tex_subimage.c
>> index 4aec05d..5e46760 100644
>> --- a/src/mesa/drivers/dri/i965/**intel_tex_subimage.c
>> +++ b/src/mesa/drivers/dri/i965/**intel_tex_subimage.c
>> @@ -541,14 +541,13 @@ intel_texsubimage_tiled_**memcpy(struct
>> gl_context *
>> ctx,
>>   uint32_t cpp;
>>   mem_copy_fn mem_copy = NULL;
>>
>> -   /* This fastpath is restricted to specific texture types: level 0
>> of
>> +   /* This fastpath is restricted to specific texture types:
>>* a 2D BGRA, RGBA, L8 or A8 texture. It could be generalized to
>> support
>>* more types.
>>*/
>>   if (!brw->has_llc ||
>>   type != GL_UNSIGNED_BYTE ||
>>   texImage->TexObject->Target != GL_TEXTURE_2D ||
>> -   texImage->Level != 0 ||
>>   pixels == NULL ||
>>   _mesa_is_bufferobj(packing->**BufferObj) ||
>>   packing->Alignment > 4 ||
>> @@ -616,6 +615,16 @@ intel_texsubimage_tiled_**memcpy(struct
>> gl_context *
>> ctx,
>>   DBG("%s: level=%d offset=(%d,%d) (w,h)=(%d,%d)\n",
>>   __FUNCTION__, texImage->Level, xoffset, yoffset, width,
>> height);
>>
>> +   /* Adjust x and y offset based on miplevel
>> +*/
>> +   if (texImage->Level) {
>> +  GLuint xlevel, ylevel;
>> +  intel_miptree_get_image_**offset(image->mt, texImage->Level,
>> 0,
>> +  &xlevel, &ylevel);
>> +  xoffset += xlevel;
>> +  yoffset += ylevel;
>> +   }
>> +
>>   linear_to_tiled(
>>  xoffset * cpp, (xoffset + width) * cpp,
>>  yoffset, yoffset + height,
>>
>>
> Usually when we commit performance patches like this, we state in the
> commit message what the observed relative performance gain.
>
> What gain did you see? Hardware? Benchmark? Kernel version? How many
> runs?
>

 We could quote from my patch, as this is just opening more paths into
 that code.
 Or do you think this calls for different testing?

>>>
>>> I think what Chad is asking is whether there's some information like
>>> "Improves load time of application XYZ 12.3+4.5%" or similar.
>>>
>>> In the past, we've had problems with patches that just make vague claims
>>> of "improves performance" when we later find critical bugs in those
>>> patches... can we just revert the code, or is it going to run the
>>> performance of... something?
>>>
>>> For reference, see commit 329cd6a9b and this thread from mesa-dev:
>>>
>>> http://lists.freedesktop.org/**archives/mesa-dev/2013-June/**040811.html
>>>
>>
>> Ian read my mind correctly. The commit message should say "Improves XYZ of
>> application ABC by 10.3+-1.2%", as well as state the hardware at a
>> minimum,
>> and kernel version too if you're feeling gracious.
>>
>> In the future, if someone discover that this patch introduces a bug, the
>> commit
>> message's performance claim will prevent that someone from simply
>> reverting the
>> code.
>>
>>
>
>
> --
> Courtney Goeltzenleuchter
> LunarG
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
_

[Mesa-dev] [PATCH 0/8] Implement GL_ARB_sample_shading on Intel hardware

2013-10-14 Thread Anuj Phogat
Patches listed below implement the GL_ARB_sample_shading extension
on Intel hardware >= gen6. I verified the implementation with a
number of piglit tests, currently under review on piglit mailing
list. Observed no piglit, gles3 CTS regressions with these patches
on SNB & IVB. 
These patches can also be found at my github branch:
https://github.com/aphogat/mesa.git branch: sample-shading-5

Anuj Phogat (8):
  mesa: Add infrastructure for GL_ARB_sample_shading
  mesa: Add new functions and enums required by GL_ARB_sample_shading
  mesa: Pass number of samples as a program state variable
  glsl: Add new builtins required by GL_ARB_sample_shading
  i965: Implement FS backend for ARB_sample_shading
  i965/gen6: Enable the features required for GL_ARB_sample_shading
  i965/gen7: Enable the features required for GL_ARB_sample_shading
  i965: Enable ARB_sample_shading on intel hardware >= gen6

 src/glsl/builtin_variables.cpp   |  11 +++
 src/glsl/glcpp/glcpp-parse.y |   3 +
 src/glsl/glsl_parser_extras.cpp  |   1 +
 src/glsl/glsl_parser_extras.h|   2 +
 src/glsl/link_varyings.cpp   |   2 +
 src/glsl/standalone_scaffolding.cpp  |   1 +
 src/mapi/glapi/gen/ARB_sample_shading.xml|  19 +
 src/mapi/glapi/gen/GL3x.xml  |   5 ++
 src/mapi/glapi/gen/gl_API.xml|   2 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp | 109 +++
 src/mesa/drivers/dri/i965/brw_fs.h   |   4 +
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  23 ++
 src/mesa/drivers/dri/i965/brw_wm.h   |   1 +
 src/mesa/drivers/dri/i965/gen6_wm_state.c|  67 +++-
 src/mesa/drivers/dri/i965/gen7_wm_state.c|  70 -
 src/mesa/drivers/dri/i965/intel_extensions.c |   1 +
 src/mesa/main/enable.c   |  16 
 src/mesa/main/extensions.c   |   1 +
 src/mesa/main/get.c  |   4 +
 src/mesa/main/get_hash_params.py |   3 +
 src/mesa/main/mtypes.h   |  10 ++-
 src/mesa/main/multisample.c  |  13 
 src/mesa/main/multisample.h  |   2 +
 src/mesa/main/tests/dispatch_sanity.cpp  |   4 +-
 src/mesa/program/prog_print.c|   5 ++
 src/mesa/program/prog_statevars.c|  11 +++
 src/mesa/program/prog_statevars.h|   2 +
 27 files changed, 382 insertions(+), 10 deletions(-)
 create mode 100644 src/mapi/glapi/gen/ARB_sample_shading.xml

-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/8] mesa: Add infrastructure for GL_ARB_sample_shading

2013-10-14 Thread Anuj Phogat
This patch implements the common support code required for the
GL_ARB_sample_shading extension.

Signed-off-by: Anuj Phogat 
---
 src/glsl/glcpp/glcpp-parse.y| 3 +++
 src/glsl/glsl_parser_extras.cpp | 1 +
 src/glsl/glsl_parser_extras.h   | 2 ++
 src/glsl/standalone_scaffolding.cpp | 1 +
 src/mesa/main/extensions.c  | 1 +
 src/mesa/main/mtypes.h  | 1 +
 6 files changed, 9 insertions(+)

diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y
index 02100ab..5141bdd 100644
--- a/src/glsl/glcpp/glcpp-parse.y
+++ b/src/glsl/glcpp/glcpp-parse.y
@@ -1249,6 +1249,9 @@ glcpp_parser_create (const struct gl_extensions 
*extensions, int api)
  if (extensions->ARB_shading_language_420pack)
 add_builtin_define(parser, "GL_ARB_shading_language_420pack", 
1);
 
+ if (extensions->ARB_sample_shading)
+add_builtin_define(parser, "GL_ARB_sample_shading", 1);
+
  if (extensions->EXT_shader_integer_mix)
 add_builtin_define(parser, "GL_EXT_shader_integer_mix", 1);
 
diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index f1cabf4..1be533e 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -523,6 +523,7 @@ static const _mesa_glsl_extension 
_mesa_glsl_supported_extensions[] = {
EXT(AMD_vertex_shader_layer,true,  false, 
AMD_vertex_shader_layer),
EXT(EXT_shader_integer_mix, true,  true,  
EXT_shader_integer_mix),
EXT(ARB_texture_gather, true,  false, ARB_texture_gather),
+   EXT(ARB_sample_shading, true,  false, ARB_sample_shading),
 };
 
 #undef EXT
diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h
index 26841f5..020148e 100644
--- a/src/glsl/glsl_parser_extras.h
+++ b/src/glsl/glsl_parser_extras.h
@@ -313,6 +313,8 @@ struct _mesa_glsl_parse_state {
bool AMD_vertex_shader_layer_warn;
bool ARB_shading_language_420pack_enable;
bool ARB_shading_language_420pack_warn;
+   bool ARB_sample_shading_enable;
+   bool ARB_sample_shading_warn;
bool EXT_shader_integer_mix_enable;
bool EXT_shader_integer_mix_warn;
/*@}*/
diff --git a/src/glsl/standalone_scaffolding.cpp 
b/src/glsl/standalone_scaffolding.cpp
index 7a1cf68..cbff6d1 100644
--- a/src/glsl/standalone_scaffolding.cpp
+++ b/src/glsl/standalone_scaffolding.cpp
@@ -97,6 +97,7 @@ void initialize_context_to_defaults(struct gl_context *ctx, 
gl_api api)
ctx->Extensions.ARB_explicit_attrib_location = true;
ctx->Extensions.ARB_fragment_coord_conventions = true;
ctx->Extensions.ARB_gpu_shader5 = true;
+   ctx->Extensions.ARB_sample_shading = true;
ctx->Extensions.ARB_shader_bit_encoding = true;
ctx->Extensions.ARB_shader_stencil_export = true;
ctx->Extensions.ARB_shader_texture_lod = true;
diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index 2507fdf..9e908c0 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -200,6 +200,7 @@ static const struct extension extension_table[] = {
{ "GL_EXT_polygon_offset",  o(dummy_true),  
GLL,1995 },
{ "GL_EXT_provoking_vertex",o(EXT_provoking_vertex),
GL, 2009 },
{ "GL_EXT_rescale_normal",  o(dummy_true),  
GLL,1997 },
+   { "GL_ARB_sample_shading",  o(ARB_sample_shading),  
GL, 2009 },
{ "GL_EXT_secondary_color", o(dummy_true),  
GLL,1999 },
{ "GL_EXT_separate_shader_objects", 
o(EXT_separate_shader_objects), GLL,2008 },
{ "GL_EXT_separate_specular_color", o(dummy_true),  
GLL,1997 },
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 15893ec..053514d 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3198,6 +3198,7 @@ struct gl_extensions
GLboolean ARB_occlusion_query;
GLboolean ARB_occlusion_query2;
GLboolean ARB_point_sprite;
+   GLboolean ARB_sample_shading;
GLboolean ARB_seamless_cube_map;
GLboolean ARB_shader_bit_encoding;
GLboolean ARB_shader_stencil_export;
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/8] mesa: Pass number of samples as a program state variable

2013-10-14 Thread Anuj Phogat
Number of samples will be required in fragment shader program by new
GLSL builtin uniform "gl_NumSamples".

Signed-off-by: Anuj Phogat 
---
 src/mesa/program/prog_statevars.c | 11 +++
 src/mesa/program/prog_statevars.h |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/src/mesa/program/prog_statevars.c 
b/src/mesa/program/prog_statevars.c
index 145c07c..8f798da 100644
--- a/src/mesa/program/prog_statevars.c
+++ b/src/mesa/program/prog_statevars.c
@@ -349,6 +349,9 @@ _mesa_fetch_state(struct gl_context *ctx, const 
gl_state_index state[],
  }
   }
   return;
+   case STATE_NUM_SAMPLES:
+  ((int *)value)[0] = ctx->DrawBuffer->Visual.samples;
+  return;
case STATE_DEPTH_RANGE:
   value[0] = ctx->Viewport.Near; /* near   */
   value[1] = ctx->Viewport.Far;  /* far*/
@@ -665,6 +668,9 @@ _mesa_program_state_flags(const gl_state_index 
state[STATE_LENGTH])
case STATE_PROGRAM_MATRIX:
   return _NEW_TRACK_MATRIX;
 
+   case STATE_NUM_SAMPLES:
+  return _NEW_MULTISAMPLE;
+
case STATE_DEPTH_RANGE:
   return _NEW_VIEWPORT;
 
@@ -852,6 +858,9 @@ append_token(char *dst, gl_state_index k)
case STATE_TEXENV_COLOR:
   append(dst, "texenv");
   break;
+   case STATE_NUM_SAMPLES:
+  append(dst, "num.samples");
+  break;
case STATE_DEPTH_RANGE:
   append(dst, "depth.range");
   break;
@@ -1027,6 +1036,8 @@ _mesa_program_state_string(const gl_state_index 
state[STATE_LENGTH])
   break;
case STATE_FOG_COLOR:
   break;
+   case STATE_NUM_SAMPLES:
+  break;
case STATE_DEPTH_RANGE:
   break;
case STATE_FRAGMENT_PROGRAM:
diff --git a/src/mesa/program/prog_statevars.h 
b/src/mesa/program/prog_statevars.h
index ec22b73..c3081c4 100644
--- a/src/mesa/program/prog_statevars.h
+++ b/src/mesa/program/prog_statevars.h
@@ -103,6 +103,8 @@ typedef enum gl_state_index_ {
 
STATE_TEXENV_COLOR,
 
+   STATE_NUM_SAMPLES,
+
STATE_DEPTH_RANGE,
 
STATE_VERTEX_PROGRAM,
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/8] mesa: Add new functions and enums required by GL_ARB_sample_shading

2013-10-14 Thread Anuj Phogat
New functions added by GL_ARB_sample_shading:
glMinSampleShadingARB()

New enums:
GL_SAMPLE_SHADING_ARB
GL_MIN_SAMPLE_SHADING_VALUE_ARB

Signed-off-by: Anuj Phogat 
---
 src/mapi/glapi/gen/ARB_sample_shading.xml | 19 +++
 src/mapi/glapi/gen/GL3x.xml   |  5 +
 src/mapi/glapi/gen/gl_API.xml |  2 +-
 src/mesa/main/enable.c| 16 
 src/mesa/main/get.c   |  4 
 src/mesa/main/get_hash_params.py  |  3 +++
 src/mesa/main/mtypes.h|  2 ++
 src/mesa/main/multisample.c   | 13 +
 src/mesa/main/multisample.h   |  2 ++
 src/mesa/main/tests/dispatch_sanity.cpp   |  4 ++--
 10 files changed, 67 insertions(+), 3 deletions(-)
 create mode 100644 src/mapi/glapi/gen/ARB_sample_shading.xml

diff --git a/src/mapi/glapi/gen/ARB_sample_shading.xml 
b/src/mapi/glapi/gen/ARB_sample_shading.xml
new file mode 100644
index 000..a87a517
--- /dev/null
+++ b/src/mapi/glapi/gen/ARB_sample_shading.xml
@@ -0,0 +1,19 @@
+
+
+
+
+
+
+
+
+
+   
+   
+
+   
+  
+   
+
+
+
+
diff --git a/src/mapi/glapi/gen/GL3x.xml b/src/mapi/glapi/gen/GL3x.xml
index 5078f7b..4ec4749 100644
--- a/src/mapi/glapi/gen/GL3x.xml
+++ b/src/mapi/glapi/gen/GL3x.xml
@@ -630,6 +630,11 @@
 
   
 
+  
+
+  
+
+
 
 
 
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index 48fce36..8919852 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8187,7 +8187,7 @@
 http://www.w3.org/2001/XInclude"/>
 http://www.w3.org/2001/XInclude"/>
 
-
+http://www.w3.org/2001/XInclude"/>
 http://www.w3.org/2001/XInclude"/>
 http://www.w3.org/2001/XInclude"/>
 
diff --git a/src/mesa/main/enable.c b/src/mesa/main/enable.c
index 5e2fd80..83c59ee 100644
--- a/src/mesa/main/enable.c
+++ b/src/mesa/main/enable.c
@@ -802,6 +802,15 @@ _mesa_set_enable(struct gl_context *ctx, GLenum cap, 
GLboolean state)
  ctx->Multisample.SampleCoverageInvert = state;
  break;
 
+  /*GL_ARB_sample_shading*/
+  case GL_SAMPLE_SHADING_ARB:
+ CHECK_EXTENSION(ARB_sample_shading, cap);
+ if (ctx->Multisample.SampleShading == state)
+return;
+ FLUSH_VERTICES(ctx, _NEW_MULTISAMPLE);
+ ctx->Multisample.SampleShading = state;
+ break;
+
   /* GL_IBM_rasterpos_clip */
   case GL_RASTER_POSITION_UNCLIPPED_IBM:
  if (ctx->API != API_OPENGL_COMPAT)
@@ -1594,6 +1603,13 @@ _mesa_IsEnabled( GLenum cap )
  CHECK_EXTENSION(ARB_texture_multisample);
  return ctx->Multisample.SampleMask;
 
+  /* ARB_sample_shading */
+  case GL_SAMPLE_SHADING_ARB:
+ if (!_mesa_is_desktop_gl(ctx))
+goto invalid_enum_error;
+ CHECK_EXTENSION(ARB_sample_shading);
+ return ctx->Multisample.SampleShading;
+
   default:
  goto invalid_enum_error;
}
diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 89b3bf0..c52133e 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -894,6 +894,10 @@ find_custom_value(struct gl_context *ctx, const struct 
value_desc *d, union valu
  _mesa_problem(ctx, "driver doesn't implement GetTimestamp");
   }
   break;
+   /* GL_ARB_sample_shading */
+   case GL_MIN_SAMPLE_SHADING_VALUE_ARB:
+ v->value_float = ctx->Multisample.MinSampleShadingValue;
+  break;
}
 }
 
diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py
index 9c54af0..0d7effb 100644
--- a/src/mesa/main/get_hash_params.py
+++ b/src/mesa/main/get_hash_params.py
@@ -83,6 +83,9 @@ descriptor=[
   [ "SAMPLE_BUFFERS_ARB", "BUFFER_INT(Visual.sampleBuffers), 
extra_new_buffers" ],
   [ "SAMPLES_ARB", "BUFFER_INT(Visual.samples), extra_new_buffers" ],
 
+# GL_ARB_sample_shading
+  [ "MIN_SAMPLE_SHADING_VALUE_ARB", 
"CONTEXT_FLOAT(Multisample.MinSampleShadingValue), NO_EXTRA" ],
+
 # GL_SGIS_generate_mipmap
   [ "GENERATE_MIPMAP_HINT_SGIS", "CONTEXT_ENUM(Hint.GenerateMipmap), NO_EXTRA" 
],
 
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 053514d..5520e86 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -872,6 +872,8 @@ struct gl_multisample_attrib
GLboolean SampleCoverage;
GLfloat SampleCoverageValue;
GLboolean SampleCoverageInvert;
+   GLboolean SampleShading;
+   GLfloat MinSampleShadingValue;
 
/* ARB_texture_multisample / GL3.2 additions */
GLboolean SampleMask;
diff --git a/src/mesa/main/multisample.c b/src/mesa/main/multisample.c
index bd97c50..892525e 100644
--- a/src/mesa/main/multisample.c
+++ b/src/mesa/main/multisample.c
@@ -119,6 +119,19 @@ _mesa_SampleMaski(GLuint index, GLbitfield mask)
ctx->Multisample.SampleMaskValue = mask;
 }
 
+/**
+ * Called via glMinSampleShadingARB
+ */
+void GLAPIENTRY
+_mesa_MinSampleShading(GLclampf value)
+{
+   GET_CURRENT_CONTEXT(ctx);
+
+   FLUSH_VERTICES(ctx, 0);
+
+   ctx->Multisample.MinSampleShadi

[Mesa-dev] [PATCH 4/8] glsl: Add new builtins required by GL_ARB_sample_shading

2013-10-14 Thread Anuj Phogat
New builtins added by GL_ARB_sample_shading:
in vec2 gl_SamplePosition
in int gl_SampleID
in int gl_NumSamples
out int gl_SampleMask[]

Signed-off-by: Anuj Phogat 
---
 src/glsl/builtin_variables.cpp | 11 +++
 src/glsl/link_varyings.cpp |  2 ++
 src/mesa/main/mtypes.h |  7 ++-
 src/mesa/program/prog_print.c  |  5 +
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp
index ae0a03f..c886840 100644
--- a/src/glsl/builtin_variables.cpp
+++ b/src/glsl/builtin_variables.cpp
@@ -30,6 +30,9 @@
 #include "program/prog_statevars.h"
 #include "program/prog_instruction.h"
 
+static struct gl_builtin_uniform_element gl_NumSamples_elements[] = {
+   {NULL, {STATE_NUM_SAMPLES, 0, 0}, SWIZZLE_XYZW}
+};
 
 static struct gl_builtin_uniform_element gl_DepthRange_elements[] = {
{"near", {STATE_DEPTH_RANGE, 0, 0}, SWIZZLE_},
@@ -236,6 +239,7 @@ static struct gl_builtin_uniform_element 
gl_NormalMatrix_elements[] = {
 #define STATEVAR(name) {#name, name ## _elements, Elements(name ## _elements)}
 
 static const struct gl_builtin_uniform_desc _mesa_builtin_uniform_desc[] = {
+   STATEVAR(gl_NumSamples),
STATEVAR(gl_DepthRange),
STATEVAR(gl_ClipPlane),
STATEVAR(gl_Point),
@@ -613,6 +617,7 @@ builtin_variable_generator::generate_constants()
 void
 builtin_variable_generator::generate_uniforms()
 {
+   add_uniform(int_t, "gl_NumSamples");
add_uniform(type("gl_DepthRangeParameters"), "gl_DepthRange");
add_uniform(array(vec4_t, VERT_ATTRIB_MAX), "gl_CurrentAttribVertMESA");
add_uniform(array(vec4_t, VARYING_SLOT_MAX), "gl_CurrentAttribFragMESA");
@@ -789,6 +794,12 @@ builtin_variable_generator::generate_fs_special_vars()
   if (state->AMD_shader_stencil_export_warn)
  var->warn_extension = "GL_AMD_shader_stencil_export";
}
+
+   if (state->ARB_sample_shading_enable) {
+  add_input(VARYING_SLOT_SAMPLE_ID, int_t, "gl_SampleID");
+  add_input(VARYING_SLOT_SAMPLE_POS, vec2_t, "gl_SamplePosition");
+  add_output(FRAG_RESULT_SAMPLE_MASK, array(int_t, 1), "gl_SampleMask");
+   }
 }
 
 
diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
index 4ba6d8a..3595a58 100644
--- a/src/glsl/link_varyings.cpp
+++ b/src/glsl/link_varyings.cpp
@@ -938,6 +938,8 @@ is_varying_var(GLenum shaderType, const ir_variable *var)
   case VARYING_SLOT_POS:
   case VARYING_SLOT_FACE:
   case VARYING_SLOT_PNTC:
+  case VARYING_SLOT_SAMPLE_ID:
+  case VARYING_SLOT_SAMPLE_POS:
  return false;
   default:
  return true;
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 5520e86..65ec829 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -236,6 +236,8 @@ typedef enum
VARYING_SLOT_LAYER, /* Appears as VS or GS output */
VARYING_SLOT_FACE, /* FS only */
VARYING_SLOT_PNTC, /* FS only */
+   VARYING_SLOT_SAMPLE_ID, /* FS only */
+   VARYING_SLOT_SAMPLE_POS, /* FS only */
VARYING_SLOT_VAR0, /* First generic varying slot */
VARYING_SLOT_MAX = VARYING_SLOT_VAR0 + MAX_VARYING
 } gl_varying_slot;
@@ -272,6 +274,8 @@ typedef enum
 #define VARYING_BIT_FACE BITFIELD64_BIT(VARYING_SLOT_FACE)
 #define VARYING_BIT_PNTC BITFIELD64_BIT(VARYING_SLOT_PNTC)
 #define VARYING_BIT_VAR(V) BITFIELD64_BIT(VARYING_SLOT_VAR0 + (V))
+#define VARYING_BIT_SAMPLE_ID BITFIELD64_BIT(VARYING_SLOT_SAMPLE_ID)
+#define VARYING_BIT_SAMPLE_POS BITFIELD64_BIT(VARYING_SLOT_SAMPLE_POS)
 /*@}*/
 
 
@@ -306,12 +310,13 @@ typedef enum
 * register is written.  No FRAG_RESULT_DATAn will be written.
 */
FRAG_RESULT_COLOR = 2,
+   FRAG_RESULT_SAMPLE_MASK = 3,
 
/* FRAG_RESULT_DATAn are the per-render-target (GLSL gl_FragData[n]
 * or ARB_fragment_program fragment.color[n]) color results.  If
 * any are written, FRAG_RESULT_COLOR will not be written.
 */
-   FRAG_RESULT_DATA0 = 3,
+   FRAG_RESULT_DATA0 = 4,
FRAG_RESULT_MAX = (FRAG_RESULT_DATA0 + MAX_DRAW_BUFFERS)
 } gl_frag_result;
 
diff --git a/src/mesa/program/prog_print.c b/src/mesa/program/prog_print.c
index cf85213..0c56ae6 100644
--- a/src/mesa/program/prog_print.c
+++ b/src/mesa/program/prog_print.c
@@ -150,6 +150,8 @@ arb_input_attrib_string(GLint index, GLenum progType)
   "fragment.(twenty)", /* VARYING_SLOT_LAYER */
   "fragment.(twenty-one)", /* VARYING_SLOT_FACE */
   "fragment.(twenty-two)", /* VARYING_SLOT_PNTC */
+  "fragment.(twenty-three)", /* VARYING_SLOT_SAMPLE_ID */
+  "fragment.(twenty-four)", /* VARYING_SLOT_SAMPLE_POS */
   "fragment.varying[0]",
   "fragment.varying[1]",
   "fragment.varying[2]",
@@ -274,6 +276,8 @@ arb_output_attrib_string(GLint index, GLenum progType)
   "result.(twenty)", /* VARYING_SLOT_LAYER */
   "result.(twenty-one)", /* VARYING_SLOT_FACE */
   "result.(twenty-two)", /* VARYING_SLOT_PNTC */
+  "result.(twenty-three)", /* VARYING_SLOT_SAMPLE_ID */
+  "result.(twenty-four

[Mesa-dev] [PATCH 5/8] i965: Implement FS backend for ARB_sample_shading

2013-10-14 Thread Anuj Phogat
Implement the FS backend for new builtins added by the extension:
in vec2 gl_SamplePosition
in int gl_SampleID
in int gl_NumSamples
out int gl_SampleMask[]

Signed-off-by: Anuj Phogat 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 109 +++
 src/mesa/drivers/dri/i965/brw_fs.h   |   4 +
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  23 ++
 src/mesa/drivers/dri/i965/brw_wm.h   |   1 +
 4 files changed, 137 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index e5d6e4b..e4f7745 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1115,6 +1115,109 @@ fs_visitor::emit_frontfacing_interpolation(ir_variable 
*ir)
return reg;
 }
 
+void
+fs_visitor::compute_sample_position(fs_reg dst, fs_reg int_sample_pos)
+{
+   int num_samples = ctx->DrawBuffer->Visual.samples;
+   assert(num_samples >= 0 && num_samples <= 8);
+
+   /* From arb_sample_shading specification:
+* "When rendering to a non-multisample buffer, or if multisample
+*  rasterization is disabled, gl_SamplePosition will always be
+*  (0.5, 0.5).
+*/
+   if (!ctx->Multisample.Enabled || num_samples == 0) {
+  emit(BRW_OPCODE_MOV, dst, fs_reg(0.5f));
+   }
+   else {
+  /* For num_samples = {4, 8} */
+  emit(BRW_OPCODE_MOV, dst, int_sample_pos);
+  emit(BRW_OPCODE_MUL, dst, dst, fs_reg(1 / 16.0f));
+   }
+}
+
+fs_reg *
+fs_visitor::emit_samplepos_interpolation(ir_variable *ir)
+{
+   assert(brw->gen >= 6);
+
+   this->current_annotation = "compute sample position";
+   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
+   fs_reg pos = *reg;
+   fs_reg int_sample_x = fs_reg(this, glsl_type::int_type);
+   fs_reg int_sample_y = fs_reg(this, glsl_type::int_type);
+
+   /* WM will be run in MSDISPMODE_PERSAMPLE. So, only SIMD8 mode will be
+* enabled. The X, Y sample positions come in as bytes in  thread payload.
+* Sample IDs and sample positions remain same for all four slots in a
+* subspan. So, read the positions using vstride=2, width=4, hstride=0.
+*/
+   emit(BRW_OPCODE_AND, int_sample_x,
+fs_reg(stride(retype(brw_vec1_grf(c->sample_pos_reg, 0),
+ BRW_REGISTER_TYPE_D), 2, 4, 0)),
+fs_reg(brw_imm_d(0xff)));
+
+   /* Compute gl_SamplePosition.x */
+   compute_sample_position(pos, int_sample_x);
+   pos.reg_offset++;
+
+   emit(BRW_OPCODE_SHR, int_sample_y,
+fs_reg(stride(retype(brw_vec1_grf(c->sample_pos_reg, 0),
+ BRW_REGISTER_TYPE_D), 2, 4, 0)),
+fs_reg(8));
+   emit(BRW_OPCODE_AND, int_sample_y, int_sample_y, fs_reg(brw_imm_d(0xff)));
+
+   /* Compute gl_SamplePosition.y */
+   compute_sample_position(pos, int_sample_y);
+   return reg;
+}
+
+fs_reg *
+fs_visitor::emit_sampleid_interpolation(ir_variable *ir)
+{
+   assert(brw->gen >= 6);
+   bool multisampled_fbo = ctx->DrawBuffer->Visual.samples > 1;
+
+   this->current_annotation = "compute sample id";
+   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
+
+   if (multisampled_fbo && ctx->Multisample.Enabled) {
+  fs_reg t1 = fs_reg(this, glsl_type::int_type);
+  fs_reg t2 = fs_reg(this, glsl_type::int_type);
+  t2.type = BRW_REGISTER_TYPE_UW;
+
+  /* The WM will be run in MSDISPMODE_PERSAMPLE with num_samples = 8.
+   * Therefore, subspan 0 will represent sample N (where N is 0, 2, 4
+   * or 6), subspan 1 will represent sample 1, 3, 5 or 7.  We can find
+   * the value of N by looking at R0.0 bits 7:6 ("Starting Sample Pair
+   * Index (SSPI)") and multiplying by two (since samples are always
+   * delivered in pairs). That is, we compute 2*((R0.0 & 0xc0) >> 6)
+   * == (R0.0 & 0xc0) >> 5.
+   *
+   * Then we need to add N to the sequence (0, 0, 0, 0, 1, 1, 1, 1),
+   * which we compute by populating a temporary variable with the
+   * sequence (0, 1), and then reading from it using vstride=1,
+   * width=4, hstride=0.
+   * Same holds true for num_samples = 4.
+   */
+  emit(BRW_OPCODE_AND, t1,
+   fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)),
+   fs_reg(brw_imm_d(0xc0)));
+  emit(BRW_OPCODE_SHR, t1, t1, fs_reg(5));
+  emit(BRW_OPCODE_MOV, t2, fs_reg(brw_imm_v(0x)));
+  emit(BRW_OPCODE_ADD, *reg, t1, t2);
+   }
+   else {
+  /* As per GL_ARB_sample_shading specification:
+   * "When rendering to a non-multisample buffer, or if multisample
+   *  rasterization is disabled, gl_SampleID will always be zero."
+   */
+  emit(BRW_OPCODE_MOV, *reg, fs_reg(0));
+   }
+
+   return reg;
+}
+
 fs_reg
 fs_visitor::fix_math_operand(fs_reg src)
 {
@@ -2966,7 +3069,13 @@ fs_visitor::setup_payload_gen6()
  c->nr_payload_regs++;
   }
}
+
/* R31: MSAA position offsets. */
+   if (fp->Base.InputsRead & VARYING_BIT_SAMPLE_POS) {
+  c->sample_pos_reg = c->nr_pa

[Mesa-dev] [PATCH 6/8] i965/gen6: Enable the features required for GL_ARB_sample_shading

2013-10-14 Thread Anuj Phogat
- Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE,
  GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
- Don't enable GEN6_WM_16_DISPATCH_ENABLE when GEN6_WM_MSDISPMODE_PERSAMPLE
  is enabled. Refer SNB PRM Vol. 2, Part 1, Page 279 for details.

Signed-off-by: Anuj Phogat 
---
 src/mesa/drivers/dri/i965/gen6_wm_state.c | 67 +--
 1 file changed, 64 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c 
b/src/mesa/drivers/dri/i965/gen6_wm_state.c
index c96a107..4bc25d6 100644
--- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
@@ -103,6 +103,7 @@ upload_wm_state(struct brw_context *brw)
 
/* _NEW_BUFFERS */
bool multisampled_fbo = ctx->DrawBuffer->Visual.samples > 1;
+   bool msdispmode_persample = false;
 
 /* CACHE_NEW_WM_PROG */
if (brw->wm.prog_data->nr_params == 0) {
@@ -156,7 +157,17 @@ upload_wm_state(struct brw_context *brw)
 
/* CACHE_NEW_WM_PROG */
dw5 |= GEN6_WM_8_DISPATCH_ENABLE;
-   if (brw->wm.prog_data->prog_offset_16)
+   msdispmode_persample =
+  ctx->Multisample.Enabled &&
+  (ctx->Multisample.SampleShading ||
+   brw->fragment_program->Base.InputsRead & VARYING_BIT_SAMPLE_ID ||
+   brw->fragment_program->Base.InputsRead & VARYING_BIT_SAMPLE_POS);
+
+   /* In case of non 1x (i.e 4x, 8x) multisampling with MDISPMODE_PERSAMPLE,
+* only one of SIMD8 and SIMD16 should be enabled.
+*/
+   if (brw->wm.prog_data->prog_offset_16 &&
+   !(multisampled_fbo && msdispmode_persample))
   dw5 |= GEN6_WM_16_DISPATCH_ENABLE;
 
/* CACHE_NEW_WM_PROG | _NEW_COLOR */
@@ -185,7 +196,10 @@ upload_wm_state(struct brw_context *brw)
 
/* _NEW_COLOR, _NEW_MULTISAMPLE */
if (fp->program.UsesKill || ctx->Color.AlphaEnabled ||
-   ctx->Multisample.SampleAlphaToCoverage)
+   ctx->Multisample.SampleAlphaToCoverage ||
+   (ctx->Multisample.SampleShading &&
+(fp->program.Base.OutputsWritten &
+ BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK
   dw5 |= GEN6_WM_KILL_ENABLE;
 
if (brw_color_buffer_write_enabled(brw) ||
@@ -193,6 +207,19 @@ upload_wm_state(struct brw_context *brw)
   dw5 |= GEN6_WM_DISPATCH_ENABLE;
}
 
+   /* From the SNB PRM, volume 2 part 1, page 278:
+* "This bit is inserted in the PS payload header and made available to
+* the DataPort (either via the message header or via header bypass) to
+* indicate that oMask data (one or two phases) is included in Render
+* Target Write messages. If present, the oMask data is used to mask off
+* samples."
+* TODO: [DevSNB:A0] This bit must be disabled in A Step.
+*/
+   if (ctx->Extensions.ARB_sample_shading &&
+   (brw->fragment_program->Base.OutputsWritten &
+BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK)))
+  dw5 |= GEN6_WM_OMASK_TO_RENDER_TARGET;
+
/* CACHE_NEW_WM_PROG */
dw6 |= brw->wm.prog_data->num_varying_inputs <<
   GEN6_WM_NUM_SF_OUTPUTS_SHIFT;
@@ -202,12 +229,46 @@ upload_wm_state(struct brw_context *brw)
  dw6 |= GEN6_WM_MSRAST_ON_PATTERN;
   else
  dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
-  dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
+
+  /* From arb_sample_shading specification:
+   * "If MULTISAMPLE or SAMPLE_SHADING_ARB is disabled, sample shading
+   *  has no effect."
+   *
+   * "Using gl_SampleID in a fragment shader causes the entire shader
+   *  to be evaluated per-sample."
+   * "Using gl_SamplePosition in a fragment shader causes the entire
+   *  shader to be evaluated per-sample."
+   *
+   *  I interprate the above four lines as enable the sample shading
+   *  if fragment shader uses gl_SampleID or gl_SamplePosition.
+   */
+  if (msdispmode_persample)
+ dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
+  else
+ dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
} else {
   dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
   dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
}
 
+   /* _NEW_MULTISAMPLE */
+   /* From the SNB PRM, volume 2 part 1, page 281:
+* "If the PS kernel does not need the Position XY Offsets
+* to compute a Position XY value, then this field should be
+* programmed to POSOFFSET_NONE."
+*
+* "SW Recommendation: If the PS kernel needs the Position Offsets
+* to compute a Position XY value, this field should match Position
+* ZW Interpolation Mode to ensure a consistent position.xyzw
+* computation."
+* We only require XY sample offsets. So, this recommendation doesn't
+* look useful at the moment. We might need this in future.
+*/
+   if (brw->fragment_program->Base.InputsRead & VARYING_BIT_SAMPLE_POS)
+  dw6 |= GEN6_WM_POSOFFSET_SAMPLE;
+   else
+  dw6 |= GEN6_WM_POSOFFSET_NONE;
+
BEGIN_BATCH(9);
OUT_BATCH(_3DSTATE_WM << 16 | (9 - 2));
OUT_BATCH(brw->wm.base.prog_offset);
-- 
1.8.1.4

___

[Mesa-dev] [PATCH 7/8] i965/gen7: Enable the features required for GL_ARB_sample_shading

2013-10-14 Thread Anuj Phogat
- Enable GEN7_WM_MSDISPMODE_PERSAMPLE, GEN7_WM_POSOFFSET_SAMPLE,
  GEN7_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
- Don't enable GEN7_WM_16_DISPATCH_ENABLE when GEN7_WM_MSDISPMODE_PERSAMPLE
  is enabled. Refer IVB PRM Vol. 2, Part 1, Page 288 for details.

Signed-off-by: Anuj Phogat 
---
 src/mesa/drivers/dri/i965/gen7_wm_state.c | 70 +--
 1 file changed, 67 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_state.c
index a2046c3..0267e0e 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c
@@ -82,9 +82,15 @@ upload_wm_state(struct brw_context *brw)
   GEN7_WM_BARYCENTRIC_INTERPOLATION_MODE_SHIFT;
 
/* _NEW_COLOR, _NEW_MULTISAMPLE */
+   /* Enable if the pixel shader kernel generates and outputs oMask.
+*/
if (fp->program.UsesKill || ctx->Color.AlphaEnabled ||
-   ctx->Multisample.SampleAlphaToCoverage)
+   ctx->Multisample.SampleAlphaToCoverage ||
+   (ctx->Multisample.SampleShading &&
+(fp->program.Base.OutputsWritten &
+ BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK {
   dw1 |= GEN7_WM_KILL_ENABLE;
+   }
 
/* _NEW_BUFFERS */
if (brw_color_buffer_write_enabled(brw) || writes_depth ||
@@ -97,7 +103,25 @@ upload_wm_state(struct brw_context *brw)
  dw1 |= GEN7_WM_MSRAST_ON_PATTERN;
   else
  dw1 |= GEN7_WM_MSRAST_OFF_PIXEL;
-  dw2 |= GEN7_WM_MSDISPMODE_PERPIXEL;
+  /* From arb_sample_shading specification:
+   * "If MULTISAMPLE or SAMPLE_SHADING_ARB is disabled, sample shading
+   *  has no effect."
+   *
+   * "Using gl_SampleID in a fragment shader causes the entire shader
+   *  to be evaluated per-sample."
+   * "Using gl_SamplePosition in a fragment shader causes the entire
+   *  shader to be evaluated per-sample."
+   *
+   *  I interprate the above four lines as enable the sample shading
+   *  if fragment shader uses gl_SampleID or gl_SamplePosition.
+   */
+  if (ctx->Multisample.Enabled &&
+  (ctx->Multisample.SampleShading ||
+   brw->fragment_program->Base.InputsRead & VARYING_BIT_SAMPLE_ID ||
+   brw->fragment_program->Base.InputsRead & VARYING_BIT_SAMPLE_POS))
+ dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE;
+  else
+ dw2 |= GEN7_WM_MSDISPMODE_PERPIXEL;
} else {
   dw1 |= GEN7_WM_MSRAST_OFF_PIXEL;
   dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE;
@@ -129,6 +153,8 @@ upload_ps_state(struct brw_context *brw)
uint32_t dw2, dw4, dw5;
const int max_threads_shift = brw->is_haswell ?
   HSW_PS_MAX_THREADS_SHIFT : IVB_PS_MAX_THREADS_SHIFT;
+   bool msdispmode_persample = false;
+   bool multisampled_fbo = ctx->DrawBuffer->Visual.samples > 1;
 
/* BRW_NEW_PS_BINDING_TABLE */
BEGIN_BATCH(2);
@@ -169,6 +195,34 @@ upload_ps_state(struct brw_context *brw)
if (brw->wm.prog_data->nr_params > 0)
   dw4 |= GEN7_PS_PUSH_CONSTANT_ENABLE;
 
+   /* From the IVB PRM, volume 2 part 1, page 287:
+* "This bit is inserted in the PS payload header and made available to
+* the DataPort (either via the message header or via header bypass) to
+* indicate that oMask data (one or two phases) is included in Render
+* Target Write messages. If present, the oMask data is used to mask off
+* samples."
+*/
+   if (ctx->Extensions.ARB_sample_shading &&
+   (brw->fragment_program->Base.OutputsWritten &
+BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK)))
+  dw4 |= GEN7_PS_OMASK_TO_RENDER_TARGET;
+
+   /* From the IVB PRM, volume 2 part 1, page 287:
+* "If the PS kernel does not need the Position XY Offsets to
+* compute a Position Value, then this field should be programmed
+* to POSOFFSET_NONE."
+* "SW Recommendation: If the PS kernel needs the Position Offsets
+* to compute a Position XY value, this field should match Position
+* ZW Interpolation Mode to ensure a consistent position.xyzw
+* computation."
+* We only require XY sample offsets. So, this recommendation doesn't
+* look useful at the moment. We might need this in future.
+*/
+   if (brw->fragment_program->Base.InputsRead & VARYING_BIT_SAMPLE_POS)
+  dw4 |= GEN7_PS_POSOFFSET_SAMPLE;
+   else
+  dw4 |= GEN7_PS_POSOFFSET_NONE;
+
/* CACHE_NEW_WM_PROG | _NEW_COLOR
 *
 * The hardware wedges if you have this bit set but don't turn on any dual
@@ -185,7 +239,17 @@ upload_ps_state(struct brw_context *brw)
   dw4 |= GEN7_PS_ATTRIBUTE_ENABLE;
 
dw4 |= GEN7_PS_8_DISPATCH_ENABLE;
-   if (brw->wm.prog_data->prog_offset_16)
+   msdispmode_persample =
+  ctx->Multisample.Enabled &&
+  (ctx->Multisample.SampleShading ||
+   brw->fragment_program->Base.InputsRead & VARYING_BIT_SAMPLE_ID ||
+   brw->fragment_program->Base.InputsRead & VARYING_BIT_SAMPLE_POS);
+
+   /* In case of non 1x (i.e 4x, 8x) multisampling with MDI

[Mesa-dev] [PATCH 8/8] i965: Enable ARB_sample_shading on intel hardware >= gen6

2013-10-14 Thread Anuj Phogat
Signed-off-by: Anuj Phogat 
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 6f024b4..0d71617 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -148,6 +148,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.OES_depth_texture_cube_map = true;
   ctx->Extensions.ARB_shading_language_packing = true;
   ctx->Extensions.ARB_texture_multisample = true;
+  ctx->Extensions.ARB_sample_shading = true;
 
   /* Test if the kernel has the ioctl. */
   if (drm_intel_reg_read(brw->bufmgr, TIMESTAMP, &dummy) == 0)
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Easy learning project for someone looking to get into Mesa development

2013-10-14 Thread Ian Romanick
Here's a project for anyone looking to get into Mesa development that
should be easy for anyone with decent C programming skills to tackle:
GL_ARB_texture_mirror_clamp_to_edge.

http://www.opengl.org/registry/specs/ARB/texture_mirror_clamp_to_edge.txt

This extension is effectively a subset of another extension
(GL_ATI_texture_mirror_once) that Mesa already supports.  Some hardware
with drivers in Mesa support the ATI extension, and some hardware can
only support the ARB extension.

I think this should be as easy as:

1. Add a flag to gl_extensions (src/mesa/main/mtypes.h) for the new
extension.  Please keep the list alphabetized.

2. Add the extension string to extension_table (src/mesa/main/extensions.c).

3. Update any place that checks the old extension flag to possibly also
check the new extension flag.  Some places will, some won't.

4. Update the piglit tests that exercise the ATI extension to also
exercise the ARB extension.

5. Update docs/GL3.txt. :)

I think all of the Gallium drivers that can support the ARB extension
already support the ATI extension, so there shouldn't be any extra work
there.  I'm not 100% positive on that, though.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/fs: Fix type of header register for sampler messages

2013-10-14 Thread Eric Anholt
Chris Forbes  writes:

> Previously this was float, which caused the copy from g0 to mangle
> everything.

If we face a choice of types for a raw mov, we should choose float --
it's higher performance on IVB (they can get dispatched twice as fast,
when there's a hyperthread waiting to dispatch a float op)

Patch 1/2 is

Reviewed-by: Eric Anholt 


pgpB35iDL_wwJ.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] RFC: Haswell resource streamer/hw-generated binding tables (v2)

2013-10-14 Thread Eric Anholt
Abdiel Janulgue  writes:

> On Friday, October 11, 2013 11:39:53 AM Eric Anholt wrote:
>> As a general rule, we don't land code whose purpose is performance
>> improvement if it doesn't actually improve performance.  If more work is
>> needed to make it actually improve performance, then we wait until then.
>> 
>> As I understand it, the thing that you think will make this eventually
>> actually improve performance is state flagging that indicates which
>> individual surfaces need updating.  Since that should improve
>> performance even in the non-resource-streamer case, it can be pursued
>> independently.
>
>
> One optimization idea that I had in mind a few months ago was to find a way 
> to 
> reduce emission of surface state objects. Currently, we rebuild surface 
> states 
> every time we generate binding tables. The idea is to basically relocate the 
> surface state indirect state objects on a separate buffer object from the 
> command batch. Using the resource streamer, we can then publish the deltas 
> when indices referring to them needs to be changed.
>
> So whenever a surface needs to be used, instead of rebuilding the whole 
> binding table structure the driver can essentially say on a per-slot basis 
> "hey a surface got activated but it it was previously bound to index 10, lets 
> rebind it to index 12".
>
> This potentially reduces the CPU overhead of generating and uploading binding 
> tables. I did a previous experiment and found out that it reduced generation 
> of surfaces states to as much as 99% with this approach. What do you think?

This has the downside that new batches implicitly reference the surfaces
that were referenced by old batches.  Imagine a video player that's
uploading a new frame to a new BO every time -- until the surface cache
BO wraps, the old BO stays referenced and app memory usage just goes up
and up.  The workaround is to have hash table you look into at BO free
time that tells you what relocations to rip out of the surface cache.

There's a bunch of overhead you get into when doing this, which is why
we moved away from surface state caching originally.


pgpyuwNfxqZKT.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fast texture upload now supports all levels

2013-10-14 Thread Eric Anholt
Courtney Goeltzenleuchter  writes:

> Does anyone know of a test that measures frame 0 time? Or texture upload
> speed?
>
> For Smokin' Guns, I tried measuring the overall time, but an improved frame
> 0 time has difficulty standing out of a 2607 frame test.
>
> I may have to create something. Suggestions for an appropriate framework?

Run an apitrace replay on a trace trimmed to frame ?  Sure, apitrace isn't real-world benchmarking
for fps, but it should get at the "how much did we cut off of load
time", assuming that texture load time isn't swamped by apitrace
decompression.


pgpKFzwvCz4yn.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fast texture upload now supports all levels

2013-10-14 Thread Eric Anholt
Courtney Goeltzenleuchter  writes:

> Does anyone know of a test that measures frame 0 time? Or texture upload
> speed?
>
> For Smokin' Guns, I tried measuring the overall time, but an improved frame
> 0 time has difficulty standing out of a 2607 frame test.
>
> I may have to create something. Suggestions for an appropriate framework?

Oh, and what some of us are using to actually do the stats:

http://anholt.net/compare-perf/


pgpZWLIhetxit.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/8] mesa: Add infrastructure for GL_ARB_sample_shading

2013-10-14 Thread Matt Turner
On Mon, Oct 14, 2013 at 10:12 AM, Anuj Phogat  wrote:
> diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
> index 2507fdf..9e908c0 100644
> --- a/src/mesa/main/extensions.c
> +++ b/src/mesa/main/extensions.c
> @@ -200,6 +200,7 @@ static const struct extension extension_table[] = {
> { "GL_EXT_polygon_offset",  o(dummy_true),
>   GLL,1995 },
> { "GL_EXT_provoking_vertex",o(EXT_provoking_vertex),  
>   GL, 2009 },
> { "GL_EXT_rescale_normal",  o(dummy_true),
>   GLL,1997 },
> +   { "GL_ARB_sample_shading",  o(ARB_sample_shading),
>   GL, 2009 },

This should go in the ARB extension list above rather than with the
EXT extensions.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/8] mesa: Add new functions and enums required by GL_ARB_sample_shading

2013-10-14 Thread Matt Turner
On Mon, Oct 14, 2013 at 10:12 AM, Anuj Phogat  wrote:
> diff --git a/src/mapi/glapi/gen/GL3x.xml b/src/mapi/glapi/gen/GL3x.xml
> index 5078f7b..4ec4749 100644
> --- a/src/mapi/glapi/gen/GL3x.xml
> +++ b/src/mapi/glapi/gen/GL3x.xml
> @@ -630,6 +630,11 @@
>  
>
>
> +  
> +
> +  
> +
> +
>  
>
>  

Does this need to be inside a new  since
MinSampleShading is new in GL 4.0?

> diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
> index 48fce36..8919852 100644
> --- a/src/mapi/glapi/gen/gl_API.xml
> +++ b/src/mapi/glapi/gen/gl_API.xml
> @@ -8187,7 +8187,7 @@
>   xmlns:xi="http://www.w3.org/2001/XInclude"/>
>   xmlns:xi="http://www.w3.org/2001/XInclude"/>
>
> -
> + xmlns:xi="http://www.w3.org/2001/XInclude"/>
>   xmlns:xi="http://www.w3.org/2001/XInclude"/>
>   xmlns:xi="http://www.w3.org/2001/XInclude"/>
>  
> diff --git a/src/mesa/main/enable.c b/src/mesa/main/enable.c
> index 5e2fd80..83c59ee 100644
> --- a/src/mesa/main/enable.c
> +++ b/src/mesa/main/enable.c
> @@ -802,6 +802,15 @@ _mesa_set_enable(struct gl_context *ctx, GLenum cap, 
> GLboolean state)
>   ctx->Multisample.SampleCoverageInvert = state;
>   break;
>
> +  /*GL_ARB_sample_shading*/
> +  case GL_SAMPLE_SHADING_ARB:
> + CHECK_EXTENSION(ARB_sample_shading, cap);
> + if (ctx->Multisample.SampleShading == state)
> +return;
> + FLUSH_VERTICES(ctx, _NEW_MULTISAMPLE);
> + ctx->Multisample.SampleShading = state;
> + break;
> +
>/* GL_IBM_rasterpos_clip */
>case GL_RASTER_POSITION_UNCLIPPED_IBM:
>   if (ctx->API != API_OPENGL_COMPAT)
> @@ -1594,6 +1603,13 @@ _mesa_IsEnabled( GLenum cap )
>   CHECK_EXTENSION(ARB_texture_multisample);
>   return ctx->Multisample.SampleMask;
>
> +  /* ARB_sample_shading */
> +  case GL_SAMPLE_SHADING_ARB:
> + if (!_mesa_is_desktop_gl(ctx))
> +goto invalid_enum_error;
> + CHECK_EXTENSION(ARB_sample_shading);
> + return ctx->Multisample.SampleShading;
> +
>default:
>   goto invalid_enum_error;
> }
> diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
> index 89b3bf0..c52133e 100644
> --- a/src/mesa/main/get.c
> +++ b/src/mesa/main/get.c
> @@ -894,6 +894,10 @@ find_custom_value(struct gl_context *ctx, const struct 
> value_desc *d, union valu
>   _mesa_problem(ctx, "driver doesn't implement GetTimestamp");
>}
>break;
> +   /* GL_ARB_sample_shading */
> +   case GL_MIN_SAMPLE_SHADING_VALUE_ARB:
> + v->value_float = ctx->Multisample.MinSampleShadingValue;
> +  break;
> }
>  }
>
> diff --git a/src/mesa/main/get_hash_params.py 
> b/src/mesa/main/get_hash_params.py
> index 9c54af0..0d7effb 100644
> --- a/src/mesa/main/get_hash_params.py
> +++ b/src/mesa/main/get_hash_params.py
> @@ -83,6 +83,9 @@ descriptor=[
>[ "SAMPLE_BUFFERS_ARB", "BUFFER_INT(Visual.sampleBuffers), 
> extra_new_buffers" ],
>[ "SAMPLES_ARB", "BUFFER_INT(Visual.samples), extra_new_buffers" ],
>
> +# GL_ARB_sample_shading
> +  [ "MIN_SAMPLE_SHADING_VALUE_ARB", 
> "CONTEXT_FLOAT(Multisample.MinSampleShadingValue), NO_EXTRA" ],
> +
>  # GL_SGIS_generate_mipmap
>[ "GENERATE_MIPMAP_HINT_SGIS", "CONTEXT_ENUM(Hint.GenerateMipmap), 
> NO_EXTRA" ],
>
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index 053514d..5520e86 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -872,6 +872,8 @@ struct gl_multisample_attrib
> GLboolean SampleCoverage;
> GLfloat SampleCoverageValue;
> GLboolean SampleCoverageInvert;
> +   GLboolean SampleShading;
> +   GLfloat MinSampleShadingValue;
>
> /* ARB_texture_multisample / GL3.2 additions */
> GLboolean SampleMask;
> diff --git a/src/mesa/main/multisample.c b/src/mesa/main/multisample.c
> index bd97c50..892525e 100644
> --- a/src/mesa/main/multisample.c
> +++ b/src/mesa/main/multisample.c
> @@ -119,6 +119,19 @@ _mesa_SampleMaski(GLuint index, GLbitfield mask)
> ctx->Multisample.SampleMaskValue = mask;
>  }
>
> +/**
> + * Called via glMinSampleShadingARB
> + */
> +void GLAPIENTRY
> +_mesa_MinSampleShading(GLclampf value)
> +{
> +   GET_CURRENT_CONTEXT(ctx);
> +
> +   FLUSH_VERTICES(ctx, 0);
> +
> +   ctx->Multisample.MinSampleShadingValue = (GLfloat) CLAMP(value, 0.0, 1.0);
> +   ctx->NewState |= _NEW_MULTISAMPLE;
> +}
>
>  /**
>   * Helper for checking a requested sample count against the limit
> diff --git a/src/mesa/main/multisample.h b/src/mesa/main/multisample.h
> index 66848d2..7441d3e 100644
> --- a/src/mesa/main/multisample.h
> +++ b/src/mesa/main/multisample.h
> @@ -44,6 +44,8 @@ _mesa_GetMultisamplefv(GLenum pname, GLuint index, GLfloat* 
> val);
>  extern void GLAPIENTRY
>  _mesa_SampleMaski(GLuint index, GLbitfield mask);
>
> +extern void GLAPIENTRY
> +_mesa_MinSampleShading(GLclampf value);
>
>  extern GLenum
>  _mesa_check_sample_count(struct gl_context *ctx, GLenum target,
> diff --git a/src/mesa/m

Re: [Mesa-dev] [PATCH 2/2] i965/fs: Fix type of header register for sampler messages

2013-10-14 Thread Chris Forbes
OK, so this needs to lose the retype() on both sides.

On Tue, Oct 15, 2013 at 6:44 AM, Eric Anholt  wrote:
> Chris Forbes  writes:
>
>> Previously this was float, which caused the copy from g0 to mangle
>> everything.
>
> If we face a choice of types for a raw mov, we should choose float --
> it's higher performance on IVB (they can get dispatched twice as fast,
> when there's a hyperthread waiting to dispatch a float op)
>
> Patch 1/2 is
>
> Reviewed-by: Eric Anholt 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/sb: fix issue with DCE between GVN and GCM (v2)

2013-10-14 Thread Vadim Girlin
We can't perform DCE using the liveness pass between GVN and GCM
because it relies on the correct schedule, but GVN doesn't care about
preserving correctness - it's rescheduled later by GCM.

This patch makes dce_cleanup pass perform simple DCE
between GVN and GCM instead of relying on liveness pass.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=70088

Signed-off-by: Vadim Girlin 
---
 src/gallium/drivers/r600/sb/sb_core.cpp| 10 --
 src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp | 22 ++
 src/gallium/drivers/r600/sb/sb_pass.h  |  7 +--
 src/gallium/drivers/r600/sb/sb_shader.h| 12 
 4 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_core.cpp 
b/src/gallium/drivers/r600/sb/sb_core.cpp
index b5dd88e..9fd9d9a 100644
--- a/src/gallium/drivers/r600/sb/sb_core.cpp
+++ b/src/gallium/drivers/r600/sb/sb_core.cpp
@@ -184,6 +184,8 @@ int r600_sb_bytecode_process(struct r600_context *rctx,
SB_RUN_PASS(psi_ops,1);
 
SB_RUN_PASS(liveness,   0);
+
+   sh->dce_flags = DF_REMOVE_DEAD | DF_EXPAND;
SB_RUN_PASS(dce_cleanup,0);
SB_RUN_PASS(def_use,0);
 
@@ -201,9 +203,10 @@ int r600_sb_bytecode_process(struct r600_context *rctx,
 
SB_RUN_PASS(gvn,1);
 
-   SB_RUN_PASS(liveness,   0);
+   SB_RUN_PASS(def_use,1);
+
+   sh->dce_flags = DF_REMOVE_DEAD | DF_REMOVE_UNUSED;
SB_RUN_PASS(dce_cleanup,1);
-   SB_RUN_PASS(def_use,0);
 
SB_RUN_PASS(ra_split,   0);
SB_RUN_PASS(def_use,0);
@@ -217,6 +220,9 @@ int r600_sb_bytecode_process(struct r600_context *rctx,
sh->compute_interferences = true;
SB_RUN_PASS(liveness,   0);
 
+   sh->dce_flags = DF_REMOVE_DEAD;
+   SB_RUN_PASS(dce_cleanup,1);
+
SB_RUN_PASS(ra_coalesce,1);
SB_RUN_PASS(ra_init,1);
 
diff --git a/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp 
b/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp
index f879395..79aef91 100644
--- a/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp
+++ b/src/gallium/drivers/r600/sb/sb_dce_cleanup.cpp
@@ -56,7 +56,8 @@ bool dce_cleanup::visit(cf_node& n, bool enter) {
else
cleanup_dst(n);
} else {
-   if (n.bc.op_ptr->flags & (CF_CLAUSE | CF_BRANCH | CF_LOOP))
+   if ((sh.dce_flags & DF_EXPAND) &&
+   (n.bc.op_ptr->flags & (CF_CLAUSE | CF_BRANCH | 
CF_LOOP)))
n.expand();
}
return true;
@@ -107,19 +108,20 @@ bool dce_cleanup::visit(region_node& n, bool enter) {
 }
 
 void dce_cleanup::cleanup_dst(node& n) {
-   cleanup_dst_vec(n.dst);
+   if (!cleanup_dst_vec(n.dst) && remove_unused &&
+   !n.dst.empty() && !(n.flags & NF_DONT_KILL) && n.parent)
+   n.remove();
 }
 
 bool dce_cleanup::visit(container_node& n, bool enter) {
-   if (enter) {
+   if (enter)
cleanup_dst(n);
-   } else {
-
-   }
return true;
 }
 
-void dce_cleanup::cleanup_dst_vec(vvec& vv) {
+bool dce_cleanup::cleanup_dst_vec(vvec& vv) {
+   bool alive = false;
+
for (vvec::iterator I = vv.begin(), E = vv.end(); I != E; ++I) {
value* &v = *I;
if (!v)
@@ -128,9 +130,13 @@ void dce_cleanup::cleanup_dst_vec(vvec& vv) {
if (v->gvn_source && v->gvn_source->is_dead())
v->gvn_source = NULL;
 
-   if (v->is_dead())
+   if (v->is_dead() || (remove_unused && !v->is_rel() && !v->uses))
v = NULL;
+   else
+   alive = true;
}
+
+   return alive;
 }
 
 } // namespace r600_sb
diff --git a/src/gallium/drivers/r600/sb/sb_pass.h 
b/src/gallium/drivers/r600/sb/sb_pass.h
index 95d2a20..a3f8515 100644
--- a/src/gallium/drivers/r600/sb/sb_pass.h
+++ b/src/gallium/drivers/r600/sb/sb_pass.h
@@ -119,9 +119,12 @@ public:
 class dce_cleanup : public vpass {
using vpass::visit;
 
+   bool remove_unused;
+
 public:
 
-   dce_cleanup(shader &s) : vpass(s) {}
+   dce_cleanup(shader &s) : vpass(s),
+   remove_unused(s.dce_flags & DF_REMOVE_UNUSED) {}
 
virtual bool visit(node &n, bool enter);
virtual bool visit(alu_group_node &n, bool enter);
@@ -135,7 +138,7 @@ public:
 private:
 
void cleanup_dst(node &n);
-   void cleanup_dst_vec(vvec &vv);
+   bool cleanup_dst_vec(vvec &vv);
 
 };
 
diff --git a/src/gallium/drivers/r600/sb/sb_shader.h 
b/src/gallium/drivers/r600/sb/sb_shader.h
index e515d31..7955bba 100644
--- a/src/gallium/drivers/r600/sb/sb_shader.h
+++ b

Re: [Mesa-dev] EXT_image_dma_buf_import FD ownership

2013-10-14 Thread Kristian Høgsberg
On Fri, Oct 11, 2013 at 3:29 PM, John Sheu  wrote:
> Hello folks:
>
> About the ownership of dmabuf file descriptors that are passed into EGL.
> I'm looking in particular at this blurb from the spec:
>
>* If  is EGL_LINUX_DMA_BUF_EXT and eglCreateImageKHR fails,
>  EGL does not retain ownership of the file descriptor and it is the
>  responsibility of the application to close it."
>
> My take on this is that this is different from most users of dmabuf, or even
> file descriptors in general.  For example, mmap() doesn't own the descriptor
> passed to it; and more specifically to dmabufs, neither does (say)
> DRM_IOCTL_PRIME_HANDLE_TO_FD, or any of the V4L2 entry points that support
> dmabuf (e.g. VIDIOC_QBUF).  They all increment the refcount on the
> descriptor, not own it.
>
> Since we're still iterating drafts on the EXT_image_dma_buf_import spec --
> I'd like to see the spec specify that EGL has the similar behavior of taking
> a reference, but not owning the descriptor.
>
> As far as I see, in Mesa, only the Intel stack has implemented
> EXT_image_dma_buf_import, and the change would be fairly trivial (removing
> dri2_take_dma_buf_ownership), since it eventually just passes the FD down to
> DRM_IOCTL_PRIME_HANDLE_TO_FD.  And as far as I'm aware, the piglit
> conformance tests are the only present consumers.  (Maybe the Wayland folks
> have something to say about this too.)

We don't use the EXT_image_dma_buf_import extension, we hide all the
details in a Wayland specific extensions.  But I do agree that the fd
life-cycle semantics in the dma_buf extensions are non-standard and it
would be cleaner for it to not take ownership of the fd.

Kristian

> Hopefully the extension's still at a
> stage where we can fix up this inconsistency.
>
> -John Sheu
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fast texture upload now supports all levels

2013-10-14 Thread Chad Versace

On 10/14/2013 10:54 AM, Eric Anholt wrote:

Courtney Goeltzenleuchter  writes:


Does anyone know of a test that measures frame 0 time? Or texture upload
speed?

For Smokin' Guns, I tried measuring the overall time, but an improved frame
0 time has difficulty standing out of a 2607 frame test.

I may have to create something. Suggestions for an appropriate framework?


Run an apitrace replay on a trace trimmed to frame ?  Sure, apitrace isn't real-world benchmarking
for fps, but it should get at the "how much did we cut off of load
time", assuming that texture load time isn't swamped by apitrace
decompression.


I believe Frank has a microbenchmark for measuring texture
upload time. Maybe he can share it with us. But, data obtained from real
apps is always preferred.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70435] [regression] Fast Texture Upload optimization results in corrupt rendering.

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70435

Chad Versace  changed:

   What|Removed |Added

   Assignee|i...@freedesktop.org |chad.vers...@linux.intel.co
   ||m

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fast texture upload now supports all levels

2013-10-14 Thread Courtney Goeltzenleuchter
On Mon, Oct 14, 2013 at 12:43 PM, Chad Versace  wrote:

> On 10/14/2013 10:54 AM, Eric Anholt wrote:
>
>> Courtney Goeltzenleuchter  writes:
>>
>>  Does anyone know of a test that measures frame 0 time? Or texture upload
>>> speed?
>>>
>>> For Smokin' Guns, I tried measuring the overall time, but an improved
>>> frame
>>> 0 time has difficulty standing out of a 2607 frame test.
>>>
>>> I may have to create something. Suggestions for an appropriate framework?
>>>
>>
>> Run an apitrace replay on a trace trimmed to frame > textures are first used>?  Sure, apitrace isn't real-world benchmarking
>> for fps, but it should get at the "how much did we cut off of load
>> time", assuming that texture load time isn't swamped by apitrace
>> decompression.
>>
>
> I believe Frank has a microbenchmark for measuring texture
> upload time. Maybe he can share it with us. But, data obtained from real
> apps is always preferred.
>
>
Frank referenced mesa demos teximage as his benchmark. Right now that only
does level 0 and a small selection of texture formats. I thought I'd take a
look at extending it to cover the added formats from my patches.

I like the apitrace idea as well, that ties it (loosely) to a real app.

Thanks for the suggestions folks!
Courtney


-- 
Courtney Goeltzenleuchter
LunarG
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70435] [regression] Fast Texture Upload optimization results in corrupt rendering.

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70435

Chad Versace  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|chad.vers...@linux.intel.co |i...@freedesktop.org
   |m   |

--- Comment #1 from Chad Versace  ---
It's no surprise that this patch caused a regression. I wish Piglit had total
and complete coverage of everything :(

Artie, on which hardware did you see the regression? Please provide the GPU's
pci id. And how many channels of RAM?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70435] [regression] Fast Texture Upload optimization results in corrupt rendering.

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70435

Chad Versace  changed:

   What|Removed |Added

   Assignee|i...@freedesktop.org |chad.vers...@linux.intel.co
   ||m

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70435] [regression] Fast Texture Upload optimization results in corrupt rendering.

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70435

--- Comment #2 from U. Artie Eoff  ---
Created attachment 87619
  --> https://bugs.freedesktop.org/attachment.cgi?id=87619&action=edit
lspci -vvnn

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70435] [regression] Fast Texture Upload optimization results in corrupt rendering.

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70435

--- Comment #3 from U. Artie Eoff  ---
(In reply to comment #1)
> It's no surprise that this patch caused a regression. I wish Piglit had
> total and complete coverage of everything :(
> 
> Artie, on which hardware did you see the regression? Please provide the
> GPU's pci id. And how many channels of RAM?

Intel Ivybridge hardware... see my attached lscpi output attachment for
details.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Easy learning project for someone looking to get into Mesa development

2013-10-14 Thread Marek Olšák
I'd just add an alias "GL_ARB_texture_mirror_clamp_to_edge" ->
o(ATI_texture_mirror_once), so it would just be a one-liner change to
extensions.c.

Marek

On Mon, Oct 14, 2013 at 7:25 PM, Ian Romanick  wrote:
> Here's a project for anyone looking to get into Mesa development that
> should be easy for anyone with decent C programming skills to tackle:
> GL_ARB_texture_mirror_clamp_to_edge.
>
> http://www.opengl.org/registry/specs/ARB/texture_mirror_clamp_to_edge.txt
>
> This extension is effectively a subset of another extension
> (GL_ATI_texture_mirror_once) that Mesa already supports.  Some hardware
> with drivers in Mesa support the ATI extension, and some hardware can
> only support the ARB extension.
>
> I think this should be as easy as:
>
> 1. Add a flag to gl_extensions (src/mesa/main/mtypes.h) for the new
> extension.  Please keep the list alphabetized.
>
> 2. Add the extension string to extension_table (src/mesa/main/extensions.c).
>
> 3. Update any place that checks the old extension flag to possibly also
> check the new extension flag.  Some places will, some won't.
>
> 4. Update the piglit tests that exercise the ATI extension to also
> exercise the ARB extension.
>
> 5. Update docs/GL3.txt. :)
>
> I think all of the Gallium drivers that can support the ARB extension
> already support the ATI extension, so there shouldn't be any extra work
> there.  I'm not 100% positive on that, though.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Easy learning project for someone looking to get into Mesa development

2013-10-14 Thread Kenneth Graunke
On 10/14/2013 12:22 PM, Marek Olšák wrote:
> I'd just add an alias "GL_ARB_texture_mirror_clamp_to_edge" ->
> o(ATI_texture_mirror_once), so it would just be a one-liner change to
> extensions.c.
> 
> Marek

They're not the same though - the new ARB extension only supports
MIRROR_CLAMP_TO_EDGE, while the ATI_texture_mirror_once additionally
supports MIRROR_CLAMP.

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/fs: Fix type of header register for sampler messages

2013-10-14 Thread Kenneth Graunke
On 10/14/2013 10:44 AM, Eric Anholt wrote:
> Chris Forbes  writes:
> 
>> Previously this was float, which caused the copy from g0 to mangle
>> everything.
> 
> If we face a choice of types for a raw mov, we should choose float --
> it's higher performance on IVB (they can get dispatched twice as fast,
> when there's a hyperthread waiting to dispatch a float op)
> 
> Patch 1/2 is
> 
> Reviewed-by: Eric Anholt 

Using float for raw MOVs is unsafe for things exposed by
ARB_fragment_program, ARB_vertex_program, or fixed-function vertex
processing.  We still use ALT mode there so we get proper 0^0=1 handling
for POW.

We should really use IEEE everywhere and emit special code for POW in
ARB_vp/fp.  But that's above and beyond fixing a regression.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 69437] Composite Bypass no longer works

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69437

--- Comment #3 from U. Artie Eoff  ---
Thanks, I confirmed it fixes the issue on master... waiting on 9.2 backport.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Easy learning project for someone looking to get into Mesa development

2013-10-14 Thread Ian Romanick
On 10/14/2013 12:22 PM, Marek Olšák wrote:
> I'd just add an alias "GL_ARB_texture_mirror_clamp_to_edge" ->
> o(ATI_texture_mirror_once), so it would just be a one-liner change to
> extensions.c.

Like I said:

>> Some hardware
>> with drivers in Mesa support the ATI extension, and some hardware can
>> only support the ARB extension.

:)

> Marek
> 
> On Mon, Oct 14, 2013 at 7:25 PM, Ian Romanick  wrote:
>> Here's a project for anyone looking to get into Mesa development that
>> should be easy for anyone with decent C programming skills to tackle:
>> GL_ARB_texture_mirror_clamp_to_edge.
>>
>> http://www.opengl.org/registry/specs/ARB/texture_mirror_clamp_to_edge.txt
>>
>> This extension is effectively a subset of another extension
>> (GL_ATI_texture_mirror_once) that Mesa already supports.  Some hardware
>> with drivers in Mesa support the ATI extension, and some hardware can
>> only support the ARB extension.
>>
>> I think this should be as easy as:
>>
>> 1. Add a flag to gl_extensions (src/mesa/main/mtypes.h) for the new
>> extension.  Please keep the list alphabetized.
>>
>> 2. Add the extension string to extension_table (src/mesa/main/extensions.c).
>>
>> 3. Update any place that checks the old extension flag to possibly also
>> check the new extension flag.  Some places will, some won't.
>>
>> 4. Update the piglit tests that exercise the ATI extension to also
>> exercise the ARB extension.
>>
>> 5. Update docs/GL3.txt. :)
>>
>> I think all of the Gallium drivers that can support the ARB extension
>> already support the ATI extension, so there shouldn't be any extra work
>> there.  I'm not 100% positive on that, though.
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/7] i965: Move the common binding table offset code to brw_shader.cpp.

2013-10-14 Thread Paul Berry
On 4 October 2013 15:44, Eric Anholt  wrote:

> Now that both vec4 and fs are dynamically assigning offsets, a lot of the
> code is the same.
> ---
>

Since next_binding_table_offset is only used to into
assign_common_binding_table_offsets(), I'd prefer to see it made into a
function argument rather than a class member.  That way it wouldn't be
necessary to grep through the code to verify that no one else uses it.

With that changed, this patch is:

Reviewed-by: Paul Berry 

I already sent out a comment on patch 4/7.  The remainder of the series is:

Reviewed-by: Paul Berry 


>  src/mesa/drivers/dri/i965/brw_fs.cpp   | 33 ++
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  3 ++
>  src/mesa/drivers/dri/i965/brw_shader.cpp   | 47
> ++
>  src/mesa/drivers/dri/i965/brw_shader.h |  5 +++
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 33 +-
>  src/mesa/drivers/dri/i965/brw_vec4.h   |  1 -
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |  2 ++
>  7 files changed, 61 insertions(+), 63 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 86ff378..13c6ddc 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2943,37 +2943,10 @@ fs_visitor::setup_payload_gen6()
>  void
>  fs_visitor::assign_binding_table_offsets()
>  {
> -   int num_textures = _mesa_fls(fp->Base.SamplersUsed);
> -   int next = 0;
> +   c->prog_data.binding_table.render_target_start =
> next_binding_table_offset;
> +   next_binding_table_offset += c->key.nr_color_regions;
>
> -   c->prog_data.binding_table.render_target_start = next;
> -   next += c->key.nr_color_regions;
> -
> -   c->prog_data.base.binding_table.texture_start = next;
> -   next += num_textures;
> -
> -   if (shader) {
> -  c->prog_data.base.binding_table.ubo_start = next;
> -  next += shader->base.NumUniformBlocks;
> -   }
> -
> -   if (INTEL_DEBUG & DEBUG_SHADER_TIME) {
> -  c->prog_data.base.binding_table.shader_time_start = next;
> -  next++;
> -   }
> -
> -   if (fp->Base.UsesGather) {
> -  c->prog_data.base.binding_table.gather_texture_start = next;
> -  next += num_textures;
> -   }
> -
> -   /* This may or may not be used depending on how the compile goes. */
> -   c->prog_data.base.binding_table.pull_constants_start = next;
> -   next++;
> -
> -   assert(next < BRW_MAX_SURFACES);
> -
> -   /* c->prog_data.base.binding_table.size will be set by
> mark_surface_used. */
> +   assign_common_binding_table_offsets();
>  }
>
>  bool
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index 8fa7f9d..aa76231 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -2617,8 +2617,10 @@ fs_visitor::fs_visitor(struct brw_context *brw,
> this->c = c;
> this->brw = brw;
> this->fp = fp;
> +   this->prog = &fp->Base;
> this->shader_prog = shader_prog;
> this->prog = &fp->Base;
> +   this->stage_prog_data = &c->prog_data.base;
> this->ctx = &brw->ctx;
> this->mem_ctx = ralloc_context(NULL);
> if (shader_prog)
> @@ -2651,6 +2653,7 @@ fs_visitor::fs_visitor(struct brw_context *brw,
>
> this->force_uncompressed_stack = 0;
> this->force_sechalf_stack = 0;
> +   this->next_binding_table_offset = 0;
>
> memset(&this->param_size, 0, sizeof(this->param_size));
>  }
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp
> b/src/mesa/drivers/dri/i965/brw_shader.cpp
> index 61c4bf5..b97bb5e 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> @@ -578,3 +578,50 @@ backend_visitor::dump_instructions()
>dump_instruction(inst);
> }
>  }
> +
> +
> +/**
> + * Sets up the starting offsets for the groups of binding table entries
> + * commong to all pipeline stages.
> + *
> + * Unused groups are initialized to 0xd0d0d0d0 to make it obvious that
> they're
> + * unused but also make sure that addition of small offsets to them will
> + * trigger some of our asserts that surface indices are <
> BRW_MAX_SURFACES.
> + */
> +void
> +backend_visitor::assign_common_binding_table_offsets()
> +{
> +   int num_textures = _mesa_fls(prog->SamplersUsed);
> +
> +   stage_prog_data->binding_table.texture_start =
> next_binding_table_offset;
> +   next_binding_table_offset += num_textures;
> +
> +   if (shader) {
> +  stage_prog_data->binding_table.ubo_start =
> next_binding_table_offset;
> +  next_binding_table_offset += shader->base.NumUniformBlocks;
> +   } else {
> +  stage_prog_data->binding_table.ubo_start = 0xd0d0d0d0;
> +   }
> +
> +   if (INTEL_DEBUG & DEBUG_SHADER_TIME) {
> +  stage_prog_data->binding_table.shader_time_start =
> next_binding_table_offset;
> +  next_binding_table_offset++;
> +   } else {
> +  stage_prog_data->binding_table.s

Re: [Mesa-dev] [Bug 69437] Composite Bypass no longer works

2013-10-14 Thread Carl Worth
bugzilla-dae...@freedesktop.org writes:
> --- Comment #2 from Kristian Høgsberg  ---
> Happy I waited to push this, I came up with a much better fix:
>
> commit 360a141f24a9d00891665b7fedb77ffb116944ca
> Author: Kristian Høgsberg 
...
> Cc: 9.2 

Hi Kristian,

I tried cherry-picking this to mesa's 9.2, but the conflicts didn't look
trivial, (to me at least). I imagine they're likely more trivial to you.

Could you please cook up a version of this against 9.2 and send to
mesa-stable@ ?

Thanks,

-Carl


pgpDLq_2ma0Rp.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: In the pre-regalloc schedule, try harder at reducing reg pressure.

2013-10-14 Thread Eric Anholt
Previously, the best thing we had was to schedule the things unblocked by
the current instruction, on the hope that it would be consuming two values
at the end of their live intervals while only producing one new value.
Sometimes that wasn't the case.

Now, when an instruction is the first user of a GRF we schedule (i.e. it
will probably be the virtual_grf_def[] instruction after computing live
intervals again), penalize it by how many regs it would take up.  When an
instruction is the last user of a GRF we have to schedule (when it will
probably be the virtual_grf_end[] instruction), give it a boost by how
many regs it would free.

The new functions are made virtual (only 1 of 2 really needs to be
virtual) because I expect we'll soon lift the pre-regalloc scheduling
heuristic over to the vec4 backend.

shader-db:
total instructions in shared programs: 1512756 -> 1511604 (-0.08%)
instructions in affected programs: 10292 -> 9140 (-11.19%)
GAINED:121
LOST:  38

Improves tropics performance at my current settings by 4.50602% +/-
2.60694% (n=5).  No difference on Lightsmark (n=5).  No difference on
GLB2.7 (n=11).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445
---
 .../drivers/dri/i965/brw_schedule_instructions.cpp | 125 ++---
 1 file changed, 111 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
index b24c38c..7cb0265 100644
--- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
@@ -353,6 +353,13 @@ public:
   this->instructions_to_schedule = 0;
   this->post_reg_alloc = post_reg_alloc;
   this->time = 0;
+  if (!post_reg_alloc) {
+ this->remaining_grf_uses = rzalloc_array(mem_ctx, int, grf_count);
+ this->grf_active = rzalloc_array(mem_ctx, bool, grf_count);
+  } else {
+ this->remaining_grf_uses = NULL;
+ this->grf_active = NULL;
+  }
}
 
~instruction_scheduler()
@@ -377,6 +384,9 @@ public:
 */
virtual int issue_time(backend_instruction *inst) = 0;
 
+   virtual void mod_remaining_grf_uses(backend_instruction *inst, int mod) = 0;
+   virtual int get_grf_pressure_benefit(backend_instruction *inst) = 0;
+
void schedule_instructions(backend_instruction *next_block_header);
 
void *mem_ctx;
@@ -387,6 +397,17 @@ public:
int time;
exec_list instructions;
backend_visitor *bv;
+
+   /** Number of instructions left to schedule that reference each vgrf. */
+   int *remaining_grf_uses;
+
+   /**
+* Tracks whether each VGRF has had an instruction scheduled that uses it.
+*
+* This is used to estimate whether scheduling a new instruction will
+* increase register pressure.
+*/
+   bool *grf_active;
 };
 
 class fs_instruction_scheduler : public instruction_scheduler
@@ -398,6 +419,9 @@ public:
schedule_node *choose_instruction_to_schedule();
int issue_time(backend_instruction *inst);
fs_visitor *v;
+
+   void mod_remaining_grf_uses(backend_instruction *inst, int mod);
+   int get_grf_pressure_benefit(backend_instruction *inst);
 };
 
 fs_instruction_scheduler::fs_instruction_scheduler(fs_visitor *v,
@@ -408,6 +432,57 @@ 
fs_instruction_scheduler::fs_instruction_scheduler(fs_visitor *v,
 {
 }
 
+void
+fs_instruction_scheduler::mod_remaining_grf_uses(backend_instruction *be,
+ int mod)
+{
+   fs_inst *inst = (fs_inst *)be;
+
+   if (!remaining_grf_uses)
+  return;
+
+   if (inst->dst.file == GRF) {
+  remaining_grf_uses[inst->dst.reg] += mod;
+  if (mod < 0 && !grf_active[inst->dst.reg])
+ grf_active[inst->dst.reg] = true;
+   }
+
+   for (int i = 0; i < 3; i++) {
+  if (inst->src[i].file != GRF)
+ continue;
+
+  remaining_grf_uses[inst->src[i].reg] += mod;
+  if (mod < 0 && !grf_active[inst->src[i].reg])
+ grf_active[inst->src[i].reg] = true;
+   }
+}
+
+int
+fs_instruction_scheduler::get_grf_pressure_benefit(backend_instruction *be)
+{
+   fs_inst *inst = (fs_inst *)be;
+   int benefit = 0;
+
+   if (inst->dst.file == GRF) {
+  if (remaining_grf_uses[inst->dst.reg] == 1)
+ benefit += v->virtual_grf_sizes[inst->dst.reg];
+  if (!grf_active[inst->dst.reg])
+ benefit -= v->virtual_grf_sizes[inst->dst.reg];
+   }
+
+   for (int i = 0; i < 3; i++) {
+  if (inst->src[i].file != GRF)
+ continue;
+
+  if (remaining_grf_uses[inst->src[i].reg] == 1)
+ benefit += v->virtual_grf_sizes[inst->src[i].reg];
+  if (!grf_active[inst->src[i].reg])
+ benefit -= v->virtual_grf_sizes[inst->src[i].reg];
+   }
+
+   return benefit;
+}
+
 class vec4_instruction_scheduler : public instruction_scheduler
 {
 public:
@@ -416,6 +491,9 @@ public:
schedule_node *choose_instruction_to_schedule();
i

Re: [Mesa-dev] [PATCH 2/2] i965/fs: Fix type of header register for sampler messages

2013-10-14 Thread Eric Anholt
Kenneth Graunke  writes:

> On 10/14/2013 10:44 AM, Eric Anholt wrote:
>> Chris Forbes  writes:
>> 
>>> Previously this was float, which caused the copy from g0 to mangle
>>> everything.
>> 
>> If we face a choice of types for a raw mov, we should choose float --
>> it's higher performance on IVB (they can get dispatched twice as fast,
>> when there's a hyperthread waiting to dispatch a float op)
>> 
>> Patch 1/2 is
>> 
>> Reviewed-by: Eric Anholt 
>
> Using float for raw MOVs is unsafe for things exposed by
> ARB_fragment_program, ARB_vertex_program, or fixed-function vertex
> processing.  We still use ALT mode there so we get proper 0^0=1 handling
> for POW.
>
> We should really use IEEE everywhere and emit special code for POW in
> ARB_vp/fp.  But that's above and beyond fixing a regression.

Oh, right!  Thanks for the reminder.


pgpfhrzQ7Rgh3.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/8] mesa: Add new functions and enums required by GL_ARB_sample_shading

2013-10-14 Thread Anuj Phogat
On Mon, Oct 14, 2013 at 11:05 AM, Matt Turner  wrote:
> On Mon, Oct 14, 2013 at 10:12 AM, Anuj Phogat  wrote:
>> diff --git a/src/mapi/glapi/gen/GL3x.xml b/src/mapi/glapi/gen/GL3x.xml
>> index 5078f7b..4ec4749 100644
>> --- a/src/mapi/glapi/gen/GL3x.xml
>> +++ b/src/mapi/glapi/gen/GL3x.xml
>> @@ -630,6 +630,11 @@
>>  
>>
>>
>> +  
>> +
>> +  
>> +
>> +
>>  
>>
>>  
>
> Does this need to be inside a new  since
> MinSampleShading is new in GL 4.0?
>
That looks like right thing to do.

>> diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
>> index 48fce36..8919852 100644
>> --- a/src/mapi/glapi/gen/gl_API.xml
>> +++ b/src/mapi/glapi/gen/gl_API.xml
>> @@ -8187,7 +8187,7 @@
>>  > xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>  > xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>
>> -
>> +> xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>  > xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>  > xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>  
>> diff --git a/src/mesa/main/enable.c b/src/mesa/main/enable.c
>> index 5e2fd80..83c59ee 100644
>> --- a/src/mesa/main/enable.c
>> +++ b/src/mesa/main/enable.c
>> @@ -802,6 +802,15 @@ _mesa_set_enable(struct gl_context *ctx, GLenum cap, 
>> GLboolean state)
>>   ctx->Multisample.SampleCoverageInvert = state;
>>   break;
>>
>> +  /*GL_ARB_sample_shading*/
>> +  case GL_SAMPLE_SHADING_ARB:
>> + CHECK_EXTENSION(ARB_sample_shading, cap);
>> + if (ctx->Multisample.SampleShading == state)
>> +return;
>> + FLUSH_VERTICES(ctx, _NEW_MULTISAMPLE);
>> + ctx->Multisample.SampleShading = state;
>> + break;
>> +
>>/* GL_IBM_rasterpos_clip */
>>case GL_RASTER_POSITION_UNCLIPPED_IBM:
>>   if (ctx->API != API_OPENGL_COMPAT)
>> @@ -1594,6 +1603,13 @@ _mesa_IsEnabled( GLenum cap )
>>   CHECK_EXTENSION(ARB_texture_multisample);
>>   return ctx->Multisample.SampleMask;
>>
>> +  /* ARB_sample_shading */
>> +  case GL_SAMPLE_SHADING_ARB:
>> + if (!_mesa_is_desktop_gl(ctx))
>> +goto invalid_enum_error;
>> + CHECK_EXTENSION(ARB_sample_shading);
>> + return ctx->Multisample.SampleShading;
>> +
>>default:
>>   goto invalid_enum_error;
>> }
>> diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
>> index 89b3bf0..c52133e 100644
>> --- a/src/mesa/main/get.c
>> +++ b/src/mesa/main/get.c
>> @@ -894,6 +894,10 @@ find_custom_value(struct gl_context *ctx, const struct 
>> value_desc *d, union valu
>>   _mesa_problem(ctx, "driver doesn't implement GetTimestamp");
>>}
>>break;
>> +   /* GL_ARB_sample_shading */
>> +   case GL_MIN_SAMPLE_SHADING_VALUE_ARB:
>> + v->value_float = ctx->Multisample.MinSampleShadingValue;
>> +  break;
>> }
>>  }
>>
>> diff --git a/src/mesa/main/get_hash_params.py 
>> b/src/mesa/main/get_hash_params.py
>> index 9c54af0..0d7effb 100644
>> --- a/src/mesa/main/get_hash_params.py
>> +++ b/src/mesa/main/get_hash_params.py
>> @@ -83,6 +83,9 @@ descriptor=[
>>[ "SAMPLE_BUFFERS_ARB", "BUFFER_INT(Visual.sampleBuffers), 
>> extra_new_buffers" ],
>>[ "SAMPLES_ARB", "BUFFER_INT(Visual.samples), extra_new_buffers" ],
>>
>> +# GL_ARB_sample_shading
>> +  [ "MIN_SAMPLE_SHADING_VALUE_ARB", 
>> "CONTEXT_FLOAT(Multisample.MinSampleShadingValue), NO_EXTRA" ],
>> +
>>  # GL_SGIS_generate_mipmap
>>[ "GENERATE_MIPMAP_HINT_SGIS", "CONTEXT_ENUM(Hint.GenerateMipmap), 
>> NO_EXTRA" ],
>>
>> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
>> index 053514d..5520e86 100644
>> --- a/src/mesa/main/mtypes.h
>> +++ b/src/mesa/main/mtypes.h
>> @@ -872,6 +872,8 @@ struct gl_multisample_attrib
>> GLboolean SampleCoverage;
>> GLfloat SampleCoverageValue;
>> GLboolean SampleCoverageInvert;
>> +   GLboolean SampleShading;
>> +   GLfloat MinSampleShadingValue;
>>
>> /* ARB_texture_multisample / GL3.2 additions */
>> GLboolean SampleMask;
>> diff --git a/src/mesa/main/multisample.c b/src/mesa/main/multisample.c
>> index bd97c50..892525e 100644
>> --- a/src/mesa/main/multisample.c
>> +++ b/src/mesa/main/multisample.c
>> @@ -119,6 +119,19 @@ _mesa_SampleMaski(GLuint index, GLbitfield mask)
>> ctx->Multisample.SampleMaskValue = mask;
>>  }
>>
>> +/**
>> + * Called via glMinSampleShadingARB
>> + */
>> +void GLAPIENTRY
>> +_mesa_MinSampleShading(GLclampf value)
>> +{
>> +   GET_CURRENT_CONTEXT(ctx);
>> +
>> +   FLUSH_VERTICES(ctx, 0);
>> +
>> +   ctx->Multisample.MinSampleShadingValue = (GLfloat) CLAMP(value, 0.0, 
>> 1.0);
>> +   ctx->NewState |= _NEW_MULTISAMPLE;
>> +}
>>
>>  /**
>>   * Helper for checking a requested sample count against the limit
>> diff --git a/src/mesa/main/multisample.h b/src/mesa/main/multisample.h
>> index 66848d2..7441d3e 100644
>> --- a/src/mesa/main/multisample.h
>> +++ b/src/mesa/main/multisample.h
>> @@ -44,6 +44,8 @@ _mesa_GetMultisamplefv(GLenum pname, GLuint index, 
>> GLfloat* val);
>>  extern vo

[Mesa-dev] [PATCH] vbo: access VBO memory more efficiently when building display lists

2013-10-14 Thread Brian Paul
Use GL_MAP_INVALIDATE_RANGE, UNSYNCHRONIZED and FLUSH_EXPLICIT flags
when mapping VBOs during display list compilation.  This mirrors what
we do for immediate-mode VBO building in vbo_exec_vtx_map().

This improves performance for applications which interleave display
list compilation with execution.  For example:

glNewList(A);
glBegin/End prims;
glEndList();
glCallList(A);
glNewList(B);
glBegin/End prims;
glEndList();
glCallList(B);

Mesa's vbo module tries to combine the vertex data from lists A and B
into the same VBO when there's room.  Before, when we mapped the VBO for
building list B, we did so with GL_MAP_WRITE_BIT only.  Even though we
were writing to an unused part of the buffer, the map would stall until
the preceeding drawing call finished.

Use the extra map flags and FlushMappedBufferRange() to avoid the stall.
---
 src/mesa/vbo/vbo_save_api.c |   39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/mesa/vbo/vbo_save_api.c b/src/mesa/vbo/vbo_save_api.c
index b5f9517..411c006 100644
--- a/src/mesa/vbo/vbo_save_api.c
+++ b/src/mesa/vbo/vbo_save_api.c
@@ -237,16 +237,31 @@ GLfloat *
 vbo_save_map_vertex_store(struct gl_context *ctx,
   struct vbo_save_vertex_store *vertex_store)
 {
+   const GLbitfield access = (GL_MAP_WRITE_BIT |
+  GL_MAP_INVALIDATE_RANGE_BIT |
+  GL_MAP_UNSYNCHRONIZED_BIT |
+  GL_MAP_FLUSH_EXPLICIT_BIT);
+
assert(vertex_store->bufferobj);
-   assert(!vertex_store->buffer);
+   assert(!vertex_store->buffer);  /* the buffer should not be mapped */
+
if (vertex_store->bufferobj->Size > 0) {
-  vertex_store->buffer =
- (GLfloat *) ctx->Driver.MapBufferRange(ctx, 0,
-vertex_store->bufferobj->Size,
-GL_MAP_WRITE_BIT,  /* not used 
*/
-vertex_store->bufferobj);
-  assert(vertex_store->buffer);
-  return vertex_store->buffer + vertex_store->used;
+  /* Map the remaining free space in the VBO */
+  GLintptr offset = vertex_store->used * sizeof(GLfloat);
+  GLsizeiptr size = vertex_store->bufferobj->Size - offset;
+  GLfloat *range = (GLfloat *)
+ ctx->Driver.MapBufferRange(ctx, offset, size, access,
+vertex_store->bufferobj);
+  if (range) {
+ /* compute address of start of whole buffer (needed elsewhere) */
+ vertex_store->buffer = range - vertex_store->used;
+ assert(vertex_store->buffer);
+ return range;
+  }
+  else {
+ vertex_store->buffer = NULL;
+ return NULL;
+  }
}
else {
   /* probably ran out of memory for buffers */
@@ -260,6 +275,14 @@ vbo_save_unmap_vertex_store(struct gl_context *ctx,
 struct vbo_save_vertex_store *vertex_store)
 {
if (vertex_store->bufferobj->Size > 0) {
+  GLintptr offset = 0;
+  GLsizeiptr length = vertex_store->used * sizeof(GLfloat)
+ - vertex_store->bufferobj->Offset;
+
+  /* Explicitly flush the region we wrote to */
+  ctx->Driver.FlushMappedBufferRange(ctx, offset, length,
+ vertex_store->bufferobj);
+
   ctx->Driver.UnmapBuffer(ctx, vertex_store->bufferobj);
}
vertex_store->buffer = NULL;
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] build: remove forced -fno-rtti

2013-10-14 Thread Francisco Jerez
Alexander von Gluck IV  writes:

> * As discussed on the mailing list,
>   forced no-rtti breaks C++ public
>   API's such as the Haiku C++ libGL.so
> * -fno-rtti *can* be still set however
>   instead of blindly forcing -fno-rtti,
>   we can rely on the llvm-config
>   --cppflags output.
>   If the system llvm is built without
>   rtti (default), the no-rtti flag will be
>   present in llvm-config --cppflags
>   (which we pick up on)
>   If llvm is built with rtti
>   (REQUIRES_RTTI=1), then -fno-rtti is
>   removed from llvm-config --cppflags.
> * We could selectively add / remove rtti
>   from various components, however mixing
>   rtti and non-rtti code is tricky and
>   could introduce bugs.
> * This needs impact tested.

This looks like the right thing to do to me,

Reviewed-by: Francisco Jerez 

Thanks.

> ---
>  configure.ac  | 1 -
>  scons/llvm.py | 3 ---
>  src/gallium/auxiliary/Makefile.am | 6 --
>  3 files changed, 10 deletions(-)
>
> diff --git a/configure.ac b/configure.ac
> index 0d082d2..3335575 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -1943,7 +1943,6 @@ AM_CONDITIONAL(HAVE_LOADER_GALLIUM, test 
> x$enable_gallium_loader = xyes)
>  AM_CONDITIONAL(HAVE_DRM_LOADER_GALLIUM, test x$enable_gallium_drm_loader = 
> xyes)
>  AM_CONDITIONAL(HAVE_GALLIUM_COMPUTE, test x$enable_opencl = xyes)
>  AM_CONDITIONAL(HAVE_MESA_LLVM, test x$MESA_LLVM = x1)
> -AM_CONDITIONAL(LLVM_NEEDS_FNORTTI, test $LLVM_VERSION_INT -ge 302)
>  
>  AC_SUBST([ELF_LIB])
>  
> diff --git a/scons/llvm.py b/scons/llvm.py
> index 7cd609c..c1c3736 100644
> --- a/scons/llvm.py
> +++ b/scons/llvm.py
> @@ -195,9 +195,6 @@ def generate(env):
>  if llvm_version >= distutils.version.LooseVersion('3.1'):
>  components.append('mcjit')
>  
> -if llvm_version >= distutils.version.LooseVersion('3.2'):
> -env.Append(CXXFLAGS = ('-fno-rtti',))
> -
>  env.ParseConfig('llvm-config --libs ' + ' '.join(components))
>  env.ParseConfig('llvm-config --ldflags')
>  except OSError:
> diff --git a/src/gallium/auxiliary/Makefile.am 
> b/src/gallium/auxiliary/Makefile.am
> index 670e124..2d2d8d4 100644
> --- a/src/gallium/auxiliary/Makefile.am
> +++ b/src/gallium/auxiliary/Makefile.am
> @@ -25,12 +25,6 @@ AM_CXXFLAGS += \
>   $(GALLIUM_CFLAGS) \
>   $(LLVM_CXXFLAGS)
>  
> -if LLVM_NEEDS_FNORTTI
> -
> -AM_CXXFLAGS += -fno-rtti
> -
> -endif
> -
>  libgallium_la_SOURCES += \
>   $(GALLIVM_SOURCES) \
>   $(GALLIVM_CPP_SOURCES)
> -- 
> 1.8.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


pgp6NPk_e4pZ0.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70471] New: undefined reference to `typeinfo for llvm::format_object_base'

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70471

  Priority: medium
Bug ID: 70471
  Keywords: regression
CC: curroje...@riseup.net, kallis...@unixzen.com
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: undefined reference to `typeinfo for
llvm::format_object_base'
  Severity: blocker
Classification: Unclassified
OS: All
  Reporter: v...@freedesktop.org
  Hardware: x86-64 (AMD64)
Status: NEW
   Version: git
 Component: Other
   Product: Mesa

mesa: ce8eadb6e8adc24f675b364e0620dbf1c9e079a8 (master)

$ scons
[...]
  Linking build/linux-x86_64-debug/gallium/drivers/llvmpipe/lp_test_arit ...
build/linux-x86_64-debug/gallium/auxiliary/libgallium.a(lp_bld_debug.os):(.data.rel.ro._ZTIN4llvm14format_object1ImEE[_ZTIN4llvm14format_object1ImEE]+0x10):
undefined reference to `typeinfo for llvm::format_object_base'
build/linux-x86_64-debug/gallium/auxiliary/libgallium.a(lp_bld_debug.os):(.data.rel.ro._ZTI18BufferMemoryObject[_ZTI18BufferMemoryObject]+0x10):
undefined reference to `typeinfo for llvm::MemoryObject'
build/linux-x86_64-debug/gallium/auxiliary/libgallium.a(lp_bld_debug.os):(.data.rel.ro._ZTI17raw_debug_ostream[_ZTI17raw_debug_ostream]+0x10):
undefined reference to `typeinfo for llvm::raw_ostream'


ce8eadb6e8adc24f675b364e0620dbf1c9e079a8 is the first bad commit
commit ce8eadb6e8adc24f675b364e0620dbf1c9e079a8
Author: Alexander von Gluck IV 
Date:   Sat Oct 12 17:12:31 2013 +

build: remove forced -fno-rtti

* As discussed on the mailing list,
  forced no-rtti breaks C++ public
  API's such as the Haiku C++ libGL.so
* -fno-rtti *can* be still set however
  instead of blindly forcing -fno-rtti,
  we can rely on the llvm-config
  --cppflags output.
  If the system llvm is built without
  rtti (default), the no-rtti flag will be
  present in llvm-config --cppflags
  (which we pick up on)
  If llvm is built with rtti
  (REQUIRES_RTTI=1), then -fno-rtti is
  removed from llvm-config --cppflags.
* We could selectively add / remove rtti
  from various components, however mixing
  rtti and non-rtti code is tricky and
  could introduce missing symbols.
* This needs impact tested.

Reviewed-by: Francisco Jerez 

:100644 100644 c68e14b44c0bc24a74e4f5870562454ac4389846
309b49385ba2dfe16d3a55f98b181ce7ba9d0348 Mconfigure.ac
:04 04 d72fe30b21c27539e9f03703b0f53967321e4e47
b65a235d75fa650f6545b2416f35ee92468a66db Mscons
:04 04 02c9dba57101dfb777d34fb8393623c6dd2c923b
f9810052c6cd076b6644f99aa86805939ec0e563 Msrc
bisect run success

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70471] undefined reference to `typeinfo for llvm::format_object_base'

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70471

--- Comment #1 from Alexander von Gluck  ---
Odd. This looks like non-rtti code is getting linked against rtti code.

Was this a clean build of mesa?  (aka, running scons -c before running scons) 
I don't think the build system will pick up on changes in cflags / cppflags

If this was a clean build, could you grab the output of llvm-config --cppflags

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70471] undefined reference to `typeinfo for llvm::format_object_base'

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70471

--- Comment #2 from Alexander von Gluck  ---
Just did a clean mesa build on my ArchLinux machine + LLVM 3.3.  No issues
seen.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70471] undefined reference to `typeinfo for llvm::format_object_base'

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70471

--- Comment #3 from Vinson Lee  ---
The build failure occurs with a clean build.

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=13.04
DISTRIB_CODENAME=raring
DISTRIB_DESCRIPTION="Ubuntu 13.04"

$ llvm-config --cppflags
-I/usr/lib/llvm-3.2/include  -DNDEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS

$ llvm-config --cflags
-I/usr/lib/llvm-3.2/include  -DNDEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -g -O2 -fomit-frame-pointer -fPIC

$ llvm-config --cxxflags
-I/usr/lib/llvm-3.2/include  -DNDEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -g -O2 -fomit-frame-pointer
-fvisibility-inlines-hidden -fno-exceptions -fno-rtti -fPIC
-Woverloaded-virtual -Wcast-qual

$ llvm-config --ldflags
-L/usr/lib/llvm-3.2/lib  -lpthread -lffi -ldl -lm

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70471] undefined reference to `typeinfo for llvm::format_object_base'

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70471

--- Comment #4 from Alexander von Gluck  ---
hm.. do you get the same results using the ./configure and a make?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70471] undefined reference to `typeinfo for llvm::format_object_base'

2013-10-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70471

--- Comment #5 from Vinson Lee  ---
(In reply to comment #4)
> hm.. do you get the same results using the ./configure and a make?

configure and make passes for me.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev