[Mesa-dev] [PATCH 00/24] Support for ARB_shader_atomic_counters.
This patch series implements support for the ARB_shader_atomic_counters extension, which is part of GL core since GL 4.2. It includes patches adding support for the new APIs and GLSL language features, and working back-end code for Intel Gen7 hardware -- Ivy Bridge should work with these patches alone, Haswell is going to need a small kernel change I'll probably submit for review during the next week. The series depends on Ken's surface state tidying patches [1] and on patches 1-4, which are seemingly unrelated fixes. There's also a series of ~30 unit tests for this extension I will send to the piglit mailing list soon. Thanks. [1] http://lists.freedesktop.org/archives/mesa-dev/2013-September/044691.html [PATCH 01/24] mesa: Fix misplaced includes of "main/uniforms.h". [PATCH 02/24] glsl: Initialize all member variables of _mesa_glsl_parse_state on construction. [PATCH 03/24] i965: Initialize all member variables of vec4_instruction on construction. [PATCH 04/24] ralloc: Unify overloads of the new operator and guarantee object destruction. [PATCH 05/24] glapi: Add support for ARB_shader_atomic_counters. [PATCH 06/24] mesa: Add support for ARB_shader_atomic_counters. [PATCH 07/24] glsl: Add extension enables for ARB_shader_atomic_counters. [PATCH 08/24] glsl: Add new atomic_uint built-in GLSL type. [PATCH 09/24] glsl: Add IR node for atomic operations. [PATCH 10/24] glsl: Implement parser support for atomic counters. [PATCH 11/24] glsl: Add built-in functions and constants required for ARB_shader_atomic_counters. [PATCH 12/24] glsl: Add predicate to determine if an IR node has side effects. [PATCH 13/24] glsl: Linker support for ARB_shader_atomic_counters. [PATCH 14/24] i965: Define vtbl method that initializes an untyped R/W surface. [PATCH 15/24] i965: Implement ABO surface state emission. [PATCH 16/24] i965/gen7: Implement code generation for untyped atomic instructions. [PATCH 17/24] i965/gen7: Implement code generation for untyped surface read instructions. [PATCH 18/24] i965: Add a 'has_side_effects' back-end instruction predicate. [PATCH 19/24] i965: Handle the 'atomic_uint' GLSL type. [PATCH 20/24] i965: Add brw_reg constructors taking a dynamically determined vector width. [PATCH 21/24] i965/gen7: Handle atomic instructions from the FS back-end. [PATCH 22/24] i965/gen7: Handle atomic instructions from the VEC4 back-end. [PATCH 23/24] i965/gen7: Expose ARB_shader_atomic_counters. [PATCH 24/24] i965: Simplify the shader time code by using atomic counter helpers. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/24] glsl: Initialize all member variables of _mesa_glsl_parse_state on construction.
The _mesa_glsl_parse_state object relies on the memory allocator zeroing out its contents before it's initialized, which seems rather risky. One of the following commits will homogenize implementations of the new operator in a way that would break this assumption leaving some of the member variables of this object uninitialized. --- src/glsl/glsl_parser_extras.cpp | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index cac5a18..772933f 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -55,7 +55,7 @@ static unsigned known_desktop_glsl_versions[] = _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct gl_context *_ctx, GLenum target, void *mem_ctx) - : ctx(_ctx) + : ctx(_ctx), switch_state() { switch (target) { case GL_VERTEX_SHADER: this->target = vertex_shader; break; @@ -66,10 +66,14 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct gl_context *_ctx, this->scanner = NULL; this->translation_unit.make_empty(); this->symbols = new(mem_ctx) glsl_symbol_table; + + this->num_uniform_blocks = 0; + this->uniform_block_array_size = 0; + this->uniform_blocks = NULL; + this->info_log = ralloc_strdup(mem_ctx, ""); this->error = false; this->loop_nesting_ast = NULL; - this->switch_state.switch_nesting_ast = NULL; this->struct_specifier_depth = 0; this->num_builtins_to_link = 0; @@ -105,6 +109,13 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct gl_context *_ctx, this->Const.MaxDrawBuffers = ctx->Const.MaxDrawBuffers; + this->current_function = NULL; + this->toplevel_ir = NULL; + this->found_return = false; + this->all_invariant = false; + this->user_structures = NULL; + this->num_user_structures = 0; + /* Populate the list of supported GLSL versions */ /* FINISHME: Once the OpenGL 3.0 'forward compatible' context or * the OpenGL 3.2 Core context is supported, this logic will need @@ -163,6 +174,7 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct gl_context *_ctx, this->gs_input_prim_type_specified = false; this->gs_input_prim_type = GL_POINTS; + this->gs_input_size = 0; this->out_qualifier = new(this) ast_type_qualifier(); } -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/24] mesa: Fix misplaced includes of "main/uniforms.h".
Several C++ source files include "main/uniforms.h" from an extern "C" block, which is both unnecessary, because "uniforms.h" already checks for a C++ compiler and sets the right linkage, and incorrect, because the header file includes other C++ headers ("glsl_types.h" and "ir_uniform.h") that are supposed to get C++ linkage. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +- src/mesa/main/ff_fragment_shader.cpp | 1 - src/mesa/program/ir_to_mesa.cpp | 2 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +- 5 files changed, 4 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index daa23b4..a98e7c7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -35,7 +35,6 @@ extern "C" { #include "main/hash_table.h" #include "main/macros.h" #include "main/shaderobj.h" -#include "main/uniforms.h" #include "main/fbobject.h" #include "program/prog_parameter.h" #include "program/prog_print.h" @@ -47,6 +46,7 @@ extern "C" { #include "brw_wm.h" } #include "brw_fs.h" +#include "main/uniforms.h" #include "glsl/glsl_types.h" void diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index d935c7b..0345329 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -33,7 +33,6 @@ extern "C" { #include "main/macros.h" #include "main/shaderobj.h" -#include "main/uniforms.h" #include "program/prog_parameter.h" #include "program/prog_print.h" #include "program/prog_optimize.h" @@ -45,6 +44,7 @@ extern "C" { #include "brw_wm.h" } #include "brw_fs.h" +#include "main/uniforms.h" #include "glsl/glsl_types.h" #include "glsl/ir_optimization.h" diff --git a/src/mesa/main/ff_fragment_shader.cpp b/src/mesa/main/ff_fragment_shader.cpp index 86317ef..01edd3f 100644 --- a/src/mesa/main/ff_fragment_shader.cpp +++ b/src/mesa/main/ff_fragment_shader.cpp @@ -32,7 +32,6 @@ extern "C" { #include "imports.h" #include "mtypes.h" #include "main/context.h" -#include "main/uniforms.h" #include "main/macros.h" #include "main/samplerobj.h" #include "program/program.h" diff --git a/src/mesa/program/ir_to_mesa.cpp b/src/mesa/program/ir_to_mesa.cpp index 510235c..8bc5412 100644 --- a/src/mesa/program/ir_to_mesa.cpp +++ b/src/mesa/program/ir_to_mesa.cpp @@ -44,11 +44,11 @@ #include "main/mtypes.h" #include "main/shaderobj.h" +#include "main/uniforms.h" #include "program/hash_table.h" extern "C" { #include "main/shaderapi.h" -#include "main/uniforms.h" #include "program/prog_instruction.h" #include "program/prog_optimize.h" #include "program/prog_print.h" diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 1c9174c..ff1ebd5 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -43,11 +43,11 @@ #include "main/mtypes.h" #include "main/shaderobj.h" +#include "main/uniforms.h" #include "program/hash_table.h" extern "C" { #include "main/shaderapi.h" -#include "main/uniforms.h" #include "program/prog_instruction.h" #include "program/prog_optimize.h" #include "program/prog_print.h" -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/24] i965: Initialize all member variables of vec4_instruction on construction.
Ditto. Otherwise some of its member variables are going to have uninitialized contents in cases where its memory is not allocated using rzalloc(). --- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 15 +++ 1 file changed, 15 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 304636a..9770f13 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -38,7 +38,22 @@ vec4_instruction::vec4_instruction(vec4_visitor *v, this->src[0] = src0; this->src[1] = src1; this->src[2] = src2; + this->saturate = false; + this->force_writemask_all = false; + this->no_dd_clear = false; + this->no_dd_check = false; + this->conditional_mod = BRW_CONDITIONAL_NONE; + this->sampler = 0; + this->texture_offset = 0; + this->target = 0; + this->shadow_compare = false; this->ir = v->base_ir; + this->urb_write_flags = BRW_URB_WRITE_NO_FLAGS; + this->header_present = false; + this->mlen = 0; + this->base_mrf = 0; + this->offset = 0; + this->ir = NULL; this->annotation = v->current_annotation; } -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/24] ralloc: Unify overloads of the new operator and guarantee object destruction.
This patch introduces a pair of helper functions providing a common implementation of the "new" and "delete" operators for all C++ classes that are allocated by ralloc via placement new. The 'ralloc_new' helper function takes care of setting up an ralloc destructor callback that will call the appropriate destructor before the memory allocated to an object is released. Until now objects needed to call 'ralloc_set_destructor' explicitly with a pointer to a static method which in turn called the actual destructor in order to get something that should be transparent to them. After this patch they'll only need to call 'ralloc_new' from the new operator and 'ralloc_delete' from the delete operator, turning all overloads of new and delete into one-liners. --- src/glsl/ast.h | 26 +++-- src/glsl/glsl_parser_extras.h | 9 + src/glsl/glsl_symbol_table.cpp | 7 +--- src/glsl/glsl_symbol_table.h | 23 +-- src/glsl/glsl_types.h | 11 +- src/glsl/ir_function_detect_recursion.cpp | 11 +- src/glsl/list.h| 22 ++- src/glsl/loop_analysis.h | 14 +-- src/glsl/ralloc.h | 44 ++ src/mesa/drivers/dri/i965/brw_cfg.h| 14 +-- src/mesa/drivers/dri/i965/brw_fs.h | 21 ++- src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 7 +--- src/mesa/drivers/dri/i965/brw_vec4.h | 21 ++- .../drivers/dri/i965/brw_vec4_live_variables.h | 7 +--- src/mesa/program/ir_to_mesa.cpp| 7 +--- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 7 +--- 16 files changed, 77 insertions(+), 174 deletions(-) diff --git a/src/glsl/ast.h b/src/glsl/ast.h index 1c7fc63..26c4701 100644 --- a/src/glsl/ast.h +++ b/src/glsl/ast.h @@ -53,19 +53,12 @@ public: * easier to just ralloc_free 'ctx' (or any of its ancestors). */ static void* operator new(size_t size, void *ctx) { - void *node; - - node = rzalloc_size(ctx, size); - assert(node != NULL); - - return node; + return ralloc_new(size, ctx); } - /* If the user *does* call delete, that's OK, we will just -* ralloc_free in that case. */ - static void operator delete(void *table) + static void operator delete(void *p) { - ralloc_free(table); + ralloc_delete(p); } /** @@ -367,19 +360,12 @@ struct ast_type_qualifier { * easier to just ralloc_free 'ctx' (or any of its ancestors). */ static void* operator new(size_t size, void *ctx) { - void *node; - - node = rzalloc_size(ctx, size); - assert(node != NULL); - - return node; + return ralloc_new(size, ctx); } - /* If the user *does* call delete, that's OK, we will just -* ralloc_free in that case. */ - static void operator delete(void *table) + static void operator delete(void *p) { - ralloc_free(table); + ralloc_delete(p); } union { diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h index 2e2440a..6c2a63e 100644 --- a/src/glsl/glsl_parser_extras.h +++ b/src/glsl/glsl_parser_extras.h @@ -77,17 +77,12 @@ struct _mesa_glsl_parse_state { * easier to just ralloc_free 'ctx' (or any of its ancestors). */ static void* operator new(size_t size, void *ctx) { - void *mem = rzalloc_size(ctx, size); - assert(mem != NULL); - - return mem; + return ralloc_new<_mesa_glsl_parse_state>(size, ctx); } - /* If the user *does* call delete, that's OK, we will just -* ralloc_free in that case. */ static void operator delete(void *mem) { - ralloc_free(mem); + ralloc_delete(mem); } /** diff --git a/src/glsl/glsl_symbol_table.cpp b/src/glsl/glsl_symbol_table.cpp index 4c96620..11fe06e 100644 --- a/src/glsl/glsl_symbol_table.cpp +++ b/src/glsl/glsl_symbol_table.cpp @@ -30,15 +30,12 @@ public: * easier to just ralloc_free 'ctx' (or any of its ancestors). */ static void* operator new(size_t size, void *ctx) { - void *entry = ralloc_size(ctx, size); - assert(entry != NULL); - return entry; + return ralloc_new(size, ctx); } - /* If the user *does* call delete, that's OK, we will just ralloc_free. */ static void operator delete(void *entry) { - ralloc_free(entry); + ralloc_delete(entry); } bool add_interface(const glsl_type *i, enum ir_variable_mode mode) diff --git a/src/glsl/glsl_symbol_table.h b/src/glsl/glsl_symbol_table.h index 62d26b8..f850d9f 100644 --- a/src/glsl/glsl_symbol_table.h +++ b/src/glsl/glsl_symbol_table.h @@ -43,35 +43,16 @@ class symbol_table_entry; * type safe and some symbol table invariants. */ struct glsl_symbol_table { -private: - static void - _glsl_symbol_table_
[Mesa-dev] [PATCH 08/24] glsl: Add new atomic_uint built-in GLSL type.
--- src/glsl/ast_to_hir.cpp | 1 + src/glsl/builtin_type_macros.h | 2 ++ src/glsl/builtin_types.cpp | 6 ++ src/glsl/glsl_types.cpp | 2 ++ src/glsl/glsl_types.h| 14 ++ src/glsl/ir_clone.cpp| 1 + src/glsl/link_uniform_initializers.cpp | 1 + src/glsl/tests/uniform_initializer_utils.cpp | 3 +++ src/mesa/program/ir_to_mesa.cpp | 2 ++ src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 1 + 10 files changed, 33 insertions(+) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 2316cf8..fcca5df 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -902,6 +902,7 @@ do_comparison(void *mem_ctx, int operation, ir_rvalue *op0, ir_rvalue *op1) case GLSL_TYPE_VOID: case GLSL_TYPE_SAMPLER: case GLSL_TYPE_INTERFACE: + case GLSL_TYPE_ATOMIC_UINT: /* I assume a comparison of a struct containing a sampler just * ignores the sampler present in the type. */ diff --git a/src/glsl/builtin_type_macros.h b/src/glsl/builtin_type_macros.h index fec38da..263fd83 100644 --- a/src/glsl/builtin_type_macros.h +++ b/src/glsl/builtin_type_macros.h @@ -110,6 +110,8 @@ DECL_TYPE(sampler2DRectShadow, GL_SAMPLER_2D_RECT_SHADOW,GLSL_SAMPLER DECL_TYPE(samplerExternalOES, GL_SAMPLER_EXTERNAL_OES, GLSL_SAMPLER_DIM_EXTERNAL, 0, 0, GLSL_TYPE_FLOAT) +DECL_TYPE(atomic_uint, GL_UNSIGNED_INT_ATOMIC_COUNTER, GLSL_TYPE_ATOMIC_UINT, 1, 1) + STRUCT_TYPE(gl_DepthRangeParameters) STRUCT_TYPE(gl_PointParameters) STRUCT_TYPE(gl_MaterialParameters) diff --git a/src/glsl/builtin_types.cpp b/src/glsl/builtin_types.cpp index 722eda2..8311a91 100644 --- a/src/glsl/builtin_types.cpp +++ b/src/glsl/builtin_types.cpp @@ -203,6 +203,8 @@ const static struct builtin_type_versions { T(sampler2DRectShadow, 140, 999) T(struct_gl_DepthRangeParameters, 110, 100) + + T(atomic_uint, 130, 999) }; const glsl_type *const deprecated_types[] = { @@ -284,5 +286,9 @@ _mesa_glsl_initialize_types(struct _mesa_glsl_parse_state *state) if (state->OES_texture_3D_enable) { add_type(symbols, glsl_type::sampler3D_type); } + + if (state->ARB_shader_atomic_counters_enable) { + add_type(symbols, glsl_type::atomic_uint_type); + } } /** @} */ diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp index 3c396dd..e1fe153 100644 --- a/src/glsl/glsl_types.cpp +++ b/src/glsl/glsl_types.cpp @@ -586,6 +586,7 @@ glsl_type::component_slots() const return this->length * this->fields.array->component_slots(); case GLSL_TYPE_SAMPLER: + case GLSL_TYPE_ATOMIC_UINT: case GLSL_TYPE_VOID: case GLSL_TYPE_ERROR: break; @@ -874,6 +875,7 @@ glsl_type::count_attribute_slots() const return this->length * this->fields.array->count_attribute_slots(); case GLSL_TYPE_SAMPLER: + case GLSL_TYPE_ATOMIC_UINT: case GLSL_TYPE_VOID: case GLSL_TYPE_ERROR: break; diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h index acdf48f..d0274e6 100644 --- a/src/glsl/glsl_types.h +++ b/src/glsl/glsl_types.h @@ -53,6 +53,7 @@ enum glsl_base_type { GLSL_TYPE_FLOAT, GLSL_TYPE_BOOL, GLSL_TYPE_SAMPLER, + GLSL_TYPE_ATOMIC_UINT, GLSL_TYPE_STRUCT, GLSL_TYPE_INTERFACE, GLSL_TYPE_ARRAY, @@ -434,6 +435,19 @@ struct glsl_type { } /** +* Return the amount of atomic counter storage required for a type. +*/ + unsigned atomic_size() const + { + if (base_type == GLSL_TYPE_ATOMIC_UINT) + return ATOMIC_COUNTER_SIZE; + else if (is_array()) + return length * element_type()->atomic_size(); + else + return 0; + } + + /** * Query the full type of a matrix row * * \return diff --git a/src/glsl/ir_clone.cpp b/src/glsl/ir_clone.cpp index fb303b0..b70b7db 100644 --- a/src/glsl/ir_clone.cpp +++ b/src/glsl/ir_clone.cpp @@ -385,6 +385,7 @@ ir_constant::clone(void *mem_ctx, struct hash_table *ht) const } case GLSL_TYPE_SAMPLER: + case GLSL_TYPE_ATOMIC_UINT: case GLSL_TYPE_VOID: case GLSL_TYPE_ERROR: case GLSL_TYPE_INTERFACE: diff --git a/src/glsl/link_uniform_initializers.cpp b/src/glsl/link_uniform_initializers.cpp index 3f66710..786aaf0 100644 --- a/src/glsl/link_uniform_initializers.cpp +++ b/src/glsl/link_uniform_initializers.cpp @@ -69,6 +69,7 @@ copy_constant_to_storage(union gl_constant_value *storage, break; case GLSL_TYPE_ARRAY: case GLSL_TYPE_STRUCT: + case GLSL_TYPE_ATOMIC_UINT: case GLSL_TYPE_INTERFACE: case GLSL_TYPE_VOID: case GLSL_TYPE_ERROR: diff --git a/src/glsl/tests/uniform_initializer_utils.cpp b/src/glsl/tests/uniform_initializer_utils.cpp index a04f5dd..5e86c24 100644 --- a/src/glsl/tests/uniform_initializer_utils.cpp +++ b/src/glsl/tests/uniform_initia
[Mesa-dev] [PATCH 07/24] glsl: Add extension enables for ARB_shader_atomic_counters.
--- src/glsl/glsl_parser_extras.cpp | 1 + src/glsl/glsl_parser_extras.h | 2 ++ 2 files changed, 3 insertions(+) diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index 772933f..ff34864 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -530,6 +530,7 @@ static const _mesa_glsl_extension _mesa_glsl_supported_extensions[] = { EXT(ARB_gpu_shader5,true, false, ARB_gpu_shader5), EXT(AMD_vertex_shader_layer,true, false, AMD_vertex_shader_layer), EXT(EXT_shader_integer_mix, true, true, EXT_shader_integer_mix), + EXT(ARB_shader_atomic_counters, true, false, ARB_shader_atomic_counters), }; #undef EXT diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h index 6c2a63e..4ffbf8f 100644 --- a/src/glsl/glsl_parser_extras.h +++ b/src/glsl/glsl_parser_extras.h @@ -312,6 +312,8 @@ struct _mesa_glsl_parse_state { bool ARB_shading_language_420pack_warn; bool EXT_shader_integer_mix_enable; bool EXT_shader_integer_mix_warn; + bool ARB_shader_atomic_counters_enable; + bool ARB_shader_atomic_counters_warn; /*@}*/ /** Extensions supported by the OpenGL implementation. */ -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/24] glapi: Add support for ARB_shader_atomic_counters.
Add XML file for the dispatch code generator, update the dispatch_sanity test and add stub definition for the new entry point. --- src/mapi/glapi/gen/ARB_shader_atomic_counters.xml | 47 +++ src/mapi/glapi/gen/Makefile.am| 1 + src/mapi/glapi/gen/gl_API.xml | 2 + src/mesa/main/tests/dispatch_sanity.cpp | 2 +- src/mesa/main/uniforms.c | 6 +++ src/mesa/main/uniforms.h | 3 ++ 6 files changed, 60 insertions(+), 1 deletion(-) create mode 100644 src/mapi/glapi/gen/ARB_shader_atomic_counters.xml diff --git a/src/mapi/glapi/gen/ARB_shader_atomic_counters.xml b/src/mapi/glapi/gen/ARB_shader_atomic_counters.xml new file mode 100644 index 000..f3b74e9 --- /dev/null +++ b/src/mapi/glapi/gen/ARB_shader_atomic_counters.xml @@ -0,0 +1,47 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index d4fbd35..df96f3c 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -106,6 +106,7 @@ API_XML = \ ARB_robustness.xml \ ARB_sampler_objects.xml \ ARB_seamless_cube_map.xml \ + ARB_shader_atomic_counters.xml \ ARB_sync.xml \ ARB_texture_buffer_object.xml \ ARB_texture_buffer_range.xml \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index 71aa9a7..bd5bd4a 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8308,6 +8308,8 @@ +http://www.w3.org/2001/XInclude"/> + http://www.w3.org/2001/XInclude"/> diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index bea6e96..5d416f7 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -827,7 +827,7 @@ const struct function gl_core_functions_possible[] = { { "glDrawTransformFeedbackInstanced", 43, -1 }, { "glDrawTransformFeedbackStreamInstanced", 43, -1 }, // { "glGetInternalformativ", 43, -1 }, // XXX: Add to xml -// { "glGetActiveAtomicCounterBufferiv", 43, -1 }, // XXX: Add to xml + { "glGetActiveAtomicCounterBufferiv", 43, -1 }, // { "glBindImageTexture", 43, -1 },// XXX: Add to xml // { "glMemoryBarrier", 43, -1 }, // XXX: Add to xml { "glTexStorage1D", 43, -1 }, diff --git a/src/mesa/main/uniforms.c b/src/mesa/main/uniforms.c index 1e6f7f4..07e7ea3 100644 --- a/src/mesa/main/uniforms.c +++ b/src/mesa/main/uniforms.c @@ -844,3 +844,9 @@ _mesa_get_uniform_name(const struct gl_uniform_storage *uni, *length += i; } } + +void GLAPIENTRY +_mesa_GetActiveAtomicCounterBufferiv(GLuint program, GLuint bufferIndex, + GLenum pname, GLint *params) +{ +} diff --git a/src/mesa/main/uniforms.h b/src/mesa/main/uniforms.h index 9223917..f7cac63 100644 --- a/src/mesa/main/uniforms.h +++ b/src/mesa/main/uniforms.h @@ -142,6 +142,9 @@ _mesa_UniformBlockBinding(GLuint program, GLuint uniformBlockIndex, GLuint uniformBlockBinding); void GLAPIENTRY +_mesa_GetActiveAtomicCounterBufferiv(GLuint program, GLuint bufferIndex, + GLenum pname, GLint *params); +void GLAPIENTRY _mesa_GetActiveUniformBlockiv(GLuint program, GLuint uniformBlockIndex, GLenum pname, -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/24] glsl: Add IR node for atomic operations.
Add a subclass of ir_rvalue that represents an atomic operation on some ir_variable. Also define a new IR visitor method, and implement IR builder, printer and reader support for it. --- src/glsl/ir.cpp| 2 +- src/glsl/ir.h | 42 ++ src/glsl/ir_builder.cpp| 7 + src/glsl/ir_builder.h | 2 ++ src/glsl/ir_clone.cpp | 11 +++ src/glsl/ir_constant_expression.cpp| 7 + src/glsl/ir_hierarchical_visitor.cpp | 16 ++ src/glsl/ir_hierarchical_visitor.h | 2 ++ src/glsl/ir_hv_accept.cpp | 14 + src/glsl/ir_print_visitor.cpp | 20 src/glsl/ir_print_visitor.h| 1 + src/glsl/ir_reader.cpp | 38 +++ src/glsl/ir_rvalue_visitor.cpp | 18 +++ src/glsl/ir_rvalue_visitor.h | 3 ++ src/glsl/ir_visitor.h | 2 ++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 5 +++ src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 5 +++ src/mesa/program/ir_to_mesa.cpp| 7 + src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 7 + 21 files changed, 210 insertions(+), 1 deletion(-) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 1b17999..83bcda2 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1565,7 +1565,7 @@ ir_swizzle::variable_referenced() const ir_variable::ir_variable(const struct glsl_type *type, const char *name, ir_variable_mode mode) : max_array_access(0), read_only(false), centroid(false), invariant(false), - mode(mode), interpolation(INTERP_QUALIFIER_NONE) + mode(mode), interpolation(INTERP_QUALIFIER_NONE), atomic() { this->ir_type = ir_type_variable; this->type = type; diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 2637b40..c4b4677 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -83,6 +83,7 @@ enum ir_node_type { ir_type_texture, ir_type_emit_vertex, ir_type_end_primitive, + ir_type_atomic, ir_type_max /**< maximum ir_type enum number, for validation */ }; @@ -547,6 +548,14 @@ public: int binding; /** +* Location an atomic counter is stored at. +*/ + struct { + int buffer_index; + int offset; + } atomic; + + /** * Built-in state that backs this uniform * * Once set at variable creation, \c state_slots must remain invariant. @@ -2085,6 +2094,39 @@ public: virtual ir_visitor_status accept(ir_hierarchical_visitor *); }; +enum ir_atomic_opcode { + ir_atomic_read, + ir_atomic_inc, + ir_atomic_dec +}; + +class ir_atomic : public ir_rvalue { +public: + ir_atomic(enum ir_atomic_opcode op, ir_dereference *location = NULL) + : op(op), location(location) + { + this->type = glsl_type::get_instance(GLSL_TYPE_UINT, 1, 1); + this->ir_type = ir_type_atomic; + } + + virtual ir_atomic *clone(void *mem_ctx, struct hash_table *) const; + + virtual ir_constant *constant_expression_value(struct hash_table *variable_context = NULL); + + virtual void accept(ir_visitor *v) + { + v->visit(this); + } + + virtual ir_visitor_status accept(ir_hierarchical_visitor *); + + /** Kind of atomic instruction. */ + enum ir_atomic_opcode op; + + /** Variable this atomic instruction operates on. */ + ir_dereference *location; +}; + /** * Apply a visitor to each IR node in a list */ diff --git a/src/glsl/ir_builder.cpp b/src/glsl/ir_builder.cpp index 98b4322..35f075a 100644 --- a/src/glsl/ir_builder.cpp +++ b/src/glsl/ir_builder.cpp @@ -535,4 +535,11 @@ if_tree(operand condition, return result; } +ir_atomic * +atomic(ir_atomic_opcode op, deref counter) +{ + void *mem_ctx = ralloc_parent(counter.val); + return new(mem_ctx) ir_atomic(op, counter.val); +} + } /* namespace ir_builder */ diff --git a/src/glsl/ir_builder.h b/src/glsl/ir_builder.h index 6a5f771..4a214fa 100644 --- a/src/glsl/ir_builder.h +++ b/src/glsl/ir_builder.h @@ -210,4 +210,6 @@ ir_if *if_tree(operand condition, ir_instruction *then_branch, ir_instruction *else_branch); +ir_atomic *atomic(ir_atomic_opcode op, deref counter); + } /* namespace ir_builder */ diff --git a/src/glsl/ir_clone.cpp b/src/glsl/ir_clone.cpp index b70b7db..9475809 100644 --- a/src/glsl/ir_clone.cpp +++ b/src/glsl/ir_clone.cpp @@ -51,6 +51,8 @@ ir_variable::clone(void *mem_ctx, struct hash_table *ht) const var->location = this->location; var->index = this->index; var->binding = this->binding; + var->atomic.buffer_index = this->atomic.buffer_index; + var->atomic.offset = this->atomic.offset; var->warn_extension = this->warn_extension
[Mesa-dev] [PATCH 11/24] glsl: Add built-in functions and constants required for ARB_shader_atomic_counters.
--- src/glsl/builtin_functions.cpp | 33 + src/glsl/builtin_variables.cpp | 15 +++ src/glsl/glcpp/glcpp-parse.y| 3 +++ src/glsl/glsl_parser_extras.cpp | 6 ++ src/glsl/glsl_parser_extras.h | 7 +++ 5 files changed, 64 insertions(+) diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp index 528af0d..7217166 100644 --- a/src/glsl/builtin_functions.cpp +++ b/src/glsl/builtin_functions.cpp @@ -300,6 +300,13 @@ tex3d_lod(const _mesa_glsl_parse_state *state) { return tex3d(state) && lod_exists_in_stage(state); } + +static bool +shader_atomic_counters(const _mesa_glsl_parse_state *state) +{ + return state->ARB_shader_atomic_counters_enable; +} + /** @} */ /**/ @@ -510,6 +517,10 @@ private: B1(findLSB) B1(findMSB) B1(fma) + + ir_function_signature *_atomicOp(ir_atomic_opcode op, +builtin_available_predicate avail); + #undef B0 #undef B1 #undef B2 @@ -1822,6 +1833,17 @@ builtin_builder::create_builtins() IU(findLSB) IU(findMSB) F(fma) + + add_function("atomicCounter", +_atomicOp(ir_atomic_read, shader_atomic_counters), +NULL); + add_function("atomicCounterIncrement", +_atomicOp(ir_atomic_inc, shader_atomic_counters), +NULL); + add_function("atomicCounterDecrement", +_atomicOp(ir_atomic_dec, shader_atomic_counters), +NULL); + #undef F #undef FI #undef FIU @@ -3514,6 +3536,17 @@ builtin_builder::_fma(const glsl_type *type) return sig; } + +ir_function_signature * +builtin_builder::_atomicOp(ir_atomic_opcode op, + builtin_available_predicate avail) +{ + ir_variable *counter = in_var(glsl_type::atomic_uint_type, "counter"); + MAKE_SIG(glsl_type::uint_type, avail, 1, counter); + body.emit(ret(atomic(op, counter))); + return sig; +} + /** @} */ /**/ diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp index 6a808c0..49f0f42 100644 --- a/src/glsl/builtin_variables.cpp +++ b/src/glsl/builtin_variables.cpp @@ -555,6 +555,21 @@ builtin_variable_generator::generate_constants() */ add_const("gl_MaxTextureCoords", state->Const.MaxTextureCoords); } + + if (state->ARB_shader_atomic_counters_enable) { + add_const("gl_MaxVertexAtomicCounters", +state->Const.MaxVertexAtomicCounters); + add_const("gl_MaxGeometryAtomicCounters", +state->Const.MaxGeometryAtomicCounters); + add_const("gl_MaxFragmentAtomicCounters", +state->Const.MaxFragmentAtomicCounters); + add_const("gl_MaxCombinedAtomicCounters", +state->Const.MaxCombinedAtomicCounters); + add_const("gl_MaxAtomicCounterBindings", +state->Const.MaxAtomicBufferBindings); + add_const("gl_MaxTessControlAtomicCounters", 0); + add_const("gl_MaxTessEvaluationAtomicCounters", 0); + } } diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y index 6eaa5f9..2b4e988 100644 --- a/src/glsl/glcpp/glcpp-parse.y +++ b/src/glsl/glcpp/glcpp-parse.y @@ -1248,6 +1248,9 @@ glcpp_parser_create (const struct gl_extensions *extensions, int api) if (extensions->EXT_shader_integer_mix) add_builtin_define(parser, "GL_EXT_shader_integer_mix", 1); + + if (extensions->ARB_shader_atomic_counters) +add_builtin_define(parser, "GL_ARB_shader_atomic_counters", 1); } } diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index ff34864..d27b600 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -109,6 +109,12 @@ _mesa_glsl_parse_state::_mesa_glsl_parse_state(struct gl_context *_ctx, this->Const.MaxDrawBuffers = ctx->Const.MaxDrawBuffers; + this->Const.MaxVertexAtomicCounters = ctx->Const.VertexProgram.MaxAtomicCounters; + this->Const.MaxGeometryAtomicCounters = ctx->Const.GeometryProgram.MaxAtomicCounters; + this->Const.MaxFragmentAtomicCounters = ctx->Const.FragmentProgram.MaxAtomicCounters; + this->Const.MaxCombinedAtomicCounters = ctx->Const.MaxCombinedAtomicCounters; + this->Const.MaxAtomicBufferBindings = ctx->Const.MaxAtomicBufferBindings; + this->current_function = NULL; this->toplevel_ir = NULL; this->found_return = false; diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h index d0e131a..f638d35 100644 --- a/src/glsl/glsl_parser_extras.h +++ b/src/glsl/glsl_parser_extras.h @@ -222,6 +222,13 @@ struct _mesa_glsl_parse_state { /* 3.00 ES */ int MinProgramTexelOffset; int MaxProgramTexelOffset; + + /* ARB_shader_atomic_counters */ + unsigned
[Mesa-dev] [PATCH 12/24] glsl: Add predicate to determine if an IR node has side effects.
And fix the dead code elimination pass so atomic writes aren't optimized out in cases where the return value isn't used by the program. --- src/glsl/ir.h | 16 src/glsl/opt_dead_code.cpp | 3 ++- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/src/glsl/ir.h b/src/glsl/ir.h index c4b4677..4f506a3 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -139,6 +139,17 @@ public: virtual class ir_jump * as_jump() { return NULL; } /*@}*/ + /** +* Determine if an IR instruction has side effects other than its +* returned value(s). Optimization passes are expected to be +* especially careful with reordering or removing these, unless +* they know what they are doing. +*/ + virtual bool has_side_effects() const + { + return false; + } + protected: ir_instruction() { @@ -2120,6 +2131,11 @@ public: virtual ir_visitor_status accept(ir_hierarchical_visitor *); + virtual bool has_side_effects() const + { + return true; + } + /** Kind of atomic instruction. */ enum ir_atomic_opcode op; diff --git a/src/glsl/opt_dead_code.cpp b/src/glsl/opt_dead_code.cpp index b65e5c2..fd05034 100644 --- a/src/glsl/opt_dead_code.cpp +++ b/src/glsl/opt_dead_code.cpp @@ -81,7 +81,8 @@ do_dead_code(exec_list *instructions, bool uniform_locations_assigned) */ if (entry->var->mode != ir_var_function_out && entry->var->mode != ir_var_function_inout && - entry->var->mode != ir_var_shader_out) { + entry->var->mode != ir_var_shader_out && + !entry->assign->rhs->has_side_effects()) { entry->assign->remove(); progress = true; -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/24] glsl: Implement parser support for atomic counters.
--- src/glsl/ast.h| 15 ++ src/glsl/ast_to_hir.cpp | 68 +-- src/glsl/ast_type.cpp | 13 +++-- src/glsl/glsl_lexer.ll| 2 +- src/glsl/glsl_parser.yy | 13 +++-- src/glsl/glsl_parser_extras.h | 10 +++ 6 files changed, 114 insertions(+), 7 deletions(-) diff --git a/src/glsl/ast.h b/src/glsl/ast.h index 26c4701..8a5d3fc 100644 --- a/src/glsl/ast.h +++ b/src/glsl/ast.h @@ -405,6 +405,12 @@ struct ast_type_qualifier { */ unsigned explicit_binding:1; + /** + * Flag set if GL_ARB_shader_atomic counter "offset" layout + * qualifier is used. + */ + unsigned explicit_offset:1; + /** \name Layout qualifiers for GL_AMD_conservative_depth */ /** \{ */ unsigned depth_any:1; @@ -468,6 +474,15 @@ struct ast_type_qualifier { int binding; /** +* Offset specified via GL_ARB_shader_atomic_counter's "offset" +* keyword. +* +* \note +* This field is only valid if \c explicit_offset is set. +*/ + int offset; + + /** * Return true if and only if an interpolation qualifier is present. */ bool has_interpolation() const; diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index fcca5df..7edbee4 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -1197,6 +1197,9 @@ ast_expression::hir(exec_list *instructions, !state->check_version(120, 300, &loc, "array comparisons forbidden")) { error_emitted = true; + } else if ((op[0]->type->atomic_size() || op[1]->type->atomic_size())) { +_mesa_glsl_error(&loc, state, "atomic counter comparisons forbidden"); +error_emitted = true; } if (error_emitted) { @@ -1952,10 +1955,19 @@ validate_binding_qualifier(struct _mesa_glsl_parse_state *state, return false; } + } else if (var->type->atomic_size()) { + if (unsigned(qual->binding) >= ctx->Const.MaxAtomicBufferBindings) { + _mesa_glsl_error(loc, state, "layout(binding = %d) exceeds the " + " maximum number of atomic counter buffer bindings" + "(%d)", qual->binding, + ctx->Const.MaxAtomicBufferBindings); + + return false; + } } else { _mesa_glsl_error(loc, state, "the \"binding\" qualifier only applies to uniform " - "blocks, samplers, or arrays of samplers"); + "blocks, samplers, atomic counters, or arrays thereof"); return false; } @@ -1983,7 +1995,7 @@ apply_type_qualifier_to_variable(const struct ast_type_qualifier *qual, } if (qual->flags.q.constant || qual->flags.q.attribute - || qual->flags.q.uniform + || (qual->flags.q.uniform && var->type != glsl_type::atomic_uint_type) || (qual->flags.q.varying && (state->target == fragment_shader))) var->read_only = 1; @@ -2225,6 +2237,35 @@ apply_type_qualifier_to_variable(const struct ast_type_qualifier *qual, var->binding = qual->binding; } + if (var->type->atomic_size()) { + if (var->mode == ir_var_uniform) { + if (var->explicit_binding) { +_mesa_glsl_parse_state::atomic_counter_binding &binding = + state->atomic_counter_bindings[var->binding]; + +if (binding.next_offset % ATOMIC_COUNTER_SIZE) + _mesa_glsl_error(loc, state, +"misaligned atomic counter offset"); + +if (binding.offsets.count(binding.next_offset)) + _mesa_glsl_error(loc, state, +"atomic counter offsets must be unique"); + +var->atomic.offset = binding.next_offset; +binding.offsets.insert(binding.next_offset); +binding.next_offset += var->type->atomic_size(); + + } else { +_mesa_glsl_error(loc, state, + "atomic counters require explicit binding point"); + } + } else if (var->mode != ir_var_function_in) { + _mesa_glsl_error(loc, state, "atomic counters may only be declared as " + "function parameters or uniform-qualified " + "global variables"); + } + } + /* Does the declaration use the deprecated 'attribute' or 'varying' * keywords? */ @@ -2725,6 +2766,18 @@ ast_declarator_list::hir(exec_list *instructions, (void) this->type->specifier->hir(instructions, state); decl_type = this->type->glsl_type(& type_name, state); + + /* An offset-qualified atomic counter declaration sets the default +* offset for the next declaration within the same atomic counter +* buffer. +*/ + if (decl_type && decl_type->atomic_size()) { + if (type->qualifier.flags.
[Mesa-dev] [PATCH 06/24] mesa: Add support for ARB_shader_atomic_counters.
This patch implements the common support code required for the ARB_shader_atomic_counters extension. It defines the necessary data structures for tracking atomic counter buffer objects (from now on "ABOs") associated with some specific context or shader program, it implements support for binding buffers to an ABO binding point and querying the existing atomic counters and buffers declared by GLSL shaders. --- src/glsl/ir_uniform.h| 7 + src/glsl/link_uniforms.cpp | 1 + src/mesa/main/bufferobj.c| 58 + src/mesa/main/config.h | 6 src/mesa/main/context.c | 9 ++ src/mesa/main/extensions.c | 1 + src/mesa/main/get.c | 40 ++ src/mesa/main/get_hash_params.py | 13 + src/mesa/main/mtypes.h | 59 ++ src/mesa/main/shaderapi.c| 6 src/mesa/main/uniform_query.cpp | 4 +++ src/mesa/main/uniforms.c | 62 +++- 12 files changed, 265 insertions(+), 1 deletion(-) diff --git a/src/glsl/ir_uniform.h b/src/glsl/ir_uniform.h index 8198c48..13faab7 100644 --- a/src/glsl/ir_uniform.h +++ b/src/glsl/ir_uniform.h @@ -166,6 +166,13 @@ struct gl_uniform_storage { bool row_major; /** @} */ + + /** +* Index within gl_shader_program::AtomicBuffers[] of the atomic +* counter buffer this uniform is stored in, or -1 if this is not +* an atomic counter. +*/ + int atomic_buffer_index; }; #ifdef __cplusplus diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp index fa77157..e877468 100644 --- a/src/glsl/link_uniforms.cpp +++ b/src/glsl/link_uniforms.cpp @@ -452,6 +452,7 @@ private: this->uniforms[id].num_driver_storage = 0; this->uniforms[id].driver_storage = NULL; this->uniforms[id].storage = this->values; + this->uniforms[id].atomic_buffer_index = -1; if (this->ubo_block_index != -1) { this->uniforms[id].block_index = this->ubo_block_index; diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index b22340f..8a5d617 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -102,6 +102,11 @@ get_buffer_target(struct gl_context *ctx, GLenum target) return &ctx->UniformBuffer; } break; + case GL_ATOMIC_COUNTER_BUFFER: + if (ctx->Extensions.ARB_shader_atomic_counters) { + return &ctx->AtomicBuffer; + } + break; default: return NULL; } @@ -2120,6 +2125,51 @@ bind_buffer_base_uniform_buffer(struct gl_context *ctx, set_ubo_binding(ctx, index, bufObj, 0, 0, GL_TRUE); } +static void +set_atomic_buffer_binding(struct gl_context *ctx, + unsigned index, + struct gl_buffer_object *bufObj, + GLintptr offset, + GLsizeiptr size, + const char *name) +{ + struct gl_atomic_buffer_binding *binding; + + if (index >= ctx->Const.MaxAtomicBufferBindings) { + _mesa_error(ctx, GL_INVALID_VALUE, "%s(index=%d)", name, index); + return; + } + + if (offset & (ATOMIC_COUNTER_SIZE - 1)) { + _mesa_error(ctx, GL_INVALID_VALUE, + "%s(offset misalgned %d/%d)", name, (int) offset, + ATOMIC_COUNTER_SIZE); + return; + } + + _mesa_reference_buffer_object(ctx, &ctx->AtomicBuffer, bufObj); + + binding = &ctx->AtomicBufferBindings[index]; + if (binding->BufferObject == bufObj && + binding->Offset == offset && + binding->Size == size) { + return; + } + + FLUSH_VERTICES(ctx, 0); + ctx->NewDriverState |= ctx->DriverFlags.NewAtomicBuffer; + + _mesa_reference_buffer_object(ctx, &binding->BufferObject, bufObj); + + if (bufObj == ctx->Shared->NullBufferObj) { + binding->Offset = -1; + binding->Size = -1; + } else { + binding->Offset = offset; + binding->Size = size; + } +} + void GLAPIENTRY _mesa_BindBufferRange(GLenum target, GLuint index, GLuint buffer, GLintptr offset, GLsizeiptr size) @@ -2157,6 +2207,10 @@ _mesa_BindBufferRange(GLenum target, GLuint index, case GL_UNIFORM_BUFFER: bind_buffer_range_uniform_buffer(ctx, index, bufObj, offset, size); return; + case GL_ATOMIC_COUNTER_BUFFER: + set_atomic_buffer_binding(ctx, index, bufObj, offset, size, +"glBindBufferRange"); + return; default: _mesa_error(ctx, GL_INVALID_ENUM, "glBindBufferRange(target)"); return; @@ -2216,6 +2270,10 @@ _mesa_BindBufferBase(GLenum target, GLuint index, GLuint buffer) case GL_UNIFORM_BUFFER: bind_buffer_base_uniform_buffer(ctx, index, bufObj); return; + case GL_ATOMIC_COUNTER_BUFFER: + set_atomic_buffer_binding(ctx, index, bufObj, 0, 0, +"glBindBufferBase
[Mesa-dev] [PATCH 14/24] i965: Define vtbl method that initializes an untyped R/W surface.
And add Gen7 implementation. --- src/mesa/drivers/dri/i965/brw_context.h | 7 + src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 35 +++ 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 108e98c..3003d15 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -895,6 +895,13 @@ struct brw_context uint32_t *out_offset, bool dword_pitch); + void (*create_raw_surface)(struct brw_context *brw, + drm_intel_bo *bo, + uint32_t offset, + uint32_t size, + uint32_t *out_offset, + bool rw); + /** Upload a SAMPLER_STATE table. */ void (*upload_sampler_state_table)(struct brw_context *brw, struct gl_program *prog, diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c index 8f95abe..8b86387 100644 --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c @@ -232,7 +232,8 @@ gen7_emit_buffer_surface_state(struct brw_context *brw, unsigned surface_format, unsigned buffer_size, unsigned pitch, - unsigned mocs) + unsigned mocs, + bool rw) { uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE, 8 * 4, 32, out_offset); @@ -251,7 +252,8 @@ gen7_emit_buffer_surface_state(struct brw_context *brw, /* Emit relocation to surface contents */ drm_intel_bo_emit_reloc(brw->batch.bo, *out_offset + 4, - bo, buffer_offset, I915_GEM_DOMAIN_SAMPLER, 0); + bo, buffer_offset, I915_GEM_DOMAIN_SAMPLER, + (rw ? I915_GEM_DOMAIN_SAMPLER : 0)); gen7_check_surface_setup(surf, false /* is_render_target */); } @@ -348,7 +350,8 @@ gen7_update_buffer_texture_surface(struct gl_context *ctx, surface_format, w, texel_size, - 0 /* mocs */); + 0 /* mocs */, + false /* rw */); } static void @@ -429,7 +432,27 @@ gen7_create_constant_surface(struct brw_context *brw, BRW_SURFACEFORMAT_R32G32B32A32_FLOAT, elements - 1, stride, - 0 /* mocs */); + 0 /* mocs */, + false /* rw */); +} + +/** + * Create a raw surface for untyped R/W access. + */ +static void +gen7_create_raw_surface(struct brw_context *brw, drm_intel_bo *bo, +uint32_t offset, uint32_t size, +uint32_t *out_offset, bool rw) +{ + gen7_emit_buffer_surface_state(brw, + out_offset, + bo, + offset, + BRW_SURFACEFORMAT_RAW, + size - 1, + 1, + 0 /* mocs */, + true /* rw */); } /** @@ -445,7 +468,8 @@ gen7_create_shader_time_surface(struct brw_context *brw, uint32_t *out_offset) BRW_SURFACEFORMAT_RAW, brw->shader_time.bo->size - 1, 1, - 0 /* mocs */); + 0 /* mocs */, + true /* rw */); } static void @@ -570,4 +594,5 @@ gen7_init_vtable_surface_functions(struct brw_context *brw) brw->vtbl.update_null_renderbuffer_surface = gen7_update_null_renderbuffer_surface; brw->vtbl.create_constant_surface = gen7_create_constant_surface; + brw->vtbl.create_raw_surface = gen7_create_raw_surface; } -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/24] glsl: Linker support for ARB_shader_atomic_counters.
--- src/glsl/Makefile.sources | 1 + src/glsl/link_atomics.cpp | 190 ++ src/glsl/linker.cpp | 15 src/glsl/linker.h | 7 ++ 4 files changed, 213 insertions(+) create mode 100644 src/glsl/link_atomics.cpp diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index 2f7bfa1..197d081 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -47,6 +47,7 @@ LIBGLSL_FILES = \ $(GLSL_SRCDIR)/ir_validate.cpp \ $(GLSL_SRCDIR)/ir_variable_refcount.cpp \ $(GLSL_SRCDIR)/linker.cpp \ + $(GLSL_SRCDIR)/link_atomics.cpp \ $(GLSL_SRCDIR)/link_functions.cpp \ $(GLSL_SRCDIR)/link_interface_blocks.cpp \ $(GLSL_SRCDIR)/link_uniforms.cpp \ diff --git a/src/glsl/link_atomics.cpp b/src/glsl/link_atomics.cpp new file mode 100644 index 000..a623454 --- /dev/null +++ b/src/glsl/link_atomics.cpp @@ -0,0 +1,190 @@ +/* + * Copyright © 2013 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#include +#include + +#include "ir.h" +#include "ir_uniform.h" +#include "linker.h" +#include "program/hash_table.h" + +namespace { + struct active_atomic_counter { + active_atomic_counter(unsigned id, ir_variable *var) : + id(id), var(var) {} + + unsigned id; + ir_variable *var; + }; + + typedef std::vector active_atomic_counters_t; + + struct active_atomic_buffer { + active_atomic_buffer() : stage_references(), size(0) {} + + active_atomic_counters_t counters; + unsigned stage_references[MESA_SHADER_TYPES]; + unsigned size; + }; + + typedef std::map active_atomic_buffers_t; + + /** +* Construct a collection of active_atomic_buffer structures +* indexed by binding point. Each entry includes a collection of +* active_atomic_counters that represent the set of atomic counters +* determined to be contained within the same buffer. +*/ + active_atomic_buffers_t + find_active_atomic_counters(struct gl_shader_program *prog) { + active_atomic_buffers_t abs; + + for (unsigned i = 0; i < MESA_SHADER_TYPES; ++i) { + struct gl_shader *sh = prog->_LinkedShaders[i]; + if (sh == NULL) +continue; + + foreach_list(node, sh->ir) { +ir_variable *var = ((ir_instruction *)node)->as_variable(); + +if (var && var->type->atomic_size()) { + unsigned id; + bool found = prog->UniformHash->get(id, var->name); + assert(found); + active_atomic_buffer &ab = abs[var->binding]; + + ab.counters.push_back(active_atomic_counter(id, var)); + ab.stage_references[i]++; + ab.size = std::max(ab.size, var->atomic.offset + + var->type->atomic_size()); +} + } + } + + return abs; + } +} + +void +link_assign_atomic_counter_resources(struct gl_shader_program *prog) +{ + active_atomic_buffers_t abs = find_active_atomic_counters(prog); + + prog->AtomicBuffers = rzalloc_array(prog, gl_active_atomic_buffer, + abs.size()); + prog->NumAtomicBuffers = abs.size(); + + unsigned i = 0; + for (active_atomic_buffers_t::iterator it = abs.begin(); +it != abs.end(); ++it, ++i) { + active_atomic_buffer &ab = it->second; + gl_active_atomic_buffer &mab = prog->AtomicBuffers[i]; + + /* Assign buffer-specific fields. */ + mab.Binding = it->first; + mab.MinimumSize = ab.size; + mab.Uniforms = rzalloc_array(prog->AtomicBuffers, GLuint, + ab.counters.size()); + mab.NumUniforms = ab.counters.size(); + + /* Assign counter-specific fields. */ + unsigned j = 0; + for (active_atomic_counters_t::iterator jt = ab.counters.begin(); +
[Mesa-dev] [PATCH 16/24] i965/gen7: Implement code generation for untyped atomic instructions.
--- src/mesa/drivers/dri/i965/brw_defines.h | 2 + src/mesa/drivers/dri/i965/brw_eu.h | 9 + src/mesa/drivers/dri/i965/brw_eu_emit.c | 62 + src/mesa/drivers/dri/i965/brw_fs.cpp| 2 + src/mesa/drivers/dri/i965/brw_fs.h | 5 +++ src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 21 ++ src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 + src/mesa/drivers/dri/i965/brw_vec4.h| 5 +++ src/mesa/drivers/dri/i965/brw_vec4_emit.cpp | 22 ++ 9 files changed, 130 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index e9e0c4a..ccb4ce4 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -770,6 +770,8 @@ enum opcode { SHADER_OPCODE_SHADER_TIME_ADD, + SHADER_OPCODE_UNTYPED_ATOMIC, + FS_OPCODE_DDX, FS_OPCODE_DDY, FS_OPCODE_PIXEL_X, diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index 720bc74..212d916 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -422,6 +422,15 @@ void brw_CMP(struct brw_compile *p, struct brw_reg src0, struct brw_reg src1); +void +brw_untyped_atomic(struct brw_compile *p, + struct brw_reg dest, + struct brw_reg mrf, + GLuint atomic_op, + GLuint bind_table_index, + GLuint msg_length, + GLuint response_length); + /*** * brw_eu_util.c: */ diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index cce8752..f39bf99 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2465,6 +2465,68 @@ brw_svb_write(struct brw_compile *p, send_commit_msg); /* send_commit_msg */ } +static void +brw_set_dp_untyped_atomic_message(struct brw_compile *p, + struct brw_instruction *insn, + GLuint atomic_op, + GLuint bind_table_index, + GLuint msg_length, + GLuint response_length, + bool header_present) +{ + if (p->brw->is_haswell) { + brw_set_message_descriptor(p, insn, HSW_SFID_DATAPORT_DATA_CACHE_1, + msg_length, response_length, + header_present, false); + + + if (insn->header.access_mode == BRW_ALIGN_1) { + if (insn->header.execution_size != BRW_EXECUTE_16) +insn->bits3.ud |= 1 << 12; /* SIMD8 mode */ + + insn->bits3.gen7_dp.msg_type = +HSW_DATAPORT_DC_PORT1_UNTYPED_ATOMIC_OP; + } else { + insn->bits3.gen7_dp.msg_type = +HSW_DATAPORT_DC_PORT1_UNTYPED_ATOMIC_OP_SIMD4X2; + } + + } else { + brw_set_message_descriptor(p, insn, GEN7_SFID_DATAPORT_DATA_CACHE, + msg_length, response_length, + header_present, false); + + insn->bits3.gen7_dp.msg_type = GEN7_DATAPORT_DC_UNTYPED_ATOMIC_OP; + + if (insn->header.execution_size != BRW_EXECUTE_16) + insn->bits3.ud |= 1 << 12; /* SIMD8 mode */ + } + + if (response_length) + insn->bits3.ud |= 1 << 13; /* Return data expected */ + + insn->bits3.gen7_dp.binding_table_index = bind_table_index; + insn->bits3.ud |= atomic_op << 8; +} + +void +brw_untyped_atomic(struct brw_compile *p, + struct brw_reg dest, + struct brw_reg mrf, + GLuint atomic_op, + GLuint bind_table_index, + GLuint msg_length, + GLuint response_length) { + struct brw_instruction *insn = brw_next_insn(p, BRW_OPCODE_SEND); + + brw_set_dest(p, insn, retype(dest, BRW_REGISTER_TYPE_UD)); + brw_set_src0(p, insn, retype(mrf, BRW_REGISTER_TYPE_UD)); + brw_set_src1(p, insn, brw_imm_d(0)); + brw_set_dp_untyped_atomic_message( + p, insn, atomic_op, bind_table_index, msg_length, response_length, + insn->header.access_mode == BRW_ALIGN_1); +} + /** * This instruction is generated as a single-channel align1 instruction by * both the VS and FS stages when using INTEL_DEBUG=shader_time. diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index a98e7c7..4f1a665 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -738,6 +738,8 @@ fs_visitor::implied_mrf_writes(fs_inst *inst) return inst->mlen; case FS_OPCODE_SPILL: return 2; + case SHADER_OPCODE_UNTYPED_ATOMIC: + return 0; default: assert(!"not reached"); return inst->mlen; diff
[Mesa-dev] [PATCH 15/24] i965: Implement ABO surface state emission.
The maximum number of atomic buffer objects is somewhat arbitrary, we can change it in the future easily if it turns out it's not enough... --- src/mesa/drivers/dri/i965/brw_context.h | 17 +++-- src/mesa/drivers/dri/i965/brw_gs_surface_state.c | 19 ++ src/mesa/drivers/dri/i965/brw_state.h| 3 ++ src/mesa/drivers/dri/i965/brw_state_upload.c | 4 +++ src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 19 ++ src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 44 6 files changed, 104 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 3003d15..3f2f297 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -157,6 +157,7 @@ enum brw_state_id { BRW_STATE_RASTERIZER_DISCARD, BRW_STATE_STATS_WM, BRW_STATE_UNIFORM_BUFFER, + BRW_STATE_ATOMIC_BUFFER, BRW_STATE_META_IN_PROGRESS, BRW_STATE_INTERPOLATION_MAP, BRW_STATE_PUSH_CONSTANT_ALLOCATION, @@ -195,6 +196,7 @@ enum brw_state_id { #define BRW_NEW_RASTERIZER_DISCARD (1 << BRW_STATE_RASTERIZER_DISCARD) #define BRW_NEW_STATS_WM (1 << BRW_STATE_STATS_WM) #define BRW_NEW_UNIFORM_BUFFER (1 << BRW_STATE_UNIFORM_BUFFER) +#define BRW_NEW_ATOMIC_BUFFER (1 << BRW_STATE_ATOMIC_BUFFER) #define BRW_NEW_META_IN_PROGRESS(1 << BRW_STATE_META_IN_PROGRESS) #define BRW_NEW_INTERPOLATION_MAP (1 << BRW_STATE_INTERPOLATION_MAP) #define BRW_NEW_PUSH_CONSTANT_ALLOCATION (1 << BRW_STATE_PUSH_CONSTANT_ALLOCATION) @@ -570,6 +572,12 @@ struct brw_gs_prog_data /** Max number of render targets in a shader */ #define BRW_MAX_DRAW_BUFFERS 8 +/** Max number of uniform buffer objects in a shader */ +#define BRW_MAX_UBO 12 + +/** Max number of atomic counter buffer objects in a shader */ +#define BRW_MAX_ABO 4 + /** * Max number of binding table entries used for stream output. * @@ -662,14 +670,16 @@ struct brw_gs_prog_data #define SURF_INDEX_FRAG_CONST_BUFFER (BRW_MAX_DRAW_BUFFERS + 1) #define SURF_INDEX_TEXTURE(t)(BRW_MAX_DRAW_BUFFERS + 2 + (t)) #define SURF_INDEX_WM_UBO(u) (SURF_INDEX_TEXTURE(BRW_MAX_TEX_UNIT) + u) -#define SURF_INDEX_WM_SHADER_TIME(SURF_INDEX_WM_UBO(12)) +#define SURF_INDEX_WM_ABO(a) (SURF_INDEX_WM_UBO(BRW_MAX_UBO) + a) +#define SURF_INDEX_WM_SHADER_TIME(SURF_INDEX_WM_ABO(BRW_MAX_ABO)) /** Maximum size of the binding table. */ #define BRW_MAX_WM_SURFACES (SURF_INDEX_WM_SHADER_TIME + 1) #define SURF_INDEX_VEC4_CONST_BUFFER (0) #define SURF_INDEX_VEC4_TEXTURE(t) (SURF_INDEX_VEC4_CONST_BUFFER + 1 + (t)) #define SURF_INDEX_VEC4_UBO(u) (SURF_INDEX_VEC4_TEXTURE(BRW_MAX_TEX_UNIT) + u) -#define SURF_INDEX_VEC4_SHADER_TIME (SURF_INDEX_VEC4_UBO(12)) +#define SURF_INDEX_VEC4_ABO(a) (SURF_INDEX_VEC4_UBO(BRW_MAX_UBO) + a) +#define SURF_INDEX_VEC4_SHADER_TIME (SURF_INDEX_VEC4_ABO(BRW_MAX_ABO)) #define BRW_MAX_VEC4_SURFACES(SURF_INDEX_VEC4_SHADER_TIME + 1) #define SURF_INDEX_GEN6_SOL_BINDING(t) (t) @@ -1456,6 +1466,9 @@ brw_update_sol_surface(struct brw_context *brw, void brw_upload_ubo_surfaces(struct brw_context *brw, struct gl_shader *shader, uint32_t *surf_offsets); +void brw_upload_abo_surfaces(struct brw_context *brw, + struct gl_shader_program *prog, + uint32_t *surf_offsets); /* brw_surface_formats.c */ bool brw_is_hiz_depth_format(struct brw_context *ctx, gl_format format); diff --git a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c index bae6015..668bea5 100644 --- a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c @@ -88,6 +88,25 @@ const struct brw_tracked_state brw_gs_ubo_surfaces = { .emit = brw_upload_gs_ubo_surfaces, }; +static void +brw_upload_gs_abo_surfaces(struct brw_context *brw) +{ + struct gl_context *ctx = &brw->ctx; + struct gl_shader_program *prog = ctx->Shader.CurrentGeometryProgram; + + if (prog) + brw_upload_abo_surfaces( + brw, prog, &brw->gs.base.surf_offset[SURF_INDEX_VEC4_ABO(0)]); +} + +const struct brw_tracked_state brw_gs_abo_surfaces = { + .dirty = { + .mesa = _NEW_PROGRAM, + .brw = BRW_NEW_BATCH | BRW_NEW_UNIFORM_BUFFER, + .cache = 0, + }, + .emit = brw_upload_gs_abo_surfaces, +}; /** * Constructs the binding table for the WM surface state, which maps unit diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h index 04c1a97..1d01406 100644 --- a/src/mesa/drivers/dri/i965/brw_state.h +++ b/src/mesa/drivers/dri/i965/brw_state.h @@ -71,7 +71,9 @@ extern const struct brw_tracked_state brw_vs_prog; extern const struct brw_tracked_state brw_vs_samplers; extern const struct brw_tracked
[Mesa-dev] [PATCH 18/24] i965: Add a 'has_side_effects' back-end instruction predicate.
Analogous to the GLSL IR predicate with the same name. This patch fixes the three dead code elimination passes and the VEC4/FS instruction scheduling passes so they leave instructions with side effects alone. At some point it might be interesting to have the instruction scheduler calculate the exact memory dependencies between atomic ops, but they're rare enough that it seems unlikely that it will make any practical difference. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 25 +- .../drivers/dri/i965/brw_schedule_instructions.cpp | 6 +- src/mesa/drivers/dri/i965/brw_shader.cpp | 11 ++ src/mesa/drivers/dri/i965/brw_shader.h | 7 ++ src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- 5 files changed, 34 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 4afe37b..453752c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1805,7 +1805,7 @@ fs_visitor::dead_code_eliminate() foreach_list_safe(node, &this->instructions) { fs_inst *inst = (fs_inst *)node; - if (inst->dst.file == GRF) { + if (inst->dst.file == GRF && !inst->has_side_effects()) { assert(this->virtual_grf_end[inst->dst.reg] >= pc); if (this->virtual_grf_end[inst->dst.reg] == pc) { inst->remove(); @@ -1943,31 +1943,26 @@ fs_visitor::dead_code_eliminate_local() get_dead_code_hash_entry(ht, inst->dst.reg, inst->dst.reg_offset); -if (inst->is_partial_write()) { - /* For a partial write, we can't remove any previous dead code -* candidate, since we're just modifying their result, but we can -* be dead code eliminiated ourselves. -*/ - if (entry) { - entry->data = inst; +if (entry) { + if (inst->is_partial_write()) { + /* For a partial write, we can't remove any previous dead code + * candidate, since we're just modifying their result. + */ } else { - insert_dead_code_hash(ht, inst->dst.reg, inst->dst.reg_offset, -inst); - } -} else { - if (entry) { /* We're completely updating a channel, and there was a * previous write to the channel that wasn't read. Kill it! */ fs_inst *inst = (fs_inst *)entry->data; inst->remove(); progress = true; - _mesa_hash_table_remove(ht, entry); } + _mesa_hash_table_remove(ht, entry); +} + +if (!inst->has_side_effects()) insert_dead_code_hash(ht, inst->dst.reg, inst->dst.reg_offset, inst); -} } } } diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp index 5530683..a688336 100644 --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp @@ -562,7 +562,8 @@ fs_instruction_scheduler::calculate_deps() schedule_node *n = (schedule_node *)node; fs_inst *inst = (fs_inst *)n->inst; - if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT) + if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT || + inst->has_side_effects()) add_barrier_deps(n); /* read-after-write deps. */ @@ -795,6 +796,9 @@ vec4_instruction_scheduler::calculate_deps() schedule_node *n = (schedule_node *)node; vec4_instruction *inst = (vec4_instruction *)n->inst; + if (inst->has_side_effects()) + add_barrier_deps(n); + /* read-after-write deps. */ for (int i = 0; i < 3; i++) { if (inst->src[i].file == GRF) { diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 53364a5..7a47c6c 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -566,6 +566,17 @@ backend_instruction::is_control_flow() } } +bool +backend_instruction::has_side_effects() const +{ + switch (opcode) { + case SHADER_OPCODE_UNTYPED_ATOMIC: + return true; + default: + return false; + } +} + void backend_visitor::dump_instructions() { diff --git a/src/mesa/drivers/dri/i965/brw_shader.h b/src/mesa/drivers/dri/i965/brw_shader.h index 55769ff..88a5673 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.h +++ b/src/mesa/drivers/dri/i965/brw_shader.h @@ -46,6 +46,13 @@ public: bool is_math(); bool is_control_flow(); + /** +* True if the instruction has side effec
[Mesa-dev] [PATCH 17/24] i965/gen7: Implement code generation for untyped surface read instructions.
--- src/mesa/drivers/dri/i965/brw_defines.h | 1 + src/mesa/drivers/dri/i965/brw_eu.h | 8 + src/mesa/drivers/dri/i965/brw_eu_emit.c | 56 + src/mesa/drivers/dri/i965/brw_fs.cpp| 1 + src/mesa/drivers/dri/i965/brw_fs.h | 4 +++ src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 18 ++ src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4.h| 4 +++ src/mesa/drivers/dri/i965/brw_vec4_emit.cpp | 19 ++ 9 files changed, 112 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index ccb4ce4..a04d82c 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -771,6 +771,7 @@ enum opcode { SHADER_OPCODE_SHADER_TIME_ADD, SHADER_OPCODE_UNTYPED_ATOMIC, + SHADER_OPCODE_UNTYPED_SURFACE_READ, FS_OPCODE_DDX, FS_OPCODE_DDY, diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index 212d916..83d830d 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -431,6 +431,14 @@ brw_untyped_atomic(struct brw_compile *p, GLuint msg_length, GLuint response_length); +void +brw_untyped_surface_read(struct brw_compile *p, + struct brw_reg dest, + struct brw_reg mrf, + GLuint bind_table_index, + GLuint msg_length, + GLuint response_length); + /*** * brw_eu_util.c: */ diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index f39bf99..7484649 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2527,6 +2527,62 @@ brw_untyped_atomic(struct brw_compile *p, insn->header.access_mode == BRW_ALIGN_1); } +static void +brw_set_dp_untyped_surface_read_message(struct brw_compile *p, +struct brw_instruction *insn, +GLuint bind_table_index, +GLuint msg_length, +GLuint response_length, +bool header_present) +{ + const unsigned dispatch_width = + (insn->header.execution_size == BRW_EXECUTE_16 ? 16 : 8); + const unsigned num_channels = response_length / (dispatch_width / 8); + + if (p->brw->is_haswell) { + brw_set_message_descriptor(p, insn, HSW_SFID_DATAPORT_DATA_CACHE_1, + msg_length, response_length, + header_present, false); + + insn->bits3.gen7_dp.msg_type = HSW_DATAPORT_DC_PORT1_UNTYPED_SURFACE_READ; + } else { + brw_set_message_descriptor(p, insn, GEN7_SFID_DATAPORT_DATA_CACHE, + msg_length, response_length, + header_present, false); + + insn->bits3.gen7_dp.msg_type = GEN7_DATAPORT_DC_UNTYPED_SURFACE_READ; + } + + if (insn->header.access_mode == BRW_ALIGN_1) { + if (dispatch_width == 16) + insn->bits3.ud |= 1 << 12; /* SIMD16 mode */ + else + insn->bits3.ud |= 2 << 12; /* SIMD8 mode */ + } + + insn->bits3.gen7_dp.binding_table_index = bind_table_index; + + /* Set mask of 32-bit channels to drop. */ + insn->bits3.ud |= (0xf & (0xf << num_channels)) << 8; +} + +void +brw_untyped_surface_read(struct brw_compile *p, + struct brw_reg dest, + struct brw_reg mrf, + GLuint bind_table_index, + GLuint msg_length, + GLuint response_length) +{ + struct brw_instruction *insn = next_insn(p, BRW_OPCODE_SEND); + + brw_set_dest(p, insn, retype(dest, BRW_REGISTER_TYPE_UD)); + brw_set_src0(p, insn, retype(mrf, BRW_REGISTER_TYPE_UD)); + brw_set_dp_untyped_surface_read_message( + p, insn, bind_table_index, msg_length, response_length, + insn->header.access_mode == BRW_ALIGN_1); +} + /** * This instruction is generated as a single-channel align1 instruction by * both the VS and FS stages when using INTEL_DEBUG=shader_time. diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 4f1a665..4afe37b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -739,6 +739,7 @@ fs_visitor::implied_mrf_writes(fs_inst *inst) case FS_OPCODE_SPILL: return 2; case SHADER_OPCODE_UNTYPED_ATOMIC: + case SHADER_OPCODE_UNTYPED_SURFACE_READ: return 0; default: assert(!"not reached"); diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 27a47fa..dcd489c 100644 --
[Mesa-dev] [PATCH 19/24] i965: Handle the 'atomic_uint' GLSL type.
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 1 + src/mesa/drivers/dri/i965/brw_shader.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 4 +++- 4 files changed, 7 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 453752c..c2b313b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -481,6 +481,8 @@ fs_visitor::type_size(const struct glsl_type *type) * link time. */ return 0; + case GLSL_TYPE_ATOMIC_UINT: + return 0; case GLSL_TYPE_VOID: case GLSL_TYPE_ERROR: case GLSL_TYPE_INTERFACE: diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index aaadb1d..762832a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -773,6 +773,7 @@ fs_visitor::emit_assignment_writes(fs_reg &l, fs_reg &r, break; case GLSL_TYPE_SAMPLER: + case GLSL_TYPE_ATOMIC_UINT: break; case GLSL_TYPE_VOID: diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 7a47c6c..caa03c8 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -297,6 +297,7 @@ brw_type_for_base_type(const struct glsl_type *type) return brw_type_for_base_type(type->fields.array); case GLSL_TYPE_STRUCT: case GLSL_TYPE_SAMPLER: + case GLSL_TYPE_ATOMIC_UINT: /* These should be overridden with the type of the member when * dereferenced into. BRW_REGISTER_TYPE_UD seems like a likely * way to trip up if we don't. diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index a13ce1b..a19686b 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -567,6 +567,8 @@ type_size(const struct glsl_type *type) * at link time. */ return 1; + case GLSL_TYPE_ATOMIC_UINT: + return 0; case GLSL_TYPE_VOID: case GLSL_TYPE_ERROR: case GLSL_TYPE_INTERFACE: @@ -971,7 +973,7 @@ vec4_visitor::visit(ir_variable *ir) * ir_binop_ubo_load expressions and not ir_dereference_variable for UBO * variables, so no need for them to be in variable_ht. */ - if (ir->is_in_uniform_block()) + if (ir->is_in_uniform_block() || ir->type->atomic_size()) return; /* Track how big the whole uniform variable is, in case we need to put a -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/6] Support for 10 bpc EGLSurface
Hi, This little series adds support for creating EGLSurfaces with color buffers using the ARGB2101010 pixel format. We the new KMS addFB2 ioctl we can create KMS framebuffers with that format and this series ends up adding the pixel format to gbm so we can generate buffers with that format. The first two patches make sure we don't advertise ARGB2101010 configs that you can use with an ARGB X window. The X visual to EGL config matching just compares visual depth and EGL config buffer size, and they're both 32 bits for those two pixel formats. Unless we match on the pixel layout, we will advertise EGLConfigs with 10 bpc that you can use with a ARGB X window. With this patch series, I can run weston on KMS in 10 bpc, but anything that uses gbm will benefit from this. We also add support for 10 bpc GLX pixmaps and pbuffers. Kristian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] egl_dri2: Match X11 visuals using rgba masks instead of depth
Matching on visual depth to buffer size makes 8 bpc RGBA look similar to 10 bit RGB with 2 bit alphs - both have buffer size 32. Instead, build the rgba masks from the visual data and use that for finding matching DRI configs. We need to keep the special case that allows us to match 24 bit visuals to DRI configs with buffer size 32. We do that by creating an alpha mask of "all the non-rgb bits" for 24 bit visuals and matching a second time with that. Signed-off-by: Kristian Høgsberg --- src/egl/drivers/dri2/platform_x11.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/src/egl/drivers/dri2/platform_x11.c b/src/egl/drivers/dri2/platform_x11.c index ec76aec..d1ceb62 100644 --- a/src/egl/drivers/dri2/platform_x11.c +++ b/src/egl/drivers/dri2/platform_x11.c @@ -630,6 +630,7 @@ dri2_add_configs_for_visuals(struct dri2_egl_display *dri2_dpy, xcb_depth_iterator_t d; xcb_visualtype_t *visuals; int i, j, id; + unsigned int rgba_masks[4]; EGLint surface_type; EGLint config_attrs[] = { EGL_NATIVE_VISUAL_ID, 0, @@ -660,8 +661,26 @@ dri2_add_configs_for_visuals(struct dri2_egl_display *dri2_dpy, config_attrs[1] = visuals[i].visual_id; config_attrs[3] = visuals[i]._class; +rgba_masks[0] = visuals[i].red_mask; +rgba_masks[1] = visuals[i].green_mask; +rgba_masks[2] = visuals[i].blue_mask; +rgba_masks[3] = 0; dri2_add_config(disp, dri2_dpy->driver_configs[j], id++, - d.data->depth, surface_type, config_attrs, NULL); + 0, surface_type, config_attrs, rgba_masks); + +/* Allow a 24-bit RGB visual to match a 32-bit RGBA EGLConfig. + * Otherwise it will only match a 32-bit RGBA visual. On a + * composited window manager on X11, this will make all of the + * EGLConfigs with destination alpha get blended by the + * compositor. This is probably not what the application + * wants... especially on drivers that only have 32-bit RGBA + * EGLConfigs! */ +if (d.data->depth == 24) { + rgba_masks[3] = + ~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]); + dri2_add_config(disp, dri2_dpy->driver_configs[j], id++, + 0, surface_type, config_attrs, rgba_masks); +} } } -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] egl_dri2: Remove depth argument from dri2_add_config()
All callers now use the more correct rgba mask mechanism for filtering out mathcing DRI configs. Even if depth and buffer size match, the color component layout can be different, or in case or ARGB and ARGB2101010 the color components can even be different sizes. Since anything that the depth check would reject is also rejected by the rgba mask comparison, the depth parameter is redundant and not specific enough. We should probably have removed it when the rgba masks argument was introduced, but better late than never. Signed-off-by: Kristian Høgsberg --- src/egl/drivers/dri2/egl_dri2.c | 12 +--- src/egl/drivers/dri2/egl_dri2.h | 2 +- src/egl/drivers/dri2/platform_android.c | 14 ++ src/egl/drivers/dri2/platform_drm.c | 2 +- src/egl/drivers/dri2/platform_wayland.c | 4 ++-- src/egl/drivers/dri2/platform_x11.c | 4 ++-- 6 files changed, 13 insertions(+), 25 deletions(-) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 04ab564..86cbb20 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -116,7 +116,7 @@ dri2_match_config(const _EGLConfig *conf, const _EGLConfig *criteria) struct dri2_egl_config * dri2_add_config(_EGLDisplay *disp, const __DRIconfig *dri_config, int id, - int depth, EGLint surface_type, const EGLint *attr_list, + EGLint surface_type, const EGLint *attr_list, const unsigned int *rgba_masks) { struct dri2_egl_config *conf; @@ -200,16 +200,6 @@ dri2_add_config(_EGLDisplay *disp, const __DRIconfig *dri_config, int id, for (i = 0; attr_list[i] != EGL_NONE; i += 2) _eglSetConfigKey(&base, attr_list[i], attr_list[i+1]); - /* Allow a 24-bit RGB visual to match a 32-bit RGBA EGLConfig. Otherwise -* it will only match a 32-bit RGBA visual. On a composited window manager -* on X11, this will make all of the EGLConfigs with destination alpha get -* blended by the compositor. This is probably not what the application -* wants... especially on drivers that only have 32-bit RGBA EGLConfigs! -*/ - if (depth > 0 && depth != base.BufferSize - && !(depth == 24 && base.BufferSize == 32)) - return NULL; - if (rgba_masks && memcmp(rgba_masks, dri_masks, sizeof(dri_masks))) return NULL; diff --git a/src/egl/drivers/dri2/egl_dri2.h b/src/egl/drivers/dri2/egl_dri2.h index fba5f81..4a39efb 100644 --- a/src/egl/drivers/dri2/egl_dri2.h +++ b/src/egl/drivers/dri2/egl_dri2.h @@ -246,7 +246,7 @@ dri2_lookup_egl_image(__DRIscreen *screen, void *image, void *data); struct dri2_egl_config * dri2_add_config(_EGLDisplay *disp, const __DRIconfig *dri_config, int id, - int depth, EGLint surface_type, const EGLint *attr_list, + EGLint surface_type, const EGLint *attr_list, const unsigned int *rgba_masks); _EGLImage * diff --git a/src/egl/drivers/dri2/platform_android.c b/src/egl/drivers/dri2/platform_android.c index ff41e83..2c20de7 100644 --- a/src/egl/drivers/dri2/platform_android.c +++ b/src/egl/drivers/dri2/platform_android.c @@ -547,14 +547,13 @@ droid_add_configs_for_visuals(_EGLDriver *drv, _EGLDisplay *dpy) struct dri2_egl_display *dri2_dpy = dri2_egl_display(dpy); const struct { int format; - int size; unsigned int rgba_masks[4]; } visuals[] = { - { HAL_PIXEL_FORMAT_RGBA_, 32, { 0xff, 0xff00, 0xff, 0xff00 } }, - { HAL_PIXEL_FORMAT_RGBX_, 32, { 0xff, 0xff00, 0xff, 0x0 } }, - { HAL_PIXEL_FORMAT_RGB_888, 24, { 0xff, 0xff00, 0xff, 0x0 } }, - { HAL_PIXEL_FORMAT_RGB_565, 16, { 0xf800, 0x7e0, 0x1f, 0x0 } }, - { HAL_PIXEL_FORMAT_BGRA_, 32, { 0xff, 0xff00, 0xff, 0xff00 } }, + { HAL_PIXEL_FORMAT_RGBA_, { 0xff, 0xff00, 0xff, 0xff00 } }, + { HAL_PIXEL_FORMAT_RGBX_, { 0xff, 0xff00, 0xff, 0x0 } }, + { HAL_PIXEL_FORMAT_RGB_888, { 0xff, 0xff00, 0xff, 0x0 } }, + { HAL_PIXEL_FORMAT_RGB_565, { 0xf800, 0x7e0, 0x1f, 0x0 } }, + { HAL_PIXEL_FORMAT_BGRA_, { 0xff, 0xff00, 0xff, 0xff00 } }, { 0, 0, { 0, 0, 0, 0 } } }; int count, i, j; @@ -576,8 +575,7 @@ droid_add_configs_for_visuals(_EGLDriver *drv, _EGLDisplay *dpy) continue; dri2_conf = dri2_add_config(dpy, dri2_dpy->driver_configs[j], - count + 1, visuals[i].size, surface_type, NULL, - visuals[i].rgba_masks); + count + 1, surface_type, NULL, visuals[i].rgba_masks); if (dri2_conf) { dri2_conf->base.NativeVisualID = visuals[i].format; dri2_conf->base.NativeVisualType = visuals[i].format; diff --git a/src/egl/drivers/dri2/platform_drm.c b/src/egl/drivers/dri2/platform_drm.c index 615648b..fb28bd9 100644 --- a/src/egl/drivers/dri2/platform_drm.c +++ b/src/egl/drivers/dri2/platform_drm.c @@ -478,7 +478,7 @@ dri2_ini
[Mesa-dev] [PATCH 3/6] dri/common: Add support for creating ARGB2101010 configs
This extends the common dri driver infrastructure with the ability to create __DRIconfigs for 10 bits/channel + 2 bit alphs formats. This still has to be supported and requested by a driver, so this doesn't enable anthing yet. Signed-off-by: Kristian Høgsberg --- src/mesa/drivers/dri/common/utils.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/mesa/drivers/dri/common/utils.c b/src/mesa/drivers/dri/common/utils.c index c9fc218..f3780d9 100644 --- a/src/mesa/drivers/dri/common/utils.c +++ b/src/mesa/drivers/dri/common/utils.c @@ -189,6 +189,10 @@ driCreateConfigs(gl_format format, { 0x00FF, 0xFF00, 0x00FF, 0x }, /* MESA_FORMAT_ARGB */ { 0x00FF, 0xFF00, 0x00FF, 0xFF00 }, + /* MESA_FORMAT_XRGB2101010_UNORM */ + { 0x3FF0, 0x000FFC00, 0x03FF, 0x }, + /* MESA_FORMAT_ARGB2101010 */ + { 0x3FF0, 0x000FFC00, 0x03FF, 0xC000 }, }; const uint32_t * masks; @@ -214,6 +218,12 @@ driCreateConfigs(gl_format format, case MESA_FORMAT_SARGB8: masks = masks_table[2]; break; + case MESA_FORMAT_XRGB2101010_UNORM: + masks = masks_table[3]; + break; + case MESA_FORMAT_ARGB2101010: + masks = masks_table[4]; + break; default: fprintf(stderr, "[%s:%u] Unknown framebuffer type %s (%d).\n", __FUNCTION__, __LINE__, -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] i965: Create ARGB2101010 DRI configs
This commit enables ARGB2101010 system framebuffers (that is, DRI drawables) for the i965 drivers. This is done by generating DRI configs that advertise this color format as well as teaching intelCreateBuffer to pick the right color format when it sees such a DRI config. Signed-off-by: Kristian Høgsberg --- src/mesa/drivers/dri/i965/intel_screen.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index eb6515e..a7d640c 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -904,6 +904,8 @@ intelCreateBuffer(__DRIscreen * driScrnPriv, if (mesaVis->redBits == 5) rgbFormat = MESA_FORMAT_RGB565; + else if (mesaVis->redBits == 10) + rgbFormat = MESA_FORMAT_ARGB2101010; else if (mesaVis->sRGBCapable) rgbFormat = MESA_FORMAT_SARGB8; else if (mesaVis->alphaBits == 0) @@ -1084,7 +1086,8 @@ intel_screen_make_configs(__DRIscreen *dri_screen) { static const gl_format formats[] = { MESA_FORMAT_RGB565, - MESA_FORMAT_ARGB + MESA_FORMAT_ARGB, + MESA_FORMAT_ARGB2101010 }; /* GLX_SWAP_COPY_OML is not supported due to page flipping. */ -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] dri: Add __DRIimage support for the ARGB2101010 format
We add support for the ARGB2101010 color format to the DRI image extension, which allows DRI loaders to create a __DRIimage with this color format. Signed-off-by: Kristian Høgsberg --- include/GL/internal/dri_interface.h | 2 ++ src/mesa/drivers/dri/i965/intel_screen.c | 10 ++ 2 files changed, 12 insertions(+) diff --git a/include/GL/internal/dri_interface.h b/include/GL/internal/dri_interface.h index 709fece..33b41ea 100644 --- a/include/GL/internal/dri_interface.h +++ b/include/GL/internal/dri_interface.h @@ -964,6 +964,8 @@ struct __DRIdri2ExtensionRec { #define __DRI_IMAGE_FORMAT_R8 0x1006 /* Since version 5 */ #define __DRI_IMAGE_FORMAT_GR88 0x1007 #define __DRI_IMAGE_FORMAT_NONE 0x1008 +#define __DRI_IMAGE_FORMAT_XRGB2101010 0x1009 +#define __DRI_IMAGE_FORMAT_ARGB2101010 0x100a #define __DRI_IMAGE_USE_SHARE 0x0001 #define __DRI_IMAGE_USE_SCANOUT0x0002 diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index a7d640c..25c9bcd 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -289,6 +289,12 @@ intel_allocate_image(int dri_format, void *loaderPrivate) case __DRI_IMAGE_FORMAT_XRGB: image->format = MESA_FORMAT_XRGB; break; +case __DRI_IMAGE_FORMAT_ARGB2101010: + image->format = MESA_FORMAT_ARGB2101010; + break; +case __DRI_IMAGE_FORMAT_XRGB2101010: + image->format = MESA_FORMAT_XRGB2101010_UNORM; + break; case __DRI_IMAGE_FORMAT_ARGB: image->format = MESA_FORMAT_ARGB; break; @@ -375,6 +381,10 @@ intel_dri_format(GLuint format) return __DRI_IMAGE_FORMAT_R8; case MESA_FORMAT_RG88: return __DRI_IMAGE_FORMAT_GR88; + case MESA_FORMAT_XRGB2101010_UNORM: + return __DRI_IMAGE_FORMAT_XRGB2101010; + case MESA_FORMAT_ARGB2101010: + return __DRI_IMAGE_FORMAT_ARGB2101010; } return MESA_FORMAT_NONE; -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] gbm: Add support for gbm bos and surfaces using GBM_FORMAT_ARGB2101010
We can now add GBM support for the 10 bit/channel formats which lets us create a gbm surface that we can use with KMS for display hardware that support the format. Signed-off-by: Kristian Høgsberg --- src/gbm/backends/dri/gbm_dri.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/gbm/backends/dri/gbm_dri.c b/src/gbm/backends/dri/gbm_dri.c index f7da79c..067b7d3 100644 --- a/src/gbm/backends/dri/gbm_dri.c +++ b/src/gbm/backends/dri/gbm_dri.c @@ -548,6 +548,12 @@ gbm_dri_bo_create(struct gbm_device *gbm, case GBM_FORMAT_ABGR: dri_format = __DRI_IMAGE_FORMAT_ABGR; break; + case GBM_FORMAT_ARGB2101010: + dri_format = __DRI_IMAGE_FORMAT_ARGB2101010; + break; + case GBM_FORMAT_XRGB2101010: + dri_format = __DRI_IMAGE_FORMAT_XRGB2101010; + break; default: return NULL; } -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 20/24] i965: Add brw_reg constructors taking a dynamically determined vector width.
The MRF variant is going to be used extensively by the next two patches to assemble untyped atomic and surface read messages easily. --- src/mesa/drivers/dri/i965/brw_reg.h | 24 1 file changed, 24 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_reg.h b/src/mesa/drivers/dri/i965/brw_reg.h index 6df3366..8d20ceb 100644 --- a/src/mesa/drivers/dri/i965/brw_reg.h +++ b/src/mesa/drivers/dri/i965/brw_reg.h @@ -281,6 +281,24 @@ brw_vec1_reg(unsigned file, unsigned nr, unsigned subnr) WRITEMASK_X); } +static inline struct brw_reg +brw_vecn_reg(unsigned width, unsigned file, unsigned nr, unsigned subnr) +{ + switch (width) { + case 1: + return brw_vec1_reg(file, nr, subnr); + case 2: + return brw_vec2_reg(file, nr, subnr); + case 4: + return brw_vec4_reg(file, nr, subnr); + case 8: + return brw_vec8_reg(file, nr, subnr); + case 16: + return brw_vec16_reg(file, nr, subnr); + default: + assert(!"Invalid register width"); + } +} static inline struct brw_reg retype(struct brw_reg reg, unsigned type) @@ -569,6 +587,12 @@ brw_message_reg(unsigned nr) return brw_vec8_reg(BRW_MESSAGE_REGISTER_FILE, nr, 0); } +static inline struct brw_reg +brw_uvec_mrf(unsigned width, unsigned nr, unsigned subnr) +{ + return retype(brw_vecn_reg(width, BRW_MESSAGE_REGISTER_FILE, nr, subnr), + BRW_REGISTER_TYPE_UD); +} /* This is almost always called with a numeric constant argument, so * make things easy to evaluate at compile time: -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 22/24] i965/gen7: Handle atomic instructions from the VEC4 back-end.
This can deal with all the 15 32-bit untyped atomic operations the hardware supports, but only INC and PREDEC are going to be exposed through the API for now. --- src/mesa/drivers/dri/i965/brw_vec4.h | 7 +++ src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 64 ++ 2 files changed, 71 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 37e1da0..7678925 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -499,6 +499,13 @@ public: void emit_shader_time_write(enum shader_time_shader_type type, src_reg value); + void emit_untyped_atomic(unsigned atomic_op, unsigned surf_index, +unsigned offset, dst_reg dst, src_reg src0, +src_reg src1); + + void emit_untyped_surface_read(unsigned surf_index, unsigned offset, + dst_reg dst); + src_reg get_scratch_offset(vec4_instruction *inst, src_reg *reladdr, int reg_offset); src_reg get_pull_constant_offset(vec4_instruction *inst, diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index a19686b..c3d4506 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -2464,8 +2464,72 @@ vec4_visitor::visit(ir_end_primitive *) } void +vec4_visitor::emit_untyped_atomic(unsigned atomic_op, unsigned surf_index, + unsigned offset, dst_reg dst, + src_reg src0, src_reg src1) +{ + unsigned mlen = 0; + + /* Set the atomic operation offset. */ + emit(MOV(brw_writemask(brw_uvec_mrf(8, mlen, 0), WRITEMASK_X), +src_reg(offset))); + mlen++; + + /* Set the atomic operation arguments. */ + if (src0.file != BAD_FILE) { + emit(MOV(brw_writemask(brw_uvec_mrf(8, mlen, 0), WRITEMASK_X), src0)); + mlen++; + } + + if (src1.file != BAD_FILE) { + emit(MOV(brw_writemask(brw_uvec_mrf(8, mlen, 0), WRITEMASK_X), src1)); + mlen++; + } + + /* Emit the instruction. */ + vec4_instruction *inst = emit(SHADER_OPCODE_UNTYPED_ATOMIC, dst, + src_reg(atomic_op), src_reg(surf_index)); + inst->base_mrf = 0; + inst->mlen = mlen; +} + +void +vec4_visitor::emit_untyped_surface_read(unsigned surf_index, +unsigned offset, dst_reg dst) +{ + /* Set the surface read offset. */ + emit(MOV(brw_writemask(brw_uvec_mrf(8, 0, 0), WRITEMASK_X), +src_reg(offset))); + + /* Emit the instruction. */ + vec4_instruction *inst = emit(SHADER_OPCODE_UNTYPED_SURFACE_READ, + dst, src_reg(surf_index)); + inst->base_mrf = 0; + inst->mlen = 1; +} + +void vec4_visitor::visit(ir_atomic *ir) { + ir_variable *loc = ir->location->variable_referenced(); + unsigned surf_index = SURF_INDEX_VEC4_ABO(loc->atomic.buffer_index); + + result = src_reg(this, ir->type); + + switch (ir->op) { + case ir_atomic_read: + emit_untyped_surface_read(surf_index, loc->atomic.offset, +dst_reg(result)); + break; + case ir_atomic_inc: + emit_untyped_atomic(BRW_AOP_INC, surf_index, loc->atomic.offset, + dst_reg(result), src_reg(), src_reg()); + break; + case ir_atomic_dec: + emit_untyped_atomic(BRW_AOP_PREDEC, surf_index, loc->atomic.offset, + dst_reg(result), src_reg(), src_reg()); + break; + } } void -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 21/24] i965/gen7: Handle atomic instructions from the FS back-end.
This can deal with all the 15 32-bit untyped atomic operations the hardware supports, but only INC and PREDEC are going to be exposed through the API for now. --- src/mesa/drivers/dri/i965/brw_fs.h | 7 +++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 83 2 files changed, 90 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index dcd489c..44930f7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -392,6 +392,13 @@ public: void emit_shader_time_write(enum shader_time_shader_type type, fs_reg value); + void emit_untyped_atomic(unsigned atomic_op, unsigned surf_index, +unsigned offset, fs_reg dst, fs_reg src0, +fs_reg src1); + + void emit_untyped_surface_read(unsigned surf_index, unsigned offset, + fs_reg dst); + bool try_rewrite_rhs_to_dst(ir_assignment *ir, fs_reg dst, fs_reg src, diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 762832a..412d27a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2112,8 +2112,91 @@ fs_visitor::visit(ir_end_primitive *) } void +fs_visitor::emit_untyped_atomic(unsigned atomic_op, unsigned surf_index, +unsigned offset, fs_reg dst, fs_reg src0, +fs_reg src1) +{ + const unsigned operand_len = dispatch_width / 8; + unsigned mlen = 0; + + /* Initialize the sample mask in the message header. */ + emit(MOV(brw_uvec_mrf(8, mlen, 0), brw_imm_ud(0))) + ->force_writemask_all = true; + emit(MOV(brw_uvec_mrf(1, mlen, 7), +retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UD))) + ->force_writemask_all = true; + mlen++; + + /* Set the atomic operation offset. */ + emit(MOV(brw_uvec_mrf(dispatch_width, mlen, 0), brw_imm_ud(offset))); + mlen += operand_len; + + /* Set the atomic operation arguments. */ + if (src0.file != BAD_FILE) { + emit(MOV(brw_uvec_mrf(dispatch_width, mlen, 0), src0)); + mlen += operand_len; + } + + if (src1.file != BAD_FILE) { + emit(MOV(brw_uvec_mrf(dispatch_width, mlen, 0), src1)); + mlen += operand_len; + } + + /* Emit the instruction. */ + fs_inst inst(SHADER_OPCODE_UNTYPED_ATOMIC, dst, +fs_reg(atomic_op), fs_reg(surf_index)); + inst.base_mrf = 0; + inst.mlen = mlen; + emit(inst); +} + +void +fs_visitor::emit_untyped_surface_read(unsigned surf_index, unsigned offset, + fs_reg dst) +{ + const unsigned operand_len = dispatch_width / 8; + unsigned mlen = 0; + + /* Initialize the sample mask in the message header. */ + emit(MOV(brw_uvec_mrf(8, mlen, 0), brw_imm_ud(0))) + ->force_writemask_all = true; + emit(MOV(brw_uvec_mrf(1, mlen, 7), +retype(brw_vec1_grf(1, 7), BRW_REGISTER_TYPE_UD))) + ->force_writemask_all = true; + mlen++; + + /* Set the surface read offset. */ + emit(MOV(brw_uvec_mrf(dispatch_width, mlen, 0), brw_imm_ud(offset))); + mlen += operand_len; + + /* Emit the instruction. */ + fs_inst inst(SHADER_OPCODE_UNTYPED_SURFACE_READ, dst, fs_reg(surf_index)); + inst.base_mrf = 0; + inst.mlen = mlen; + emit(inst); +} + +void fs_visitor::visit(ir_atomic *ir) { + ir_variable *loc = ir->location->variable_referenced(); + unsigned surf_index = SURF_INDEX_WM_ABO(loc->atomic.buffer_index); + + result = fs_reg(this, ir->type); + + switch (ir->op) { + case ir_atomic_read: + emit_untyped_surface_read(surf_index, loc->atomic.offset, result); + break; + case ir_atomic_inc: + emit_untyped_atomic(BRW_AOP_INC, surf_index, loc->atomic.offset, + result, fs_reg(), fs_reg()); + break; + case ir_atomic_dec: + emit_untyped_atomic(BRW_AOP_PREDEC, surf_index, loc->atomic.offset, + result, fs_reg(), fs_reg()); + break; + } } fs_inst * -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 23/24] i965/gen7: Expose ARB_shader_atomic_counters.
--- src/mesa/drivers/dri/i965/brw_context.c | 10 ++ src/mesa/drivers/dri/i965/intel_extensions.c | 3 +++ 2 files changed, 13 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 4fcc9fb..2060764 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -229,6 +229,16 @@ brw_initialize_context_constants(struct brw_context *brw) ctx->Const.FragmentProgram.HighInt = ctx->Const.FragmentProgram.LowInt; ctx->Const.FragmentProgram.MediumInt = ctx->Const.FragmentProgram.LowInt; + if (brw->gen >= 7) { + ctx->Const.FragmentProgram.MaxAtomicCounters = MAX_ATOMIC_COUNTERS; + ctx->Const.VertexProgram.MaxAtomicCounters = MAX_ATOMIC_COUNTERS; + ctx->Const.GeometryProgram.MaxAtomicCounters = MAX_ATOMIC_COUNTERS; + ctx->Const.FragmentProgram.MaxAtomicBuffers = BRW_MAX_ABO; + ctx->Const.VertexProgram.MaxAtomicBuffers = BRW_MAX_ABO; + ctx->Const.GeometryProgram.MaxAtomicBuffers = BRW_MAX_ABO; + ctx->Const.MaxCombinedAtomicBuffers = 3 * BRW_MAX_ABO; + } + /* Gen6 converts quads to polygon in beginning of 3D pipeline, * but we're not sure how it's actually done for vertex order, * that affect provoking vertex decision. Always use last vertex diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index aef7805..d4b6d47 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -169,4 +169,7 @@ intelInitExtensions(struct gl_context *ctx) ctx->Extensions.EXT_texture_compression_s3tc = true; ctx->Extensions.ANGLE_texture_compression_dxt = true; + + if (brw->gen >= 7) + ctx->Extensions.ARB_shader_atomic_counters = true; } -- 1.8.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 24/24] i965: Simplify the shader time code by using atomic counter helpers.
--- src/mesa/drivers/dri/i965/brw_eu_emit.c | 25 --- src/mesa/drivers/dri/i965/brw_state.h | 2 -- src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 4 +++- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 4 +++- src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 17 --- 5 files changed, 10 insertions(+), 42 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 7484649..a6cc92a 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2619,25 +2619,8 @@ void brw_shader_time_add(struct brw_compile *p, BRW_ARF_NULL, 0)); brw_set_src0(p, send, brw_vec1_reg(payload.file, payload.nr, 0)); - - uint32_t sfid, msg_type; - if (brw->is_haswell) { - sfid = HSW_SFID_DATAPORT_DATA_CACHE_1; - msg_type = HSW_DATAPORT_DC_PORT1_UNTYPED_ATOMIC_OP; - } else { - sfid = GEN7_SFID_DATAPORT_DATA_CACHE; - msg_type = GEN7_DATAPORT_DC_UNTYPED_ATOMIC_OP; - } - - bool header_present = false; - bool eot = false; - uint32_t mlen = 2; /* offset, value */ - uint32_t rlen = 0; - brw_set_message_descriptor(p, send, sfid, mlen, rlen, header_present, eot); - - send->bits3.ud |= msg_type << 14; - send->bits3.ud |= 0 << 13; /* no return data */ - send->bits3.ud |= 1 << 12; /* SIMD8 mode */ - send->bits3.ud |= BRW_AOP_ADD << 8; - send->bits3.ud |= surf_index << 0; + brw_set_dp_untyped_atomic_message(p, send, BRW_AOP_ADD, surf_index, + 2 /* message length */, + 0 /* response length */, + false /* header present */); } diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h index 1d01406..6101aae 100644 --- a/src/mesa/drivers/dri/i965/brw_state.h +++ b/src/mesa/drivers/dri/i965/brw_state.h @@ -209,8 +209,6 @@ void gen7_set_surface_mcs_info(struct brw_context *brw, bool is_render_target); void gen7_check_surface_setup(uint32_t *surf, bool is_render_target); void gen7_init_vtable_surface_functions(struct brw_context *brw); -void gen7_create_shader_time_surface(struct brw_context *brw, - uint32_t *out_offset); /* gen7_sol_state.c */ void gen7_upload_3dstate_so_decl_list(struct brw_context *brw, diff --git a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c index 9f8c0f5..17fa30d 100644 --- a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c @@ -179,7 +179,9 @@ brw_vec4_upload_binding_table(struct brw_context *brw, int i; if (INTEL_DEBUG & DEBUG_SHADER_TIME) { - gen7_create_shader_time_surface(brw, &stage_state->surf_offset[SURF_INDEX_VEC4_SHADER_TIME]); + brw->vtbl.create_raw_surface( + brw, brw->shader_time.bo, 0, brw->shader_time.bo->size, + &stage_state->surf_offset[SURF_INDEX_VEC4_SHADER_TIME], true); } /* Skip making a binding table if we don't use textures or pull diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c index 48f351f..da7dac1 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c @@ -886,7 +886,9 @@ brw_upload_wm_binding_table(struct brw_context *brw) int i; if (INTEL_DEBUG & DEBUG_SHADER_TIME) { - gen7_create_shader_time_surface(brw, &brw->wm.base.surf_offset[SURF_INDEX_WM_SHADER_TIME]); + brw->vtbl.create_raw_surface( + brw, brw->shader_time.bo, 0, brw->shader_time.bo->size, + &brw->wm.base.surf_offset[SURF_INDEX_WM_SHADER_TIME], true); } /* CACHE_NEW_WM_PROG */ diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c index 8b86387..6f5e670 100644 --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c @@ -455,23 +455,6 @@ gen7_create_raw_surface(struct brw_context *brw, drm_intel_bo *bo, true /* rw */); } -/** - * Create a surface for shader time. - */ -void -gen7_create_shader_time_surface(struct brw_context *brw, uint32_t *out_offset) -{ - gen7_emit_buffer_surface_state(brw, - out_offset, - brw->shader_time.bo, - 0, - BRW_SURFACEFORMAT_RAW, - brw->shader_time.bo->size - 1, - 1, - 0 /* mocs */, - true /* rw */); -} - static void gen7_update_null_renderbuffer_surface(struct brw_contex
[Mesa-dev] [PATCH V3 00/11] ARB_texture_gather for i965 Gen7
This series adds support for ARB_texture_gather. Patches 1-2 add the core mesa and GLSL compiler scaffolding for the extension; Patches 3-5 add basic support to the i965 driver; Patches 6-10 work around a hardware bug which causes incorrect sampling of R32G32_FLOAT surfaces; Patch 11 turns everything on. Tested on Ivybridge GT2; No regressions; Passes all 1057 new ARB_texture_gather test cases. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V3 01/11] mesa: add texture gather changes
From: Maxence Le Dore Reviewed-by: Kenneth Graunke --- src/mapi/glapi/gen/ARB_texture_gather.xml | 14 ++ src/mapi/glapi/gen/gl_API.xml | 2 +- src/mesa/main/context.c | 4 src/mesa/main/extensions.c| 1 + src/mesa/main/get.c | 1 + src/mesa/main/get_hash_params.py | 6 ++ src/mesa/main/mtypes.h| 6 ++ src/mesa/main/tests/enum_strings.cpp | 3 +++ 8 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 src/mapi/glapi/gen/ARB_texture_gather.xml diff --git a/src/mapi/glapi/gen/ARB_texture_gather.xml b/src/mapi/glapi/gen/ARB_texture_gather.xml new file mode 100644 index 000..cd331ac --- /dev/null +++ b/src/mapi/glapi/gen/ARB_texture_gather.xml @@ -0,0 +1,14 @@ + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index 71aa9a7..b1dcf13 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8189,7 +8189,7 @@ http://www.w3.org/2001/XInclude"/> - +http://www.w3.org/2001/XInclude"/> diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c index d726d11..ab8137c 100644 --- a/src/mesa/main/context.c +++ b/src/mesa/main/context.c @@ -645,6 +645,10 @@ _mesa_init_constants(struct gl_context *ctx) ctx->Const.MinProgramTexelOffset = -8; ctx->Const.MaxProgramTexelOffset = 7; + /* GL_ARB_texture_gather */ + ctx->Const.MinProgramTextureGatherOffset = -8; + ctx->Const.MaxProgramTextureGatherOffset = 7; + /* GL_ARB_robustness */ ctx->Const.ResetStrategy = GL_NO_RESET_NOTIFICATION_ARB; diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index 34615e3..337f3ee 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -142,6 +142,7 @@ static const struct extension extension_table[] = { { "GL_ARB_texture_env_crossbar", o(ARB_texture_env_crossbar),GLL,2001 }, { "GL_ARB_texture_env_dot3",o(ARB_texture_env_dot3), GLL,2001 }, { "GL_ARB_texture_float", o(ARB_texture_float), GL, 2004 }, + { "GL_ARB_texture_gather", o(ARB_texture_gather), GL, 2009 }, { "GL_ARB_texture_mirrored_repeat", o(dummy_true), GLL,2001 }, { "GL_ARB_texture_multisample", o(ARB_texture_multisample), GL, 2009 }, { "GL_ARB_texture_non_power_of_two", o(ARB_texture_non_power_of_two),GL, 2003 }, diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 4f6f59a..f07455e 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -366,6 +366,7 @@ EXTRA_EXT(ARB_map_buffer_alignment); EXTRA_EXT(ARB_texture_cube_map_array); EXTRA_EXT(ARB_texture_buffer_range); EXTRA_EXT(ARB_texture_multisample); +EXTRA_EXT(ARB_texture_gather); static const int extra_ARB_color_buffer_float_or_glcore[] = { diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py index 30855c3..987d4a0 100644 --- a/src/mesa/main/get_hash_params.py +++ b/src/mesa/main/get_hash_params.py @@ -718,6 +718,12 @@ descriptor=[ # GL_ARB_texture_cube_map_array [ "TEXTURE_BINDING_CUBE_MAP_ARRAY_ARB", "LOC_CUSTOM, TYPE_INT, TEXTURE_CUBE_ARRAY_INDEX, extra_ARB_texture_cube_map_array" ], + +# GL_ARB_texture_gather + [ "MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB", "CONTEXT_INT(Const.MinProgramTextureGatherOffset), extra_ARB_texture_gather"], + [ "MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB", "CONTEXT_INT(Const.MaxProgramTextureGatherOffset), extra_ARB_texture_gather"], + [ "MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB", "CONTEXT_INT(Const.MaxProgramTextureGatherComponents), extra_ARB_texture_gather"], + ]}, # Enums restricted to OpenGL Core profile diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 6d700ec..e24052f 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -2973,6 +2973,11 @@ struct gl_constants /** GL_EXT_gpu_shader4 */ GLint MinProgramTexelOffset, MaxProgramTexelOffset; + /** GL_ARB_texture_gather */ + GLuint MinProgramTextureGatherOffset; + GLuint MaxProgramTextureGatherOffset; + GLuint MaxProgramTextureGatherComponents; + /* GL_ARB_robustness */ GLenum ResetStrategy; @@ -3102,6 +3107,7 @@ struct gl_extensions GLboolean ARB_texture_env_crossbar; GLboolean ARB_texture_env_dot3; GLboolean ARB_texture_float; + GLboolean ARB_texture_gather; GLboolean ARB_texture_multisample; GLboolean ARB_texture_non_power_of_two; GLboolean ARB_texture_query_lod; diff --git a/src/mesa/main/tests/enum_strings.cpp b/src/mesa/main/tests/enum_strings.cpp index 1dae60f..0c08be0 100644 --- a/src/mesa/main/tests/enum
[Mesa-dev] [PATCH V3 02/11] glsl: add texture gather changes
From: Maxence Le Dore V2 [Chris Forbes]: - Add new pattern, fixup parameter reading. V3: Rebase onto new builtins machinery Reviewed-by: Kenneth Graunke --- src/glsl/builtin_functions.cpp | 35 +++ src/glsl/glcpp/glcpp-parse.y| 3 +++ src/glsl/glsl_parser_extras.cpp | 1 + src/glsl/glsl_parser_extras.h | 2 ++ src/glsl/ir.cpp | 2 +- src/glsl/ir.h | 4 +++- src/glsl/ir_clone.cpp | 1 + src/glsl/ir_hv_accept.cpp | 1 + src/glsl/ir_print_visitor.cpp | 3 ++- src/glsl/ir_reader.cpp | 6 +- src/glsl/ir_rvalue_visitor.cpp | 1 + src/glsl/opt_tree_grafting.cpp | 1 + src/glsl/standalone_scaffolding.cpp | 1 + src/mesa/program/ir_to_mesa.cpp | 5 + 14 files changed, 62 insertions(+), 4 deletions(-) diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp index 528af0d..a7d454c 100644 --- a/src/glsl/builtin_functions.cpp +++ b/src/glsl/builtin_functions.cpp @@ -262,6 +262,13 @@ texture_query_lod(const _mesa_glsl_parse_state *state) state->ARB_texture_query_lod_enable; } +static bool +texture_gather(const _mesa_glsl_parse_state *state) +{ + return state->is_version(400, 0) || + state->ARB_texture_gather_enable; +} + /* Desktop GL or OES_standard_derivatives + fragment shader only */ static bool fs_oes_derivatives(const _mesa_glsl_parse_state *state) @@ -1807,6 +1814,34 @@ builtin_builder::create_builtins() _texture(ir_txd, shader_texture_lod_and_rect, glsl_type::vec4_type, glsl_type::sampler2DRectShadow_type, glsl_type::vec4_type, TEX_PROJECT), NULL); + add_function("textureGather", +_texture(ir_tg4, texture_gather, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec2_type), +_texture(ir_tg4, texture_gather, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type), +_texture(ir_tg4, texture_gather, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type), + +_texture(ir_tg4, texture_gather, glsl_type::vec4_type, glsl_type::sampler2DArray_type, glsl_type::vec3_type), +_texture(ir_tg4, texture_gather, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type), +_texture(ir_tg4, texture_gather, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type), + +_texture(ir_tg4, texture_gather, glsl_type::vec4_type, glsl_type::samplerCube_type, glsl_type::vec3_type), +_texture(ir_tg4, texture_gather, glsl_type::ivec4_type, glsl_type::isamplerCube_type, glsl_type::vec3_type), +_texture(ir_tg4, texture_gather, glsl_type::uvec4_type, glsl_type::usamplerCube_type, glsl_type::vec3_type), + +_texture(ir_tg4, texture_gather, glsl_type::vec4_type, glsl_type::samplerCubeArray_type, glsl_type::vec4_type), +_texture(ir_tg4, texture_gather, glsl_type::ivec4_type, glsl_type::isamplerCubeArray_type, glsl_type::vec4_type), +_texture(ir_tg4, texture_gather, glsl_type::uvec4_type, glsl_type::usamplerCubeArray_type, glsl_type::vec4_type), +NULL); + + add_function("textureGatherOffset", +_texture(ir_tg4, texture_gather, glsl_type::vec4_type, glsl_type::sampler2D_type, glsl_type::vec2_type, TEX_OFFSET), +_texture(ir_tg4, texture_gather, glsl_type::ivec4_type, glsl_type::isampler2D_type, glsl_type::vec2_type, TEX_OFFSET), +_texture(ir_tg4, texture_gather, glsl_type::uvec4_type, glsl_type::usampler2D_type, glsl_type::vec2_type, TEX_OFFSET), + +_texture(ir_tg4, texture_gather, glsl_type::vec4_type, glsl_type::sampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET), +_texture(ir_tg4, texture_gather, glsl_type::ivec4_type, glsl_type::isampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET), +_texture(ir_tg4, texture_gather, glsl_type::uvec4_type, glsl_type::usampler2DArray_type, glsl_type::vec3_type, TEX_OFFSET), +NULL); + F(dFdx) F(dFdy) F(fwidth) diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y index 6eaa5f9..c7ad3e9 100644 --- a/src/glsl/glcpp/glcpp-parse.y +++ b/src/glsl/glcpp/glcpp-parse.y @@ -1248,6 +1248,9 @@ glcpp_parser_create (const struct gl_extensions *extensions, int api) if (extensions->EXT_shader_integer_mix) add_builtin_define(parser, "GL_EXT_shader_integer_mix", 1); + + if (extensions->ARB_texture_gather) +add_builtin_define(parser, "GL_ARB_texture_gather", 1); } } diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index cac5a18..aca0f5c 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parse
[Mesa-dev] [PATCH V3 03/11] i965: add SHADER_OPCODE_TG4
Adds the Gen7 message IDs, a new SHADER_OPCODE_TG4 pseudo-op, and low-level support for emitting it via generate_tex(). V3: Updated for changes in master. Signed-off-by: Chris Forbes Reviewed-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_defines.h | 3 +++ src/mesa/drivers/dri/i965/brw_fs.cpp| 1 + src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 5 + src/mesa/drivers/dri/i965/brw_shader.cpp| 3 ++- src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4_emit.cpp | 6 +- 6 files changed, 17 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index e9e0c4a..826942e 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -767,6 +767,7 @@ enum opcode { FS_OPCODE_TXB, SHADER_OPCODE_TXF_MS, SHADER_OPCODE_LOD, + SHADER_OPCODE_TG4, SHADER_OPCODE_SHADER_TIME_ADD, @@ -1042,8 +1043,10 @@ enum brw_message_target { #define GEN5_SAMPLER_MESSAGE_SAMPLE_BIAS_COMPARE 5 #define GEN5_SAMPLER_MESSAGE_SAMPLE_LOD_COMPARE 6 #define GEN5_SAMPLER_MESSAGE_SAMPLE_LD 7 +#define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4 8 #define GEN5_SAMPLER_MESSAGE_LOD 9 #define GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO 10 +#define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO 17 #define HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE 20 #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS 29 #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS 30 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index daa23b4..eda600a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -725,6 +725,7 @@ fs_visitor::implied_mrf_writes(fs_inst *inst) case SHADER_OPCODE_TXD: case SHADER_OPCODE_TXF: case SHADER_OPCODE_TXF_MS: + case SHADER_OPCODE_TG4: case SHADER_OPCODE_TXL: case SHADER_OPCODE_TXS: case SHADER_OPCODE_LOD: diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp index bfb3d33..a706f4a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp @@ -431,6 +431,10 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src case SHADER_OPCODE_LOD: msg_type = GEN5_SAMPLER_MESSAGE_LOD; break; + case SHADER_OPCODE_TG4: + assert(brw->gen >= 6); + msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4; + break; default: assert(!"not reached"); break; @@ -1386,6 +1390,7 @@ fs_generator::generate_code(exec_list *instructions) case SHADER_OPCODE_TXL: case SHADER_OPCODE_TXS: case SHADER_OPCODE_LOD: + case SHADER_OPCODE_TG4: generate_tex(inst, dst, src[0]); break; case FS_OPCODE_DDX: diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 53364a5..7a10995 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -531,7 +531,8 @@ backend_instruction::is_tex() opcode == SHADER_OPCODE_TXF_MS || opcode == SHADER_OPCODE_TXL || opcode == SHADER_OPCODE_TXS || - opcode == SHADER_OPCODE_LOD); + opcode == SHADER_OPCODE_LOD || + opcode == SHADER_OPCODE_TG4); } bool diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 2c1f541..75c3d34 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -270,6 +270,7 @@ vec4_visitor::implied_mrf_writes(vec4_instruction *inst) case SHADER_OPCODE_TXF: case SHADER_OPCODE_TXF_MS: case SHADER_OPCODE_TXS: + case SHADER_OPCODE_TG4: return inst->header_present ? 1 : 0; default: assert(!"not reached"); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp b/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp index 6916134..6bdffb3 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp @@ -308,6 +308,9 @@ vec4_generator::generate_tex(vec4_instruction *inst, case SHADER_OPCODE_TXS: msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO; break; + case SHADER_OPCODE_TG4: + msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4; + break; default: assert(!"should not get here: invalid VS texture opcode"); break; @@ -361,7 +364,7 @@ vec4_generator::generate_tex(vec4_instruction *inst, brw_MOV(p, retype(brw_vec1_reg(BRW_MESSAGE_REGISTER_FILE, inst->base_mrf, 2), BRW_REGISTER_TYPE_UD), - brw_imm_uw(inst->texture_offset)); + brw_imm_ud(inst->texture_offset)); brw_pop_insn_state(p); } else if (inst->header_present) { /* Set up an implied move
[Mesa-dev] [PATCH V3 04/11] i965/fs: Add support for ir_tg4
Lowers ir_tg4 (from textureGather and textureGatherOffset builtins) to SHADER_OPCODE_TG4. The usual post-sampling swizzle workaround can't work for ir_tg4, so avoid doing that: * For R/G/B/A swizzles use the hardware channel select (lives in the same dword in the header as the texel offset), and then don't do anything afterward in the shader. * For 0/1 swizzles blast the appropriate constant over all the output channels instead of sampling. V2: Avoid duplicating header enabling block V3: Avoid sampling at all, for degenerate swizzles. Signed-off-by: Chris Forbes --- src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 62 ++-- 2 files changed, 60 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index cb4ac3b..f3a1eeb 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -241,6 +241,7 @@ public: void visit(ir_emit_vertex *); void visit(ir_end_primitive *); + uint32_t gather_channel(ir_texture *ir, int sampler); void swizzle_result(ir_texture *ir, fs_reg orig_val, int sampler); bool can_do_source_mods(fs_inst *inst); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index d935c7b..be7aed7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1163,6 +1163,12 @@ fs_visitor::emit_texture_gen5(ir_texture *ir, fs_reg dst, fs_reg coordinate, case ir_lod: inst = emit(SHADER_OPCODE_LOD, dst); break; + case ir_tg4: + inst = emit(SHADER_OPCODE_TG4, dst); + break; + default: + fail("unrecognized texture opcode"); + break; } inst->base_mrf = base_mrf; inst->mlen = mlen; @@ -1187,9 +1193,12 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate, bool header_present = false; int offsets[3]; - if (ir->offset && ir->op != ir_txf) { - /* The offsets set up by the ir_texture visitor are in the + if (ir->op == ir_tg4 || (ir->offset && ir->op != ir_txf)) { + /* * The offsets set up by the ir_texture visitor are in the * m1 header, so we can't go headerless. + * + * * ir4_tg4 needs to place its channel select in the header, + * for interaction with ARB_texture_swizzle */ header_present = true; mlen++; @@ -1205,6 +1214,7 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate, switch (ir->op) { case ir_tex: case ir_lod: + case ir_tg4: break; case ir_txb: emit(MOV(fs_reg(MRF, base_mrf + mlen), lod)); @@ -1319,6 +1329,7 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, fs_reg coordinate, case ir_txf_ms: inst = emit(SHADER_OPCODE_TXF_MS, dst); break; case ir_txs: inst = emit(SHADER_OPCODE_TXS, dst); break; case ir_lod: inst = emit(SHADER_OPCODE_LOD, dst); break; + case ir_tg4: inst = emit(SHADER_OPCODE_TG4, dst); break; } inst->base_mrf = base_mrf; inst->mlen = mlen; @@ -1446,6 +1457,24 @@ fs_visitor::visit(ir_texture *ir) */ int texunit = fp->Base.SamplerUnits[sampler]; + if (ir->op == ir_tg4) { + /* When tg4 is used with the degenerate ZERO/ONE swizzles, don't bother + * emitting anything other than setting up the constant result. + */ + int swiz = GET_SWZ(c->key.tex.swizzles[sampler], 0); + if (swiz == SWIZZLE_ZERO || swiz == SWIZZLE_ONE) { + + fs_reg res = fs_reg(this, glsl_type::vec4_type); + this->result = res; + + for (int i=0; i<4; i++) { +emit(MOV(res, fs_reg(swiz == SWIZZLE_ZERO ? 0.0f : 1.0f))); +res.reg_offset++; + } + return; + } + } + /* Should be lowered by do_lower_texture_projection */ assert(!ir->projector); @@ -1473,6 +1502,7 @@ fs_visitor::visit(ir_texture *ir) switch (ir->op) { case ir_tex: case ir_lod: + case ir_tg4: break; case ir_txb: ir->lod_info.bias->accept(this); @@ -1495,6 +1525,8 @@ fs_visitor::visit(ir_texture *ir) ir->lod_info.sample_index->accept(this); sample_index = this->result; break; + default: + assert(!"Unrecognized texture opcode"); }; /* Writemasking doesn't eliminate channels on SIMD8 texture @@ -1519,6 +1551,9 @@ fs_visitor::visit(ir_texture *ir) if (ir->offset != NULL && ir->op != ir_txf) inst->texture_offset = brw_texture_offset(ir->offset->as_constant()); + if (ir->op == ir_tg4) + inst->texture_offset |= gather_channel(ir, sampler) << 16; // M0.2:16-17 + inst->sampler = sampler; if (ir->shadow_comparitor) @@ -1539,6 +1574,24 @@ fs_visitor::visit(ir_texture *ir) } /** + * Set up the gather channel based on the swizzle, for gather4. + */ +uint32_t +fs_visitor::gather_channel(ir_texture *ir, int sampler)
[Mesa-dev] [PATCH V3 05/11] i965/vs: Add support for ir_tg4
Pretty much the same as the FS case. Channel select goes in the header, V2: Less mangling. V3: Avoid sampling at all, for degenerate swizzles. Signed-off-by: Chris Forbes --- src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 46 -- 2 files changed, 45 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index f0ab53d..4b0e132 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -500,6 +500,7 @@ public: void emit_pack_half_2x16(dst_reg dst, src_reg src0); void emit_unpack_half_2x16(dst_reg dst, src_reg src0); + uint32_t gather_channel(ir_texture *ir, int sampler); void swizzle_result(ir_texture *ir, src_reg orig_val, int sampler); void emit_ndc_computation(); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 304636a..f23b235 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -2128,6 +2128,19 @@ vec4_visitor::visit(ir_texture *ir) int sampler = _mesa_get_sampler_uniform_value(ir->sampler, shader_prog, prog); + /* When tg4 is used with the degenerate ZERO/ONE swizzles, don't bother +* emitting anything other than setting up the constant result. +*/ + if (ir->op == ir_tg4) { + int swiz = GET_SWZ(key->tex.swizzles[sampler], 0); + if (swiz == SWIZZLE_ZERO || swiz == SWIZZLE_ONE) { + dst_reg result(this, ir->type); + this->result = src_reg(result); + emit(MOV(result, src_reg(swiz == SWIZZLE_ONE ? 1.0f : 0.0f))); + return; + } + } + /* Should be lowered by do_lower_texture_projection */ assert(!ir->projector); @@ -2177,6 +2190,7 @@ vec4_visitor::visit(ir_texture *ir) break; case ir_txb: case ir_lod: + case ir_tg4: break; } @@ -2198,18 +2212,23 @@ vec4_visitor::visit(ir_texture *ir) case ir_txs: inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TXS); break; + case ir_tg4: + inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TG4); + break; case ir_txb: assert(!"TXB is not valid for vertex shaders."); break; case ir_lod: assert(!"LOD is not valid for vertex shaders."); break; + default: + assert(!"Unrecognized tex op"); } bool use_texture_offset = ir->offset != NULL && ir->op != ir_txf; /* Texel offsets go in the message header; Gen4 also requires headers. */ - inst->header_present = use_texture_offset || brw->gen < 5; + inst->header_present = use_texture_offset || brw->gen < 5 || ir->op == ir_tg4; inst->base_mrf = 2; inst->mlen = inst->header_present + 1; /* always at least one */ inst->sampler = sampler; @@ -2220,6 +2239,10 @@ vec4_visitor::visit(ir_texture *ir) if (use_texture_offset) inst->texture_offset = brw_texture_offset(ir->offset->as_constant()); + /* Stuff the channel select bits in the top of the texture offset */ + if (ir->op == ir_tg4) + inst->texture_offset |= gather_channel(ir, sampler)<<16; + /* MRF for the first parameter */ int param_base = inst->base_mrf + inst->header_present; @@ -2344,6 +2367,24 @@ vec4_visitor::visit(ir_texture *ir) swizzle_result(ir, src_reg(inst->dst), sampler); } +/** + * Set up the gather channel based on the swizzle, for gather4. + */ +uint32_t +vec4_visitor::gather_channel(ir_texture *ir, int sampler) +{ + int swiz = GET_SWZ(key->tex.swizzles[sampler], 0 /* red */); + switch (swiz) { + case SWIZZLE_X: return 0; + case SWIZZLE_Y: return 1; + case SWIZZLE_Z: return 2; + case SWIZZLE_W: return 3; + default: + /* zero, one swizzles */ + return 0; + } +} + void vec4_visitor::swizzle_result(ir_texture *ir, src_reg orig_val, int sampler) { @@ -2353,11 +2394,12 @@ vec4_visitor::swizzle_result(ir_texture *ir, src_reg orig_val, int sampler) dst_reg swizzled_result(this->result); if (ir->op == ir_txs || ir->type == glsl_type::float_type - || s == SWIZZLE_NOOP) { + || s == SWIZZLE_NOOP || ir->op == ir_tg4) { emit(MOV(swizzled_result, orig_val)); return; } + int zero_mask = 0, one_mask = 0, copy_mask = 0; int swizzle[4] = {0}; -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V3 06/11] i965: w/a for gather4 green RG32F
Signed-off-by: Chris Forbes --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 4 src/mesa/drivers/dri/i965/brw_program.h| 5 + src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 4 src/mesa/drivers/dri/i965/brw_wm.c | 9 + 4 files changed, 22 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index be7aed7..6e1a3f5 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1580,6 +1580,10 @@ uint32_t fs_visitor::gather_channel(ir_texture *ir, int sampler) { int swiz = GET_SWZ(c->key.tex.swizzles[sampler], 0 /* red */); + if (c->key.tex.gather_channel_quirk_mask & (1tex.gather_channel_quirk_mask & (1 yuvtex_mask); found |= key_debug(brw, "GL_MESA_ycbcr UV swapping\n", old_key->yuvtex_swap_mask, key->yuvtex_swap_mask); + found |= key_debug(brw, "gather channel quirk on any texture unit", + old_key->gather_channel_quirk_mask, key->gather_channel_quirk_mask); return found; } @@ -342,6 +345,12 @@ brw_populate_sampler_prog_key_data(struct gl_context *ctx, if (sampler->WrapR == GL_CLAMP) key->gl_clamp_mask[2] |= 1 << s; } + + /* gather4's channel select for green from RG32F is broken */ + if (brw->gen >= 7) { +if (img->InternalFormat == GL_RG32F && GET_SWZ(t->_Swizzle, 0) == 1) + key->gather_channel_quirk_mask |= 1 << s; + } } } } -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V3 07/11] i965: Add BRW_SURFACEFORMAT_R32G32_FLOAT_LD, required for IVB gather4 w/a
gather4 GREEN channel against a surface with format R32G32_FLOAT doesn't work correctly on IVB. w/a from bspec: - use R32G32_FLOAT_LD = 0x97 instead, for gather4 only. - select BLUE channel to read GREEN Signed-off-by: Chris Forbes --- src/mesa/drivers/dri/i965/brw_defines.h | 1 + src/mesa/drivers/dri/i965/brw_surface_formats.c | 1 + 2 files changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 826942e..f2269f7 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -309,6 +309,7 @@ #define BRW_SURFACEFORMAT_R16G16B16A16_USCALED 0x094 #define BRW_SURFACEFORMAT_R32G32_SSCALED 0x095 #define BRW_SURFACEFORMAT_R32G32_USCALED 0x096 +#define BRW_SURFACEFORMAT_R32G32_FLOAT_LD0x097 #define BRW_SURFACEFORMAT_R32G32_SFIXED 0x0A0 #define BRW_SURFACEFORMAT_R64_PASSTHRU 0x0A1 #define BRW_SURFACEFORMAT_B8G8R8A8_UNORM 0x0C0 diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c b/src/mesa/drivers/dri/i965/brw_surface_formats.c index 0d8d805..8666336 100644 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c @@ -110,6 +110,7 @@ const struct surface_format_info surface_formats[] = { SF( Y, x, x, x, Y, x, Y, x, x, BRW_SURFACEFORMAT_R16G16B16A16_UINT) SF( Y, Y, x, x, Y, Y, Y, x, x, BRW_SURFACEFORMAT_R16G16B16A16_FLOAT) SF( Y, 50, x, x, Y, Y, Y, Y, x, BRW_SURFACEFORMAT_R32G32_FLOAT) + SF( Y, 70, x, x, Y, Y, Y, Y, x, BRW_SURFACEFORMAT_R32G32_FLOAT_LD) SF( Y, x, x, x, Y, x, Y, Y, x, BRW_SURFACEFORMAT_R32G32_SINT) SF( Y, x, x, x, Y, x, Y, Y, x, BRW_SURFACEFORMAT_R32G32_UINT) SF( Y, 50, Y, x, x, x, x, x, x, BRW_SURFACEFORMAT_R32_FLOAT_X8X24_TYPELESS) -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V3 08/11] i965: make room in the binding table for a full alternate set of surface_states
Worst-case is that *every* texunit uses a format that needs overriding. Signed-off-by: Chris Forbes --- src/mesa/drivers/dri/i965/brw_context.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 108e98c..3cf418f 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -661,14 +661,16 @@ struct brw_gs_prog_data #define SURF_INDEX_DRAW(d) (d) #define SURF_INDEX_FRAG_CONST_BUFFER (BRW_MAX_DRAW_BUFFERS + 1) #define SURF_INDEX_TEXTURE(t)(BRW_MAX_DRAW_BUFFERS + 2 + (t)) -#define SURF_INDEX_WM_UBO(u) (SURF_INDEX_TEXTURE(BRW_MAX_TEX_UNIT) + u) +#define SURF_INDEX_GATHER_TEXTURE(t) (SURF_INDEX_TEXTURE(BRW_MAX_TEX_UNIT) + t) +#define SURF_INDEX_WM_UBO(u) (SURF_INDEX_GATHER_TEXTURE(BRW_MAX_TEX_UNIT) + u) #define SURF_INDEX_WM_SHADER_TIME(SURF_INDEX_WM_UBO(12)) /** Maximum size of the binding table. */ #define BRW_MAX_WM_SURFACES (SURF_INDEX_WM_SHADER_TIME + 1) #define SURF_INDEX_VEC4_CONST_BUFFER (0) #define SURF_INDEX_VEC4_TEXTURE(t) (SURF_INDEX_VEC4_CONST_BUFFER + 1 + (t)) -#define SURF_INDEX_VEC4_UBO(u) (SURF_INDEX_VEC4_TEXTURE(BRW_MAX_TEX_UNIT) + u) +#define SURF_INDEX_VEC4_GATHER_TEXTURE(t) (SURF_INDEX_VEC4_TEXTURE(BRW_MAX_TEX_UNIT) + t) +#define SURF_INDEX_VEC4_UBO(u) (SURF_INDEX_VEC4_GATHER_TEXTURE(BRW_MAX_TEX_UNIT) + u) #define SURF_INDEX_VEC4_SHADER_TIME (SURF_INDEX_VEC4_UBO(12)) #define BRW_MAX_VEC4_SURFACES(SURF_INDEX_VEC4_SHADER_TIME + 1) -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V3 09/11] i965: Emit a second set of SURFACE_STATE for gather4 from textures.
This allows us to use a different surface format for gather4, which is required for R32G32_FLOAT to work on Gen7. Signed-off-by: Chris Forbes --- src/mesa/drivers/dri/i965/brw_context.h | 3 +- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 57 ++- src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 5 +- 3 files changed, 52 insertions(+), 13 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 3cf418f..214bcac 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -883,7 +883,8 @@ struct brw_context void (*update_texture_surface)(struct gl_context *ctx, unsigned unit, - uint32_t *surf_offset); + uint32_t *surf_offset, + uint32_t (*tex_format_override)(uint32_t format)); void (*update_renderbuffer_surface)(struct brw_context *brw, struct gl_renderbuffer *rb, bool layered, diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c index 25db2e0..10810dd 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c @@ -246,7 +246,8 @@ brw_update_buffer_texture_surface(struct gl_context *ctx, static void brw_update_texture_surface(struct gl_context *ctx, unsigned unit, - uint32_t *surf_offset) + uint32_t *surf_offset, + uint32_t (*tex_format_override)(uint32_t)) { struct brw_context *brw = brw_context(ctx); struct gl_texture_object *tObj = ctx->Texture.Unit[unit]._Current; @@ -265,14 +266,17 @@ brw_update_texture_surface(struct gl_context *ctx, surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE, 6 * 4, 32, surf_offset); + uint32_t tex_format = translate_tex_format(brw, + mt->format, + tObj->DepthMode, + sampler->sRGBDecode); + if (tex_format_override) + tex_format = tex_format_override(tex_format); + surf[0] = (translate_tex_target(tObj->Target) << BRW_SURFACE_TYPE_SHIFT | BRW_SURFACE_MIPMAPLAYOUT_BELOW << BRW_SURFACE_MIPLAYOUT_SHIFT | BRW_SURFACE_CUBEFACE_ENABLES | - (translate_tex_format(brw, -mt->format, - tObj->DepthMode, - sampler->sRGBDecode) << - BRW_SURFACE_FORMAT_SHIFT)); + tex_format << BRW_SURFACE_FORMAT_SHIFT); surf[1] = intelObj->mt->region->bo->offset + intelObj->mt->offset; /* reloc */ @@ -736,7 +740,8 @@ const struct brw_tracked_state gen6_renderbuffer_surfaces = { static void update_stage_texture_surfaces(struct brw_context *brw, const struct gl_program *prog, - uint32_t *surf_offset) + uint32_t *surf_offset, + uint32_t (*tex_format_override)(uint32_t)) { if (!prog) return; @@ -753,13 +758,22 @@ update_stage_texture_surfaces(struct brw_context *brw, /* _NEW_TEXTURE */ if (ctx->Texture.Unit[unit]._ReallyEnabled) { -brw->vtbl.update_texture_surface(ctx, unit, surf_offset + s); +brw->vtbl.update_texture_surface(ctx, unit, surf_offset + s, tex_format_override); } } } } +static uint32_t +gather_format_override(uint32_t format) { + if (format == BRW_SURFACEFORMAT_R32G32_FLOAT) + return BRW_SURFACEFORMAT_R32G32_FLOAT_LD; + else + return format; +} + + /** * Construct SURFACE_STATE objects for enabled textures. */ @@ -778,13 +792,34 @@ brw_update_texture_surfaces(struct brw_context *brw) /* _NEW_TEXTURE */ update_stage_texture_surfaces(brw, vs, brw->vs.base.surf_offset + - SURF_INDEX_VEC4_TEXTURE(0)); + SURF_INDEX_VEC4_TEXTURE(0), + NULL); update_stage_texture_surfaces(brw, gs, brw->gs.base.surf_offset + - SURF_INDEX_VEC4_TEXTURE(0)); + SURF_INDEX_VEC4_TEXTURE(0), + NULL); update_stage_texture_surfaces(brw, fs, brw->wm.base.surf_offset + - SURF_INDEX_TEXTURE(0)); + SURF_INDEX_TEXTURE(0), + NULL); + + /* emit alterna
[Mesa-dev] [PATCH V3 10/11] i965: use gather slots in the binding table for gather4.
Signed-off-by: Chris Forbes --- src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 8 ++-- src/mesa/drivers/dri/i965/brw_vec4_emit.cpp | 8 ++-- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp index a706f4a..8389760 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp @@ -522,11 +522,15 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src src = retype(brw_vec8_grf(0, 0), BRW_REGISTER_TYPE_UW); } + uint32_t surface_index = inst->opcode == SHADER_OPCODE_TG4 + ? SURF_INDEX_GATHER_TEXTURE(inst->sampler) + : SURF_INDEX_TEXTURE(inst->sampler); + brw_SAMPLE(p, retype(dst, BRW_REGISTER_TYPE_UW), inst->base_mrf, src, - SURF_INDEX_TEXTURE(inst->sampler), + surface_index, inst->sampler, msg_type, rlen, @@ -535,7 +539,7 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src simd_mode, return_format); - mark_surface_used(SURF_INDEX_TEXTURE(inst->sampler)); + mark_surface_used(surface_index); } diff --git a/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp b/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp index 6bdffb3..00efb10 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp @@ -385,11 +385,15 @@ vec4_generator::generate_tex(vec4_instruction *inst, break; } + uint32_t surface_index = inst->opcode == SHADER_OPCODE_TG4 + ? SURF_INDEX_VEC4_GATHER_TEXTURE(inst->sampler) + : SURF_INDEX_VEC4_TEXTURE(inst->sampler); + brw_SAMPLE(p, dst, inst->base_mrf, src, - SURF_INDEX_VEC4_TEXTURE(inst->sampler), + surface_index, inst->sampler, msg_type, 1, /* response length */ @@ -398,7 +402,7 @@ vec4_generator::generate_tex(vec4_instruction *inst, BRW_SAMPLER_SIMD_MODE_SIMD4X2, return_format); - mark_surface_used(SURF_INDEX_VEC4_TEXTURE(inst->sampler)); + mark_surface_used(surface_index); } void -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V3 11/11] i965: Enable ARB_texture_gather on Gen7
Signed-off-by: Chris Forbes --- src/mesa/drivers/dri/i965/brw_context.c | 1 + src/mesa/drivers/dri/i965/intel_extensions.c | 4 2 files changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 4fcc9fb..96d1ff4 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -176,6 +176,7 @@ brw_initialize_context_constants(struct brw_context *brw) ctx->Const.MaxColorTextureSamples = 8; ctx->Const.MaxDepthTextureSamples = 8; ctx->Const.MaxIntegerSamples = 8; + ctx->Const.MaxProgramTextureGatherComponents = 4; } ctx->Const.MinLineWidth = 1.0; diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index aef7805..13758a7 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -160,6 +160,10 @@ intelInitExtensions(struct gl_context *ctx) ctx->Extensions.EXT_shader_integer_mix = ctx->Const.GLSLVersion >= 130; } + if (brw->gen >= 7) { + ctx->Extensions.ARB_texture_gather = true; + } + if (ctx->API == API_OPENGL_CORE) ctx->Extensions.ARB_base_instance = true; if (ctx->API != API_OPENGL_CORE) -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Fix cube array coordinate normalization
Hardware requires the magnitude of the largest component to not exceed 1; brw_cubemap_normalize ensures that this is the case. Unfortunately, we would previously multiply the array index for cube arrays by the normalization factor. The incorrect array index would then cause the sampler to attempt to access either the wrong cube, or memory outside the cube surface entirely, resulting in garbage rendering or in the worst case, hangs. Alter the normalization pass to only multiply the .xyz components. Fixes broken rendering in the arb_texture_cube_map_array-cubemap piglit, which was recently adjusted to provoke this behavior. Signed-off-by: Chris Forbes Cc: "9.2" mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/brw_cubemap_normalize.cpp | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_cubemap_normalize.cpp b/src/mesa/drivers/dri/i965/brw_cubemap_normalize.cpp index 9e149f9..edb3a2d 100644 --- a/src/mesa/drivers/dri/i965/brw_cubemap_normalize.cpp +++ b/src/mesa/drivers/dri/i965/brw_cubemap_normalize.cpp @@ -32,6 +32,7 @@ #include "glsl/glsl_types.h" #include "glsl/ir.h" +#include "program/prog_instruction.h" /* For WRITEMASK_* */ class brw_cubemap_normalize_visitor : public ir_hierarchical_visitor { public: @@ -88,11 +89,16 @@ brw_cubemap_normalize_visitor::visit_leave(ir_texture *ir) glsl_type::float_type, expr, NULL); - deref = new(mem_ctx) ir_dereference_variable(var); - ir->coordinate = new(mem_ctx) ir_expression(ir_binop_mul, - ir->coordinate->type, - deref, - expr); + /* coordinate.xyz *= expr */ + assign = new(mem_ctx) ir_assignment( + new(mem_ctx) ir_dereference_variable(var), + new(mem_ctx) ir_expression(ir_binop_mul, + ir->coordinate->type, +new(mem_ctx) ir_dereference_variable(var), +expr)); + assign->write_mask = WRITEMASK_XYZ; + base_ir->insert_before(assign); + ir->coordinate = new(mem_ctx) ir_dereference_variable(var); progress = true; return visit_continue; -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] vdpau/decode: Check max width and max height.
Reviewed and committed. Thanks, Christian. - Ursprüngliche Mail - Von: "Rico Schüller" An: mesa-dev@lists.freedesktop.org CC: "Rico Schüller" Gesendet: Samstag, 14. September 2013 20:27:07 Betreff: [Mesa-dev] [PATCH] vdpau/decode: Check max width and max height. --- src/gallium/state_trackers/vdpau/decode.c | 20 1 file changed, 20 insertions(+) diff --git a/src/gallium/state_trackers/vdpau/decode.c b/src/gallium/state_trackers/vdpau/decode.c index 47ca229..b144b83 100644 --- a/src/gallium/state_trackers/vdpau/decode.c +++ b/src/gallium/state_trackers/vdpau/decode.c @@ -51,6 +51,7 @@ vlVdpDecoderCreate(VdpDevice device, vlVdpDecoder *vldecoder; VdpStatus ret; bool supported; + uint32_t maxwidth, maxheight; if (!decoder) return VDP_STATUS_INVALID_POINTER; @@ -84,6 +85,25 @@ vlVdpDecoderCreate(VdpDevice device, return VDP_STATUS_INVALID_DECODER_PROFILE; } + maxwidth = screen->get_video_param + ( + screen, + templat.profile, + PIPE_VIDEO_ENTRYPOINT_BITSTREAM, + PIPE_VIDEO_CAP_MAX_WIDTH + ); + maxheight = screen->get_video_param + ( + screen, + templat.profile, + PIPE_VIDEO_ENTRYPOINT_BITSTREAM, + PIPE_VIDEO_CAP_MAX_HEIGHT + ); + if (width > maxwidth || height > maxheight) { + pipe_mutex_unlock(dev->mutex); + return VDP_STATUS_INVALID_SIZE; + } + vldecoder = CALLOC(1,sizeof(vlVdpDecoder)); if (!vldecoder) { pipe_mutex_unlock(dev->mutex); -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] gallium-bind-sampler-states branch
On Sun, Sep 15, 2013 at 12:24 AM, Brian Paul wrote: > On 09/12/2013 09:06 PM, Chia-I Wu wrote: >> >> Hi Brian, >> >> On Fri, Sep 13, 2013 at 8:46 AM, Brian Paul wrote: >>> >>> >>> I just pushed a gallium-bind-sampler-states branch to my git repo at >>> git://people.freedesktop.org/~brianp/mesa >>> >>> It replaces the four >>> pipe_context::bind_fragment/vertex/geometry/compute_sampler_states() >>> functions with a single bind_sampler_states() function: >>> >>> void (*bind_sampler_states)(struct pipe_context *, >>> unsigned shader, unsigned start_slot, >>> unsigned num_samplers, void **samplers); >>> >>> At this point start_slot is always zero (at least for non-compute >>> shaders). >>> And as the updated gallium docs explain, at some point calls to >>> bind_sampler_states() will be used to updated sub-ranges, but that never >>> happens currently. >>> >>> I've updated all the drivers, state trackers, utils, etc. >>> >>> I've tested the svga, llvmpipe and softpipe drivers. 'make check' and a >>> texture subset of piglit pass w/out regressions. I'd appreciate it if >>> other >>> driver developers would test their favorite driver. >> >> For ilo, the new code does not follow the doc and unbinds samplers not in >> range. > > > I think that's OK. The CSO module (used by the state tracker) currently > always calls pipe_context::bind_sampler_states() with start=0 and count such > that it sets/replaces all samplers, never a sub-range. That could/should > change in the future. > > See single_sampler_done() in cso_context.c. > > > >> Is it fine if I implement the new bind_sampler_states as a helper >> function on master branch, so that you hook it up to >> pipe_context::bind_sampler_states in your branch and remove the old >> ones? > > > I'm not quite sure that I understand what you mean. Can you elaborate? There is already ilo_bind_sampler_states that does what pipe_context::bind_sampler_states expects, except that the function returns a bool. I can make it return void so that, in your branch, you can initialize pipe_context::bind_sampler_states to it instead of adding ilo_bind_sampler_states2. > > -Brian > -- o...@lunarg.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] xdemos/corender: remove obsolete function gethostbyname
From 911cdbcf90440d748c185ac53fa6dded7f3da17c Mon Sep 17 00:00:00 2001 From: David Heidelberger Date: Mon, 9 Sep 2013 21:48:36 +0200 Subject: [PATCH] xdemos/corender: remove obsolete function gethostbyname Patch remove unused and obsolete funtion gethostbyname(). This also fixes runtime assert and error. corender: ipc.c:85: CreatePort: Assertion `hp' failed. --- src/xdemos/ipc.c | 12 1 file changed, 12 deletions(-) diff --git a/src/xdemos/ipc.c b/src/xdemos/ipc.c index c872d16..d6920b5 100644 --- a/src/xdemos/ipc.c +++ b/src/xdemos/ipc.c @@ -67,7 +67,6 @@ CreatePort(int *port) { char hostname[1000]; struct sockaddr_in servaddr; -struct hostent *hp; int so_reuseaddr = 1; int tcp_nodelay = 1; int sock, k; @@ -80,16 +79,10 @@ CreatePort(int *port) k = gethostname(hostname, 1000); assert(k == 0); -/* get hostent info */ -hp = gethostbyname(hostname); -assert(hp); - /* initialize the servaddr struct */ memset(&servaddr, 0, sizeof(servaddr) ); servaddr.sin_family = AF_INET; servaddr.sin_port = htons((unsigned short) (*port)); -memcpy((char *) &servaddr.sin_addr, hp->h_addr, - sizeof(servaddr.sin_addr)); /* deallocate when we exit */ k = setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, @@ -155,7 +148,6 @@ int Connect(const char *hostname, int port) { struct sockaddr_in servaddr; -struct hostent *hp; int sock, k; int tcp_nodelay = 1; @@ -164,13 +156,9 @@ Connect(const char *hostname, int port) sock = socket(AF_INET, SOCK_STREAM, 0); assert(sock >= 0); -hp = gethostbyname(hostname); -assert(hp); - memset(&servaddr, 0, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_port = htons((unsigned short) port); -memcpy((char *) &servaddr.sin_addr, hp->h_addr, sizeof(servaddr.sin_addr)); k = connect(sock, (struct sockaddr *) &servaddr, sizeof(servaddr)); if (k != 0) { -- 1.8.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/hsw: compute DDX in a subspan based only on top row
On Fri, Sep 13, 2013 at 2:15 PM, Paul Berry wrote: > On 12 September 2013 22:06, Chia-I Wu wrote: > >> From: Chia-I Wu >> >> Consider only the top-left and top-right pixels to approximate DDX in a >> 2x2 >> subspan, unless the application or the user requests a more accurate >> approximation. This results in a less accurate approximation. However, >> it >> improves the performance of Xonotic with Ultra settings by 24.3879% +/- >> 0.832202% (at 95.0% confidence) on Haswell. No noticeable image quality >> difference observed. >> >> No piglit gpu.tests regressions (tested with v1) >> >> I failed to come up with an explanation for the performance difference, >> as the >> change does not affect Ivy Bridge. If anyone has the insight, please >> kindly >> enlighten me. Performance differences may also be observed on other games >> that call textureGrad and dFdx. >> >> v2: Honor GL_FRAGMENT_SHADER_DERIVATIVE_HINT and add a drirc option. >> Update >> comments. >> > > I'm not entirely comfortable making a change that has a known negative > impact on computational accuracy (even one that leads to such an impressive > performance improvement) when we don't have any theories as to why the > performance improvement happens, or why the improvement doesn't apply to > Ivy Bridge. In my experience, making changes to the codebase without > understanding why they improve things almost always leads to improvements > that are brittle, since it's likely that the true source of the improvement > is a coincidence that will be wiped out by some future change (or won't be > relevant to client programs other than this particular benchmark). Having > a theory as to why the performance improvement happens would help us be > confident that we're applying the right fix under the right circumstances. > There's another angle to approach this and that is to develop a simple test case that will show the different results across a range of computational accuracy and run the test on proprietary drivers for the same hardware to determine what settings they are using. > > For example, here's one theory as to why we might be seeing an > improvement: perhaps Haswell's sample_d processing is smart enough to > realize that when all the gradient values within a sub-span are the same, > that means that all of the sampling for the sub-span will come from the > same LOD, and that allows it to short-cut some expensive step in the LOD > calculation. Perhaps the same improvement isn't seen on Ivy Bridge because > Ivy Bridge's sample_d processing logic is less sophisticated, so it's > unable to perform the optimization. If this is the case, then conditioning > the optimization on brw->is_haswell (as you've done) makes sense. > > Another possible explanation for the Haswell vs Ivy Bridge difference is > that perhaps Ivy Bridge, being a lower-performing chip, has other > bottlenecks that make the optimization irrelevant for this particular > benchmark, but potentially still useful for other benchmarks. For > instance, maybe when this benchmark executes on Ivy Bridge, the texture > that's being sampled from is located in sufficiently distant memory that > optimizing the sample_d's memory accesses makes no difference, since the > bottleneck is the speed with which the texture can be read into cache, > rather than the speed of operation of sample_d. If this explanation is > correct, then it might be worth applying the optimization to both Ivy > Bridge and Haswell (and perhaps Sandy Bridge as well), since it might > conceivably benefit those other chips when running applications that place > less cache pressure on the chip. > This scenario is where I'd place my bets, especially given that the numbers are based on Xonotic. I benchmarked this patch using Xonotic on Bay Trail as is and by replacing !brw->is_haswell with !brw->is_baytrail. With ultra and ultimate levels at medium and high resolutions, the results were all essentially the same at comparable resolutions and quality levels. I don't see any justification to tie this change to just Haswell hardware. There's all sorts of reasons why doing that sounds like a big mistake. In fact, another _explanation_ to add to your list is maybe there's another is_haswell test elsewhere in the driver that is responsible for the performance anomaly. > Another possibile explanation is that Haswell has a bug in its sample_d > logic which causes it to be slow under some conditions, and this > lower-accuracy DDX computation happens to work around it. If that's the > case, we might want to consider not using sample_d at all on Haswell, and > instead calculating the LOD in the shader and using sample_l instead. If > this is the correct explanation, then that might let us have faster > performance without sacrificing DDX accuracy. > > A final possible explanation for the performance improvement is that > perhaps for some reason sample_d performs more optimally when the DDX and > DDY computations have si
Re: [Mesa-dev] [PATCH 05/15] i965/sf: Consolidate common code for setting up gen6-7 attribute overrides.
On 09/03/2013 04:18 PM, Paul Berry wrote: > --- > src/mesa/drivers/dri/i965/brw_state.h | 9 +- > src/mesa/drivers/dri/i965/gen6_sf_state.c | 153 > +- > src/mesa/drivers/dri/i965/gen7_sf_state.c | 64 + > 3 files changed, 97 insertions(+), 129 deletions(-) This is fantastic! Patches 1-5 are: Reviewed-by: Kenneth Graunke ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/15] i965/fs: When >64 input components, order them to match prev pipeline stage.
On 09/03/2013 04:18 PM, Paul Berry wrote: > Since the SF/SBE stage is only capable of performing arbitrary > reorderings of 16 varying slots, we can't arrange the fragment shader > inputs in an arbitrary order if there are more than 16 input varying > slots in use. We need to make sure that slots 16-31 match the > corresponding outputs of the previous pipeline stage. > > The easiest way to accomplish this is to just make all varying slots > match up with the previous pipeline stage. > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 42 > ++-- > src/mesa/drivers/dri/i965/brw_wm.c | 3 ++- > 2 files changed, 38 insertions(+), 7 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index 7950d5f6..8d73a0f 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -1237,11 +1237,40 @@ fs_visitor::calculate_urb_setup() > int urb_next = 0; > /* Figure out where each of the incoming setup attributes lands. */ > if (brw->gen >= 6) { > - for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) { > - if (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK & > - BITFIELD64_BIT(i)) { > - c->prog_data.urb_setup[i] = urb_next++; > - } > + if (_mesa_bitcount_64(fp->Base.InputsRead & > +BRW_FS_VARYING_INPUT_MASK) <= 16) { > + /* The SF/SBE pipeline stage can do arbitrary rearrangement of the > + * first 16 varying inputs, so we can put them wherever we want. > + * Just put them in order. > + */ It might be nice to have a comment saying why this is useful (as opposed to always following the previous stage's ordering). As I understand it, this is useful when the VS outputs a bunch of varyings but the FS only uses a couple of them---we can pack them into the first few slots, saving space. > + for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) { > +if (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK & > +BITFIELD64_BIT(i)) { > + c->prog_data.urb_setup[i] = urb_next++; > +} > + } > + } else { > + /* We have enough input varyings that the SF/SBE pipeline stage > can't > + * arbitrarily rearrange them to suit our whim; we have to put them > + * in an order that matches the output of the previous pipeline > stage > + * (geometry or vertex shader). > + */ > + struct brw_vue_map prev_stage_vue_map; > + brw_compute_vue_map(brw, &prev_stage_vue_map, > + c->key.input_slots_valid); > + int first_slot = 2 * BRW_SF_URB_ENTRY_READ_OFFSET; > + assert(prev_stage_vue_map.num_slots <= first_slot + 32); > + for (int slot = first_slot; slot < prev_stage_vue_map.num_slots; > + slot++) { > +int varying = prev_stage_vue_map.slot_to_varying[slot]; > +if (varying != BRW_VARYING_SLOT_COUNT && It wasn't immediately obvious to me why varying would be BRW_VARYING_SLOT_COUNT. But, on further inspection, I see this is the value that gets assigned to empty slots. Up to you whether you want to add a small comment. > +(fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK & > + BITFIELD64_BIT(varying))) { > + c->prog_data.urb_setup[varying] = slot - first_slot; > + urb_next = MAX2(urb_next, slot + 1); > +} > + } > + urb_next = prev_stage_vue_map.num_slots - first_slot; Huh? It looks like you're setting urb_next in the loop, then clobbering it immediately after the loop. This should probably be fixed. >} > } else { >/* FINISHME: The sf doesn't map VS->FS inputs for us very well. */ > @@ -3149,7 +3178,8 @@ brw_fs_precompile(struct gl_context *ctx, struct > gl_shader_program *prog) >key.iz_lookup |= IZ_DEPTH_WRITE_ENABLE_BIT; > } > > - if (brw->gen < 6) > + if (brw->gen < 6 || _mesa_bitcount_64(fp->Base.InputsRead & > + BRW_FS_VARYING_INPUT_MASK) > 16) Could this be simplified to: if (brw->gen < 6 || c->prog_data.num_varying_inputs > 16) Or are these values different? >key.input_slots_valid = fp->Base.InputsRead | VARYING_BIT_POS; > > key.clamp_fragment_color = ctx->API == API_OPENGL_COMPAT; > diff --git a/src/mesa/drivers/dri/i965/brw_wm.c > b/src/mesa/drivers/dri/i965/brw_wm.c > index 3df2b7d..3e59880 100644 > --- a/src/mesa/drivers/dri/i965/brw_wm.c > +++ b/src/mesa/drivers/dri/i965/brw_wm.c > @@ -466,7 +466,8 @@ static void brw_wm_populate_key( struct brw_context *brw, >(ctx->Multisample.SampleAlphaToCoverage || ctx->Color.AlphaEnabled); > > /* BRW_NEW_VUE_MAP_GEOM_OUT */ > - if (brw->gen < 6) > + if (brw->gen < 6 || _mesa_bitcount_64(fp->program.Base.InputsRead & > +
Re: [Mesa-dev] [PATCH 15/15] i965/gen6+: Support 128 varying components.
On 09/03/2013 04:18 PM, Paul Berry wrote: > GL 3.2 requires us to support 128 varying components for geometry > shader outputs and fragment shader inputs, and 64 varying components > otherwise. But there's no hardware limitation that restricts us to 64 > varying components, and core Mesa doesn't currently allow different > stages to have different maximum values, so just go ahead and enable > 128 varying components for all stages. This gets us better test > coverage anyway. > > Even though we are only working on GL 3.2 support for gen7 right now, > gen6 also supports 128 varying components, so go ahead and switch it > on there too. > --- > src/mesa/drivers/dri/i965/brw_context.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_context.c > b/src/mesa/drivers/dri/i965/brw_context.c > index 2321076..3c1e409 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.c > +++ b/src/mesa/drivers/dri/i965/brw_context.c > @@ -247,6 +247,9 @@ brw_initialize_context_constants(struct brw_context *brw) > ctx->Const.DisableGLSLLineContinuations = >driQueryOptionb(&brw->optionCache, "disable_glsl_line_continuations"); > > + if (brw->gen >= 6) > + ctx->Const.MaxVarying = 32; > + > /* We want the GLSL compiler to emit code that uses condition codes */ > for (int i = 0; i < MESA_SHADER_TYPES; i++) { >ctx->ShaderCompilerOptions[i].MaxIfDepth = brw->gen < 6 ? 16 : > UINT_MAX; > Nice work, Paul. Other than my small nits on patch 11, this series is: Reviewed-by: Kenneth Graunke ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] mesa: add missing error checks in _mesa_GetObject[Ptr]Label()
Hi Brian, Maybe its just another oversight in the spec but I thought I'd point out that the spec doesnt actually say to test for this in the get label functions. I assumed this was because label can be NULL in which case a bufSize of 0 would be valid. I haven't checked what the Catalyst drivers do yet. Tim >- Original Message - >From: Brian Paul >To: mesa-dev@lists.freedesktop.org >Cc: >Sent: Sunday, 15 September 2013 2:16 AM >Subject: [Mesa-dev] [PATCH 3/3] mesa: add missing error checks in >_mesa_GetObject[Ptr]Label() > >--- >src/mesa/main/objectlabel.c | 12 >1 file changed, 12 insertions(+) > >diff --git a/src/mesa/main/objectlabel.c b/src/mesa/main/objectlabel.c >index bfe9ba2..c373a46 100644 >--- a/src/mesa/main/objectlabel.c >+++ b/src/mesa/main/objectlabel.c >@@ -256,6 +256,12 @@ _mesa_GetObjectLabel(GLenum identifier, GLuint name, >GLsizei bufSize, > GET_CURRENT_CONTEXT(ctx); > char **labelPtr; > >+ if (bufSize <= 0) { >+ _mesa_error(ctx, GL_INVALID_VALUE, "glGetObjectLabel(bufSize = %d)", >+ bufSize); >+ return; >+ } >+ > labelPtr = get_label_pointer(ctx, identifier, name, "glGetObjectLabel"); > if (!labelPtr) > return; >@@ -288,6 +294,12 @@ _mesa_GetObjectPtrLabel(const void *ptr, GLsizei bufSize, >GLsizei *length, > char **labelPtr; > struct gl_sync_object *const syncObj = (struct gl_sync_object *) ptr; > >+ if (bufSize <= 0) { >+ _mesa_error(ctx, GL_INVALID_VALUE, "glGetObjectPtrLabel(bufSize = %d)", >+ bufSize); >+ return; >+ } >+ > if (!_mesa_validate_sync(ctx, syncObj)) { > _mesa_error(ctx, GL_INVALID_VALUE, "glGetObjectPtrLabel (not a valid >sync object)"); > return; >-- >1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev