[Mesa-dev] [PATCH 3/3] glsl: fix typos in comments "transfor" -> "transform"

2018-11-21 Thread Jose Maria Casanova Crespo
--- src/compiler/glsl/ir.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h index e09f053b77c..c3f5f1f7b05 100644 --- a/src/compiler/glsl/ir.h +++ b/src/compiler/glsl/ir.h @@ -773,17 +773,17 @@ public: unsigned is_xfb_

[Mesa-dev] [PATCH 2/3] glsl: TCS outputs can not be transform feedback candidates on GLES

2018-11-21 Thread Jose Maria Casanova Crespo
Fixes: KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage Cc: mesa-sta...@lists.freedesktop.org --- I think this patch and the previous one should be squashed or interchange the order before landing. I'm sending splitted because it allows exposing the incorrect behavio

[Mesa-dev] [PATCH 1/3] glsl: XFB TSC per-vertex output varyings match as not declared as arrays

2018-11-21 Thread Jose Maria Casanova Crespo
Recent change on OpenGL CTS ("Use non-arrayed varying name for TCS blocks") on KHR-GL*.tessellation_shader.single.xfb_captures_data_from_correct_stage tests changed how to name per-vertex Tessellation Control Shader output varyings in transform feedback using interface block as "BLOCK_INOUT.value"

[Mesa-dev] [PATCH v5 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

2018-07-29 Thread Jose Maria Casanova Crespo
These new methods return for a instruction register source/destination the read/write byte pattern of the 32-byte GRF as an unsigned int. The returned pattern takes into account the exec_size of the instruction, the type bitsize, the register stride and a relative offset inside the register. The

[Mesa-dev] [PATCH v4 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

2018-07-27 Thread Jose Maria Casanova Crespo
These new methods return for a instruction register source/destination the read/write byte pattern of the 32-byte GRF as an unsigned int. The returned pattern takes into account the exec_size of the instruction, the type bitsize, the register stride and a relative offset inside the register. The

[Mesa-dev] [PATCH 2/2] intel/compiler: implement 8-bit constant load

2018-07-27 Thread Jose Maria Casanova Crespo
From: Iago Toral Quiroga --- src/intel/compiler/brw_fs_nir.cpp | 5 + 1 file changed, 5 insertions(+) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index 2c8595b9730..6e9a5829d3b 100644 --- a/src/intel/compiler/brw_fs_nir.cpp +++ b/src/intel/compiler/br

[Mesa-dev] [PATCH 0/2] intel/compiler: Enable 8-bit constants

2018-07-27 Thread Jose Maria Casanova Crespo
New VK-GL-CTS tests that use VK_KHR_8bit_storage extension use 32-bit constants that are converted to 8-bit and there are stored in a storage buffer. Although 8-bit constants are not enabled by VK_KHR_8bit_storage nir_opt_constant_folding already optimizes the 32 -> 8 integer conversion to a 8-bit

[Mesa-dev] [PATCH 1/2] intel/compiler: add setup_imm_(u)b helpers

2018-07-27 Thread Jose Maria Casanova Crespo
From: Iago Toral Quiroga The hardware doesn't support byte immediates, so similar to setup_imm_df() for doubles, these helpers work by loading the constant value into a VGRF. --- src/intel/compiler/brw_fs.h | 6 ++ src/intel/compiler/brw_fs_nir.cpp | 16 2 files chang

[Mesa-dev] [PATCH 2/2] intel/fs: Write multiple 8/16-bit components with byte_scattered_write

2018-07-25 Thread Jose Maria Casanova Crespo
We also pack in the same byte_scattered_write message the maximum number of 8/16-bit components. Comments have been rewritten to adapt them to the 8-bit case. --- src/intel/compiler/brw_fs_nir.cpp | 66 ++- 1 file changed, 38 insertions(+), 28 deletions(-) diff --git

[Mesa-dev] [PATCH 1/2] intel/fs: Read multiple 8/16-bit components with byte_scattered_read

2018-07-25 Thread Jose Maria Casanova Crespo
We used the byte_scattered_read message because it allows to read from non aligned 32-bit offsets. We were reading one component for each message. Using a 32-bit bitsize read at byte_scattered_read we can read up to two 16-bit components or four 8-bit components with only one message per iteration

[Mesa-dev] [PATCH v3 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

2018-07-23 Thread Jose Maria Casanova Crespo
These new methods return for a instruction register source/destination the read/write byte pattern of the 32-byte GRF as an unsigned int. The returned pattern takes into account the exec_size of the instruction, the type bitsize, the register stride and a relative offset inside the register. The

[Mesa-dev] [PATCH v2 2/2] intel/fs: Improve liveness range calculation for partial writes

2018-07-19 Thread Jose Maria Casanova Crespo
We use the information of the registers read/write patterns to improve variable liveness analysis avoiding extending the liveness range of a variable to the beginning of the block so it always reaches the beginning of the shader. This optimization analyses inside each block if a partial write defi

[Mesa-dev] [PATCH v2 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

2018-07-19 Thread Jose Maria Casanova Crespo
These new methods return for a instruction register source/destination the read/write byte pattern of the 32-byte GRF as an unsigned int. The returned pattern takes into account the exec_size of the instruction, the type bitsize, the register stride and a relative offset inside the register. The

[Mesa-dev] [PATCH 2/2] intel/fs: Improve liveness range calculation for partial writes

2018-07-13 Thread Jose Maria Casanova Crespo
We use the information of the registers read/write patterns to improve variable liveness analysis avoiding extending the liveness range of a variable to the beginning of the block so it always reaches the beginning of the shader. This optimization analyses inside each block that if a partial write

[Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-13 Thread Jose Maria Casanova Crespo
For a register source/destination of an instruction the function returns the read/write byte pattern of a 32-byte registers as a unsigned int. The returned pattern takes into account the exec_size of the instruction, the type bitsize, the stride and if the register is source or destination. The o

[Mesa-dev] [PATCH 0/2] intel/fs: Liveness range improvements with partial writes

2018-07-13 Thread Jose Maria Casanova Crespo
ound yet a case where I see any improvements in the generated code and I have still pending to deal with an important increase in compilation time in my WIP solution. Jose Maria Casanova Crespo (2): intel/fs: New method for register_byte_use_pattern for fs_inst intel/fs: Improve liveness range c

[Mesa-dev] [PATCH] i965/fs: unspills shoudn't use grf127 as dest since Gen8+

2018-07-11 Thread Jose Maria Casanova Crespo
At 232ed8980217dd65ab0925df28156f565b94b2e5 "i965/fs: Register allocator shoudn't use grf127 for sends dest" we didn't take into account the case of SEND instructions that are not send_from_grf. But since Gen7+ although the backend still uses MRFs internally for sends they are finally asigned to a

[Mesa-dev] [PATCH 9/9] anv: Enable SPV_KHR_8bit_storage and VK_KHR_8bit_storage

2018-07-08 Thread Jose Maria Casanova Crespo
Enables SPV_KHR_8bit_storage and VK_KHR_8bit_storage on gen 8+ using the VK_KHR_get_physical_device_properties2 functionality to expose if the extension is supported or not. Reviewed-by: Jason Ekstrand --- src/intel/vulkan/anv_device.c | 11 +++ src/intel/vulkan/anv_extensions.py |

[Mesa-dev] [PATCH 8/9] spirv/nir: Add support for SPV_KHR_8bit_storage

2018-07-08 Thread Jose Maria Casanova Crespo
Reviewed-by: Jason Ekstrand --- src/compiler/shader_info.h| 1 + src/compiler/spirv/spirv_to_nir.c | 6 ++ 2 files changed, 7 insertions(+) diff --git a/src/compiler/shader_info.h b/src/compiler/shader_info.h index 8c58ee285ec..3b95d5962c0 100644 --- a/src/compiler/shader_info.h +++

[Mesa-dev] [PATCH 6/9] i965/fs: Enable store_ssbo for 8-bit types.

2018-07-08 Thread Jose Maria Casanova Crespo
v2: Update comment according to this patch. (Jason Ekstrand) Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_fs_nir.cpp | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index 4155b2ed

[Mesa-dev] [PATCH 7/9] spirv: Include headers and grammar for SPV_KHR_8bit_storage

2018-07-08 Thread Jose Maria Casanova Crespo
Update to headers and grammar to ff684ffc6a35d2a58f0f63108877d0064ea33feb --- src/compiler/spirv/spirv.core.grammar.json | 44 ++ src/compiler/spirv/spirv.h | 3 ++ 2 files changed, 40 insertions(+), 7 deletions(-) diff --git a/src/compiler/spirv/spirv.core.gr

[Mesa-dev] [PATCH 3/9] i965: Support for 8-bit base types in helper functions

2018-07-08 Thread Jose Maria Casanova Crespo
Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_fs_nir.cpp | 11 ++- src/intel/compiler/brw_nir.c | 4 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index 02ac92e62f1..83ed9575f80 100

[Mesa-dev] [PATCH 1/9] intel/compiler: grf127 can not be dest when src and dest overlap in send

2018-07-08 Thread Jose Maria Casanova Crespo
Implement at brw_eu_validate the restriction from Intel Broadwell PRM, vol 07, section "Instruction Set Reference", subsection "EUISA Instructions", Send Message (page 990): "r127 must not be used for return address when there is a src and dest overlap in send instruction." v2: Style fixes (Matt

[Mesa-dev] [PATCH 4/9] i965/fs: Enable conversions to 8-bit integers

2018-07-08 Thread Jose Maria Casanova Crespo
Reviewed-by: Jason Ekstrand --- src/intel/compiler/brw_fs_nir.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index 83ed9575f80..4155b2ed996 100644 --- a/src/intel/compiler/brw_fs_nir.cpp +++ b/src/intel/compiler/brw

[Mesa-dev] [PATCH 2/9] i965/fs: Register allocator shoudn't use grf127 for sends dest

2018-07-08 Thread Jose Maria Casanova Crespo
Since Gen8+ Intel PRM states that "r127 must not be used for return address when there is a src and dest overlap in send instruction." This patch implements this restriction creating new grf127_send_hack_node at the register allocator. This node has a fixed assignation to grf127. For vgrf that ar

[Mesa-dev] [PATCH 0/9] anv: Enable VK_KHR_8bit_storage

2018-07-08 Thread Jose Maria Casanova Crespo
: dEQP-VK.spirv_assembly.instruction.*.8bit_storage.* Jose Maria Casanova Crespo (9): intel/compiler: grf127 can not be dest when src and dest overlap in send i965/fs: Register allocator shoudn't use grf127 for sends dest i965: Support for 8-bit base types in helper functions

[Mesa-dev] [PATCH 5/9] intel/compiler: relax brw_eu_validate for byte raw movs

2018-07-08 Thread Jose Maria Casanova Crespo
When the destination is a BYTE type allow raw movs even if the stride is not exact multiple of destination type and exec type, execution type is Word and its size is 2. This restriction was only allowing stride==2 destinations for 8-bit types. Reviewed-by: Jason Ekstrand --- src/intel/compiler/

[Mesa-dev] [PATCH] anv: finish the binding_table_pool on destroyDevice when use_softpin

2018-06-28 Thread Jose Maria Casanova Crespo
Running VK-CTS in batch execution mode was raising the VK_ERROR_INITIALIZATION_FAILED error in multiple tests. But when the same failing tests were run isolated they always passed. createDevice and destroyDevice were called before and after every tests. Because the binding_table_pool was never clo

[Mesa-dev] [PATCH 01/14] intel/fs: general 8/16/32/64-bit shuffle_src_to_dst function (v2)

2018-06-14 Thread Jose Maria Casanova Crespo
This new function takes care of shuffle/unshuffle components of a particular bit-size in components with a different bit-size. If source type size is smaller than destination type size the operation needed is a component shuffle. The opposite case would be an unshuffle. Component units are measur

[Mesa-dev] [PATCH] intel/fs: use uint type for per_slot_offset at GS

2018-06-12 Thread Jose Maria Casanova Crespo
This helps us to compact original instruction: mul(8) g3<1>D g6<8,8,1>UD 0x0006UD { align1 1Q }; So now we emit: mul(8) g3<1>UD g6<8,8,1>UD 0x0006UD { align1 1Q compacted }; --- src/intel/compiler/brw_fs_visitor.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

[Mesa-dev] [PATCH 13/14] intel/compiler: use new shuffle_32bit_write for all 64-bit storage writes

2018-06-09 Thread Jose Maria Casanova Crespo
--- src/intel/compiler/brw_fs_nir.cpp | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index 2521f3c001b..833fad4247a 100644 --- a/src/intel/compiler/brw_fs_nir.cpp +++ b/src/intel/compiler/brw_

[Mesa-dev] [PATCH 14/14] intel/compiler: shuffle_64bit_data_for_32bit_write is not used anymore

2018-06-09 Thread Jose Maria Casanova Crespo
--- src/intel/compiler/brw_fs.h | 4 src/intel/compiler/brw_fs_nir.cpp | 32 --- 2 files changed, 36 deletions(-) diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h index 1f86f17ccbb..17b1368d522 100644 --- a/src/intel/compiler/brw_fs.h

[Mesa-dev] [PATCH 09/14] intel/compiler: Use shuffle_from_32bit_read at VS load_input

2018-06-09 Thread Jose Maria Casanova Crespo
shuffle_from_32bit_read manages 32-bit reads to 32-bit destination in the same way that the previous loop so now we just call the new function for all bitsizes, simplifying also the 64-bit load_input. --- src/intel/compiler/brw_fs_nir.cpp | 12 ++-- 1 file changed, 2 insertions(+), 10 dele

[Mesa-dev] [PATCH 10/14] intel/compiler: shuffle_from_32bit_read at load_per_vertex_input at TCS/TES

2018-06-09 Thread Jose Maria Casanova Crespo
Previously, the shuffle function had a source/destination overlap that needs to be avoided to use shuffle_from_32bit_read. As we can use for the shuffle destination the destination of removed MOVs. This change also avoids the internal MOVs done by the previous shuffle to deal with possible overlap

[Mesa-dev] [PATCH 12/14] intel/compiler: shuffle_32bit_load_result_to_64bit_data is not used anymore

2018-06-09 Thread Jose Maria Casanova Crespo
--- src/intel/compiler/brw_fs.h | 5 --- src/intel/compiler/brw_fs_nir.cpp | 53 --- 2 files changed, 58 deletions(-) diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h index d72164ae0b6..1f86f17ccbb 100644 --- a/src/intel/compiler/brw_fs.h +

[Mesa-dev] [PATCH 11/14] intel/compiler: use shuffle_from_32bit_read for 64-bit FS load_input

2018-06-09 Thread Jose Maria Casanova Crespo
As the previous use of shuffle_32bit_load_result_to_64bit_data had a source/destination overlap for 64-bit. Now a temporal destination is used for 64-bit cases to use shuffle_from_32bit_read that doesn't handle src/dst overlaps. --- src/intel/compiler/brw_fs_nir.cpp | 8 1 file changed, 4

[Mesa-dev] [PATCH 04/14] intel/compiler: Use shuffle_from_32bit_read to read 16-bit SSBO

2018-06-09 Thread Jose Maria Casanova Crespo
Using shuffle_from_32bit_read instead of 16-bit shuffle functions avoids the need of retype. At the same time new function are ready for 8-bit type SSBO reads. --- src/intel/compiler/brw_fs_nir.cpp | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/intel/compiler/brw_fs_

[Mesa-dev] [PATCH 06/14] intel/compiler: remove old 16-bit shuffle/unshuffle functions

2018-06-09 Thread Jose Maria Casanova Crespo
--- src/intel/compiler/brw_fs.h | 11 -- src/intel/compiler/brw_fs_nir.cpp | 62 --- 2 files changed, 73 deletions(-) diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h index 779170ecc95..d72164ae0b6 100644 --- a/src/intel/compiler/brw_fs.

[Mesa-dev] [PATCH 01/14] intel/compiler: general 8/16/32/64-bit shuffle_src_to_dst function

2018-06-09 Thread Jose Maria Casanova Crespo
This new function takes care of shuffle/unshuffle components of a particular bit-size in components with a different bit-size. If source type size is smaller than destination type size the operation needed is a component shuffle. The opposite case would be an unshuffle. The operation allows to sk

[Mesa-dev] [PATCH 03/14] intel/compiler: use shuffle_from_32bit_read at VARYING_PULL_CONSTANT_LOAD

2018-06-09 Thread Jose Maria Casanova Crespo
shuffle_from_32bit_read can manage the shuffle/unshuffle needed for different 8/16/32/64 bit-sizes at VARYING PULL CONSTANT LOAD. To get the specific component the first_component parameter is used. In the case of the previous 16-bit shuffle, the shuffle operation was generating not needed MOVs wh

[Mesa-dev] [PATCH 07/14] intel/compiler: shuffle_from_32bit_read for 64-bit do_untyped_vector_read

2018-06-09 Thread Jose Maria Casanova Crespo
do_untyped_vector_read is used at load_ssbo and load_shared. The previous MOVs are removed because shuffle_from_32bit_read can handle storing the shuffle results in the expected destination just using the proper offset. --- src/intel/compiler/brw_fs_nir.cpp | 12 ++-- 1 file changed, 2 in

[Mesa-dev] [PATCH 08/14] intel/compiler: enable shuffle_from_32bit_read at 64-bit gs_input_load

2018-06-09 Thread Jose Maria Casanova Crespo
This implementation avoids two unneeded MOVs for each 64-bit component. One was done in the old shuffle, to avoid cases of src/dst overlap but this is not the case. And the removed MOV was already being being done in the shuffle. Copy propagation wasn't able to remove them because shuffle destinat

[Mesa-dev] [PATCH 05/14] intel/compiler: Use shuffle_from_32bit_write for 16-bits store_ssbo

2018-06-09 Thread Jose Maria Casanova Crespo
--- src/intel/compiler/brw_fs_nir.cpp | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index ef7895262b8..a54935f7049 100644 --- a/src/intel/compiler/brw_fs_nir.cpp +++ b/src/intel/compiler/brw_fs_nir.

[Mesa-dev] [PATCH 00/14] intel/compiler: unshuffle/shuffle functions refactoring

2018-06-09 Thread Jose Maria Casanova Crespo
as Cc: Iago Toral Jose Maria Casanova Crespo (14): intel/compiler: general 8/16/32/64-bit shuffle_src_to_dst function intel/compiler: new shuffle_for_32bit_write and shuffle_from_32bit_read intel/compiler: use shuffle_from_32bit_read at VARYING_PULL_CONSTANT_LOAD intel/c

[Mesa-dev] [PATCH 02/14] intel/compiler: new shuffle_for_32bit_write and shuffle_from_32bit_read

2018-06-09 Thread Jose Maria Casanova Crespo
These new shuffle functions deal with the shuffle/unshuffle operations needed for read/write operations using 32-bit components when the read/written components have a different bit-size (8, 16, 64-bits). Shuffle from 32-bit to 32-bit becomes a simple MOV. As the new function shuffle_src_to_dst ta

[Mesa-dev] [PATCH v2 7.5/18] intel/compiler: support negate and abs of half float immediates

2018-05-02 Thread Jose Maria Casanova Crespo
--- src/intel/compiler/brw_shader.cpp | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/intel/compiler/brw_shader.cpp b/src/intel/compiler/brw_shader.cpp index 284c2e8233c..537defd05d9 100644 --- a/src/intel/compiler/brw_shader.cpp +++ b/src/intel/compiler/brw_shader.c

[Mesa-dev] [PATCH v3] intel/compiler: fix 16-bit int brw_negate_immediate and brw_abs_immediate

2018-05-02 Thread Jose Maria Casanova Crespo
From Intel Skylake PRM, vol 07, "Immediate" section (page 768): "For a word, unsigned word, or half-float immediate data, software must replicate the same 16-bit immediate value to both the lower word and the high word of the 32-bit immediate field in a GEN instruction." This fixes the int16/uint

[Mesa-dev] [PATCH v3] intel/compiler: fix brw_imm_w for negative 16-bit integers

2018-05-02 Thread Jose Maria Casanova Crespo
16-bit immediates need to replicate the 16-bit immediate value in both words of the 32-bit value. This needs to be careful to avoid sign-extension, which the previous implementation was not handling properly. For example, with the previous implementation, storing the value -3 would generate imm.d

[Mesa-dev] [PATCH 2/2] i965/fs: Register allocator shoudn't use grf127 for sends dest (v2)

2018-04-18 Thread Jose Maria Casanova Crespo
Since Gen8+ Intel PRM states that "r127 must not be used for return address when there is a src and dest overlap in send instruction." This patch implements this restriction creating new grf127_send_hack_node at the register allocator. This node has a fixed assignation to grf127. For vgrf that ar

[Mesa-dev] [PATCH] i965/fs: retype offset_reg to UD at load_ssbo

2018-04-18 Thread Jose Maria Casanova Crespo
All operations with offset_reg at do_vector_read are done with UD type. So copy propagation was not working through the generated MOVs: mov(8) vgrf9:UD, vgrf7:D This change allows removing the MOV generated for reading the first components for 16-bit and 64-bit ssbo reads with non-constant offset

[Mesa-dev] [PATCH 2/2] i965/fs: Register allocator shoudn't use grf127 for sends dest

2018-04-11 Thread Jose Maria Casanova Crespo
Since Gen8+ Intel PRM states that "r127 must not be used for return address when there is a src and dest overlap in send instruction." This patch implements this restriction creating new register allocator classes that are copies of the normal classes. These new classes exclude in their set of re

[Mesa-dev] [PATCH 1/2] intel/compiler: grf127 can not be dest when src and dest overlap in send

2018-04-11 Thread Jose Maria Casanova Crespo
Implement at brw_eu_validate the restriction from Intel Broadwell PRM, vol 07, section "Instruction Set Reference", subsection "EUISA Instructions", Send Message (page 990): "r127 must not be used for return address when there is a src and dest overlap in send instruction." Cc: Jason Ekstrand Cc

[Mesa-dev] [PATCH] nir/search: Include 8 and 16-bit support in construct_value

2018-03-01 Thread Jose Maria Casanova Crespo
--- src/compiler/nir/nir_search.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c index c7c52ae320d..28b36b2b863 100644 --- a/src/compiler/nir/nir_search.c +++ b/src/compiler/nir/nir_search.c @@ -525,6 +525,9 @@ con

[Mesa-dev] [PATCH v2 3/8] i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout (v3)

2018-02-28 Thread Jose Maria Casanova Crespo
16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't guarantee that offsets are multiple of 4-bytes as required by untyped_read message. This happens for example in the case of f16mat3x3 when then VK_KHR_relaxed_block_layout is enabled. Vectors reads when we have non-constant off

[Mesa-dev] [PATCH v2 7/8] spirv/i965/anv: Relax push constant offset assertions being 32-bit aligned

2018-02-27 Thread Jose Maria Casanova Crespo
The introduction of 16-bit types with VK_KHR_16bit_storages implies that push constant offsets could be multiple of 2-bytes. Some assertions are updated so offsets should be just multiple of size of the base type but in some cases we can not assume it as doubles aren't aligned to 8 bytes in some ca

[Mesa-dev] [PATCH v2 8/8] anv: Enable VK_KHR_16bit_storage for PushConstant

2018-02-27 Thread Jose Maria Casanova Crespo
Enables storagePushConstant16 features of VK_KHR_16bit_storage for Gen8+. --- src/intel/vulkan/anv_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index a7b586c79c7..7c8b768c589 100644 --- a/src/intel/vulkan

[Mesa-dev] [PATCH v2 6/8] spirv: Calculate properly 16-bit vector sizes

2018-02-27 Thread Jose Maria Casanova Crespo
Range in 16-bit push constants load was being calculated wrongly using 4-bytes per element instead of 2-bytes as it should be. v2: Use glsl_get_bit_size instead of if statement (Jason Ekstrand) Reviewed-by: Jason Ekstrand --- src/compiler/spirv/vtn_variables.c | 7 ++- 1 file changed, 2

[Mesa-dev] [PATCH v2 2/8] i965/fs: shuffle_32bit_load_result_to_16bit_data now skips components

2018-02-27 Thread Jose Maria Casanova Crespo
This helper used to load 16bit components from 32-bits read now allows skipping components with the new parameter first_component. The semantics now skip components until we reach the first_component, and then reads the number of components passed to the function. All previous uses of the helper a

[Mesa-dev] [PATCH v2 5/8] anv: Enable VK_KHR_16bit_storage for SSBO and UBO

2018-02-27 Thread Jose Maria Casanova Crespo
Enables storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccesss features of VK_KHR_16bit_storage for Gen8+. --- src/intel/vulkan/anv_device.c | 5 +++-- src/intel/vulkan/anv_extensions.py | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/src/intel/vulkan/anv_devi

[Mesa-dev] [PATCH v2 1/8] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit

2018-02-27 Thread Jose Maria Casanova Crespo
The surfaces that backup the GPU buffers have a boundary check that considers that access to partial dwords are considered out-of-bounds. For example, buffers with 1,3 16-bit elements has size 2 or 6 and the last two bytes would always be read as 0 or its writting ignored. The introduction of 16-b

[Mesa-dev] [PATCH v2 0/8] anv: VK_KHR_16bit_storage enabling SSBO/UBO/PushConstant

2018-02-27 Thread Jose Maria Casanova Crespo
both series has been force-pushed at [2] [1] https://lists.freedesktop.org/archives/mesa-dev/2018-February/186544.html [2] https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc5 Cc: Jason Ekstrand Jose Maria Casanova Crespo (8): isl/i965/fs: SSBO/UBO buffers need size padding i

[Mesa-dev] [PATCH v2 3/8] i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout

2018-02-27 Thread Jose Maria Casanova Crespo
16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't guarantee that offsets are multiple of 4-bytes as required by untyped_read message. This happens for example in the case of f16mat3x3 when then VK_KHR_relaxed_block_layout is enabled. Vectors reads when we have non-constant off

[Mesa-dev] [PATCH v2 4/8] i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout

2018-02-27 Thread Jose Maria Casanova Crespo
Restrict the use of untyped_surface_write with 16-bit pairs in ssbo to the cases where we can guarantee that offset is multiple of 4. Taking into account that VK_KHR_relaxed_block_layout is available in ANV we can only guarantee that when we have a constant offset that is multiple of 4. For non co

[Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)

2018-02-26 Thread Jose Maria Casanova Crespo
The surfaces that backup the GPU buffers have a boundary check that considers that access to partial dwords are considered out-of-bounds. For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the last two bytes would always be read as 0 or its writting ignored. The introduction of 16-bi

[Mesa-dev] [PATCH 6/7] spirv/i965/anv: Relax push constant offset assertions being 32-bit aligned (v2)

2018-02-26 Thread Jose Maria Casanova Crespo
The introduction of 16-bit types with VK_KHR_16bit_storages implies that push constant offsets could be multiple of 2-bytes. Some assertions are updated so offsets should be just multiple of size of the base type but in some cases we can not assume it as doubles aren't aligned to 8 bytes in some ca

[Mesa-dev] [PATCH 5/7] spirv: Calculate properly 16-bit vector sizes (v2)

2018-02-23 Thread Jose Maria Casanova Crespo
Range in 16-bit push constants load was being calculated wrongly using 4-bytes per element instead of 2-bytes as it should be. v2: Use glsl_get_bit_size instead of if statement (Jason Ekstrand) --- src/compiler/spirv/vtn_variables.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-)

[Mesa-dev] [PATCH v5 04/14] anv/cmd_buffer: Add a padding to the vertex buffer

2018-02-23 Thread Jose Maria Casanova Crespo
half_inputs_read to inputs_read_16bit. v3: Rebase minor changes (Chema Casanova) Signed-off-by: Jose Maria Casanova Crespo Signed-off-by: Alejandro Piñeiro --- src/intel/vulkan/anv_device.c | 9 + src/intel/vulkan/genX_cmd_buffer.c | 20 ++-- 2 files changed, 27

[Mesa-dev] [PATCH v5 08/14] anv: Enable VK_KHR_16bit_storage for input/output

2018-02-23 Thread Jose Maria Casanova Crespo
Enables storageInputOutput16 feature of VK_KHR_16bit_storage for Gen8+. --- src/intel/vulkan/anv_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index 1756cf5324..c183ea8437 100644 --- a/src/intel/vulkan/anv

[Mesa-dev] [PATCH v5 07/14] i965/fs: Enable Render Target Write for 16-bit outputs

2018-02-23 Thread Jose Maria Casanova Crespo
ld be packed (Jason Ekstrand) Remove not necessary alignment operation for 16-bit to 32-bit conversion (Chema Casanova) Signed-off-by: Jose Maria Casanova Crespo Signed-off-by: Eduardo Lima --- src/intel/compiler/brw_fs_nir.cpp | 48 +++ 1 file cha

[Mesa-dev] [PATCH v5 06/14] i965/fs: Support 16-bit types at load_input and store_output

2018-02-23 Thread Jose Maria Casanova Crespo
Enables the support of 16-bit types on load_input and store_outputs intrinsics intra-stages. The approach was based on re-using the 32-bit URB read and writes between stages, shuffling pairs of 16-bit values into 32-bit values at load_store intrinsic and un-shuffling the values at load_inputs. v2

[Mesa-dev] [PATCH v5 14/14] i965/fs: Enable 16-bit render target write on SKL and CHV

2018-02-23 Thread Jose Maria Casanova Crespo
messages do not support UNIT formats." where UNIT is a typo for UINT. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo Signed-off-by: Eduardo Lima --- src/intel/compiler/brw_fs_nir.cpp | 46 +++ 1 file ch

[Mesa-dev] [PATCH v5 09/14] i965/fs: Include support for SEND data_format bit for Render Targets

2018-02-23 Thread Jose Maria Casanova Crespo
example: on brw_inst.h). Signed-off-by: Jose Maria Casanova Crespo Signed-off-by: Eduardo Lima Signed-off-by: Alejandro Piñeiro --- src/intel/compiler/brw_eu.h | 6 -- src/intel/compiler/brw_eu_emit.c | 25 - src/intel/compiler/brw_fs.c

[Mesa-dev] [PATCH v5 02/14] i965/compiler: includes 16-bit vertex input

2018-02-23 Thread Jose Maria Casanova Crespo
Includes the info about 16-bit vertex inputs coming from nir on brw VS prog data, as we already do with 64-bit vertex input. v2: Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand) --- src/intel/compiler/brw_compiler.h | 1 + src/intel/compiler/brw_vec4.cpp | 1 + 2 files changed, 2

[Mesa-dev] [PATCH v5 05/14] i965/fs: Unpack 16-bit from 32-bit components in VS load_input

2018-02-23 Thread Jose Maria Casanova Crespo
The VS load input for 16-bit values receives pairs of 16-bit values packed in 32-bit values. Because of the adjusted format used at: anv/pipeline: Use 32-bit surface formats for 16-bit formats v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) v3: Fix coding style and typo (Topi Po

[Mesa-dev] [PATCH v5 12/14] i965/fs: 16-bit source payloads always use 1 register

2018-02-23 Thread Jose Maria Casanova Crespo
Render Target Message's payloads for 16bit values fit in only one register. From Intel PRM vol07, page 249 "Render Target Messages" / "Message Data Payloads" "The half precision Render Target Write messages have data payloads that can pack a full SIMD16 payload into 1 register instead of

[Mesa-dev] [PATCH v5 03/14] anv/pipeline: Use 32-bit surface formats for 16-bit formats

2018-02-23 Thread Jose Maria Casanova Crespo
(example: use *R32* for *R16G16*). v2: Always use UINT surface format variants. (Topi Pohjolainen) Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand) Reword commit log (Jason Ekstrand) v3: Rebase minor changes (Chema Casanova) Signed-off-by: Jose Maria Casanova Crespo Signed

[Mesa-dev] [PATCH v5 11/14] i965/fs: Mark 16-bit outputs on FS store_output

2018-02-23 Thread Jose Maria Casanova Crespo
Maria Casanova Crespo Signed-off-by: Eduardo Lima --- src/intel/compiler/brw_fs_nir.cpp | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index 03ee1d1e09..1688a9a3d8 100644 --- a/src

[Mesa-dev] [PATCH v5 13/14] i965/fs: Use half_precision data_format on 16-bit fb writes

2018-02-23 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro --- src/intel/compiler/brw_fs_visitor.cpp | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/intel/compiler/brw_fs_visitor.cpp b/src/intel/compiler/brw_fs_visitor.cpp index 7a5f6451f2..c3bc024095 100644 --- a/src/intel/compiler/brw_fs_visitor.cpp +++ b/src/int

[Mesa-dev] [PATCH v5 10/14] i965/disasm: Show half-precision data_format on rt_writes

2018-02-23 Thread Jose Maria Casanova Crespo
--- src/intel/compiler/brw_disasm.c | 4 1 file changed, 4 insertions(+) diff --git a/src/intel/compiler/brw_disasm.c b/src/intel/compiler/brw_disasm.c index 429ed78140..2def79f1d5 100644 --- a/src/intel/compiler/brw_disasm.c +++ b/src/intel/compiler/brw_disasm.c @@ -1676,6 +1676,10 @@ brw_d

[Mesa-dev] [PATCH v5 01/14] compiler: Mark when input/ouput attribute at VS uses 16-bit

2018-02-23 Thread Jose Maria Casanova Crespo
New shader attribute to mark when a location has 16-bit value. This patch includes support on mesa glsl and nir. v2: Remove use of is_half_slot as is a duplicate of is_16bit (Topi Pohjolainen) Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand) --- src/compiler/glsl_types.h

[Mesa-dev] [PATCH v5 00/14] VK_KHR_16bit_storage input/output support for gen8+

2018-02-23 Thread Jose Maria Casanova Crespo
is in some cases for BSW/CHV. Cc: Jason Ekstrand Cc: Topi Pohjolainen Alejandro Piñeiro (3): anv/pipeline: Use 32-bit surface formats for 16-bit formats anv/cmd_buffer: Add a padding to the vertex buffer i965/fs: Use half_precision data_format on 16-bit fb writes Jose Maria Casanova Cres

[Mesa-dev] [PATCH 4/7] anv: Enable VK_KHR_16bit_storage for SSBO and UBO

2018-02-23 Thread Jose Maria Casanova Crespo
Enables storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccesss features of VK_KHR_16bit_storage for Gen8+. --- src/intel/vulkan/anv_device.c | 5 +++-- src/intel/vulkan/anv_extensions.py | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/src/intel/vulkan/anv_devi

[Mesa-dev] [PATCH 7/7] anv: Enable VK_KHR_16bit_storage for PushConstant

2018-02-23 Thread Jose Maria Casanova Crespo
Enables storagePushConstant16 features of VK_KHR_16bit_storage for Gen8+. --- src/intel/vulkan/anv_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index a7b586c79c..7c8b768c58 100644 --- a/src/intel/vulkan/a

[Mesa-dev] [PATCH 2/7] i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout

2018-02-23 Thread Jose Maria Casanova Crespo
16-bit load_ubo/ssbo operations that call do_untyped_read_vector doesn't guarantee that offsets are multiple of 4-bytes as required by untyped_read message. This happens for example on 16-bit scalar arrays and in the case of f16vec3 when then VK_KHR_relaxed_block_layoud is enabled. Vectors reads w

[Mesa-dev] [PATCH 6/7] spirv/i965/anv: Relax push constant offset assertions being 32-bit aligned

2018-02-23 Thread Jose Maria Casanova Crespo
The introduction of 16-bit types with VK_KHR_16bit_storages implies that push constant offsets could be multiple of 2-bytes. Some assertions are relaxed so offsets can be multiple of 4-bytes or multiple of size of the base type. For 16-bit types, the push constant offset takes into account the int

[Mesa-dev] [PATCH 3/7] i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout

2018-02-23 Thread Jose Maria Casanova Crespo
Restrict the use of untyped_surface_write with 16-bit pairs in ssbo to the cases where we can guarantee that offset is multiple of 4. Taking into account that VK_KHR_relaxed_block_layout is available in ANV we can only guarantee that when we have a constant offset that is multiple of 4. For non co

[Mesa-dev] [PATCH 5/7] spirv: Calculate properly 16-bit vector sizes

2018-02-23 Thread Jose Maria Casanova Crespo
Range in 16-bit push constants load was being calculated wrongly using 4-bytes per element instead of 2-bytes as it should be. --- src/compiler/spirv/vtn_variables.c | 4 1 file changed, 4 insertions(+) diff --git a/src/compiler/spirv/vtn_variables.c b/src/compiler/spirv/vtn_variables.c ind

[Mesa-dev] [PATCH 1/7] anv/spirv: SSBO/UBO buffers needs padding size is not multiple of 32-bits

2018-02-23 Thread Jose Maria Casanova Crespo
The surfaces that backup the GPU buffers have a boundary check that considers that access to partial dwords are considered out-of-bounds. For example is basic 16-bit cases of buffers with size 2 or 6 where the last two bytes will always be read as 0 or its writting ignored. The introduction of 16-

[Mesa-dev] [PATCH 0/7] anv: VK_KHR_16bit_storage enabling SSBO/UBO/PushConstant

2018-02-23 Thread Jose Maria Casanova Crespo
ainen Jose Maria Casanova Crespo (7): anv/spirv: SSBO/UBO buffers needs padding size is not multiple of 32-bits i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout anv: Enable VK_KHR_16bit_storag

[Mesa-dev] [PATCH v4 23/44] i965/fs: Enables 16-bit load_ubo with sampler (v2)

2017-12-05 Thread Jose Maria Casanova Crespo
message that needs one message for each component and is supposed to be slower. v2: (Jason Ekstrand) - Simplify component selection and unshuffling for different bitsizes - Remove SKL optimization of reading only two 32-bit components when reading 16-bits types. Reviewed-by: Jose Maria

[Mesa-dev] [PATCH v4 28/44] i965/fs: Use untyped_surface_read for 16-bit load_ssbo (v2)

2017-12-05 Thread Jose Maria Casanova Crespo
SSBO loads were using byte_scattered read messages as they allow reading 16-bit size components. byte_scattered messages can only operate one component at a time so we needed to emit as many messages as components. But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the untyped_su

[Mesa-dev] [PATCH v4 41/44] i965/fs: Use half_precision data_format on 16-bit fb writes

2017-11-29 Thread Jose Maria Casanova Crespo
From: Alejandro Piñeiro --- src/intel/compiler/brw_fs_visitor.cpp | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/intel/compiler/brw_fs_visitor.cpp b/src/intel/compiler/brw_fs_visitor.cpp index 481d9c51e7..01e75ff7fc 100644 --- a/src/intel/compiler/brw_fs_visitor.cpp +++ b/src/int

[Mesa-dev] [PATCH v4 44/44] anv: Enable VK_KHR_16bit_storage for push_constant

2017-11-29 Thread Jose Maria Casanova Crespo
Enables storagePushConstant16 feature of VK_KHR_16bit_storage for Gen8+. --- src/intel/vulkan/anv_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index 26c0ace1ca..5b6032d794 100644 --- a/src/intel/vulkan/an

[Mesa-dev] [PATCH v4 42/44] i965/fs: Enable 16-bit render target write on SKL and CHV

2017-11-29 Thread Jose Maria Casanova Crespo
messages do not support UNIT formats." where UNIT is a typo for UINT. v2: Removed use of stride = 2 on sources (Jason Ekstrand) Signed-off-by: Jose Maria Casanova Crespo Signed-off-by: Eduardo Lima --- src/intel/compiler/brw_fs_nir.cpp | 46 +++ 1 file ch

[Mesa-dev] [PATCH v4 43/44] i965/fs: Support push constants of 16-bit types

2017-11-29 Thread Jose Maria Casanova Crespo
We enable the use of 16-bit values in push constants modifying the assign_constant_locations function to work with 16-bit types. The API to access buffers in Vulkan use multiples of 4-byte for offsets and sizes. Current accountability of uniforms based on 4-byte slots will work for 16-bit values i

[Mesa-dev] [PATCH v4 39/44] i965/fs: Mark 16-bit outputs on FS store_output

2017-11-29 Thread Jose Maria Casanova Crespo
Maria Casanova Crespo Signed-off-by: Eduardo Lima --- src/intel/compiler/brw_fs_nir.cpp | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/src/intel/compiler/brw_fs_nir.cpp b/src/intel/compiler/brw_fs_nir.cpp index fb138de76a..04d1e3bbf7 100644 --- a/src

[Mesa-dev] [PATCH v4 40/44] i965/fs: 16-bit source payloads always use 1 register

2017-11-29 Thread Jose Maria Casanova Crespo
Render Target Message's payloads for 16bit values fit in only one register. From Intel PRM vol07, page 249 "Render Target Messages" / "Message Data Payloads" "The half precision Render Target Write messages have data payloads that can pack a full SIMD16 payload into 1 register instead of

[Mesa-dev] [PATCH v4 38/44] i965/disasm: Show half-precision data_format on rt_writes

2017-11-29 Thread Jose Maria Casanova Crespo
--- src/intel/compiler/brw_disasm.c | 4 1 file changed, 4 insertions(+) diff --git a/src/intel/compiler/brw_disasm.c b/src/intel/compiler/brw_disasm.c index 1a94ed3954..c752e15331 100644 --- a/src/intel/compiler/brw_disasm.c +++ b/src/intel/compiler/brw_disasm.c @@ -1676,6 +1676,10 @@ brw_d

[Mesa-dev] [PATCH v4 33/44] i965/fs: Unpack 16-bit from 32-bit components in VS load_input

2017-11-29 Thread Jose Maria Casanova Crespo
The VS load input for 16-bit values receives pairs of 16-bit values packed in 32-bit values. Because of the adjusted format used at: anv/pipeline: Use 32-bit surface formats for 16-bit formats v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand) v3: Fix coding style and typo (Topi Po

[Mesa-dev] [PATCH v4 30/44] i965/compiler: includes 16-bit vertex input

2017-11-29 Thread Jose Maria Casanova Crespo
Includes the info about 16-bit vertex inputs coming from nir on brw VS prog data, as we already do with 64-bit vertex input. v2: Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand) --- src/intel/compiler/brw_compiler.h | 1 + src/intel/compiler/brw_vec4.cpp | 1 + 2 files changed, 2

  1   2   >