---
src/compiler/glsl/ir.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
index e09f053b77c..c3f5f1f7b05 100644
--- a/src/compiler/glsl/ir.h
+++ b/src/compiler/glsl/ir.h
@@ -773,17 +773,17 @@ public:
unsigned is_xfb_
Fixes:
KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage
Cc: mesa-sta...@lists.freedesktop.org
---
I think this patch and the previous one should be squashed or interchange
the order before landing. I'm sending splitted because it allows exposing
the incorrect behavio
Recent change on OpenGL CTS ("Use non-arrayed varying name for TCS blocks")
on KHR-GL*.tessellation_shader.single.xfb_captures_data_from_correct_stage
tests changed how to name per-vertex Tessellation Control Shader output
varyings in transform feedback using interface block as "BLOCK_INOUT.value"
These new methods return for a instruction register source/destination
the read/write byte pattern of the 32-byte GRF as an unsigned int.
The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the register stride and a relative offset inside the
register.
The
These new methods return for a instruction register source/destination
the read/write byte pattern of the 32-byte GRF as an unsigned int.
The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the register stride and a relative offset inside the
register.
The
From: Iago Toral Quiroga
---
src/intel/compiler/brw_fs_nir.cpp | 5 +
1 file changed, 5 insertions(+)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index 2c8595b9730..6e9a5829d3b 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/br
New VK-GL-CTS tests that use VK_KHR_8bit_storage extension use
32-bit constants that are converted to 8-bit and there are stored in a
storage buffer.
Although 8-bit constants are not enabled by VK_KHR_8bit_storage
nir_opt_constant_folding already optimizes the 32 -> 8 integer
conversion to a 8-bit
From: Iago Toral Quiroga
The hardware doesn't support byte immediates, so similar to setup_imm_df()
for doubles, these helpers work by loading the constant value into a
VGRF.
---
src/intel/compiler/brw_fs.h | 6 ++
src/intel/compiler/brw_fs_nir.cpp | 16
2 files chang
We also pack in the same byte_scattered_write message the maximum
number of 8/16-bit components.
Comments have been rewritten to adapt them to the 8-bit case.
---
src/intel/compiler/brw_fs_nir.cpp | 66 ++-
1 file changed, 38 insertions(+), 28 deletions(-)
diff --git
We used the byte_scattered_read message because it allows to read from
non aligned 32-bit offsets. We were reading one component for each
message.
Using a 32-bit bitsize read at byte_scattered_read we can read up to two
16-bit components or four 8-bit components with only one message per
iteration
These new methods return for a instruction register source/destination
the read/write byte pattern of the 32-byte GRF as an unsigned int.
The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the register stride and a relative offset inside the
register.
The
We use the information of the registers read/write patterns
to improve variable liveness analysis avoiding extending the
liveness range of a variable to the beginning of the block so
it always reaches the beginning of the shader.
This optimization analyses inside each block if a partial write
defi
These new methods return for a instruction register source/destination
the read/write byte pattern of the 32-byte GRF as an unsigned int.
The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the register stride and a relative offset inside the
register.
The
We use the information of the registers read/write patterns
to improve variable liveness analysis avoiding extending the
liveness range of a variable to the beginning of the block so
it always reaches the beginning of the shader.
This optimization analyses inside each block that if a partial write
For a register source/destination of an instruction the function returns
the read/write byte pattern of a 32-byte registers as a unsigned int.
The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the stride and if the register is source or destination.
The o
ound yet a case where I
see any improvements in the generated code and I have still pending to
deal with an important increase in compilation time in my WIP solution.
Jose Maria Casanova Crespo (2):
intel/fs: New method for register_byte_use_pattern for fs_inst
intel/fs: Improve liveness range c
At 232ed8980217dd65ab0925df28156f565b94b2e5 "i965/fs: Register allocator
shoudn't use grf127 for sends dest" we didn't take into account the case
of SEND instructions that are not send_from_grf. But since Gen7+ although
the backend still uses MRFs internally for sends they are finally asigned
to a
Enables SPV_KHR_8bit_storage and VK_KHR_8bit_storage on gen 8+
using the VK_KHR_get_physical_device_properties2 functionality
to expose if the extension is supported or not.
Reviewed-by: Jason Ekstrand
---
src/intel/vulkan/anv_device.c | 11 +++
src/intel/vulkan/anv_extensions.py |
Reviewed-by: Jason Ekstrand
---
src/compiler/shader_info.h| 1 +
src/compiler/spirv/spirv_to_nir.c | 6 ++
2 files changed, 7 insertions(+)
diff --git a/src/compiler/shader_info.h b/src/compiler/shader_info.h
index 8c58ee285ec..3b95d5962c0 100644
--- a/src/compiler/shader_info.h
+++
v2: Update comment according to this patch. (Jason Ekstrand)
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_fs_nir.cpp | 15 ---
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index 4155b2ed
Update to headers and grammar to ff684ffc6a35d2a58f0f63108877d0064ea33feb
---
src/compiler/spirv/spirv.core.grammar.json | 44 ++
src/compiler/spirv/spirv.h | 3 ++
2 files changed, 40 insertions(+), 7 deletions(-)
diff --git a/src/compiler/spirv/spirv.core.gr
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_fs_nir.cpp | 11 ++-
src/intel/compiler/brw_nir.c | 4
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index 02ac92e62f1..83ed9575f80 100
Implement at brw_eu_validate the restriction from Intel Broadwell PRM,
vol 07, section "Instruction Set Reference", subsection "EUISA
Instructions", Send Message (page 990):
"r127 must not be used for return address when there is a src and
dest overlap in send instruction."
v2: Style fixes (Matt
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/brw_fs_nir.cpp | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index 83ed9575f80..4155b2ed996 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw
Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."
This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.
For vgrf that ar
:
dEQP-VK.spirv_assembly.instruction.*.8bit_storage.*
Jose Maria Casanova Crespo (9):
intel/compiler: grf127 can not be dest when src and dest overlap in
send
i965/fs: Register allocator shoudn't use grf127 for sends dest
i965: Support for 8-bit base types in helper functions
When the destination is a BYTE type allow raw movs
even if the stride is not exact multiple of destination
type and exec type, execution type is Word and its size is 2.
This restriction was only allowing stride==2 destinations
for 8-bit types.
Reviewed-by: Jason Ekstrand
---
src/intel/compiler/
Running VK-CTS in batch execution mode was raising the
VK_ERROR_INITIALIZATION_FAILED error in multiple tests. But when the
same failing tests were run isolated they always passed.
createDevice and destroyDevice were called before and after every
tests. Because the binding_table_pool was never clo
This new function takes care of shuffle/unshuffle components of a
particular bit-size in components with a different bit-size.
If source type size is smaller than destination type size the operation
needed is a component shuffle. The opposite case would be an unshuffle.
Component units are measur
This helps us to compact original instruction:
mul(8) g3<1>D g6<8,8,1>UD 0x0006UD { align1 1Q };
So now we emit:
mul(8) g3<1>UD g6<8,8,1>UD 0x0006UD { align1 1Q compacted };
---
src/intel/compiler/brw_fs_visitor.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
---
src/intel/compiler/brw_fs_nir.cpp | 13 ++---
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index 2521f3c001b..833fad4247a 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_
---
src/intel/compiler/brw_fs.h | 4
src/intel/compiler/brw_fs_nir.cpp | 32 ---
2 files changed, 36 deletions(-)
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 1f86f17ccbb..17b1368d522 100644
--- a/src/intel/compiler/brw_fs.h
shuffle_from_32bit_read manages 32-bit reads to 32-bit destination
in the same way that the previous loop so now we just call the new
function for all bitsizes, simplifying also the 64-bit load_input.
---
src/intel/compiler/brw_fs_nir.cpp | 12 ++--
1 file changed, 2 insertions(+), 10 dele
Previously, the shuffle function had a source/destination overlap that
needs to be avoided to use shuffle_from_32bit_read. As we can use for
the shuffle destination the destination of removed MOVs.
This change also avoids the internal MOVs done by the previous shuffle
to deal with possible overlap
---
src/intel/compiler/brw_fs.h | 5 ---
src/intel/compiler/brw_fs_nir.cpp | 53 ---
2 files changed, 58 deletions(-)
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index d72164ae0b6..1f86f17ccbb 100644
--- a/src/intel/compiler/brw_fs.h
+
As the previous use of shuffle_32bit_load_result_to_64bit_data
had a source/destination overlap for 64-bit. Now a temporal destination
is used for 64-bit cases to use shuffle_from_32bit_read that doesn't
handle src/dst overlaps.
---
src/intel/compiler/brw_fs_nir.cpp | 8
1 file changed, 4
Using shuffle_from_32bit_read instead of 16-bit shuffle functions
avoids the need of retype. At the same time new function are
ready for 8-bit type SSBO reads.
---
src/intel/compiler/brw_fs_nir.cpp | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/src/intel/compiler/brw_fs_
---
src/intel/compiler/brw_fs.h | 11 --
src/intel/compiler/brw_fs_nir.cpp | 62 ---
2 files changed, 73 deletions(-)
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 779170ecc95..d72164ae0b6 100644
--- a/src/intel/compiler/brw_fs.
This new function takes care of shuffle/unshuffle components of a
particular bit-size in components with a different bit-size.
If source type size is smaller than destination type size the operation
needed is a component shuffle. The opposite case would be an unshuffle.
The operation allows to sk
shuffle_from_32bit_read can manage the shuffle/unshuffle needed
for different 8/16/32/64 bit-sizes at VARYING PULL CONSTANT LOAD.
To get the specific component the first_component parameter is used.
In the case of the previous 16-bit shuffle, the shuffle operation was
generating not needed MOVs wh
do_untyped_vector_read is used at load_ssbo and load_shared.
The previous MOVs are removed because shuffle_from_32bit_read
can handle storing the shuffle results in the expected destination
just using the proper offset.
---
src/intel/compiler/brw_fs_nir.cpp | 12 ++--
1 file changed, 2 in
This implementation avoids two unneeded MOVs for each 64-bit
component. One was done in the old shuffle, to avoid cases of
src/dst overlap but this is not the case. And the removed MOV
was already being being done in the shuffle.
Copy propagation wasn't able to remove them because shuffle
destinat
---
src/intel/compiler/brw_fs_nir.cpp | 7 ++-
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index ef7895262b8..a54935f7049 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.
as
Cc: Iago Toral
Jose Maria Casanova Crespo (14):
intel/compiler: general 8/16/32/64-bit shuffle_src_to_dst function
intel/compiler: new shuffle_for_32bit_write and shuffle_from_32bit_read
intel/compiler: use shuffle_from_32bit_read at VARYING_PULL_CONSTANT_LOAD
intel/c
These new shuffle functions deal with the shuffle/unshuffle operations
needed for read/write operations using 32-bit components when the
read/written components have a different bit-size (8, 16, 64-bits).
Shuffle from 32-bit to 32-bit becomes a simple MOV.
As the new function shuffle_src_to_dst ta
---
src/intel/compiler/brw_shader.cpp | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/intel/compiler/brw_shader.cpp
b/src/intel/compiler/brw_shader.cpp
index 284c2e8233c..537defd05d9 100644
--- a/src/intel/compiler/brw_shader.cpp
+++ b/src/intel/compiler/brw_shader.c
From Intel Skylake PRM, vol 07, "Immediate" section (page 768):
"For a word, unsigned word, or half-float immediate data,
software must replicate the same 16-bit immediate value to both
the lower word and the high word of the 32-bit immediate field
in a GEN instruction."
This fixes the int16/uint
16-bit immediates need to replicate the 16-bit immediate value
in both words of the 32-bit value. This needs to be careful
to avoid sign-extension, which the previous implementation was
not handling properly.
For example, with the previous implementation, storing the value
-3 would generate imm.d
Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."
This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.
For vgrf that ar
All operations with offset_reg at do_vector_read are done
with UD type. So copy propagation was not working through
the generated MOVs:
mov(8) vgrf9:UD, vgrf7:D
This change allows removing the MOV generated for reading the
first components for 16-bit and 64-bit ssbo reads with
non-constant offset
Since Gen8+ Intel PRM states that "r127 must not be used for
return address when there is a src and dest overlap in send
instruction."
This patch implements this restriction creating new register allocator
classes that are copies of the normal classes. These new classes
exclude in their set of re
Implement at brw_eu_validate the restriction from Intel Broadwell PRM, vol 07,
section "Instruction Set Reference", subsection "EUISA Instructions", Send
Message (page 990):
"r127 must not be used for return address when there is a src and dest overlap
in send instruction."
Cc: Jason Ekstrand
Cc
---
src/compiler/nir/nir_search.c | 15 +++
1 file changed, 15 insertions(+)
diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
index c7c52ae320d..28b36b2b863 100644
--- a/src/compiler/nir/nir_search.c
+++ b/src/compiler/nir/nir_search.c
@@ -525,6 +525,9 @@ con
16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't
guarantee that offsets are multiple of 4-bytes as required by untyped_read
message. This happens for example in the case of f16mat3x3 when then
VK_KHR_relaxed_block_layout is enabled.
Vectors reads when we have non-constant off
The introduction of 16-bit types with VK_KHR_16bit_storages implies that
push constant offsets could be multiple of 2-bytes. Some assertions are
updated so offsets should be just multiple of size of the base type but
in some cases we can not assume it as doubles aren't aligned to 8 bytes
in some ca
Enables storagePushConstant16 features of VK_KHR_16bit_storage for Gen8+.
---
src/intel/vulkan/anv_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index a7b586c79c7..7c8b768c589 100644
--- a/src/intel/vulkan
Range in 16-bit push constants load was being calculated
wrongly using 4-bytes per element instead of 2-bytes as it
should be.
v2: Use glsl_get_bit_size instead of if statement
(Jason Ekstrand)
Reviewed-by: Jason Ekstrand
---
src/compiler/spirv/vtn_variables.c | 7 ++-
1 file changed, 2
This helper used to load 16bit components from 32-bits read now allows
skipping components with the new parameter first_component. The semantics
now skip components until we reach the first_component, and then reads the
number of components passed to the function.
All previous uses of the helper a
Enables storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccesss
features of VK_KHR_16bit_storage for Gen8+.
---
src/intel/vulkan/anv_device.c | 5 +++--
src/intel/vulkan/anv_extensions.py | 2 +-
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/src/intel/vulkan/anv_devi
The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example, buffers with 1,3 16-bit elements has size 2 or 6 and the
last two bytes would always be read as 0 or its writting ignored.
The introduction of 16-b
both series has been force-pushed at [2]
[1] https://lists.freedesktop.org/archives/mesa-dev/2018-February/186544.html
[2] https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc5
Cc: Jason Ekstrand
Jose Maria Casanova Crespo (8):
isl/i965/fs: SSBO/UBO buffers need size padding i
16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't
guarantee that offsets are multiple of 4-bytes as required by untyped_read
message. This happens for example in the case of f16mat3x3 when then
VK_KHR_relaxed_block_layout is enabled.
Vectors reads when we have non-constant off
Restrict the use of untyped_surface_write with 16-bit pairs in
ssbo to the cases where we can guarantee that offset is multiple
of 4.
Taking into account that VK_KHR_relaxed_block_layout is available
in ANV we can only guarantee that when we have a constant offset
that is multiple of 4. For non co
The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
last two bytes would always be read as 0 or its writting ignored.
The introduction of 16-bi
The introduction of 16-bit types with VK_KHR_16bit_storages implies that
push constant offsets could be multiple of 2-bytes. Some assertions are
updated so offsets should be just multiple of size of the base type but
in some cases we can not assume it as doubles aren't aligned to 8 bytes
in some ca
Range in 16-bit push constants load was being calculated
wrongly using 4-bytes per element instead of 2-bytes as it
should be.
v2: Use glsl_get_bit_size instead of if statement
(Jason Ekstrand)
---
src/compiler/spirv/vtn_variables.c | 7 ++-
1 file changed, 2 insertions(+), 5 deletions(-)
half_inputs_read to inputs_read_16bit.
v3: Rebase minor changes (Chema Casanova)
Signed-off-by: Jose Maria Casanova Crespo
Signed-off-by: Alejandro Piñeiro
---
src/intel/vulkan/anv_device.c | 9 +
src/intel/vulkan/genX_cmd_buffer.c | 20 ++--
2 files changed, 27
Enables storageInputOutput16 feature of VK_KHR_16bit_storage
for Gen8+.
---
src/intel/vulkan/anv_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 1756cf5324..c183ea8437 100644
--- a/src/intel/vulkan/anv
ld be packed (Jason Ekstrand)
Remove not necessary alignment operation for 16-bit to
32-bit conversion (Chema Casanova)
Signed-off-by: Jose Maria Casanova Crespo
Signed-off-by: Eduardo Lima
---
src/intel/compiler/brw_fs_nir.cpp | 48 +++
1 file cha
Enables the support of 16-bit types on load_input and
store_outputs intrinsics intra-stages.
The approach was based on re-using the 32-bit URB read
and writes between stages, shuffling pairs of 16-bit values into
32-bit values at load_store intrinsic and un-shuffling the values
at load_inputs.
v2
messages do not support UNIT
formats." where UNIT is a typo for UINT.
v2: Removed use of stride = 2 on sources (Jason Ekstrand)
Signed-off-by: Jose Maria Casanova Crespo
Signed-off-by: Eduardo Lima
---
src/intel/compiler/brw_fs_nir.cpp | 46 +++
1 file ch
example: on
brw_inst.h).
Signed-off-by: Jose Maria Casanova Crespo
Signed-off-by: Eduardo Lima
Signed-off-by: Alejandro Piñeiro
---
src/intel/compiler/brw_eu.h | 6 --
src/intel/compiler/brw_eu_emit.c | 25 -
src/intel/compiler/brw_fs.c
Includes the info about 16-bit vertex inputs coming from nir on brw VS
prog data, as we already do with 64-bit vertex input.
v2: Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand)
---
src/intel/compiler/brw_compiler.h | 1 +
src/intel/compiler/brw_vec4.cpp | 1 +
2 files changed, 2
The VS load input for 16-bit values receives pairs of 16-bit values
packed in 32-bit values. Because of the adjusted format used at:
anv/pipeline: Use 32-bit surface formats for 16-bit formats
v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)
v3: Fix coding style and typo (Topi Po
Render Target Message's payloads for 16bit values fit in only one
register.
From Intel PRM vol07, page 249 "Render Target Messages" / "Message
Data Payloads"
"The half precision Render Target Write messages have data payloads
that can pack a full SIMD16 payload into 1 register instead of
(example: use *R32* for *R16G16*).
v2: Always use UINT surface format variants. (Topi Pohjolainen)
Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand)
Reword commit log (Jason Ekstrand)
v3: Rebase minor changes (Chema Casanova)
Signed-off-by: Jose Maria Casanova Crespo
Signed
Maria Casanova Crespo
Signed-off-by: Eduardo Lima
---
src/intel/compiler/brw_fs_nir.cpp | 25 ++---
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index 03ee1d1e09..1688a9a3d8 100644
--- a/src
From: Alejandro Piñeiro
---
src/intel/compiler/brw_fs_visitor.cpp | 6 ++
1 file changed, 6 insertions(+)
diff --git a/src/intel/compiler/brw_fs_visitor.cpp
b/src/intel/compiler/brw_fs_visitor.cpp
index 7a5f6451f2..c3bc024095 100644
--- a/src/intel/compiler/brw_fs_visitor.cpp
+++ b/src/int
---
src/intel/compiler/brw_disasm.c | 4
1 file changed, 4 insertions(+)
diff --git a/src/intel/compiler/brw_disasm.c b/src/intel/compiler/brw_disasm.c
index 429ed78140..2def79f1d5 100644
--- a/src/intel/compiler/brw_disasm.c
+++ b/src/intel/compiler/brw_disasm.c
@@ -1676,6 +1676,10 @@ brw_d
New shader attribute to mark when a location has 16-bit
value. This patch includes support on mesa glsl and nir.
v2: Remove use of is_half_slot as is a duplicate of is_16bit
(Topi Pohjolainen)
Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand)
---
src/compiler/glsl_types.h
is in some cases for BSW/CHV.
Cc: Jason Ekstrand
Cc: Topi Pohjolainen
Alejandro Piñeiro (3):
anv/pipeline: Use 32-bit surface formats for 16-bit formats
anv/cmd_buffer: Add a padding to the vertex buffer
i965/fs: Use half_precision data_format on 16-bit fb writes
Jose Maria Casanova Cres
Enables storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccesss
features of VK_KHR_16bit_storage for Gen8+.
---
src/intel/vulkan/anv_device.c | 5 +++--
src/intel/vulkan/anv_extensions.py | 2 +-
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/src/intel/vulkan/anv_devi
Enables storagePushConstant16 features of VK_KHR_16bit_storage for Gen8+.
---
src/intel/vulkan/anv_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index a7b586c79c..7c8b768c58 100644
--- a/src/intel/vulkan/a
16-bit load_ubo/ssbo operations that call do_untyped_read_vector doesn't
guarantee that offsets are multiple of 4-bytes as required by untyped_read
message. This happens for example on 16-bit scalar arrays and in the case
of f16vec3 when then VK_KHR_relaxed_block_layoud is enabled.
Vectors reads w
The introduction of 16-bit types with VK_KHR_16bit_storages implies that
push constant offsets could be multiple of 2-bytes. Some assertions are
relaxed so offsets can be multiple of 4-bytes or multiple of size of the
base type.
For 16-bit types, the push constant offset takes into account the
int
Restrict the use of untyped_surface_write with 16-bit pairs in
ssbo to the cases where we can guarantee that offset is multiple
of 4.
Taking into account that VK_KHR_relaxed_block_layout is available
in ANV we can only guarantee that when we have a constant offset
that is multiple of 4. For non co
Range in 16-bit push constants load was being calculated
wrongly using 4-bytes per element instead of 2-bytes as it
should be.
---
src/compiler/spirv/vtn_variables.c | 4
1 file changed, 4 insertions(+)
diff --git a/src/compiler/spirv/vtn_variables.c
b/src/compiler/spirv/vtn_variables.c
ind
The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example is basic 16-bit cases of buffers with size 2 or 6 where the
last two bytes will always be read as 0 or its writting ignored.
The introduction of 16-
ainen
Jose Maria Casanova Crespo (7):
anv/spirv: SSBO/UBO buffers needs padding size is not multiple of
32-bits
i965/fs: Support 16-bit do_read_vector with
VK_KHR_relaxed_block_layout
i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout
anv: Enable VK_KHR_16bit_storag
message
that needs one message for each component and is supposed to be
slower.
v2: (Jason Ekstrand)
- Simplify component selection and unshuffling for different bitsizes
- Remove SKL optimization of reading only two 32-bit components when
reading 16-bits types.
Reviewed-by: Jose Maria
SSBO loads were using byte_scattered read messages as they allow
reading 16-bit size components. byte_scattered messages can only
operate one component at a time so we needed to emit as many messages
as components.
But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
untyped_su
From: Alejandro Piñeiro
---
src/intel/compiler/brw_fs_visitor.cpp | 6 ++
1 file changed, 6 insertions(+)
diff --git a/src/intel/compiler/brw_fs_visitor.cpp
b/src/intel/compiler/brw_fs_visitor.cpp
index 481d9c51e7..01e75ff7fc 100644
--- a/src/intel/compiler/brw_fs_visitor.cpp
+++ b/src/int
Enables storagePushConstant16 feature of VK_KHR_16bit_storage
for Gen8+.
---
src/intel/vulkan/anv_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 26c0ace1ca..5b6032d794 100644
--- a/src/intel/vulkan/an
messages do not support UNIT
formats." where UNIT is a typo for UINT.
v2: Removed use of stride = 2 on sources (Jason Ekstrand)
Signed-off-by: Jose Maria Casanova Crespo
Signed-off-by: Eduardo Lima
---
src/intel/compiler/brw_fs_nir.cpp | 46 +++
1 file ch
We enable the use of 16-bit values in push constants
modifying the assign_constant_locations function to work
with 16-bit types.
The API to access buffers in Vulkan use multiples of 4-byte for
offsets and sizes. Current accountability of uniforms based on 4-byte
slots will work for 16-bit values i
Maria Casanova Crespo
Signed-off-by: Eduardo Lima
---
src/intel/compiler/brw_fs_nir.cpp | 25 ++---
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/src/intel/compiler/brw_fs_nir.cpp
b/src/intel/compiler/brw_fs_nir.cpp
index fb138de76a..04d1e3bbf7 100644
--- a/src
Render Target Message's payloads for 16bit values fit in only one
register.
From Intel PRM vol07, page 249 "Render Target Messages" / "Message
Data Payloads"
"The half precision Render Target Write messages have data payloads
that can pack a full SIMD16 payload into 1 register instead of
---
src/intel/compiler/brw_disasm.c | 4
1 file changed, 4 insertions(+)
diff --git a/src/intel/compiler/brw_disasm.c b/src/intel/compiler/brw_disasm.c
index 1a94ed3954..c752e15331 100644
--- a/src/intel/compiler/brw_disasm.c
+++ b/src/intel/compiler/brw_disasm.c
@@ -1676,6 +1676,10 @@ brw_d
The VS load input for 16-bit values receives pairs of 16-bit values
packed in 32-bit values. Because of the adjusted format used at:
anv/pipeline: Use 32-bit surface formats for 16-bit formats
v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)
v3: Fix coding style and typo (Topi Po
Includes the info about 16-bit vertex inputs coming from nir on brw VS
prog data, as we already do with 64-bit vertex input.
v2: Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand)
---
src/intel/compiler/brw_compiler.h | 1 +
src/intel/compiler/brw_vec4.cpp | 1 +
2 files changed, 2
1 - 100 of 185 matches
Mail list logo