URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=2e75d71c1faa737ef3290ff1e9cb4851762fa381
Author: Ian Romanick <[email protected]>
Date: Wed Nov 15 10:48:02 2023 -0800
intel/cmat: Generate better code for nir_intrinsic_cmat_insert
When the source destination index is a constant, we can avoid generating
a lot of the intermediate code. At the very least, this makes initial
NIR dumps much easier to read.
v2: Simplify tracking of dst_index. Suggested by Caio.
Suggested-by: Caio
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=c6d44284aa633569a58200d00015b3e6d80a465a
Author: Ian Romanick <[email protected]>
Date: Wed Aug 2 13:36:33 2023 -0700
intel/dev: Enable VK_KHR_cooperative_matrix on all Gfx9+ GPUs
Gfx12.5 (DG2) will use DPAS instructions to accelerate the
implementation. Earlier platforms will use equivalent discrete
instructions (basically subgroup operations). Gfx12 (Tigerlake) will use
DP4A for 8-bit integer matrix multiplication. Older platforms, which
lack DP4A, will use a suboptimal instruction sequence. There is plenty
of room for improvement here.
On DG2 (Gfx12.5) gets the following results from the CTS:
Test run totals:
Passed: 1642/13982 (11.7%)
Failed: 0/13982 (0.0%)
Not supported: 12340/13982 (88.3%)
Warnings: 0/13982 (0.0%)
Waived: 0/13982 (0.0%)
On DG2 (Gfx12.5) with forced lowering, Raptor Lake (Gfx12) and Ice Lake
(Gfx11):
Test run totals:
Passed: 1662/13982 (11.9%)
Failed: 0/13982 (0.0%)
Not supported: 12320/13982 (88.1%)
Warnings: 0/13982 (0.0%)
Waived: 0/13982 (0.0%)
The difference in the number of tests run is due to
saturatingAccumulation not being set on DG2 when DPAS is used. There is
a comment in "intel/dev: Advertise integer configs with
saturatingAccumulation too" that explains how this could be added should
the need arise.
v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=8ea032b78ee3257fd9398db8b79cdf9ca5ff4a36
Author: Ian Romanick <[email protected]>
Date: Fri Oct 20 18:24:25 2023 -0700
intel/dev: Advertise integer configs with saturatingAccumulation too
VUID-RuntimeSpirv-saturatingAccumulation-08983 says:
For OpCooperativeMatrixMulAddKHR, the SaturatingAccumulation
cooperative matrix operand must be present if and only if
VkCooperativeMatrixPropertiesKHR::saturatingAccumulation is VK_TRUE.
As a result, we have to advertise integer configs both with and without
this flag set.
v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=f952dd510e4e83639f77259baaa61ff25c918305
Author: Ian Romanick <[email protected]>
Date: Tue Aug 1 10:38:14 2023 -0700
anv: Select the SIMD mode very early when cooperative matrices are used
The commit is a little ugly. The definition of anv_fixup_subgroup_size
is moved before the added call site. In addition, the bit starting at
the "Cooperative matrix extension requires..." comment is added.
v2: Dramatic simplification of SIMD selection. Suggested by Caio.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=511f91e307c98326185ec69570b0c6eee2c36cab
Author: Ian Romanick <[email protected]>
Date: Tue Aug 8 09:32:40 2023 -0700
anv: Lower indirect derefs again after lowering cooperative matrices
The cooperative matrix lowering can generate a lot of indirect array
accesses, and these need to be eliminated.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=b741a9a851ca3747aa92ce0d6611b488c6e0e07b
Author: Ian Romanick <[email protected]>
Date: Mon Sep 25 09:16:55 2023 -0700
anv: Set PIPELINE_SELECT systolic mode enable flag
Set the flag on compute shaders when the application has enabled the
cooperative matrix feature. We might still want to enable this only when
DPAS is actually used. The current method is based on many suggestions
from Lionel.
Reviewed-by: Lionel Landwerlin <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=7bfbeb79a75a04c3a7baa0e230a5bd4efa0976c4
Author: Ian Romanick <[email protected]>
Date: Fri Sep 22 16:17:18 2023 -0700
anv: Set COMPUTE_WALKER systolic mode enable flag
Reviewed-by: Lionel Landwerlin <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=67739b02de08e97128673f05bf1a525047873d3e
Author: Ian Romanick <[email protected]>
Date: Mon Oct 30 11:06:24 2023 -0700
anv: Add anv_physical_device::has_cooperative_matrix
This flag tracks whether or not cooperative matrices are fully enabled
on the physica device (i.e., both the configs exist and the environment
varible is set). This is mainly to support a later commit "anv: Set
PIPELINE_SELECT systolic mode enable flag."
This could be squashed into "anv: Implement VK_KHR_cooperative_matrix."
I left it separate because we might go back to the previous method.
v3: Don't hide the extension behind an environment variable
(ANV_COOPERATIVE_MATRIX) now the we have a better solution for setting
PIPELINE_SELECT.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=0a6f8b40bfdf39faaf1ff7def741faf612cf5706
Author: Caio Oliveira <[email protected]>
Date: Tue Jun 13 19:48:16 2023 -0700
anv: Implement VK_KHR_cooperative_matrix
v2: Rebase on moving lowering pass to src/intel/compiler.
v3: Don't hide the extension behind an environment variable
(ANV_COOPERATIVE_MATRIX) now the we have a better solution for setting
PIPELINE_SELECT.
v4: Prefix type names with INTEL_CMAT_. Suggested by Lionel. Also rebase
on f99e43d606e ("anv: switch to use runtime physical device properties
infrastructure").
Reviewed-by: Ian Romanick <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=ff16458478eec50b04190f58802dde5d4d3e99d7
Author: Caio Oliveira <[email protected]>
Date: Fri Jun 16 16:47:45 2023 -0700
intel/dev: Add cooperative matrix configuration information
v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel.
Reviewed-by: Ian Romanick <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=6b14da33ad3aa8a30ed5e479eace8bc6470095a7
Author: Ian Romanick <[email protected]>
Date: Mon Oct 9 13:54:38 2023 -0700
intel/fs: nir: Add nir_intrinsic_dpas_intel
v2: Fix parameter order in nir_intrinsic_dpas_intel to DPAS conversion.
v3: Fix float16 destination DPAS on DG2.
v4: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio.
v5: Rebase on !26323.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=3756f605586fb2dcf53d892606152ecc5ce1ad1d
Author: Ian Romanick <[email protected]>
Date: Tue Oct 10 15:35:46 2023 -0700
intel/fs: DPAS lowering
Implements integer dot product lowering both with and without
DP4A. Implements half-float dot product lowering.
There are a couple FINISHME comments describing future optimizations.
v2: Add a brw_compiler::lower_dpas flag to track when the lowering
should be applied.
v3: Use is_null() instead of checking file != ARF. Suggested by Caio.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=3cb96255397747ecef3f824064ca0afba349c50d
Author: Ian Romanick <[email protected]>
Date: Mon Oct 16 14:22:51 2023 -0700
intel/fs: Fix scoreboarding for DPAS
v2: Remove all mention of DPASW. Suggested by Curro and Caio.
Reviewed-by: Francisco Jerez <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=eb1f19d7bf194574b984033754a301d1407f24d5
Author: Ian Romanick <[email protected]>
Date: Mon Sep 25 17:40:01 2023 -0700
intel/compiler: Validation for DPAS instructions
v2: s/regiser/register/g in messages. Noticed by Caio. Add more context
to the sub-byte precision error message. Suggested by Caio.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=1c92dad5cb7f5d46dfaf56d2f9ce0203c2fbefbe
Author: Ian Romanick <[email protected]>
Date: Mon Oct 9 16:31:41 2023 -0700
intel/disasm: Disassembly support for DPAS
v2: Fix regioning in src[012]_dpas_3src. Noticed by Caio. Treat DPAS as
unordered. Suggested by Curro.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=e666872c751bedd1e4c2e1231644c14ed18639e7
Author: Ian Romanick <[email protected]>
Date: Wed Sep 20 12:42:24 2023 -0700
intel/compiler: Initial bits for DPAS instruction
v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix
overlapping register allocation (via has_source_and_destination_hazard). Fix
incorrect destination register file encoding.
v3: Prevent lower_regioning from trying to "fix" DPAS sources.
v4: Add instruction latency information for scheduling and perf
estimates.
v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update
the comment in fs_inst::has_source_and_destination_hazard. Suggested
by Caio.
v6: Add some comments near the src2 calculation in
fs_inst::size_read. Suggested by Caio.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=3a35f8b29bb9b6a92f98e8bb897bd444a54ca255
Author: Ian Romanick <[email protected]>
Date: Tue Oct 3 11:25:36 2023 -0700
intel/cmat: Lower cmat_load and cmat_store
v2: Add support for non-constant stride.
v3: Explain B matrices (a little bit) in
get_slice_type_from_desc. Suggested by Caio.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=502be565da052e91adfa596945d5d55f7565a203
Author: Ian Romanick <[email protected]>
Date: Fri Jul 21 16:06:48 2023 -0700
intel/cmat: Add lowering for cmat_bitcast
v2: Use nir_component_mask(...) instead of 0xffff. Assert that source
and destination are same size. Both suggested by Caio.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=7303315a8b5d16dc269359e19a8edcee4af99823
Author: Ian Romanick <[email protected]>
Date: Fri Jul 14 11:34:44 2023 -0700
intel/cmat: Enable packed formats for scalar ops
v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_scalar
handling. This saved 13 lines of code.
v3: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=26c4acd8ee58239dadb0dcaf59703c7510ebbb9a
Author: Ian Romanick <[email protected]>
Date: Thu Jul 13 11:08:54 2023 -0700
intel/cmat: Enable packed formats for binary ops
v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_binary
handling. This saved 13 lines of code.
v3: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=0d314eb3ccdbbc9c050c9432ee3713da5a9853c7
Author: Ian Romanick <[email protected]>
Date: Thu Jul 13 11:05:16 2023 -0700
intel/cmat: Enable packed formats for unary, length, and construct
With this, a minimum test case passes:
void main()
{
coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matA;
coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matR;
matA = coopmat<float16_t, gl_ScopeSubgroup, M, N,
gl_MatrixUseA>(2.0);
matR = coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(matA);
coopMatStore(matR, result, 0, N,
gl_CooperativeMatrixLayoutRowMajor);
}
v2: Use nir_vec instead of explicit nir_vec{2,4}. Also fixes a typo in
one of the 4x8 cases.
v3: Use nir_pack_bits and nir_unpack_bits to dramatically simplify
coop_unary handling. This saved 67 lines of code.
v4: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.
v5: Massive update to the comment in lower_cooperative_matrix_unary_op
with some suggestions from Caio. Add a comment and assertion around
`nir_def *v[4]`. Suggested by Caio.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=75388a71c932db7114a6980ef818b6f50236d6f9
Author: Ian Romanick <[email protected]>
Date: Thu Jun 29 18:21:44 2023 -0700
intel/cmat: Add lowering for cmat_insert and cmat_extract
v2: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=a2ded5b26cbaa7ee5f433f046b5f2c559329740e
Author: Ian Romanick <[email protected]>
Date: Wed Jul 12 17:50:17 2023 -0700
intel/cmat: Update get_slice_type for packed slices
Also splits off another funciton get_slice_type_from_desc that will be
used in future commits.
v2: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.
v3: Use glsl_base_type_get_bit_size.
v4: Adjust packing so that a single row fills an entire GRF.
v5: Add comment for get_packing_factor and some other cleanups
there. s/cooperative_matrix/cmat/. Tighten the validation of len in
gt_slice_from_desc. All suggested by Caio.
Reviewed-by: Caio Oliveira <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
URL:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=dba6451ce8113b7f81df95897d666d37ae5b8cee
Author: Caio Oliveira <[email protected]>
Date: Tue Jun 13 19:45:49 2023 -0700
intel/cmat: Add pass to lower cooperative matrix to subgroup operations
This is just the skeleton of the implementation. Future commits will
fill it all in.
v2: Move to src/intel/compiler
v3 (idr): Use vecN instead of array[N] for slice type.
v4 (idr): Refactor lower_cooperative_matrix_load and
lower_cooperative_matrix_store into a single function.
v5 (idr): Remove old, verbose debug logging. Assert that entry is not
NULL in get_coop_type_for_slice. Use nir_component_mask(...) instead of
0xffff. s/cooperative_matrix/cmat/. All suggested by Caio.
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Caio Oliveira <[email protected]>
I put both R-b on this because, at this point, we've each done equal
parts authoring and reviewing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>