Issue |
145559
|
Summary |
# Explore options for lowering of vector.contract to FEAT_I8MM, unified between Neon and SVE
|
Labels |
mlir,
mlir:neon,
mlir:sve
|
Assignees |
|
Reporter |
momchil-velikov
|
MLIR contains two patterns for lowering of `vector.contract` to FEAT_I8MM "instructions":
* `LowerContractionToNeonI8MMPattern`
* `LowerContractionToSVEI8MMPattern`
It may be possible and beneficial to develop a unified pattern, able to generate code for either Neon or SVE.
There are some differences in the functionality between the patterns:
* Neon pattern handles "arbitrary"[1] indexing maps, the SVE pattern only the usual "identities + transposed RHS" one.
* The Neon can handle input operands of types `iN, N <= 8`, SVE only handles `i8`.
* In the Neon pattern the constraints on left-hand side, right-hand side, and accumulator/output tiles
are to be evenly breakable into `2x8`, `8x2`, and `2x2` tiles, respectively, plus the support for left-hand side being one-dimensional.
In the SVE pattern the constraints for left-hand side, right-hand side, and accumulator/output are to have shapes `<Mx8>`+, `<8x[N]>`, and
`<Mx[N]>`, respectively, with `M` and `N` even. Notably the `K` dimension is fixed to 8 and only the `N` dimension is allowed to be scalable.
[1] "arbitrary" in the sense it does not impose explicit requirements on the maps and
handles them in a generic manner; however the pattern does not trigger for `<4x8> * <8x4>` with
canonical/textbook matrix multiplication maps whereas it does trigger for `<4x8> * <4x8>` with maps for a
transposed right-hand side. Unclear bug or by design.
(Also indexing maps are not entirely "arbitrary", they need to make sense in the context of `vector.contract`).
Before any unification, it would be nice if the functionality of both patterns converged to a common point.
A. Indexing maps
* Restrict the Neon indexing maps
This is straightforward.
* Support "arbitrary" indexing maps with SVE (are there any variants other than straight and transposed?)
This is bit more involved, but still doable, under the assumption one would need at most one extra transpose op
to accommodate for the data layout expected by the FEAT_I8MM instructions.
B. "Small" integer types (`i4`, `i6`, etc)
* It does not seem reasonable to remove this from Neon.
* Should not be a problem to adds to SVE. May or may not expose the need to add codegen elsewhere (i.e. sign-/zero- extend with scalable vector types)
C. Input/output shapes
This need a lot of thought. The restriction `K == 8` is fairly fundamental to the SVE pattern and provides a number of adjacency guarantees (in the context FEAT_I8MM). It won't be easy to lift that restriction.
In the context of tiled matrix-multiplication (where the operands to the `vector.contract` do not represent the whole matrix, but just tiles of a bigger one) the ability to have tile dimensions many multiples of 8 is unlikely to be very valuable - even a 8x8 output tile would require 16 SIMD registers - bigger tiles may exceed the number of available registers and introduce spills in something that is likely to be an inner loop.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs