[Bug target/96342] [SVE] Add support for "omp declare simd"

yangyang305 at huawei dot com via Gcc-bugs Wed, 21 Oct 2020 01:38:31 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342


--- Comment #3 from yangyang <yangyang305 at huawei dot com> ---
Hi,
    Sorry for the slow reply. After studying the specification of SVE "omp
declare simd" and GCC's current implementation of "omp declare simd", I have
developed a rough plan to support GCC to generating SVE functions for "omp
declare simd". However, there are still some uncertainties in the plan which
might need further discussion. 

    The work is mainly composed of three parts: the generating of SVE functions
for "omp declare simd" in pass_omp_simd_clone, the supporting of SVE PCS of
non-builtin types, and the generating of the call of SVE vectoried functions in
pass_vect. I plan to finish this work in the following five steps, each step
corresponds to a patch:

Part 1) Change the type of the field "simdlen" of struct cgraph_simd_clone from
unsigned int to poly_uint64 and related adaptation. Since the length might be
variable for the SVE cases.

PR96342-part1-v1.patch

Part 2) During debugging, I find that all the calls to interface
simd_clone_subparts needing to be replaced with calls to TYPE_VECTOR_SUBPARTS
due to the introduction of SVE simdclones. So I plan to complete all the
replacements in a patch.

PR96342-part1-v2.patch

Part 3) Add the generating of VLA SVE (vector length agnostic, without
"simdlen") functions for "omp declare simd" and skip the VLS (vector length
specific) ones, specifically:

a) In aarch64_simd_clone_compute_vecsize_and_simdlen, add 1 to “count” when
TARGE_SVE is specified.

b) Add bool type field “always_masked” in struct cgraph_simd_clone to mark
simdclones that always masked and skip the generating of noinbranch version
when always_masked is true. In aarch64_simd_clone_compute_vecsize_and_simdlen,
set it to true when processing SVE simdclones.

c) In aarch64_simd_clone_compute_vecsize_and_simdlen, set the “vecsize_mangle”
to ‘s’, and the “vec_bits” to BITS_PER_SVE_VECTOR when processing VLA SVE
simdclones. Report an unsupported warning when processing VLS SVE simdclones.

d) Adjust simd_clone_mangle.

e) Support SVE masking: For SVE vector functions, masked signatures are
generated by add a svbool_t mask (corresponds to a predicate register) as the
last parameter. Since aarch64 GCC currently doesn’t support muti-types
simdclones, the input predicate works for all the types, GCC doesn’t need to do
special adjustment. For now, I plan to follow current scheme, transform the
input predicate into a bool array with [16, 16] elements (since the input
predicate always has a mode of VNx16BImode), and use the active elements to
build the branch, the following gimple stmts are expected to be generated:

MEM <vector([16,16]) <signed-boolean:1>> [(<signed-boolean:1> *)&mask.34] =
mask.37_17(D);
…
_9 = iter.38_6 * 4;
_8 = mask.34[_9];
if (_8 == 0)
…

The number 4 in _9 = iter.38_6 * 4; comes from arg_unit_size / mask_unit_size.
For how to do this, set “clonei->mask_mode” to VNx16BImode when processing SVE
simdclones in aarch64_simd_clone_compute_vecsize_and_simdlen. And when
processing cgraph_simd_clone->mask_mode in common codes, add special treatment
if cgraph_simd_clone->mask_mode != VOIDmode and cgraph_simd_clone->mask_mode is
VECTOR_MODE, which corresponds to the SVE cases (It’s OK to do so since
cgraph_simd_clone->mask_mode != VOIDmode is established only when the mask is
passed in integer argument(s) in current GCC).

f) In pass_expand, only when a “SVE type” attribute is added to the tree nodes
of the types of arguments and return type, these types use the SVE PCS. For
now, GCC only has a mechanism for adding attributes to SVE builtin type, so I
plan to define a new hook to add attribute to the types of arguments and return
type of simdclones generated if needed. The related processing functions are
planned to be moved to aarch64.c from aarch64-sve-builtin.cc in addition.

Part 4) Add the generating of VLS SVE functions for "omp declare simd". The
specification writes: “When using a simdlen(len) clause, the compiler expects a
VLS vector version of the function that is tuned for a specific implementation
of SVE. ”. Therefore I think only when the number of bits in a SVE vector
register of the target is specified and coincides with the simdlen clause, GCC
is supposed to generate the VLS SVE functions for "omp declare simd",
specifically:

a) In aarch64_simd_clone_compute_vecsize_and_simdlen, when processing VLS SVE
simdclones, if the number of bits in an SVE vector register is specified and
coincides with the simdlen clause, set “clonei->vecsize_mangle”,
“clonei->mask_mode”, and “clonei->always_masked” and calculate the “vec_bits”,
otherwise report a warning and return NULL.

b) In this case, the field "simdlen" is a constant, so using build_vector_type
to build the vector type will get an advanced SIMD version instead of a SVE
version, which seems to be wrong. I plan to add a new hook. The hook does some
special treatment to build a SVE version vector type when processing VLS SVE
simdclones, while call build_vector_type directly in other cases.

Part 5) Generate the call of SVE vectoried functions in pass_vect,
specifically:

a) Define a new hook that return true if the target support variable vector
length simdclones and set the aarch64 return value to true if TARGET_SVE. In
vectorizable_simd_clone_call, continue analyzing instead of directly returning
false.

b) Adjustment to the calculation of badness.

c) The generating of mask.

    Since there is still not enough debugging, the detailed implementation
plans of Part 5) b) and Part 5) c) have not been developed yet.

    For now, I’m working on Part 3) and Part 4). I think it’s necessary to
propose the plan to be reviewed and see if there is any suggestion, since there
are many detailed designs that I’m not sure whether they are the best ways to
do so, any comments?

    In addition, I have finished the first two patches and attached them on
this PR. Is it necessary to send the patchs to the GCC patches mailing list for
reviewing?

[Bug target/96342] [SVE] Add support for "omp declare simd"

Reply via email to