https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342
--- Comment #3 from yangyang <yangyang305 at huawei dot com> --- Hi, Sorry for the slow reply. After studying the specification of SVE "omp declare simd" and GCC's current implementation of "omp declare simd", I have developed a rough plan to support GCC to generating SVE functions for "omp declare simd". However, there are still some uncertainties in the plan which might need further discussion. The work is mainly composed of three parts: the generating of SVE functions for "omp declare simd" in pass_omp_simd_clone, the supporting of SVE PCS of non-builtin types, and the generating of the call of SVE vectoried functions in pass_vect. I plan to finish this work in the following five steps, each step corresponds to a patch: Part 1) Change the type of the field "simdlen" of struct cgraph_simd_clone from unsigned int to poly_uint64 and related adaptation. Since the length might be variable for the SVE cases. PR96342-part1-v1.patch Part 2) During debugging, I find that all the calls to interface simd_clone_subparts needing to be replaced with calls to TYPE_VECTOR_SUBPARTS due to the introduction of SVE simdclones. So I plan to complete all the replacements in a patch. PR96342-part1-v2.patch Part 3) Add the generating of VLA SVE (vector length agnostic, without "simdlen") functions for "omp declare simd" and skip the VLS (vector length specific) ones, specifically: a) In aarch64_simd_clone_compute_vecsize_and_simdlen, add 1 to “count” when TARGE_SVE is specified. b) Add bool type field “always_masked” in struct cgraph_simd_clone to mark simdclones that always masked and skip the generating of noinbranch version when always_masked is true. In aarch64_simd_clone_compute_vecsize_and_simdlen, set it to true when processing SVE simdclones. c) In aarch64_simd_clone_compute_vecsize_and_simdlen, set the “vecsize_mangle” to ‘s’, and the “vec_bits” to BITS_PER_SVE_VECTOR when processing VLA SVE simdclones. Report an unsupported warning when processing VLS SVE simdclones. d) Adjust simd_clone_mangle. e) Support SVE masking: For SVE vector functions, masked signatures are generated by add a svbool_t mask (corresponds to a predicate register) as the last parameter. Since aarch64 GCC currently doesn’t support muti-types simdclones, the input predicate works for all the types, GCC doesn’t need to do special adjustment. For now, I plan to follow current scheme, transform the input predicate into a bool array with [16, 16] elements (since the input predicate always has a mode of VNx16BImode), and use the active elements to build the branch, the following gimple stmts are expected to be generated: MEM <vector([16,16]) <signed-boolean:1>> [(<signed-boolean:1> *)&mask.34] = mask.37_17(D); … _9 = iter.38_6 * 4; _8 = mask.34[_9]; if (_8 == 0) … The number 4 in _9 = iter.38_6 * 4; comes from arg_unit_size / mask_unit_size. For how to do this, set “clonei->mask_mode” to VNx16BImode when processing SVE simdclones in aarch64_simd_clone_compute_vecsize_and_simdlen. And when processing cgraph_simd_clone->mask_mode in common codes, add special treatment if cgraph_simd_clone->mask_mode != VOIDmode and cgraph_simd_clone->mask_mode is VECTOR_MODE, which corresponds to the SVE cases (It’s OK to do so since cgraph_simd_clone->mask_mode != VOIDmode is established only when the mask is passed in integer argument(s) in current GCC). f) In pass_expand, only when a “SVE type” attribute is added to the tree nodes of the types of arguments and return type, these types use the SVE PCS. For now, GCC only has a mechanism for adding attributes to SVE builtin type, so I plan to define a new hook to add attribute to the types of arguments and return type of simdclones generated if needed. The related processing functions are planned to be moved to aarch64.c from aarch64-sve-builtin.cc in addition. Part 4) Add the generating of VLS SVE functions for "omp declare simd". The specification writes: “When using a simdlen(len) clause, the compiler expects a VLS vector version of the function that is tuned for a specific implementation of SVE. ”. Therefore I think only when the number of bits in a SVE vector register of the target is specified and coincides with the simdlen clause, GCC is supposed to generate the VLS SVE functions for "omp declare simd", specifically: a) In aarch64_simd_clone_compute_vecsize_and_simdlen, when processing VLS SVE simdclones, if the number of bits in an SVE vector register is specified and coincides with the simdlen clause, set “clonei->vecsize_mangle”, “clonei->mask_mode”, and “clonei->always_masked” and calculate the “vec_bits”, otherwise report a warning and return NULL. b) In this case, the field "simdlen" is a constant, so using build_vector_type to build the vector type will get an advanced SIMD version instead of a SVE version, which seems to be wrong. I plan to add a new hook. The hook does some special treatment to build a SVE version vector type when processing VLS SVE simdclones, while call build_vector_type directly in other cases. Part 5) Generate the call of SVE vectoried functions in pass_vect, specifically: a) Define a new hook that return true if the target support variable vector length simdclones and set the aarch64 return value to true if TARGET_SVE. In vectorizable_simd_clone_call, continue analyzing instead of directly returning false. b) Adjustment to the calculation of badness. c) The generating of mask. Since there is still not enough debugging, the detailed implementation plans of Part 5) b) and Part 5) c) have not been developed yet. For now, I’m working on Part 3) and Part 4). I think it’s necessary to propose the plan to be reviewed and see if there is any suggestion, since there are many detailed designs that I’m not sure whether they are the best ways to do so, any comments? In addition, I have finished the first two patches and attached them on this PR. Is it necessary to send the patchs to the GCC patches mailing list for reviewing?