Richard Sandiford <richard.sandif...@arm.com> writes: > Richard Biener <richard.guent...@gmail.com> writes: >> On December 14, 2019 11:43:48 AM GMT+01:00, Richard Sandiford >> <richard.sandif...@arm.com> wrote: >>>Richard Biener <richard.guent...@gmail.com> writes: >>>> On December 13, 2019 10:12:40 AM GMT+01:00, Richard Sandiford >>><richard.sandif...@arm.com> wrote: >>>>>Richard Biener <richard.guent...@gmail.com> writes: >>>>>>>>>The AArch64 port emits an error if calls pass values of SVE type >>>to >>>>>>>an >>>>>>>>>unprototyped function. To do that we need to know whether the >>>>>value >>>>>>>>>really is an SVE type rathr than a plain vector. >>>>>>>>> >>>>>>>>>For varags the ABI is the same for 256 bits+. But we'll have the >>>>>>>>>same problem there once we support -msve-vector-bits=128, since >>>the >>>>>>>>>layout of SVE and Advanced SIMD vectors differ for big-endian. >>>>>>>> >>>>>>>> But then why don't you have different modes? >>>>>>> >>>>>>>Yeah, true, modes will probably help for the Advanced SIMD/SVE >>>>>>>difference. But from a vector value POV, a vector of 4 ints is a >>>>>>>vector >>>>>>>of 4 ints, so even distinguishing based on the mode is artificial. >>>>>> >>>>>> True. >>>>>> >>>>>>>SVE is AFAIK the first target to have different modes for >>>potentially >>>>>>>the "same" vector type, and I had to add new infrastructure to >>>allow >>>>>>>targets to define multiple modes of the same size. So the fact >>>that >>>>>>>gimple distinguishes otherwise identical vectors based on mode is a >>>>>>>relatively recent thing. AFAIK it just fell out in the wash rather >>>>>>>than being deliberately planned. It happens to be convenient in >>>this >>>>>>>context, but it hasn't been important until now. >>>>>>> >>>>>>>The hook doesn't seem any worse than distinguishing based on the >>>>>mode. >>>>>>>Another way to avoid this would have been to define separate SVE >>>>>modes >>>>>>>for the predefined vectors. The big downside of that is that we'd >>>>>end >>>>>>>up doubling the number of SVE patterns. >>>>>>> >>>>>>>Extra on-the-side metadata is going to be easy to drop >>>accidentally, >>>>>>>and this is something we need for correctness rather than >>>>>optimisation. >>>>>> >>>>>> Still selecting the ABI during call expansion only and based on >>>>>values types at that point is fragile. >>>>> >>>>>Agreed. But it's fragile in general, not just for this case. >>>Changing >>>>>something as fundamental as that would be a lot of work and seems >>>>>likely >>>>>to introduce accidental ABI breakage. >>>>> >>>>>> The frontend are in charge of specifying the actual argument type >>>and >>>>>> at that point the target may fix the ABI. The ABI can be recorded >>>in >>>>>> the calls fntype, either via its TYPE_ARG_TYPES or in more awkward >>>>>> ways for varargs functions (in full generality that would mean >>>>>> attaching varargs ABI meta to each call). >>>>>> >>>>>> The alternative is to have an actual argument type vector >>>associated >>>>>> with each call. >>>>> >>>>>I think multiple pieces of gimple code would then have to cope with >>>>>that >>>>>as a special case. E.g. if: >>>>> >>>>> void foo (int, ...); >>>>> >>>>> type1 a; >>>>> b = VIEW_CONVERT_EXPR<type2> (a); >>>>> if (a) >>>>> foo (1, a); >>>>> else >>>>> foo (1, b); >>>>> >>>>>gets converted to: >>>>> >>>>> if (a) >>>>> foo (1, a); >>>>> else >>>>> foo (1, a); >>>>> >>>>>on the basis that type1 and type2 are "the same" despite having >>>>>different calling conventions, we have to be sure that the calls >>>>>are not treated as equivalent: >>>>> >>>>> foo (1, a); >>>>> >>>>>Things like IPA clones would also need to handle this specially. >>>>>Anything that generates new calls based on old ones will need >>>>>to copy this information too. >>>>> >>>>>This also sounds like it would be fragile and seems a bit too >>>>>invasive for stage 3. >>>> >>>> But we are already relying on this to work (fntype non-propagation) >>>because function pointer conversions are dropped on the floor. >>>> >>>> The real change would be introducing (per call) fntype for calls to >>>unprototyped functions and somehow dealing with varargs. >>> >>>It looks like this itself relies on useless_type_conversion_p, >>>is that right? E.g. we have things like: >>> >>>bool >>>func_checker::compare_gimple_call (gcall *s1, gcall *s2) >>>{ >>> ... >>> tree fntype1 = gimple_call_fntype (s1); >>> tree fntype2 = gimple_call_fntype (s2); >>> if ((fntype1 && !fntype2) >>> || (!fntype1 && fntype2) >>> || (fntype1 && !types_compatible_p (fntype1, fntype2))) >>>return return_false_with_msg ("call function types are not >>>compatible"); >>> >>>and useless_type_conversion_p has: >>> >>> else if ((TREE_CODE (inner_type) == FUNCTION_TYPE >>> || TREE_CODE (inner_type) == METHOD_TYPE) >>> && TREE_CODE (inner_type) == TREE_CODE (outer_type)) >>> { >>> tree outer_parm, inner_parm; >>> >>> /* If the return types are not compatible bail out. */ >>> if (!useless_type_conversion_p (TREE_TYPE (outer_type), >>> TREE_TYPE (inner_type))) >>> return false; >>> >>> /* Method types should belong to a compatible base class. */ >>> if (TREE_CODE (inner_type) == METHOD_TYPE >>> && !useless_type_conversion_p (TYPE_METHOD_BASETYPE (outer_type), >>> TYPE_METHOD_BASETYPE (inner_type))) >>> return false; >>> >>> /* A conversion to an unprototyped argument list is ok. */ >>> if (!prototype_p (outer_type)) >>> return true; >>> >>> /* If the unqualified argument types are compatible the conversion >>> is useless. */ >>> if (TYPE_ARG_TYPES (outer_type) == TYPE_ARG_TYPES (inner_type)) >>> return true; >>> >>> for (outer_parm = TYPE_ARG_TYPES (outer_type), >>> inner_parm = TYPE_ARG_TYPES (inner_type); >>> outer_parm && inner_parm; >>> outer_parm = TREE_CHAIN (outer_parm), >>> inner_parm = TREE_CHAIN (inner_parm)) >>> if (!useless_type_conversion_p >>> (TYPE_MAIN_VARIANT (TREE_VALUE (outer_parm)), >>> TYPE_MAIN_VARIANT (TREE_VALUE (inner_parm)))) >>> return false; >>> >>>So it looks like we'd still need to distinguish the vector types in >>>useless_type_conversion_p even if we went the fntype route. The >>>difference >>>is that the fntype route would give us the option of only >>>distinguishing >>>the vectors for return and argument types and not in general. >>> >>>But if we are going to have to distinguish the vectors here anyway >>>in some form, could we go with the patch as-is for stage 3 and leave >>>restricting this to just return and argument types as a follow-on >>>optimisation? >> >> How does this get around the LTO canonical type merging machinery? That is, >> how are those types streamed and how are they identified by the backend? >> Just by means of being pointer equal to some statically built type in the >> backend? >> Or does the type have some attribute on it or on the component? How does the >> middle end build a related type with the same ABI, like a vector with the >> half number of elements? > > Hmm... > > At the moment it's based on pointer equality between the TYPE_MAIN_VARIANT > and statically-built types. We predefine the only available SVE "ABI types" > and there's no way to create "new" ones. > > But you're right that that doesn't work for LTO -- in general, not just > for this conversion patch -- because no streamed types end up as ABI types. > So we'll need an attribute after all, with the ABI decisions keyed off that > rather than TYPE_MAIN_VARIANT pointer equality. Will fix...
Now fixed :-) > Once that's fixed, the fact that we use SET_TYPE_STRUCTURAL_EQUALITY > for the ABI types means that the types remain distinct from "normal" > vector types even for TYPE_CANONICAL purposes, since: > > As a special case, if TYPE_CANONICAL is NULL_TREE, and thus > TYPE_STRUCTURAL_EQUALITY_P is true, then it cannot > be used for comparison against other types. Instead, the type is > said to require structural equality checks, described in > TYPE_STRUCTURAL_EQUALITY_P. > [...] > #define TYPE_CANONICAL(NODE) (TYPE_CHECK (NODE)->type_common.canonical) > /* Indicates that the type node requires structural equality > checks. The compiler will need to look at the composition of the > type to determine whether it is equal to another type, rather than > just comparing canonical type pointers. For instance, we would need > to look at the return and parameter types of a FUNCTION_TYPE > node. */ > #define TYPE_STRUCTURAL_EQUALITY_P(NODE) (TYPE_CANONICAL (NODE) == > NULL_TREE) > > We also have: > > /* Return ture if get_alias_set care about TYPE_CANONICAL of given type. > We don't define the types for pointers, arrays and vectors. The reason is > that pointers are handled specially: ptr_type_node accesses conflict with > accesses to all other pointers. This is done by alias.c. > Because alias sets of arrays and vectors are the same as types of their > elements, we can't compute canonical type either. Otherwise we could go > form void *[10] to int *[10] (because they are equivalent for canonical > type > machinery) and get wrong TBAA. */ > > inline bool > canonical_type_used_p (const_tree t) > { > return !(POINTER_TYPE_P (t) > || TREE_CODE (t) == ARRAY_TYPE > || TREE_CODE (t) == VECTOR_TYPE); > } > > So with the attribute added (needed anyway), the patch does seem to > work for LTO too. Given the above, is the patch OK? I agree it isn't very elegant, but at the moment we have no choice but to distinguish the vector types at some point during gimple. Thanks, Richard 2020-01-07 Richard Sandiford <richard.sandif...@arm.com> gcc/ * target.def (compatible_vector_types_p): New target hook. * hooks.h (hook_bool_const_tree_const_tree_true): Declare. * hooks.c (hook_bool_const_tree_const_tree_true): New function. * doc/tm.texi.in (TARGET_COMPATIBLE_VECTOR_TYPES_P): New hook. * doc/tm.texi: Regenerate. * gimple-expr.c: Include target.h. (useless_type_conversion_p): Use targetm.compatible_vector_types_p. * config/aarch64/aarch64.c (aarch64_compatible_vector_types_p): New function. (TARGET_COMPATIBLE_VECTOR_TYPES_P): Define. * config/aarch64/aarch64-sve-builtins.cc (gimple_folder::convert_pred): Use the original predicate if it already has a suitable type. gcc/testsuite/ * gcc.target/aarch64/sve/pcs/gnu_vectors_1.c: New test. * gcc.target/aarch64/sve/pcs/gnu_vectors_2.c: Likewise. Index: gcc/target.def =================================================================== --- gcc/target.def 2020-01-06 12:57:55.753930730 +0000 +++ gcc/target.def 2020-01-07 10:24:01.546344751 +0000 @@ -3411,6 +3411,29 @@ must have move patterns for this mode.", hook_bool_mode_false) DEFHOOK +(compatible_vector_types_p, + "Return true if there is no target-specific reason for treating\n\ +vector types @var{type1} and @var{type2} as distinct types. The caller\n\ +has already checked for target-independent reasons, meaning that the\n\ +types are known to have the same mode, to have the same number of elements,\n\ +and to have what the caller considers to be compatible element types.\n\ +\n\ +The main reason for defining this hook is to reject pairs of types\n\ +that are handled differently by the target's calling convention.\n\ +For example, when a new @var{N}-bit vector architecture is added\n\ +to a target, the target may want to handle normal @var{N}-bit\n\ +@code{VECTOR_TYPE} arguments and return values in the same way as\n\ +before, to maintain backwards compatibility. However, it may also\n\ +provide new, architecture-specific @code{VECTOR_TYPE}s that are passed\n\ +and returned in a more efficient way. It is then important to maintain\n\ +a distinction between the ``normal'' @code{VECTOR_TYPE}s and the new\n\ +architecture-specific ones.\n\ +\n\ +The default implementation returns true, which is correct for most targets.", + bool, (const_tree type1, const_tree type2), + hook_bool_const_tree_const_tree_true) + +DEFHOOK (vector_alignment, "This hook can be used to define the alignment for a vector of type\n\ @var{type}, in order to comply with a platform ABI. The default is to\n\ Index: gcc/hooks.h =================================================================== --- gcc/hooks.h 2020-01-06 12:57:54.749937335 +0000 +++ gcc/hooks.h 2020-01-07 10:24:01.542344777 +0000 @@ -45,6 +45,7 @@ extern bool hook_bool_uint_uint_mode_fal extern bool hook_bool_uint_mode_true (unsigned int, machine_mode); extern bool hook_bool_tree_false (tree); extern bool hook_bool_const_tree_false (const_tree); +extern bool hook_bool_const_tree_const_tree_true (const_tree, const_tree); extern bool hook_bool_tree_true (tree); extern bool hook_bool_const_tree_true (const_tree); extern bool hook_bool_gsiptr_false (gimple_stmt_iterator *); Index: gcc/hooks.c =================================================================== --- gcc/hooks.c 2020-01-06 12:57:54.745937361 +0000 +++ gcc/hooks.c 2020-01-07 10:24:01.542344777 +0000 @@ -313,6 +313,12 @@ hook_bool_const_tree_false (const_tree) } bool +hook_bool_const_tree_const_tree_true (const_tree, const_tree) +{ + return true; +} + +bool hook_bool_tree_true (tree) { return true; Index: gcc/doc/tm.texi.in =================================================================== --- gcc/doc/tm.texi.in 2020-01-06 12:57:53.657944518 +0000 +++ gcc/doc/tm.texi.in 2020-01-07 10:24:01.542344777 +0000 @@ -3365,6 +3365,8 @@ stack. @hook TARGET_VECTOR_MODE_SUPPORTED_P +@hook TARGET_COMPATIBLE_VECTOR_TYPES_P + @hook TARGET_ARRAY_MODE @hook TARGET_ARRAY_MODE_SUPPORTED_P Index: gcc/doc/tm.texi =================================================================== --- gcc/doc/tm.texi 2020-01-06 12:57:53.649944570 +0000 +++ gcc/doc/tm.texi 2020-01-07 10:24:01.542344777 +0000 @@ -4324,6 +4324,27 @@ insns involving vector mode @var{mode}. must have move patterns for this mode. @end deftypefn +@deftypefn {Target Hook} bool TARGET_COMPATIBLE_VECTOR_TYPES_P (const_tree @var{type1}, const_tree @var{type2}) +Return true if there is no target-specific reason for treating +vector types @var{type1} and @var{type2} as distinct types. The caller +has already checked for target-independent reasons, meaning that the +types are known to have the same mode, to have the same number of elements, +and to have what the caller considers to be compatible element types. + +The main reason for defining this hook is to reject pairs of types +that are handled differently by the target's calling convention. +For example, when a new @var{N}-bit vector architecture is added +to a target, the target may want to handle normal @var{N}-bit +@code{VECTOR_TYPE} arguments and return values in the same way as +before, to maintain backwards compatibility. However, it may also +provide new, architecture-specific @code{VECTOR_TYPE}s that are passed +and returned in a more efficient way. It is then important to maintain +a distinction between the ``normal'' @code{VECTOR_TYPE}s and the new +architecture-specific ones. + +The default implementation returns true, which is correct for most targets. +@end deftypefn + @deftypefn {Target Hook} opt_machine_mode TARGET_ARRAY_MODE (machine_mode @var{mode}, unsigned HOST_WIDE_INT @var{nelems}) Return the mode that GCC should use for an array that has @var{nelems} elements, with each element having mode @var{mode}. Index: gcc/gimple-expr.c =================================================================== --- gcc/gimple-expr.c 2020-01-06 12:58:10.545833431 +0000 +++ gcc/gimple-expr.c 2020-01-07 10:24:01.542344777 +0000 @@ -37,6 +37,7 @@ Software Foundation; either version 3, o #include "tree-pass.h" #include "stringpool.h" #include "attribs.h" +#include "target.h" /* ----- Type related ----- */ @@ -147,10 +148,12 @@ useless_type_conversion_p (tree outer_ty /* Recurse for vector types with the same number of subparts. */ else if (TREE_CODE (inner_type) == VECTOR_TYPE - && TREE_CODE (outer_type) == VECTOR_TYPE - && TYPE_PRECISION (inner_type) == TYPE_PRECISION (outer_type)) - return useless_type_conversion_p (TREE_TYPE (outer_type), - TREE_TYPE (inner_type)); + && TREE_CODE (outer_type) == VECTOR_TYPE) + return (known_eq (TYPE_VECTOR_SUBPARTS (inner_type), + TYPE_VECTOR_SUBPARTS (outer_type)) + && useless_type_conversion_p (TREE_TYPE (outer_type), + TREE_TYPE (inner_type)) + && targetm.compatible_vector_types_p (inner_type, outer_type)); else if (TREE_CODE (inner_type) == ARRAY_TYPE && TREE_CODE (outer_type) == ARRAY_TYPE) Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2020-01-07 10:18:06.572651552 +0000 +++ gcc/config/aarch64/aarch64.c 2020-01-07 10:24:01.538344801 +0000 @@ -2098,6 +2098,15 @@ aarch64_fntype_abi (const_tree fntype) return default_function_abi; } +/* Implement TARGET_COMPATIBLE_VECTOR_TYPES_P. */ + +static bool +aarch64_compatible_vector_types_p (const_tree type1, const_tree type2) +{ + return (aarch64_sve::builtin_type_p (type1) + == aarch64_sve::builtin_type_p (type2)); +} + /* Return true if we should emit CFI for register REGNO. */ static bool @@ -22099,6 +22108,9 @@ #define TARGET_USE_BLOCKS_FOR_CONSTANT_P #undef TARGET_VECTOR_MODE_SUPPORTED_P #define TARGET_VECTOR_MODE_SUPPORTED_P aarch64_vector_mode_supported_p +#undef TARGET_COMPATIBLE_VECTOR_TYPES_P +#define TARGET_COMPATIBLE_VECTOR_TYPES_P aarch64_compatible_vector_types_p + #undef TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT #define TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT \ aarch64_builtin_support_vector_misalignment Index: gcc/config/aarch64/aarch64-sve-builtins.cc =================================================================== --- gcc/config/aarch64/aarch64-sve-builtins.cc 2020-01-07 10:21:17.575410530 +0000 +++ gcc/config/aarch64/aarch64-sve-builtins.cc 2020-01-07 10:24:01.534344828 +0000 @@ -2265,9 +2265,13 @@ tree gimple_folder::convert_pred (gimple_seq &stmts, tree vectype, unsigned int argno) { - tree predtype = truth_type_for (vectype); tree pred = gimple_call_arg (call, argno); - return gimple_build (&stmts, VIEW_CONVERT_EXPR, predtype, pred); + if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (pred)), + TYPE_VECTOR_SUBPARTS (vectype))) + return pred; + + return gimple_build (&stmts, VIEW_CONVERT_EXPR, + truth_type_for (vectype), pred); } /* Return a pointer to the address in a contiguous load or store, Index: gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_1.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_1.c 2020-01-07 10:24:01.546344751 +0000 @@ -0,0 +1,99 @@ +/* { dg-options "-O -msve-vector-bits=256 -fomit-frame-pointer" } */ + +#include <arm_sve.h> + +typedef float16_t float16x16_t __attribute__((vector_size (32))); +typedef float32_t float32x8_t __attribute__((vector_size (32))); +typedef float64_t float64x4_t __attribute__((vector_size (32))); +typedef int8_t int8x32_t __attribute__((vector_size (32))); +typedef int16_t int16x16_t __attribute__((vector_size (32))); +typedef int32_t int32x8_t __attribute__((vector_size (32))); +typedef int64_t int64x4_t __attribute__((vector_size (32))); +typedef uint8_t uint8x32_t __attribute__((vector_size (32))); +typedef uint16_t uint16x16_t __attribute__((vector_size (32))); +typedef uint32_t uint32x8_t __attribute__((vector_size (32))); +typedef uint64_t uint64x4_t __attribute__((vector_size (32))); + +void float16_callee (float16x16_t); +void float32_callee (float32x8_t); +void float64_callee (float64x4_t); +void int8_callee (int8x32_t); +void int16_callee (int16x16_t); +void int32_callee (int32x8_t); +void int64_callee (int64x4_t); +void uint8_callee (uint8x32_t); +void uint16_callee (uint16x16_t); +void uint32_callee (uint32x8_t); +void uint64_callee (uint64x4_t); + +void +float16_caller (void) +{ + float16_callee (svdup_f16 (1.0)); +} + +void +float32_caller (void) +{ + float32_callee (svdup_f32 (2.0)); +} + +void +float64_caller (void) +{ + float64_callee (svdup_f64 (3.0)); +} + +void +int8_caller (void) +{ + int8_callee (svindex_s8 (0, 1)); +} + +void +int16_caller (void) +{ + int16_callee (svindex_s16 (0, 2)); +} + +void +int32_caller (void) +{ + int32_callee (svindex_s32 (0, 3)); +} + +void +int64_caller (void) +{ + int64_callee (svindex_s64 (0, 4)); +} + +void +uint8_caller (void) +{ + uint8_callee (svindex_u8 (1, 1)); +} + +void +uint16_caller (void) +{ + uint16_callee (svindex_u16 (1, 2)); +} + +void +uint32_caller (void) +{ + uint32_callee (svindex_u32 (1, 3)); +} + +void +uint64_caller (void) +{ + uint64_callee (svindex_u64 (1, 4)); +} + +/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7], \[x0\]} 2 } } */ +/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7], \[x0\]} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x0\]} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x0\]} 3 } } */ +/* { dg-final { scan-assembler-times {\tadd\tx0, sp, #?16\n} 11 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_2.c =================================================================== --- /dev/null 2019-09-17 11:41:18.176664108 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/pcs/gnu_vectors_2.c 2020-01-07 10:24:01.546344751 +0000 @@ -0,0 +1,99 @@ +/* { dg-options "-O -msve-vector-bits=256 -fomit-frame-pointer" } */ + +#include <arm_sve.h> + +typedef float16_t float16x16_t __attribute__((vector_size (32))); +typedef float32_t float32x8_t __attribute__((vector_size (32))); +typedef float64_t float64x4_t __attribute__((vector_size (32))); +typedef int8_t int8x32_t __attribute__((vector_size (32))); +typedef int16_t int16x16_t __attribute__((vector_size (32))); +typedef int32_t int32x8_t __attribute__((vector_size (32))); +typedef int64_t int64x4_t __attribute__((vector_size (32))); +typedef uint8_t uint8x32_t __attribute__((vector_size (32))); +typedef uint16_t uint16x16_t __attribute__((vector_size (32))); +typedef uint32_t uint32x8_t __attribute__((vector_size (32))); +typedef uint64_t uint64x4_t __attribute__((vector_size (32))); + +void float16_callee (svfloat16_t); +void float32_callee (svfloat32_t); +void float64_callee (svfloat64_t); +void int8_callee (svint8_t); +void int16_callee (svint16_t); +void int32_callee (svint32_t); +void int64_callee (svint64_t); +void uint8_callee (svuint8_t); +void uint16_callee (svuint16_t); +void uint32_callee (svuint32_t); +void uint64_callee (svuint64_t); + +void +float16_caller (float16x16_t arg) +{ + float16_callee (arg); +} + +void +float32_caller (float32x8_t arg) +{ + float32_callee (arg); +} + +void +float64_caller (float64x4_t arg) +{ + float64_callee (arg); +} + +void +int8_caller (int8x32_t arg) +{ + int8_callee (arg); +} + +void +int16_caller (int16x16_t arg) +{ + int16_callee (arg); +} + +void +int32_caller (int32x8_t arg) +{ + int32_callee (arg); +} + +void +int64_caller (int64x4_t arg) +{ + int64_callee (arg); +} + +void +uint8_caller (uint8x32_t arg) +{ + uint8_callee (arg); +} + +void +uint16_caller (uint16x16_t arg) +{ + uint16_callee (arg); +} + +void +uint32_caller (uint32x8_t arg) +{ + uint32_callee (arg); +} + +void +uint64_caller (uint64x4_t arg) +{ + uint64_callee (arg); +} + +/* { dg-final { scan-assembler-times {\tld1b\tz0\.b, p[0-7]/z, \[x0\]} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1h\tz0\.h, p[0-7]/z, \[x0\]} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz0\.s, p[0-7]/z, \[x0\]} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz0\.d, p[0-7]/z, \[x0\]} 3 } } */ +/* { dg-final { scan-assembler-not {\tst1[bhwd]\t} } } */