On 15/11/2019 15:05, Francesco Petrogalli wrote: > Thank you Szabolcs for working on this. > >> On Nov 14, 2019, at 2:23 PM, Szabolcs Nagy <szabolcs.n...@arm.com> wrote: >> >> Sorry v2 had a bug. >> >> v2: added documentation and tests. >> v3: fixed expand_simd_clones so different isa variants are actually >> generated. >> >> GCC currently supports two ways to declare the availability of vector >> variants of a scalar function: >> >> #pragma omp declare simd >> void f (void); >> >> and >> >> __attribute__ ((simd)) >> void f (void); >> >> However these declare a set of symbols that are different simd variants >> of f, so a library either provides definitions for all those symbols or >> it cannot use these declarations. (The set of declared symbols can be >> narrowed down with additional omp clauses, but not enough to allow >> declaring a single symbol. OpenMP 5 has a declare variant feature that >> allows declaring more specific simd variants, but it is complicated and >> still requires gcc or vendor extension for unambiguous declarations.) >> > > It is not just that it is complicated, it is also a good idea to make math > function vectorization orthogonal to OpenMP. > This is needed din clang too, thank you for shaping a solution. It would be > good if we could come up with a common solution! > >> This patch extends the gcc specific simd attribute such that it can >> specify a single vector variant of simple scalar functions (functions >> that only take and return scalar integer or floating type values): >> >> >> >> where mask is "inbranch" or "notinbranch" like now, simdlen is an int >> with the same meaning as in omp declare simd and simdabi is a string >> specifying the call ABI (which the intel vector ABI calls ISA). The >> name is optional and allows a library to use a different symbol name >> than what the vector ABI specifies. >> > > Can we have also handling also the simplest case for the use of linear (just > with step = OpenMP default = 1)? It will be useful for `sincos`. > > OpenMP `linear` uses parameters name, in the attribute, which applies to > declaration with unnamed parameters, we could use positions. > > Also, can we avoid making the attribute a varargs attribute, but requires all > arguments to be present? We could use dummy values when the descriptor is not > needed (e.g. `no_linear`). > > I would also require the name to be specified, with the additional > requirement to make the declaration of name visible in the same compilation > unit. > > For example, on Arm, mapping `sincos` and `exp` to unmasked vector versions > with 2 lanes would be: > > ``` > void _ZGVnN2vl8l8_sincos(float64x2_t, double *, double *); > > void sincos(double, double*, double *) > __attribute__(simd(notinbranch,2,”simd”,_ZGVnN2vl8l8_sincos, linear(2,3))); > > void _ZGVnN2v_exp(float64x2_t); > > void exp(double) __attribute__(simd(notinbranch,2,”simd”,_ZGVnN2v_exp, > no_linear)); > ```
note that exp returns double. the simdabi is "n" instead of "simd", it is the 'ISA character' from the vector abi, to make it clear how it is tied to the vector abi name mangling. pre-declaration of the simd symbol is not required, since the current simd attr (and omp pragma) works that way (e.g. it means simd types need not be visible where it is used, which may be tricky if they need target specific header inclusion). notinbranch is a string in quotes, to avoid macro name collision in standard headers. the same issue applies to linear, so if we add such argument it must not use bare 'linear' identifier, e.g. it could be an int list like simd("notinbranch", 2, "n", {2,3}, "vsincos") or we could just specify the vector abi name and reverse engineer the requirements from that: simd("", "_ZGVnN2vl8l8_sincos", "vsincos") (first argument is just to disambiguate from the mask form) i considered this name demangling more difficult to implement and error prone, but it unambiguously specifies a single vector abi symbol for all future vector abi extensions. but there is a bigger problem with linear: currently the vectorizer only considers vector clones if they are 'const' or the vectorization of a loop was explicitly requested with omp simd pragma. so functions with pointer arguments won't be used for vectorization. is that different in clang? or expected to change somehow?