On Tue, 29 Sep 2020, Richard Sandiford wrote: > Tamar Christina <tamar.christ...@arm.com> writes: > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > > index > > 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be > > 100644 > > --- a/gcc/doc/md.texi > > +++ b/gcc/doc/md.texi > > @@ -6132,6 +6132,17 @@ floating-point mode. > > > > This pattern is not allowed to @code{FAIL}. > > > > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern > > +@item @samp{cadd@var{m}@var{n}3} > > +Perform a vector addition of complex numbers in operand 1 with operand 2 > > +rotated by @var{m} degrees around the argand plane and storing the result > > in > > +operand 0. The instruction must perform the operation on data loaded > > +contiguously into the vectors. > > Nitpicking, sorry, but I think it would be better to describe the > layout directly rather than in terms of loads, since the preceding > operation might not be a load.
So if we're at that and since GCC vectors do not have complex components can we formulate this in terms avoiding 'complex'? Isn't this an add of one vector to a vector with adjacant lanes swapped and possibly negated? Mentioning that this would match a complex add in case lanes happen to match up with complex real/imag parts is OK but the pattern should work equally well if there's no complex numbers involved? > I guess the main question is: what representation do we expect for > big-endian? A normal Advanced SIMD LDR would give this (for floats): > > MEMORY > +-----+-----+-----+-----+ > | r0 | i0 | r1 | i1 | > +-----+-----+-----+-----+ > | 0 | 1 | 2 | 3 | array numbering > +-----+-----+-----+-----+ > V V V V Advanced SIMD LDR > +-----+-----+-----+-----+ > | r0 | i0 | r1 | i1 | > +-----+-----+-----+-----+ > | 0 | 1 | 2 | 3 | GCC lane numbering > +-----+-----+-----+-----+ > | 3 | 2 | 1 | 0 | Arm lane numbering > +-----+-----+-----+-----+ > MSB REGISTER LSB > > but the FC* instructions put the imaginary parts in the more > significant lane, so the pairs of elements above would need > to be reversed: > > MEMORY > +-----+-----+-----+-----+ > | r0 | i0 | r1 | i1 | > +-----+-----+-----+-----+ > | 0 | 1 | 2 | 3 | array numbering > +-----+-----+-----+-----+ > \ / \ / > \ / \ / > X X Load and permute > / \ / \ > / \ / \ > +-----+-----+-----+-----+ > | i0 | r0 | i1 | r1 | > +-----+-----+-----+-----+ > | 0 | 1 | 2 | 3 | GCC lane numbering > +-----+-----+-----+-----+ > | 3 | 2 | 1 | 0 | Arm lane numbering > +-----+-----+-----+-----+ > MSB REGISTER LSB > > (Or the whole vector could be reversed.) > > We might decide that it just isn't worth doing this for Advanced SIMD. > But should the semantics of the optab be that: > > (1) GCC lane number 0 holds a real part, or > (2) the least significant lane holds a real part? > > With (1), it would be up to the target to hide the permute above. > With (2), the vectoriser would need to introduce the permute itself. > > I'm not sure there's a perfect answer even for Arm targets. (2) matches > the Advanced SIMD semantics. But for SVE, the register layout follows > LD1 rather than LDR, and the GCC and architectural lane numbering match up. > (1) would therefore be better than (2) for SVE (and so no permute would be > needed for either endianness on SVE). > > > +The operation is only supported for vector modes @var{n} and with > > +rotations @var{m} of 90 or 270. > > + > > +This pattern is not allowed to @code{FAIL}. > > + > > @cindex @code{ffs@var{m}2} instruction pattern > > @item @samp{ffs@var{m}2} > > Store into operand 0 one plus the index of the least significant 1-bit > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > index > > 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3fb005b5af78ef31 > > 100644 > > --- a/gcc/internal-fn.def > > +++ b/gcc/internal-fn.def > > @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary) > > DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary) > > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary) > > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary) > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary) > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) > > > > /* FP scales. */ > > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > > diff --git a/gcc/optabs.def b/gcc/optabs.def > > index > > 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d > > 100644 > > --- a/gcc/optabs.def > > +++ b/gcc/optabs.def > > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2") > > OPTAB_D (atanh_optab, "atanh$a2") > > OPTAB_D (copysign_optab, "copysign$F$a3") > > OPTAB_D (xorsign_optab, "xorsign$F$a3") > > +OPTAB_D (cadd90_optab, "cadd90$a3") > > +OPTAB_D (cadd270_optab, "cadd270$a3") > > OPTAB_D (cos_optab, "cos$a2") > > OPTAB_D (cosh_optab, "cosh$a2") > > OPTAB_D (exp10_optab, "exp10$a2") > > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c > > index > > 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2bac736dd970be735 > > 100644 > > --- a/gcc/tree-vect-slp-patterns.c > > +++ b/gcc/tree-vect-slp-patterns.c > > @@ -663,12 +663,94 @@ graceful_exit: > > } > > }; > > > > +class ComplexAddPattern : public ComplexPattern > > Another nitpick, sorry, but type names should be lower case rather than > CamelCase. > > Thanks, > Richard > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend