On 7/10/2022 9:43 PM, liuhongt via Gcc-patches wrote:
The patch only handles load/store(including ctor/permutation, except
gather/scatter) for complex type, other operations don't needs to be
handled since they will be lowered by pass cplxlower.(MASK_LOAD is not
supported for complex type, so no need to handle either).
Instead of support vector(2) _Complex double, this patch takes vector(4)
double as vector type of _Complex double. Since vectorizer originally
takes TYPE_VECTOR_SUBPARTS as nunits which is not true for complex
type, the patch handles nunits/ncopies/vf specially for complex type.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Also test the patch for SPEC2017 and find there's complex type vectorization
in 510/549(but no performance impact).
No comment on the implementation. From a benchmarking standpoint you
might want to look at cam4 in speed, not rate mode. I'd bet you'd
want -ffast-math or -fcx-limited-range to avoid divdc3 and have those
calls expanded inline which may give you a better crack at exposing
vectorization opportunities in there.
jeff