https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63351
--- Comment #3 from Kirill Yukhin <kyukhin at gcc dot gnu.org> --- Hello, For AVX-512F (zmm-s) We have a patch which enables such as stuff basing on combiner machinery: a new subst which allows `broadcasted' version of patterns. Combiner can combine (load-bcst + actual insn) into (actual insn w/ bcst-ed mem-op). This patch generates emb. bcts for such a cases: +/* { dg-options "-O3 -mavxavx512f" } */ +/* { dg-final { scan-assembler-times "vpmulps\[ \\t\]+\[^\n\]*.*1to16.*%zmm\[0-9\]\[\\n\]" 1 } } */ + +#define N 16 + +float f1 (float *c1_p, float *c2_p) +{ + + float a[N]; + float b[N]; + float c[N]; + float c1 = *c1_p; + float c2 = *c2_p; + int i; + + for (i = 0; i < N; i++) + { + a[i] = c1; + b[i] = c2; + } + + for (i = 0; i < N; i++) + { + c[i] = a[i] * b[i]; + } + + return c[(int)(c1 + c2) % N]; +} The patch almost no impact on Spec2006 (one of the reasons is the combiner not working through bb-s). For AVX-512VL ([xy]mm-s) Such an optimization should be also applicable, when all new patterns will reach the trunk.