https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79359
--- Comment #1 from Raphael C <drraph at gmail dot com> --- In case it's of any help, here is an explanation of the assembly that ICC gives with -fp-model strict. R = real and C = complex. Here "x" just means don't know or unused. We start with xmm0 = {x, x, C, R}. The desired output is (R+iC)^2= R^2 + 2RCi - C^2 vmovsldup xmm1, xmm0 # xmm1 = { x, x, R, R } vmovshdup xmm2, xmm0 # xmm2 = { x, x, C, C } vshufps xmm3, xmm0, xmm0, 177 # xmm3 = { x, x, R, C } vmulps xmm4, xmm1, xmm0 # xmm4 = { x, x, RC, RR } vmulps xmm5, xmm2, xmm3 # xmm5 = { x, x, RC, CC } vaddsubps xmm0, xmm4, xmm5 # xmm0 = { x, x, 2RC, RR-CC } ret