------- Comment #3 from jamborm at gcc dot gnu dot org 2009-05-21 16:02 -------
With he new SRA, the optimized dump looks like:
D.6886_10 = {1, 1, 1, 1};
D.6887_11 = VIEW_CONVERT_EXPR<vector long long int>(D.6886_10);
D.6893_12 = VIEW_CONVERT_EXPR<vector int>(D.6887_11);
D.6891_14 = __builtin_ia32_pcmpeqd128 (D.6893_12, D.6893_12);
D.6890_15 = VIEW_CONVERT_EXPR<vector long long int>(D.6891_14);
D.6897_16 = VIEW_CONVERT_EXPR<vector char>(D.6890_15);
D.6896_17 = __builtin_ia32_pmovmskb128 (D.6897_16);
D.6933_21 = D.6896_17 != 65535;
return D.6933_21;
x is completely gone.
The (relevant) assembly output is
main:
movdqa .LC0, %xmm0
pcmpeqd %xmm0, %xmm0
pmovmskb %xmm0, %eax
cmpl $65535, %eax
pushl %ebp
setne %al
movl %esp, %ebp
movzbl %al, %eax
popl %ebp
ret
So even though I don't really understand the SSE instructions I
believe the new SRA does indeed help. I'll add a testcase checking
that x vanishes to the patch series as I am finalizing the final patch
set now.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40122