http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48927
Summary: Issues with "enable" attribute and IRA register preferences Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ubiz...@gmail.com Trying to merge*vec_concatv4si_1_avx and *vec_concatv4si_1 patterns usign "enable" attribute, gcc.target/i386/pr36246.c test (scan-asm-not for movq insn) and gcc.target/i386/pr36222-1.c (scan-assembler-not for movdqa) failed with: FAIL: gcc.target/i386/pr36222-1.c scan-assembler-not movdqa FAIL: gcc.target/i386/pr36246.c scan-assembler-not movq Following are the two patterns, original (the first one) and merged pattern (the second one). Separate AVX pattern is not relevant to this discussion. (define_insn "*vec_concatv4si_old" [(set (match_operand:V4SI 0 "register_operand" "=Y2,x,x") (vec_concat:V4SI (match_operand:V2SI 1 "register_operand" " 0 ,0,0") (match_operand:V2SI 2 "nonimmediate_operand" " Y2,x,m")))] "0" "@ punpcklqdq\t{%2, %0|%0, %2} movlhps\t{%2, %0|%0, %2} movhps\t{%2, %0|%0, %2}" [(set_attr "type" "sselog,ssemov,ssemov") (set_attr "mode" "TI,V4SF,V2SF")]) (define_insn "*vec_concatv4si" [(set (match_operand:V4SI 0 "register_operand" "=Y2,x,x,x,x") (vec_concat:V4SI (match_operand:V2SI 1 "register_operand" " 0 ,x,0,0,x") (match_operand:V2SI 2 "nonimmediate_operand" " Y2,x,x,m,m")))] "TARGET_SSE" "@ punpcklqdq\t{%2, %0|%0, %2} vpunpcklqdq\t{%2, %1, %0|%0, %1, %2} movlhps\t{%2, %0|%0, %2} movhps\t{%2, %0|%0, %2} vmovhps\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx,noavx,noavx,avx") (set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov") (set_attr "prefix" "orig,vex,orig,orig,vex") (set_attr "mode" "TI,TI,V4SF,V2SF,V2SF")]) The problem is, that for non-AVX target, merged pattern somehow changes register allocation preferences (please note that all new constraints are disabled for non-AVX target), so in theory, there should be nothing different, However, IRA shows certain differences, the diff betwen non-patched (pr36246.c) and patched (pr34246_1.c) IRA dump files show: --- pr36246_1.c.190r.ira 2011-05-05 22:06:46.252582018 +0200 +++ pr36246.c.190r.ira 2011-05-05 21:50:07.831975984 +0200 @@ -100,10 +100,9 @@ cp1:a1(r68)<->a5(r62)@125:shuffle cp2:a2(r69)<->a4(r65)@125:shuffle cp3:a2(r69)<->a3(r64)@125:shuffle - cp4:a0(r67)<->a2(r69)@125:shuffle - cp5:a0(r67)<->a1(r68)@125:shuffle + cp4:a0(r67)<->a2(r69)@1000:constraint regions=1, blocks=3, points=8 - allocnos=7 (big 0), copies=6, conflicts=0, ranges=7 + allocnos=7 (big 0), copies=5, conflicts=0, ranges=7 **** Allocnos coloring: @@ -140,11 +139,11 @@ Popping a6(r63,l0) -- assign reg 4 Popping a5(r62,l0) -- assign reg 5 Popping a0(r67,l0) -- assign reg 21 - Popping a1(r68,l0) -- assign reg 21 - Popping a2(r69,l0) -- assign reg 22 + Popping a1(r68,l0) -- assign reg 22 + Popping a2(r69,l0) -- assign reg 21 Disposition: 5:r62 l0 5 6:r63 l0 4 3:r64 l0 1 4:r65 l0 2 - 0:r67 l0 21 1:r68 l0 21 2:r69 l0 22 + 0:r67 l0 21 1:r68 l0 22 2:r69 l0 21 This results in different allocated registers, so the difference between assembly files shows: --- pr36246_1.s 2011-05-05 22:06:46.255582628 +0200 +++ pr36246.s 2011-05-05 21:50:07.833976438 +0200 @@ -1,4 +1,4 @@ - .file "pr36246_1.c" + .file "pr36246.c" .text .p2align 4,,15 .globl _mm_set_epi32 @@ -7,19 +7,17 @@ .LFB0: .cfi_startproc movl %esi, -12(%rsp) # 23 *movsi_internal/2 [length = 4] - movd -12(%rsp), %xmm0 # 24 *movsi_internal/12 [length = 6] + movd -12(%rsp), %xmm1 # 24 *movsi_internal/12 [length = 6] movl %edi, -12(%rsp) # 25 *movsi_internal/2 [length = 4] - movd -12(%rsp), %xmm1 # 26 *movsi_internal/12 [length = 6] + movd -12(%rsp), %xmm0 # 26 *movsi_internal/12 [length = 6] movl %ecx, -12(%rsp) # 27 *movsi_internal/2 [length = 4] - punpckldq %xmm1, %xmm0 # 9 *vec_concatv2si_sse2/1 [length = 4] - movd -12(%rsp), %xmm1 # 28 *movsi_internal/12 [length = 6] + punpckldq %xmm0, %xmm1 # 9 *vec_concatv2si_sse2/1 [length = 4] + movd -12(%rsp), %xmm0 # 28 *movsi_internal/12 [length = 6] movl %edx, -12(%rsp) # 29 *movsi_internal/2 [length = 4] movd -12(%rsp), %xmm2 # 30 *movsi_internal/12 [length = 6] - punpckldq %xmm2, %xmm1 # 10 *vec_concatv2si_sse2/1 [length = 4] - movq %xmm1, %xmm2 # 31 *movv2si_internal_rex64/10 [length = 4] - punpcklqdq %xmm0, %xmm2 # 11 *vec_concatv4si/1 [length = 4] - movdqa %xmm2, %xmm0 # 32 *movv4si_internal/2 [length = 4] - ret # 35 return_internal [length = 1] + punpckldq %xmm2, %xmm0 # 10 *vec_concatv2si_sse2/1 [length = 4] + punpcklqdq %xmm1, %xmm0 # 11 *vec_concatv4si_1/1 [length = 4] + ret # 33 return_internal [length = 1] .cfi_endproc .LFE0: .size _mm_set_epi32, .-_mm_set_epi32 This triggers the scan-asm-not scanner failure, pointing to the interference between "enable" attribute and IRA. I believe that the intention of "enable" attribute is to maintain consistency between separate patterns and merged patterns in all stages of compilation.