2016-04-27 22:58 GMT+03:00 Uros Bizjak <ubiz...@gmail.com>: > Hello! > > This RFC patch illustrates the idea of using STV pass to load/store > any TImode constant using SSE insns. The testcase: > > --cut here-- > __int128 x; > > __int128 test_1 (void) > { > x = (__int128) 0x00112233; > } > > __int128 test_2 (void) > { > x = ((__int128) 0x0011223344556677 << 64); > } > > __int128 test_3 (void) > { > x = ((__int128) 0x0011223344556677 << 64) + (__int128) 0x0011223344556677; > } > --cut here-- > > currently compiles (-O2) on x86_64 to: > > test_1: > movq $1122867, x(%rip) > movq $0, x+8(%rip) > ret > > test_2: > xorl %eax, %eax > movabsq $4822678189205111, %rdx > movq %rax, x(%rip) > movq %rdx, x+8(%rip) > ret > > test_3: > movabsq $4822678189205111, %rax > movabsq $4822678189205111, %rdx > movq %rax, x(%rip) > movq %rdx, x+8(%rip) > ret > > However, using the attached patch, we compile all tests to: > > test: > movdqa .LC0(%rip), %xmm0 > movaps %xmm0, x(%rip) > ret > > Ilya, HJ - do you think new sequences are better, or - as suggested by > Jakub - they are beneficial with STV pass, as we are now able to load > any immediate value? A variant of this patch can also be used to load > DImode values to 32bit STV pass. > > Uros.
Hi, Why don't we have two movq instructions in all three cases now? Is it because of late split? I wouldn't say SSE load+store is always better than two movq instructions. But it obviously can enable bigger chains for STV which is good. I think you should adjust a cost model to handle immediates for proper decision. That's what I have in my draft for DImode immediates: @@ -3114,6 +3123,20 @@ scalar_chain::build (bitmap candidates, unsigned insn_uid) BITMAP_FREE (queue); } +/* Return a cost of building a vector costant + instead of using a scalar one. */ + +int +scalar_chain::vector_const_cost (rtx exp) +{ + gcc_assert (CONST_INT_P (exp)); + + if (const0_operand (exp, GET_MODE (exp)) + || constm1_operand (exp, GET_MODE (exp))) + return COSTS_N_INSNS (1); + return ix86_cost->sse_load[1]; +} + /* Compute a gain for chain conversion. */ int @@ -3145,11 +3168,25 @@ scalar_chain::compute_convert_gain () || GET_CODE (src) == IOR || GET_CODE (src) == XOR || GET_CODE (src) == AND) - gain += ix86_cost->add; + { + gain += ix86_cost->add; + if (CONST_INT_P (XEXP (src, 0))) + gain -= scalar_chain::vector_const_cost (XEXP (src, 0)); + if (CONST_INT_P (XEXP (src, 1))) + gain -= scalar_chain::vector_const_cost (XEXP (src, 1)); + } else if (GET_CODE (src) == COMPARE) { /* Assume comparison cost is the same. */ } + else if (GET_CODE (src) == CONST_INT) + { + if (REG_P (dst)) + gain += COSTS_N_INSNS (2); + else if (MEM_P (dst)) + gain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1]; + gain -= scalar_chain::vector_const_cost (src); + } else gcc_unreachable ();