在 2022/11/23 00:44, Xi Ruoyao 写道:
While I still can't fully understand the immediate load issue and how
this patch fix it, I've tested this patch (alongside the prefetch
instruction patch) with bootstrap-ubsan. And the compiled result of
imm-load1.c seems OK.
And it's doing correct thing for Glibc "improved generic string
functions" patch, producing some really tight loop now.
In the process of debugging, I found this,bringing the immediate number
load instruction out of the loop is done in loop2_invariant optimization.
One of the conditions for extraction is that the destination register
cannot be used more than once, and the sequence before it was modified
was like this:
(insn 12 11 13 3 (set (reg:DI 90)
(const_int 16842752 [0x1010000])) "test.c":13:12 discrim 1 131
{*movdi_64bit}
(nil))
(insn 13 12 14 3 (set (reg:DI 91)
(ior:DI (reg:DI 90)
(const_int 257 [0x101]))) "test.c":13:12 discrim 1 88 {iordi3}
(expr_list:REG_DEAD (reg:DI 90)
(expr_list:REG_EQUAL (const_int 16843009 [0x1010101])
(nil))))
(insn 14 13 15 3 (set (reg:DI 91)
(ior:DI (zero_extend:DI (subreg:SI (reg:DI 91) 0))
(const_int 282578783305728 [0x1010100000000])))
"test.c":13:12 discrim 1 150 {lu32i_d}
(expr_list:REG_EQUAL (const_int 282578800148737 [0x1010101010101])
(nil)))
(insn 15 14 17 3 (set (reg:DI 91)
(ior:DI (and:DI (reg:DI 91)
(const_int 4503599627370495 [0xfffffffffffff]))
(const_int 72057594037927936 [0x100000000000000])))
"test.c":13:12 discrim 1 151 {lu52i_d}
(expr_list:REG_EQUAL (const_int 72340172838076673 [0x101010101010101])
(nil)))
Therefore, the last two instructions do not meet the extraction conditions.
But because of the implementation of our instructions, I freed myself up
immediately to do it loop2_invariant later, so I avoided this problem.