https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87104
--- Comment #12 from pipcet at gmail dot com ---
(In reply to pipcet from comment #11)
> (insn 7 6 8 2 (set (reg:CCZ 17 flags)
> (compare:CCZ (and:DI (not:DI (reg/v:DI 86 [ i ]))
> (const_int 12 [0xc]))
> (const_int 0 [0]))) "h17.c":4 15 {*cmpdi_1}
> (expr_list:REG_DEAD (reg:DI 88)
>
> Surely we should be dealing with a canonical form instead? Who's
> generating this non-canonical expression, and why?
simplify-rtx.c, it turns out, because it "canonicalizes" (x & y) = y to (~x &
y) = 0. I think that's strange, but we can work around it.
I'm testing these three approaches:
1. canonicalize to (x-y) & z = 0
2. don't canonicalize, but add a define_insn_and_split
3. original gcc
head-to-head. I'm compiling trunk Emacs with Paul's patch reverted, then
running "perf stat ./src/temacs --batch" in a loop and producing a histogram
of the cycles needed. It seems (1) and (2) beat (3) quite significantly (1.1%)
while (1) very narrowly beats (2) (< 0.1%). Both values are the median values,
but it looks like the curves are simply shifted a little, so I'm prepared to
say it's a consistent effect.
The code looks good, and the slight difference between (1) and (2) makes sense,
because (2) generates:
leal -5(%rdi), %esi
movq %rdi, %rax
andl $7, %esi
je .L129
ret
.p2align 4,,10
.p2align 3
.L129:
movslq suspicious_object_index(%rip), %rsi
movl $0, %ecx
while (1) realizes %rsi is zero at this point and skips the movl. (Looking at
this code, I do not understand why movl is used rather than the standard xorl,
though, so maybe this is another optimization opportunity).
So I think the performance difference is really significant for Emacs; my plan
is to test all three versions on other programs, make sure the code works for C
bitfields, and then submit it for inclusion. Is that okay?