Hi all,
On 13/01/16 01:40, Jim Wilson wrote:
On Tue, Jan 12, 2016 at 5:10 PM, Kugan
<kugan.vivekanandara...@linaro.org> wrote:
Yes, making PROMOTE_MODE to work the same way as in
promote_function_mode in arm will fix this. Can you please point me to
the test cases that are regressing so that I can also start looking at them.
The info is in here
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932
See the comments on gcc.target/arm/wmul-[123].c which no longer
generate smulbb etc instructions, which are 16x16=32 expanding
multiplies which are faster on some older parts that have them. They
are present in armv5e and higher architecture versions.
Kyrylo looked at this in November, but the situation looks even worse
now, as some of the redundant sign extends are gone even before the
first rtl pass. That may make it harder to get the smulbb
instructions back.
I've done some more investigation on this yesterday.
The situation is actually not so bad.
The sign-extends are being removed from the arms of the multiply
(turning it into an mla rather than smlabb) in the RTL cse pass.
Something's going off in the costing logic in CSE because it generates
a simpler RTX (a multiply-add versus a multiply-add-extend) but one which
is more expensive. I think I've found the bug in there and I hope to post
a separate thread on that soon.
With the fix to CSE and a couple of arm backend rtx costing issues I can get
wmul-1.c and wmul-2.c to pass even with the change to promote_mode.
For wmul-3.c I get the sequence:
ldrsh r1, [ip, #2]!
ldrsh r4, [r0, #2]!
cmp r5, ip
mls r2, r1, r1, r2
mls lr, r1, r4, lr
instead of the previous:
ldrh ip, [lr, #2]!
ldrh r1, [r0, #2]!
cmp r6, lr
smulbb r5, ip, ip
smulbb r1, ip, r1
sub r2, r2, r5
sub r4, r4, r1
That is, two instructions shorter (no subs) but using the more expensive mls
instruction
rather than smulbb. Is the new sequence preferable?
I hope to post the backend cost bug fixes soon and an RFC for the cse issue.
Thanks,
Kyrill
Jim