On Thu, Jan 21, 2021 at 2:25 AM Levy <ad...@levyhsu.com> wrote: > Added implementation for builtin overflow detection, new patterns are > listed below. >
For rv32 SImode and rv64 DImode, the unsigned add/sub and signed/unsigned mul patterns seem to give the same result as the default code generation. That has me wondering if we really need the patterns. For rv64 SImode, the signed add/sub patterns are generating worse code. Signed add overflow without the pattern is add a1,a0,a1 sext.w a0,a1 bne a1,a0,.L23 and with the pattern is mv a5,a0 addw a0,a0,a1 slt a5,a0,a5 slti a1,a1,0 bne a1,a5,.L27 ignoring the register allocation issue we have one more instruction using the pattern which is undesirable. This needs a fix. We could just use X instead of GPR. We could then optionally add rv64 SImode patterns. If we don't add them it looks like a bug, so I think we should add another pattern for this. For rv64 SImode, the unsigned add/sub patterns are generating better code. We have unnecessary zero extends without the patterns. This suggests that the unsigned add/sub patterns really are useful. Only 1 of the 3 expansions is useful, but if we only implement one of the 3 expansions that looks like a bug so I think this is OK as is. The signed/unsigned mul patterns only support rv32 SImode and rv64 DImode. The rv64 SImode support is missing. But even though there is no pattern for it we can still get worse code for it. An unpatched tree for signed mul gives mul a1,a0,a1 sext.w a0,a1 bne a1,a0,.L37 and a patched tree gives mul a0,a0,a1 sraiw a4,a0,31 srai a5,a0,32 bne a4,a5,.L43 On the other hand, an unpatched tree for unsigned mul gives slli a0,a0,32 slli a1,a1,32 srli a0,a0,32 srli a1,a1,32 mul a5,a0,a1 slli a4,a5,32 srli a4,a4,32 bne a5,a4,.L61 and a patched tree gives slli a0,a0,32 slli a1,a1,32 srli a0,a0,32 srli a1,a1,32 mul a0,a0,a1 srai a5,a0,32 bne a5,zero,.L69 which is one instruction shorter. In both cases, the difference is due to the hook to set min precision to 32. Presumably we need this hook for add/sub, so we need to add the missing rv64 SImode signed mul pattern. Since we have to have one of them, we may as well have all of them for completeness. There are a few minor style issues. In the define_expands, the patterns aren't formatted properly. An open parenthesis should be aligned to an open paren at the same depth on the line above. If you are an emacs user, you can use M-x emacs-lisp-mode and hit tab to align them. In a define_expand the constraints aren't useful and shouldn't be included. In a define_expand that always emits its own RTL the pattern isn't useful either. Only the operand modes, numbers, and predicates are useful. This is why some define_expands only list the operands and don't include the pattern. But including the pattern is OK, it just isn't required. So the only real fix we need here is to drop the constraints. I attached the testcase I used for testing purposes. Every function should be same size or smaller with the patch. Jim
#include <stdlib.h> int sub_add_p (int i, int j) { int k; return __builtin_add_overflow_p (i, j, k); } int sub_sub_p (int i, int j) { int k; return __builtin_sub_overflow_p (i, j, k); } int sub_mul_p (int i, int j) { int k; return __builtin_mul_overflow_p (i, j, k); } long sub_add_p_long (long i, long j) { long k; return __builtin_add_overflow_p (i, j, k); } long sub_sub_p_long (long i, long j) { long k; return __builtin_sub_overflow_p (i, j, k); } long sub_mul_p_long (long i, long j) { long k; return __builtin_mul_overflow_p (i, j, k); } int sub_add (int i, int j) { int k; if (__builtin_sadd_overflow (i, j, &k)) abort (); return k; } int sub_sub (int i, int j) { int k; if (__builtin_ssub_overflow (i, j, &k)) abort (); return k; } int sub_mul (int i, int j) { int k; if (__builtin_smul_overflow (i, j, &k)) abort (); return k; } int sub_uadd (int i, int j) { int k; if (__builtin_uadd_overflow (i, j, &k)) abort (); return k; } int sub_usub (int i, int j) { int k; if (__builtin_usub_overflow (i, j, &k)) abort (); return k; } int sub_umul (int i, int j) { int k; if (__builtin_umul_overflow (i, j, &k)) abort (); return k; } long sub_add_long (long i, long j) { long k; if (__builtin_saddl_overflow (i, j, &k)) abort (); return k; } long sub_sub_long (long i, long j) { long k; if (__builtin_ssubl_overflow (i, j, &k)) abort (); return k; } long sub_mul_long (long i, long j) { long k; if (__builtin_smull_overflow (i, j, &k)) abort (); return k; } long sub_uadd_long (long i, long j) { long k; if (__builtin_uaddl_overflow (i, j, &k)) abort (); return k; } long sub_usub_long (long i, long j) { long k; if (__builtin_usubl_overflow (i, j, &k)) abort (); return k; } long sub_umul_long (long i, long j) { long k; if (__builtin_umull_overflow (i, j, &k)) abort (); return k; }