Hi! The following testcase ICEs, because during try_combine of i3: (insn 18 17 19 2 (parallel [ (set (reg:CCO 17 flags) (eq:CCO (plus:OI (sign_extend:OI (reg:TI 96)) (const_int 1 [0x1])) (sign_extend:OI (plus:TI (reg:TI 96) (const_int 1 [0x1]))))) (set (reg:TI 98) (plus:TI (reg:TI 96) (const_int 1 [0x1]))) ]) "pr93376.c":8:10 223 {*addvti4_doubleword_1} (expr_list:REG_UNUSED (reg:TI 98) (expr_list:REG_DEAD (reg:TI 96) (nil)))) and i2: (insn 17 37 18 2 (set (reg:TI 96) (const_wide_int 0x7fffffffffffffffffffffffffffffff)) "pr93376.c":8:10 65 {*movti_internal} (nil)) the eq in there gets simplified into: (eq:CCO (const_wide_int 0x080000000000000000000000000000000) (const_wide_int 0x80000000000000000000000000000000)) and simplify-rtx.c tries to simplify it by simplifying MINUS of the two operands. Now, i386 defines MAX_BITSIZE_MODE_ANY_INT to 128, because OImode and XImode are used mainly as a placeholder for the vector modes; these new signed overflow patterns are an exception to that, but what they really need is just TImode precision + 1 (maybe 2 worst case) bits at any time.
wide-int.h defines WIDE_INT_MAX_ELTS in a way that it contains one more HWI above number of HWIs to cover WIDE_INT_MAX_ELTS, so on i386 that is 3 HWIs, meaning that TImode precision + 1/2 bits is still representable in there. Unfortunately, the way wi::sub_large is implemented, it needs not just those 3 HWIs, but one HWI above the maximum of the lengths of both operands, which means it buffer overflows, overwrites the following precision in wide_int_storage and ICEs later on. The need for 4 HWIs is only temporary, because canonize immediately after it canonicalizes it back to 3 HWIs only. Attached are two patches, the first one is admittedly a hack, but could be considered an optimization too, simply before overwriting the last HWI ({add,sub}_large seem to be the only one that have such len++) perform a cheap check if canonize won't optimize it away immediately and in such case don't store it. Yes, we are playing with fire because full OImode is 256 bits and we can't store that many, but we in the end only need 129 bits. The other patch is something suggested by Richard S., avoid using OImode for this and instead use a partial int mode that is smaller. This is still playing with fire because even the partial int mode is larger than MAX_BITSIZE_MODE_ANY_INT, but at least its precision fits into those 3 WIDE_INT_MAX_ELTS wide-int.h uses. The disadvantage of that approach is that it is more costly at compile time, various RA data structures contains number_of_modes^2 sized arrays, and RA initialization will want to compute at runtime various properties of each of the modes. I've tried to think about other ways how to represent signed overflow check in RTL, but didn't find any that would actually properly describe it and not be way too complicated for the patterns. So, my preference is the first patch, but Richard S. doesn't like that much. Both patches have been bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk (which one)? Jakub
2020-01-23 Jakub Jelinek <ja...@redhat.com> PR target/93376 * wide-int.cc (wi::add_large, wi::sub_large): Don't extend len by writing SIGN_MASK of the highest element that canonize would remove again. * gcc.dg/pr93376.c: New test. --- gcc/wide-int.cc.jj 2020-01-12 11:54:38.535381394 +0100 +++ gcc/wide-int.cc 2020-01-22 12:05:06.739486117 +0100 @@ -1155,8 +1155,14 @@ wi::add_large (HOST_WIDE_INT *val, const if (len * HOST_BITS_PER_WIDE_INT < prec) { - val[len] = mask0 + mask1 + carry; - len++; + HOST_WIDE_INT val_top = mask0 + mask1 + carry; + /* Don't extend len unnecessarily when canonize would shrink it + again immediately. */ + if (SIGN_MASK (val[len - 1]) != val_top) + { + val[len] = val_top; + len++; + } if (overflow) *overflow = (sgn == UNSIGNED && carry) ? wi::OVF_OVERFLOW : wi::OVF_NONE; @@ -1566,8 +1572,14 @@ wi::sub_large (HOST_WIDE_INT *val, const if (len * HOST_BITS_PER_WIDE_INT < prec) { - val[len] = mask0 - mask1 - borrow; - len++; + HOST_WIDE_INT val_top = mask0 - mask1 - borrow; + /* Don't extend len unnecessarily when canonize would shrink it + again immediately. */ + if (SIGN_MASK (val[len - 1]) != val_top) + { + val[len] = val_top; + len++; + } if (overflow) *overflow = (sgn == UNSIGNED && borrow) ? OVF_UNDERFLOW : OVF_NONE; } --- gcc/testsuite/gcc.dg/pr93376.c.jj 2020-01-22 12:16:40.231937796 +0100 +++ gcc/testsuite/gcc.dg/pr93376.c 2020-01-22 12:16:23.953190057 +0100 @@ -0,0 +1,20 @@ +/* PR target/93376 */ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-Og -finline-functions-called-once" } */ + +unsigned a, b, c; + +int +bar (int x) +{ + short s = __builtin_sub_overflow (~x, 0, &b); + a = __builtin_ffsll (~x); + return __builtin_add_overflow_p (-(unsigned __int128) a, s, + (unsigned __int128) 0); +} + +void +foo (void) +{ + c = bar (0); +}
2020-01-23 Jakub Jelinek <ja...@redhat.com> PR target/93376 * config/i386/i386-modes.def (POImode): New mode. * config/i386/i386.md (DPWI): New mode attribute. (addv<mode>4, subv<mode>4): Use <DPWI> instead of <DWI>. (QWI): Rename to... (QPWI): ... this. Use POI instead of OI for TImode. (*addv<dwi>4_doubleword, *addv<dwi>4_doubleword_1, *subv<dwi>4_doubleword, *subv<dwi>4_doubleword_1): Use <QPWI> instead of <QWI>. * gcc.dg/pr93376.c: New test. --- gcc/config/i386/i386-modes.def.jj 2020-01-12 11:54:36.323414766 +0100 +++ gcc/config/i386/i386-modes.def 2020-01-22 14:10:44.781367495 +0100 @@ -107,6 +107,13 @@ INT_MODE (XI, 64); PARTIAL_INT_MODE (HI, 16, P2QI); PARTIAL_INT_MODE (SI, 32, P2HI); +/* Mode used for signed overflow checking of TImode. As + MAX_BITSIZE_MODE_ANY_INT is only 128, wide-int.h reserves only that + plus HOST_BITS_PER_WIDE_INT bits in wide_int etc., so OImode is too + large. For the overflow checking we actually need just 1 or 2 bits + beyond TImode precision. */ +PARTIAL_INT_MODE (OI, 192, POI); + /* Keep the OI and XI modes from confusing the compiler into thinking that these modes could actually be used for computation. They are only holders for vectors during data movement. */ --- gcc/config/i386/i386.md.jj 2020-01-22 14:03:52.597593584 +0100 +++ gcc/config/i386/i386.md 2020-01-22 14:31:22.661691613 +0100 @@ -6054,15 +6054,18 @@ (define_insn "*addqi_ext_2" [(set_attr "type" "alu") (set_attr "mode" "QI")]) +;; Like DWI, but use POImode instead of OImode. +(define_mode_attr DPWI [(QI "HI") (HI "SI") (SI "DI") (DI "TI") (TI "POI")]) + ;; Add with jump on overflow. (define_expand "addv<mode>4" [(parallel [(set (reg:CCO FLAGS_REG) (eq:CCO - (plus:<DWI> - (sign_extend:<DWI> + (plus:<DPWI> + (sign_extend:<DPWI> (match_operand:SWIDWI 1 "nonimmediate_operand")) (match_dup 4)) - (sign_extend:<DWI> + (sign_extend:<DPWI> (plus:SWIDWI (match_dup 1) (match_operand:SWIDWI 2 "<general_hilo_operand>"))))) @@ -6078,7 +6081,7 @@ (define_expand "addv<mode>4" if (CONST_SCALAR_INT_P (operands[2])) operands[4] = operands[2]; else - operands[4] = gen_rtx_SIGN_EXTEND (<DWI>mode, operands[2]); + operands[4] = gen_rtx_SIGN_EXTEND (<DPWI>mode, operands[2]); }) (define_insn "*addv<mode>4" @@ -6123,17 +6126,17 @@ (define_insn "addv<mode>4_1" (const_string "<MODE_SIZE>")))]) ;; Quad word integer modes as mode attribute. -(define_mode_attr QWI [(SI "TI") (DI "OI")]) +(define_mode_attr QPWI [(SI "TI") (DI "POI")]) (define_insn_and_split "*addv<dwi>4_doubleword" [(set (reg:CCO FLAGS_REG) (eq:CCO - (plus:<QWI> - (sign_extend:<QWI> + (plus:<QPWI> + (sign_extend:<QPWI> (match_operand:<DWI> 1 "nonimmediate_operand" "%0,0")) - (sign_extend:<QWI> + (sign_extend:<QPWI> (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o"))) - (sign_extend:<QWI> + (sign_extend:<QPWI> (plus:<DWI> (match_dup 1) (match_dup 2))))) (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r") (plus:<DWI> (match_dup 1) (match_dup 2)))] @@ -6172,11 +6175,11 @@ (define_insn_and_split "*addv<dwi>4_doub (define_insn_and_split "*addv<dwi>4_doubleword_1" [(set (reg:CCO FLAGS_REG) (eq:CCO - (plus:<QWI> - (sign_extend:<QWI> + (plus:<QPWI> + (sign_extend:<QPWI> (match_operand:<DWI> 1 "nonimmediate_operand" "%0")) - (match_operand:<QWI> 3 "const_scalar_int_operand" "")) - (sign_extend:<QWI> + (match_operand:<QPWI> 3 "const_scalar_int_operand" "")) + (sign_extend:<QPWI> (plus:<DWI> (match_dup 1) (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "<di>"))))) @@ -6570,11 +6573,11 @@ (define_insn "*subsi_2_zext" (define_expand "subv<mode>4" [(parallel [(set (reg:CCO FLAGS_REG) (eq:CCO - (minus:<DWI> - (sign_extend:<DWI> + (minus:<DPWI> + (sign_extend:<DPWI> (match_operand:SWIDWI 1 "nonimmediate_operand")) (match_dup 4)) - (sign_extend:<DWI> + (sign_extend:<DPWI> (minus:SWIDWI (match_dup 1) (match_operand:SWIDWI 2 "<general_hilo_operand>"))))) @@ -6590,7 +6593,7 @@ (define_expand "subv<mode>4" if (CONST_SCALAR_INT_P (operands[2])) operands[4] = operands[2]; else - operands[4] = gen_rtx_SIGN_EXTEND (<DWI>mode, operands[2]); + operands[4] = gen_rtx_SIGN_EXTEND (<DPWI>mode, operands[2]); }) (define_insn "*subv<mode>4" @@ -6637,12 +6640,12 @@ (define_insn "subv<mode>4_1" (define_insn_and_split "*subv<dwi>4_doubleword" [(set (reg:CCO FLAGS_REG) (eq:CCO - (minus:<QWI> - (sign_extend:<QWI> + (minus:<QPWI> + (sign_extend:<QPWI> (match_operand:<DWI> 1 "nonimmediate_operand" "0,0")) - (sign_extend:<QWI> + (sign_extend:<QPWI> (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "r<di>,o"))) - (sign_extend:<QWI> + (sign_extend:<QPWI> (minus:<DWI> (match_dup 1) (match_dup 2))))) (set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r") (minus:<DWI> (match_dup 1) (match_dup 2)))] @@ -6679,11 +6682,11 @@ (define_insn_and_split "*subv<dwi>4_doub (define_insn_and_split "*subv<dwi>4_doubleword_1" [(set (reg:CCO FLAGS_REG) (eq:CCO - (minus:<QWI> - (sign_extend:<QWI> + (minus:<QPWI> + (sign_extend:<QPWI> (match_operand:<DWI> 1 "nonimmediate_operand" "0")) - (match_operand:<QWI> 3 "const_scalar_int_operand" "")) - (sign_extend:<QWI> + (match_operand:<QPWI> 3 "const_scalar_int_operand" "")) + (sign_extend:<QPWI> (minus:<DWI> (match_dup 1) (match_operand:<DWI> 2 "x86_64_hilo_general_operand" "<di>"))))) --- gcc/testsuite/gcc.dg/pr93376.c.jj 2020-01-22 14:04:57.955604949 +0100 +++ gcc/testsuite/gcc.dg/pr93376.c 2020-01-22 14:04:57.955604949 +0100 @@ -0,0 +1,20 @@ +/* PR target/93376 */ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-Og -finline-functions-called-once" } */ + +unsigned a, b, c; + +int +bar (int x) +{ + short s = __builtin_sub_overflow (~x, 0, &b); + a = __builtin_ffsll (~x); + return __builtin_add_overflow_p (-(unsigned __int128) a, s, + (unsigned __int128) 0); +} + +void +foo (void) +{ + c = bar (0); +}