fixed rot on the wide-int branch.
Index: gcc/optabs.c === --- gcc/optabs.c(revision 201884) +++ gcc/optabs.c(working copy) @@ -867,7 +867,8 @@ expand_subword_shift (enum machine_mode outof_input, const1_rtx, 0, unsignedp, methods); if (shift_mask == BITS_PER_WORD - 1) { - tmp = immed_wide_int_const (wide_int::minus_one (op1_mode), op1_mode); + tmp = immed_wide_int_const +(wide_int::minus_one (GET_MODE_PRECISION (op1_mode)), op1_mode); tmp = simplify_expand_binop (op1_mode, xor_optab, op1, tmp, 0, true, methods); } Index: gcc/recog.c === --- gcc/recog.c(revision 201884) +++ gcc/recog.c(working copy) @@ -1187,8 +1187,7 @@ const_scalar_int_operand (rtx op, enum m /* Multiword partial int. */ HOST_WIDE_INT x = CONST_WIDE_INT_ELT (op, CONST_WIDE_INT_NUNITS (op) - 1); - return (wide_int::sext (x, prec & (HOST_BITS_PER_WIDE_INT - 1)) - == x); + return (sext_hwi (x, prec & (HOST_BITS_PER_WIDE_INT - 1)) == x); } } return 1;
patch for wide-int branch
cleaned up version of convert_modes that handles all constants in a uniform manner. This is clean on x86-64. Will test on other platforms tomorrow. kenny Index: gcc/expr.c === --- gcc/expr.c(revision 201884) +++ gcc/expr.c(working copy) @@ -710,64 +710,32 @@ convert_modes (enum machine_mode mode, e if (mode == oldmode) return x; - /* There is one case that we must handle specially: If we are - converting a CONST_INT into a mode whose size is larger than - HOST_BITS_PER_WIDE_INT and we are to interpret the constant as - unsigned, gen_lowpart will do the wrong if the constant appears - negative. What we want to do is make the high-order word of the - constant zero, not all ones. */ - - if (unsignedp && GET_MODE_CLASS (mode) == MODE_INT - && GET_MODE_BITSIZE (mode) > HOST_BITS_PER_WIDE_INT - && CONST_INT_P (x) && INTVAL (x) < 0) + if (CONST_SCALAR_INT_P (x) + && GET_MODE_CLASS (mode) == MODE_INT + && (oldmode == VOIDmode || GET_MODE_CLASS (oldmode) == MODE_INT)) { - wide_int val = std::make_pair (x, mode); - /* We need to zero extend VAL. */ - if (oldmode != VOIDmode) -val = val.zext (GET_MODE_PRECISION (oldmode)); - - return immed_wide_int_const (val, mode); + wide_int w = std::make_pair (x, mode); + /* If the caller did not tell us the old mode, then there is + not much to do with respect to canonization. */ + if (oldmode != VOIDmode + && GET_MODE_PRECISION (mode) > GET_MODE_PRECISION (oldmode)) +w = w.ext (GET_MODE_PRECISION (oldmode), unsignedp ? UNSIGNED : SIGNED); + return immed_wide_int_const (w, mode); } /* We can do this with a gen_lowpart if both desired and current modes are integer, and this is either a constant integer, a register, or a - non-volatile MEM. Except for the constant case where MODE is no - wider than HOST_BITS_PER_WIDE_INT, we must be narrowing the operand. */ + non-volatile MEM. */ + if (GET_MODE_CLASS (mode) == MODE_INT + && GET_MODE_CLASS (oldmode) == MODE_INT + && GET_MODE_PRECISION (mode) <= GET_MODE_PRECISION (oldmode) + && ((MEM_P (x) && !MEM_VOLATILE_P (x) && direct_load[(int) mode]) + || (REG_P (x) + && (!HARD_REGISTER_P (x) + || HARD_REGNO_MODE_OK (REGNO (x), mode)) + && TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (x) - if ((CONST_INT_P (x) - && GET_MODE_PRECISION (mode) <= HOST_BITS_PER_WIDE_INT) - || (GET_MODE_CLASS (mode) == MODE_INT - && GET_MODE_CLASS (oldmode) == MODE_INT - && (CONST_SCALAR_INT_P (x) - || (GET_MODE_PRECISION (mode) <= GET_MODE_PRECISION (oldmode) - && ((MEM_P (x) && ! MEM_VOLATILE_P (x) - && direct_load[(int) mode]) - || (REG_P (x) - && (! HARD_REGISTER_P (x) - || HARD_REGNO_MODE_OK (REGNO (x), mode)) - && TRULY_NOOP_TRUNCATION_MODES_P (mode, -GET_MODE (x -{ - /* ?? If we don't know OLDMODE, we have to assume here that - X does not need sign- or zero-extension. This may not be - the case, but it's the best we can do. */ - if (CONST_INT_P (x) && oldmode != VOIDmode - && GET_MODE_PRECISION (mode) > GET_MODE_PRECISION (oldmode)) -{ - HOST_WIDE_INT val = INTVAL (x); - - /* We must sign or zero-extend in this case. Start by - zero-extending, then sign extend if we need to. */ - val &= GET_MODE_MASK (oldmode); - if (! unsignedp - && val_signbit_known_set_p (oldmode, val)) -val |= ~GET_MODE_MASK (oldmode); - - return gen_int_mode (val, mode); -} - - return gen_lowpart (mode, x); -} + return gen_lowpart (mode, x); /* Converting from integer constant into mode is always equivalent to an subreg operation. */
Re: wide-int branch now up for public comment and review
On 08/23/2013 11:02 AM, Richard Sandiford wrote: Hi Kenny, This is the first time I've looked at the implementation of wide-int.h (rather than just looking at the rtl changes, which as you know I like in general), so FWIW here are some comments on wide-int.h. I expect a lot of them overlap with Richard B.'s comments. I also expect many of them are going to be annoying, sorry, but this first one definitely will. The coding conventions say that functions should be defined outside the class: http://gcc.gnu.org/codingconventions.html and that opening braces should be on their own line, so most of the file needs to be reformatted. I went through and made that change with the patch below, in the process of reading through. I also removed "SGN must be SIGNED or UNSIGNED." because it seemed redundant when those are the only two values available. The patch fixes a few other coding standard problems and typos, but I've not made any actual code changes (or at least, I didn't mean to). I had started the file with the functions outside of the class and mike had stated putting them in the class, and so i went with putting them in the class since many of them were one liners and so having them out of line just doubled the size of everything. however, we did not look at the coding conventions and that really settles the argument. Thanks for doing this. Does it look OK to install? you can check it in. I'm still unsure about these "infinite" precision types, but I understand the motivation and I have no objections. However: * Code that does widening conversions. The canonical way that this is performed is to sign or zero extend the input value to the max width based on the sign of the type of the source and then to truncate that value to the target type. This is in preference to using the sign of the target type to extend the value directly (which gets the wrong value for the conversion of large unsigned numbers to larger signed types). I don't understand this particular reason. Using the sign of the source type is obviously right, but why does that mean we need "infinite" precision, rather than just doubling the precision of the source? in a sense, "infinite" does not really mean infinite, it really just means large enough so that you never loose any information from the top.For widening all that you really need to be "infinite" is one more bit larger than the destination type. We could have had an abi where you specified the precision of every operation based on the precisions of the inputs. Instead, for these kinds of operations, we decided to sniff the port and determine a fixed width that was large enough for everything that was needed for the port. We call that number infinite. This sort of follows the convention that double-int was used with were infinite was 128 bits, but with our design/implementation, we (hopefully) have no bugs where the size of the datatypes needed never runs into implementation limits. * When a constant that has an integer type is converted to a wide-int it comes in with precision 0. For these constants the top bit does accurately reflect the sign of that constant; this is an exception to the normal rule that the signedness is not represented. When used in a binary operation, the wide-int implementation properly extends these constants so that they properly match the other operand of the computation. This allows you write: tree t = ... wide_int x = t + 6; assuming t is a int_cst. This seems dangerous. Not all code that uses "unsigned HOST_WIDE_INT" actually wants it to be an unsigned value. Some code uses it to avoid the undefinedness of signed overflow. So these overloads could lead to us accidentally zero-extending what's conceptually a signed value without any obvious indication that that's happening. Also, hex constants are unsigned int, but it doesn't seem safe to assume that 0x8000 was meant to be zero-extended. I realise the same thing can happen if you mix "unsigned int" with HOST_WIDE_INT, but the point is that you shouldn't really do that in general, whereas we're defining these overloads precisely so that a mixture can be used. I'd prefer some explicit indication of the sign, at least for anything other than plain "int" (so that the compiler will complain about uses of "unsigned int" and above). There is a part of me that finds this scary and a part of me that feels that the concern is largely theoretical.It does make it much easier to read and understand the code to be able to write "t + 6" rather than "wide_int (t) + wide_int::from_uhwi" (6) but of course you loose some control over how 6 is converted. Note that the bits above the precision are not defined and the algorithms used here are careful not to depend on their value. In particular, values that come in from r
Re: wide-int branch now up for public comment and review
On 08/24/2013 08:05 AM, Richard Sandiford wrote: Richard Sandiford writes: I wonder how easy it would be to restrict this use of "zero precision" (i.e. flexible precision) to those where primitive types like "int" are used as template arguments to operators, and require a precision when constructing a wide_int. I wouldn't have expected "real" precision 0 (from zero-width bitfields or whatever) to need any special handling compared to precision 1 or 2. I tried the last bit -- requiring a precision when constructing a wide_int -- and it seemed surprising easy. What do you think of the attached? Most of the forced knock-on changes seem like improvements, but the java part is a bit ugly. I also went with "wide_int (0, prec).cmp" for now, although I'd like to add static cmp, cmps and cmpu alongside leu_p, etc., if that's OK. It would then be possible to write "wide_int::cmp (0, ...)" and avoid the wide_int construction altogether. I wondered whether you might also want to get rid of the build_int_cst* functions, but that still looks a long way off, so I hope using them in these two places doesn't seem too bad. This is just an incremental step. I've also only run it through a subset of the testsuite so far, but full tests are in progress... So i am going to make two high level comments here and then i am going to leave the ultimate decision to the community. (1) I am mildly in favor of leaving prec 0 stuff the way that it is (2) my guess is that richi also will favor this. My justification for (2) is because he had a lot of comments about this before he went on leave and this is substantially the way that it was when he left. Also, remember that one of his biggest dislikes was having to specify precisions. However, this question is really bigger than this branch which is why i hope others will join in, because this really comes down to how do we want the compiler to look when it is fully converted to c++. It has taken me a while to get used to writing and reading code like this where there is a lot of c++ magic going on behind the scenes. And with that magic comes the responsibility of the programmer to get it right. There were/are a lot of people in the gcc community that did not want go down the c++ pathway for exactly this reason. However, i am being converted. The rest of my comments are small comments about the patch, because some of it should be done no matter how the decision is made. = It is perfectly fine to add the static versions of the cmp functions and the usage of those functions in this patch looks perfectly reasonable. Thanks, Richard Index: gcc/fold-const.c === --- gcc/fold-const.c2013-08-24 12:11:08.085684013 +0100 +++ gcc/fold-const.c2013-08-24 01:00:00.0 +0100 @@ -8865,15 +8865,16 @@ pointer_may_wrap_p (tree base, tree offs if (bitpos < 0) return true; + int precision = TYPE_PRECISION (TREE_TYPE (base)); if (offset == NULL_TREE) -wi_offset = wide_int::zero (TYPE_PRECISION (TREE_TYPE (base))); +wi_offset = wide_int::zero (precision); else if (TREE_CODE (offset) != INTEGER_CST || TREE_OVERFLOW (offset)) return true; else wi_offset = offset; bool overflow; - wide_int units = wide_int::from_shwi (bitpos / BITS_PER_UNIT); + wide_int units = wide_int::from_shwi (bitpos / BITS_PER_UNIT, precision); total = wi_offset.add (units, UNSIGNED, &overflow); if (overflow) return true; So this is a part of the code that really should have been using addr_wide_int rather that wide_int. It is doing address arithmetic with bit positions.Because of this, the precision that the calculations should have been done with the precision of 3 + what comes out of the type. The addr_wide_int has a fixed precision that is guaranteed to be large enough for any address math on the machine. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c 2013-08-24 12:11:08.085684013 +0100 +++ gcc/gimple-ssa-strength-reduction.c 2013-08-24 01:00:00.0 +0100 @@ -777,7 +777,6 @@ restructure_reference (tree *pbase, tree { tree base = *pbase, offset = *poffset; max_wide_int index = *pindex; - wide_int bpu = BITS_PER_UNIT; tree mult_op0, t1, t2, type; max_wide_int c1, c2, c3, c4; @@ -786,7 +785,7 @@ restructure_reference (tree *pbase, tree || TREE_CODE (base) != MEM_REF || TREE_CODE (offset) != MULT_EXPR || TREE_CODE (TREE_OPERAND (offset, 1)) != INTEGER_CST - || !index.umod_floor (bpu).zero_p ()) + || !index.umod_floor (BITS_PER_UNIT).zero_p ()) return false; t1 = TREE_OPERAND (base, 0); @@ -822,7 +821,7 @@ restructure_reference (tree *pbase, tree c2 = 0; } - c4 = index.udiv_floor (bpu); + c4 = index.udiv_floor (BITS_PER_UNIT); this
Re: wide-int branch now up for public comment and review
On 08/24/2013 02:16 PM, Florian Weimer wrote: On 08/13/2013 10:57 PM, Kenneth Zadeck wrote: 1) The 4 files that hold the wide-int code itself. You have seen a lot of this code before except for the infinite precision templates. Also the classes are more C++ than C in their flavor. In particular, the integration with trees is very tight in that an int-cst or regular integers can be the operands of any wide-int operation. Are any of these conversions lossy? Maybe some of these constructors should be made explicit? It depends, there is nothing wrong with lossy conversions as long as you know what you are doing.
Re: wide-int branch now up for public comment and review
fixed with the enclosed patch. On 08/23/2013 11:02 AM, Richard Sandiford wrote: /* Return true if THIS is negative based on the interpretation of SGN. For UNSIGNED, this is always false. This is correct even if precision is 0. */ inline bool wide_int::neg_p (signop sgn) const It seems odd that you have to pass SIGNED here. I assume you were doing it so that the caller is forced to confirm signedness in the cases where a tree type is involved, but: * neg_p kind of implies signedness anyway * you don't require this for minus_one_p, so the interface isn't consistent * at the rtl level signedness isn't a property of the "type" (mode), so it seems strange to add an extra hoop there Index: gcc/ada/gcc-interface/decl.c === --- gcc/ada/gcc-interface/decl.c (revision 201967) +++ gcc/ada/gcc-interface/decl.c (working copy) @@ -7479,7 +7479,7 @@ annotate_value (tree gnu_size) tree op1 = TREE_OPERAND (gnu_size, 1); wide_int signed_op1 = wide_int::from_tree (op1).sforce_to_size (TYPE_PRECISION (sizetype)); - if (signed_op1.neg_p (SIGNED)) + if (signed_op1.neg_p ()) { op1 = wide_int_to_tree (sizetype, -signed_op1); pre_op1 = annotate_value (build1 (NEGATE_EXPR, sizetype, op1)); Index: gcc/c-family/c-ada-spec.c === --- gcc/c-family/c-ada-spec.c (revision 201967) +++ gcc/c-family/c-ada-spec.c (working copy) @@ -2197,7 +2197,7 @@ dump_generic_ada_node (pretty_printer *b { wide_int val = node; int i; - if (val.neg_p (SIGNED)) + if (val.neg_p ()) { pp_minus (buffer); val = -val; Index: gcc/config/sparc/sparc.c === --- gcc/config/sparc/sparc.c (revision 201967) +++ gcc/config/sparc/sparc.c (working copy) @@ -10624,7 +10624,7 @@ sparc_fold_builtin (tree fndecl, int n_a overall_overflow |= overall_overflow; tmp = e0.add (tmp, SIGNED, &overflow); overall_overflow |= overall_overflow; - if (tmp.neg_p (SIGNED)) + if (tmp.neg_p ()) { tmp = tmp.neg (&overflow); overall_overflow |= overall_overflow; Index: gcc/expr.c === --- gcc/expr.c (revision 201967) +++ gcc/expr.c (working copy) @@ -6718,7 +6718,7 @@ get_inner_reference (tree exp, HOST_WIDE if (offset) { /* Avoid returning a negative bitpos as this may wreak havoc later. */ - if (bit_offset.neg_p (SIGNED)) + if (bit_offset.neg_p ()) { addr_wide_int mask = addr_wide_int::mask (BITS_PER_UNIT == 8 Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 201967) +++ gcc/fold-const.c (working copy) @@ -183,13 +183,13 @@ div_if_zero_remainder (const_tree arg1, precision by 1 bit, iff the top bit is set. */ if (sgn == UNSIGNED) { - if (warg1.neg_p (SIGNED)) + if (warg1.neg_p ()) warg1 = warg1.force_to_size (warg1.get_precision () + 1, sgn); sgn = SIGNED; } else { - if (warg2.neg_p (SIGNED)) + if (warg2.neg_p ()) warg2 = warg2.force_to_size (warg2.get_precision () + 1, sgn2); } } @@ -979,7 +979,7 @@ int_const_binop_1 (enum tree_code code, case RSHIFT_EXPR: case LSHIFT_EXPR: - if (arg2.neg_p (SIGNED)) + if (arg2.neg_p ()) { arg2 = -arg2; if (code == RSHIFT_EXPR) @@ -999,7 +999,7 @@ int_const_binop_1 (enum tree_code code, case RROTATE_EXPR: case LROTATE_EXPR: - if (arg2.neg_p (SIGNED)) + if (arg2.neg_p ()) { arg2 = -arg2; if (code == RROTATE_EXPR) @@ -7180,7 +7180,7 @@ fold_plusminus_mult_expr (location_t loc /* As we canonicalize A - 2 to A + -2 get rid of that sign for the purpose of this canonicalization. */ if (TYPE_SIGN (TREE_TYPE (arg1)) == SIGNED - && wide_int (arg1).neg_p (SIGNED) + && wide_int (arg1).neg_p () && negate_expr_p (arg1) && code == PLUS_EXPR) { @@ -12323,7 +12323,7 @@ fold_binary_loc (location_t loc, && TYPE_SIGN (type) == SIGNED && TREE_CODE (arg1) == INTEGER_CST && !TREE_OVERFLOW (arg1) - && wide_int (arg1).neg_p (SIGNED) + && wide_int (arg1).neg_p () && !TYPE_OVERFLOW_TRAPS (type) /* Avoid this transformation if C is INT_MIN, i.e. C == -C. */ && !sign_bit_p (arg1, arg1)) Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 201967) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -1824,7 +1824,7 @@ cand_abs_increment (slsr_cand_t c) { max_wide_int increment = cand_increment (c); - if (!address_arithmetic_p && increment.neg_p (SIGNED)) + if (!address_arithmetic_p && increment.neg_p ()) increment = -increment; return increment; @@ -1872,7 +1872,7 @@
Re: wide-int branch now up for public comment and review
The patch goes for (1) but (2) seems better to me, for a few reasons: * As above, constants coming from rtl are already in the right form, so if you create a wide_int from an rtx and only query it, no explicit extension is needed. * Things like logical operations and right shifts naturally preserve the sign-extended form, so only a subset of write operations need to take special measures. right now the internals of wide-int do not keep the bits above the precision clean. as you point out this could be fixed by changing lshift, add, sub, mul, div (and anything else i have forgotten about) and removing the code that cleans this up on exit. I actually do not really care which way we go here but if we do go on keeping the bits clean above the precision inside wide-int, we are going to have to clean the bits in the constructors from rtl, or fix some/a lot of bugs. But if you want to go with the stay clean plan you have to start clean, so at the rtl level this means copying. and it is the not copying trick that pushed me in the direction we went. At the tree level, this is not an issue. There are no constructors for tree-csts that do not have a type and before we copy the rep from the wide-int to the tree, we clean the top bits. (BTW - If i had to guess what the bug is with the missing messages on the mips port, it would be because the front ends HAD a bad habit of creating constants that did not fit into a type and then later checking to see if there were any interesting bits above the precision in the int-cst. This now does not work because we clean out those top bits on construction but it would not surprise me if we missed the fixed point constant path.) So at the tree level, we could easily go either way here, but there is a cost at the rtl level with doing (2). TBH, I think we should do (2) and fix whatever bugs you saw with invalid rtx constants. luckily (or perhaps unluckily) if you try the test it fails pretty quickly - building gcclib on the x86-64. I have enclosed a patch to check this.you can try it your self and see if you really think this is right path. good luck, i fear you may need it. on the other hand, if it is just a few quick bugs, then i will agree that (2) is right path. kenny Index: gcc/wide-int.cc === --- gcc/wide-int.cc (revision 201968) +++ gcc/wide-int.cc (working copy) @@ -171,6 +171,10 @@ wide_int_ro::from_rtx (const rtx_mode_t case CONST_INT: result.val[0] = INTVAL (x); result.len = 1; + + if (prec != HOST_BITS_PER_WIDE_INT) + gcc_assert (result.val[0] == sext_hwi (result.val[0], prec)); + #ifdef DEBUG_WIDE_INT debug_wh ("wide_int:: %s = from_rtx ("HOST_WIDE_INT_PRINT_HEX")\n", result, INTVAL (x));
Re: wide-int branch now up for public comment and review
On 08/25/2013 02:42 AM, Richard Sandiford wrote: Kenneth Zadeck writes: On 08/24/2013 08:05 AM, Richard Sandiford wrote: Richard Sandiford writes: I wonder how easy it would be to restrict this use of "zero precision" (i.e. flexible precision) to those where primitive types like "int" are used as template arguments to operators, and require a precision when constructing a wide_int. I wouldn't have expected "real" precision 0 (from zero-width bitfields or whatever) to need any special handling compared to precision 1 or 2. I tried the last bit -- requiring a precision when constructing a wide_int -- and it seemed surprising easy. What do you think of the attached? Most of the forced knock-on changes seem like improvements, but the java part is a bit ugly. I also went with "wide_int (0, prec).cmp" for now, although I'd like to add static cmp, cmps and cmpu alongside leu_p, etc., if that's OK. It would then be possible to write "wide_int::cmp (0, ...)" and avoid the wide_int construction altogether. I wondered whether you might also want to get rid of the build_int_cst* functions, but that still looks a long way off, so I hope using them in these two places doesn't seem too bad. This is just an incremental step. I've also only run it through a subset of the testsuite so far, but full tests are in progress... So i am going to make two high level comments here and then i am going to leave the ultimate decision to the community. (1) I am mildly in favor of leaving prec 0 stuff the way that it is (2) my guess is that richi also will favor this. My justification for (2) is because he had a lot of comments about this before he went on leave and this is substantially the way that it was when he left. Also, remember that one of his biggest dislikes was having to specify precisions. Hmm, but you seem to be talking about zero precision in general. (I'm going to call it "flexible precision" to avoid confusion with the zero-width bitfield stuff.) I have tried to purge the zero width bitfield case from my mind. it was an ugly incident in the conversion. Whereas this patch is specifically about constructing flexible-precision _wide_int_ objects. I think wide_int objects should always have a known, fixed precision. This is where we differ. I do not. The top level idea is really motivated by richi, but i have come to appreciate his criticism. Many of the times, the specification of the precision is simply redundant and it glops up the code. Note that fixed_wide_ints can still use primitive types in the same way as before, since there the precision is inherent to the fixed_wide_int. The templated operators also work in the same way as before. Only the construction of wide_int proper is affected. As it stands you have various wide_int operators that cannot handle two flexible-precision inputs. This means that innocent-looking code like: extern wide_int foo (wide_int); wide_int bar () { return foo (0); } ICEs when combined with equally innocent-looking code like: wide_int foo (wide_int x) { return x + 1; } So in practice you have to know when calling a function whether any paths in that function will try applying an operator with a primitive type. If so, you need to specify a precison when constructing the wide_int argument. If not you can leave it out. That seems really unclean. my wife, who is a lawyer, likes to quote an old Brittish chancellor: "hard cases make bad law". The fact that you occasionally have to specify one should not be justification for throwing out the entire thing. The point of this template stuff is to avoid constructing wide_int objects from primitive integers whereever possible. And I think the fairly small size of the patch shows that you've succeeded in doing that. But I think we really should specify a precision in the handful of cases where a wide_int does still need to be constructed directly from a primitive type. Thanks, Richard As i said earlier.Lets see what others in the community feel about this.
wide-int branch
cleaned up code to get around tree-vrp issue. added some code that richard is going to play with to see how hard it would be to clean up rtl constants. kenny Index: gcc/wide-int.cc === --- gcc/wide-int.cc (revision 201968) +++ gcc/wide-int.cc (working copy) @@ -171,6 +171,12 @@ wide_int_ro::from_rtx (const rtx_mode_t case CONST_INT: result.val[0] = INTVAL (x); result.len = 1; + +#if 0 + if (prec != HOST_BITS_PER_WIDE_INT) + gcc_assert (result.val[0] == sext_hwi (result.val[0], prec)); +#endif + #ifdef DEBUG_WIDE_INT debug_wh ("wide_int:: %s = from_rtx ("HOST_WIDE_INT_PRINT_HEX")\n", result, INTVAL (x)); Index: gcc/wide-int.h === --- gcc/wide-int.h (revision 201968) +++ gcc/wide-int.h (working copy) @@ -1843,7 +1843,6 @@ wide_int_ro::zforce_to_size (unsigned in inline HOST_WIDE_INT wide_int_ro::sign_mask () const { - int i = len - 1; if (precision < HOST_BITS_PER_WIDE_INT) { /* We don't allow a int:0 inside a struct to get this far, @@ -1853,14 +1852,13 @@ wide_int_ro::sign_mask () const >> (HOST_BITS_PER_WIDE_INT - 1)); } - /* VRP appears to be badly broken and this is a very ugly fix. */ - if (i >= 0) -return val[i] >> (HOST_BITS_PER_WIDE_INT - 1); - - gcc_unreachable (); -#if 0 - return val[len - 1] >> (HOST_BITS_PER_WIDE_INT - 1); -#endif + /* TREE_VRP is not able to see that it is not possible for len to be + 0. So without this test, it warns about this which causes + bootstrap failures. */ + if (len < 1) +gcc_unreachable (); + else +return val[len - 1] >> (HOST_BITS_PER_WIDE_INT - 1); } /* Return THIS & C. */
Re: wide-int branch now up for public comment and review
On 08/25/2013 06:55 PM, Mike Stump wrote: On Aug 25, 2013, at 12:26 AM, Richard Sandiford wrote: (2) Adding a new namespace, wi, for the operators. So far this just contains the previously-static comparison functions and whatever else was needed to avoid cross-dependencies between wi and wide_int_ro (except for the debug routines). It seems reasonable; I don't see anything I object to. Seems like most of the time, the code is shorter (though, you use wi, which is fairly short). It doesn't seem any more complex, though, knowing how to spell the operation wide_int:: v wi:: is confusing on the client side. I'm torn between this and the nice things that come with the patch. (3) Removing the comparison member functions and using the static ones everywhere. I've love to have richi weigh in (or someone else that wants to play the role of C++ coding expert)… I'd defer to them… The idea behind using a namespace rather than static functions is that it makes it easier to separate the core, tree and rtx bits. Being able to separate core, tree and rtx bits gets a +1 in my book. I do understand the beauty of this. IMO wide-int.h shouldn't know about trees and rtxes, and all routines related to them should be in tree.h and rtl.h instead. But using static functions means that you have to declare everything in one place. Also, it feels odd for wide_int to be both an object and a home of static functions that don't always operate on wide_ints, e.g. when comparing a CONST_INT against 16. Yes, though, does wi feel odd being a home for comparing a CONST_INT and 16? :-) on the other hand, how else are you going to do this.i have not seen anyone sign up to make an oo version of rtl and even if they did that, the consts are just a small part of it. i agree that it is odd, but then again, it is actually nice to have a largely similar interface for trees and rtl. I realise I'm probably not being helpful here. Iterating on how we want to code to look like is reasonable. Prettying it up where it needs it, is good. Indeed, if the code is as you like, and as richi likes, we'll then our mission is just about complete. :-) For this patch, I'd love to defer to richi (or someone that has a stronger opinion than I do) to say, better, worse…
wide-int branch updated
fixed fits_uhwi_p. tested on x86-64. kenny Index: gcc/wide-int.h === --- gcc/wide-int.h (revision 201985) +++ gcc/wide-int.h (working copy) @@ -1650,7 +1650,7 @@ wide_int_ro::fits_shwi_p () const inline bool wide_int_ro::fits_uhwi_p () const { - return len == 1 || (len == 2 && val[1] == 0); + return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0); } /* Return the signed or unsigned min of THIS and C. */
wide-int branch updated.
removed all knowledge of SHIFT_COUNT_TRUNCATED from wide-int both Richard Biener and Richard Sandiford had commented negatively about this. fixed bug with wide-int::fits_uhwi_p. kenny Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 201985) +++ gcc/fold-const.c (working copy) @@ -992,9 +992,9 @@ int_const_binop_1 (enum tree_code code, /* It's unclear from the C standard whether shifts can overflow. The following code ignores overflow; perhaps a C standard interpretation ruling is needed. */ - res = op1.rshift (arg2, sign, GET_MODE_BITSIZE (TYPE_MODE (type)), TRUNC); + res = op1.rshift (arg2, sign, GET_MODE_BITSIZE (TYPE_MODE (type))); else - res = op1.lshift (arg2, GET_MODE_BITSIZE (TYPE_MODE (type)), TRUNC); + res = op1.lshift (arg2, GET_MODE_BITSIZE (TYPE_MODE (type))); break; case RROTATE_EXPR: Index: gcc/simplify-rtx.c === --- gcc/simplify-rtx.c (revision 201985) +++ gcc/simplify-rtx.c (working copy) @@ -3507,9 +3507,7 @@ rtx simplify_const_binary_operation (enum rtx_code code, enum machine_mode mode, rtx op0, rtx op1) { -#if TARGET_SUPPORTS_WIDE_INT == 0 unsigned int width = GET_MODE_PRECISION (mode); -#endif if (VECTOR_MODE_P (mode) && code != VEC_CONCAT @@ -3787,40 +3785,45 @@ simplify_const_binary_operation (enum rt break; case LSHIFTRT: - if (wide_int (std::make_pair (op1, mode)).neg_p ()) - return NULL_RTX; - - result = wop0.rshiftu (pop1, bitsize, TRUNC); - break; - case ASHIFTRT: - if (wide_int (std::make_pair (op1, mode)).neg_p ()) - return NULL_RTX; - - result = wop0.rshifts (pop1, bitsize, TRUNC); - break; - case ASHIFT: - if (wide_int (std::make_pair (op1, mode)).neg_p ()) - return NULL_RTX; - - result = wop0.lshift (pop1, bitsize, TRUNC); - break; - case ROTATE: - if (wide_int (std::make_pair (op1, mode)).neg_p ()) - return NULL_RTX; - - result = wop0.lrotate (pop1); - break; - case ROTATERT: - if (wide_int (std::make_pair (op1, mode)).neg_p ()) - return NULL_RTX; + { + wide_int wop1 = pop1; + if (wop1.neg_p ()) + return NULL_RTX; - result = wop0.rrotate (pop1); - break; + if (SHIFT_COUNT_TRUNCATED) + wop1 = wop1.umod_trunc (width); + switch (code) + { + case LSHIFTRT: + result = wop0.rshiftu (wop1, bitsize); + break; + + case ASHIFTRT: + result = wop0.rshifts (wop1, bitsize); + break; + + case ASHIFT: + result = wop0.lshift (wop1, bitsize); + break; + + case ROTATE: + result = wop0.lrotate (wop1); + break; + + case ROTATERT: + result = wop0.rrotate (wop1); + break; + + default: + gcc_unreachable (); + } + break; + } default: return NULL_RTX; } Index: gcc/wide-int.cc === --- gcc/wide-int.cc (revision 201985) +++ gcc/wide-int.cc (working copy) @@ -2404,7 +2404,12 @@ wide_int_ro::divmod_internal_2 (unsigned /* Do a truncating divide DIVISOR into DIVIDEND. The result is the - same size as the operands. SIGN is either SIGNED or UNSIGNED. */ + same size as the operands. SIGN is either SIGNED or UNSIGNED. If + COMPUTE_QUOTIENT is set, the quotient is computed and returned. If + it is not set, the result is undefined. If COMPUTE_REMAINDER is + set, the remaineder is returned in remainder. If it is not set, + the remainder is undefined. If OFLOW is not null, it is set to the + overflow value. */ wide_int_ro wide_int_ro::divmod_internal (bool compute_quotient, const HOST_WIDE_INT *dividend, @@ -2470,6 +2475,10 @@ wide_int_ro::divmod_internal (bool compu quotient.precision = dividend_prec; remainder->precision = dividend_prec; + /* Initialize the incoming overflow if it has been provided. */ + if (oflow) +*oflow = false; + /* If overflow is set, just get out. There will only be grief by continuing. */ if (overflow) Index: gcc/wide-int.h === --- gcc/wide-int.h (revision 201995) +++ gcc/wide-int.h (working copy) @@ -259,19 +259,6 @@ const int addr_max_bitsize = 64; const int addr_max_precision = ((addr_max_bitsize + 4 + HOST_BITS_PER_WIDE_INT - 1) & ~(HOST_BITS_PER_WIDE_INT - 1)); -enum ShiftOp { - NONE, - /* There are two uses for the wide-int shifting functions. The - first use is as an emulation of the target hardware. The - second use is as service routines for other optimizations. The - first case needs to be identified by passing TRUNC as the value - of ShiftOp so that shift amount is properly handled according to the - SHIFT_COUNT_TRUNCATED flag. For the second case, the shift - amount is always truncated by the bytesize of the mode of - THIS. */ - TRUNC -}; - /*
Re: wide-int branch updated
you are about an hour behind in reading your email. I had just committed a patch that is very close to this. On 08/27/2013 02:31 PM, Richard Sandiford wrote: Kenneth Zadeck writes: fixed fits_uhwi_p. tested on x86-64. kenny Index: gcc/wide-int.h === --- gcc/wide-int.h (revision 201985) +++ gcc/wide-int.h (working copy) @@ -1650,7 +1650,7 @@ wide_int_ro::fits_shwi_p () const inline bool wide_int_ro::fits_uhwi_p () const { - return len == 1 || (len == 2 && val[1] == 0); + return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0); } With upper bits being undefined, it doesn't seem safe to check val[0] or val[1] like this. I was thinking along the lines of: inline bool wide_int_ro::fits_uhwi_p () const { if (precision <= HOST_BITS_PER_WIDE_INT) return true; if (len == 1) return val[0] >= 0; if (precision < HOST_BITS_PER_WIDE_INT * 2) return ((unsigned HOST_WIDE_INT) val[1] << (HOST_BITS_PER_WIDE_INT * 2 - precision)) == 0; return val[1] == 0; } Since we don't have a sign, everything HWI-sized or smaller fits in a uhwi without loss of precision. I've tested the above on x86_64-linux-gnu FWIW, in case it looks OK. Thanks, Richard
wide-int branch: cleaned up comparison functions.
Removed the redundant implementations of several comparison function by just forwarding the oo version to the static version. Added static versions of cmp, cmpu and cmps. kenny Index: gcc/wide-int.h === --- gcc/wide-int.h (revision 202033) +++ gcc/wide-int.h (working copy) @@ -443,12 +443,18 @@ public: template int cmp (const T &, signop) const; + template + static int cmp (const T1 &, const T2 &, signop); template int cmps (const T &) const; + template + static int cmps (const T1 &, const T2 &); template int cmpu (const T &) const; + template + static int cmpu (const T1 &, const T2 &); bool only_sign_bit_p (unsigned int) const; bool only_sign_bit_p () const; @@ -1137,38 +1143,7 @@ template inline bool wide_int_ro::operator == (const T &c) const { - bool result; - HOST_WIDE_INT ws[WIDE_INT_MAX_ELTS]; - const HOST_WIDE_INT *s; - unsigned int cl; - unsigned int p1, p2; - - p1 = precision; - - s = to_shwi1 (ws, &cl, &p2, c); - check_precision (&p1, &p2, true, false); - - if (p1 == 0) -/* There are prec 0 types and we need to do this to check their - min and max values. */ -result = (len == cl) && (val[0] == s[0]); - else if (p1 < HOST_BITS_PER_WIDE_INT) -{ - unsigned HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << p1) - 1; - result = (val[0] & mask) == (s[0] & mask); -} - else if (p1 == HOST_BITS_PER_WIDE_INT) -result = val[0] == s[0]; - else -result = eq_p_large (val, len, p1, s, cl); - - if (result) -gcc_assert (len == cl); - -#ifdef DEBUG_WIDE_INT - debug_vwa ("wide_int_ro:: %d = (%s == %s)\n", result, *this, s, cl, p2); -#endif - return result; + return wide_int_ro::eq_p (*this, c); } /* Return true if C1 == C2. If both parameters have nonzero precisions, @@ -1219,31 +1194,7 @@ template inline bool wide_int_ro::lts_p (const T &c) const { - bool result; - HOST_WIDE_INT ws[WIDE_INT_MAX_ELTS]; - const HOST_WIDE_INT *s; - unsigned int cl; - unsigned int p1, p2; - - p1 = precision; - s = to_shwi1 (ws, &cl, &p2, c); - check_precision (&p1, &p2, false, true); - - if (p1 <= HOST_BITS_PER_WIDE_INT - && p2 <= HOST_BITS_PER_WIDE_INT) -{ - gcc_assert (cl != 0); - HOST_WIDE_INT x0 = sext_hwi (val[0], p1); - HOST_WIDE_INT x1 = sext_hwi (s[0], p2); - result = x0 < x1; -} - else -result = lts_p_large (val, len, p1, s, cl, p2); - -#ifdef DEBUG_WIDE_INT - debug_vwa ("wide_int_ro:: %d = (%s lts_p %s\n", result, *this, s, cl, p2); -#endif - return result; + return wide_int_ro::lts_p (*this, c); } /* Return true if C1 < C2 using signed comparisons. */ @@ -1283,30 +1234,7 @@ template inline bool wide_int_ro::ltu_p (const T &c) const { - bool result; - HOST_WIDE_INT ws[WIDE_INT_MAX_ELTS]; - const HOST_WIDE_INT *s; - unsigned int cl; - unsigned int p1, p2; - - p1 = precision; - s = to_shwi1 (ws, &cl, &p2, c); - check_precision (&p1, &p2, false, true); - - if (p1 <= HOST_BITS_PER_WIDE_INT - && p2 <= HOST_BITS_PER_WIDE_INT) -{ - unsigned HOST_WIDE_INT x0 = zext_hwi (val[0], p1); - unsigned HOST_WIDE_INT x1 = zext_hwi (s[0], p2); - result = x0 < x1; -} - else -result = ltu_p_large (val, len, p1, s, cl, p2); - -#ifdef DEBUG_WIDE_INT - debug_vwa ("wide_int_ro:: %d = (%s ltu_p %s)\n", result, *this, s, cl, p2); -#endif - return result; + return wide_int_ro::ltu_p (*this, c); } /* Return true if C1 < C2 using unsigned comparisons. */ @@ -1524,27 +1452,37 @@ wide_int_ro::ge_p (const T1 &c1, const T return geu_p (c1, c2); } -/* Returns -1 if THIS < C, 0 if THIS == C and 1 if A > C using +/* Returns -1 if THIS < C, 0 if THIS == C and 1 if THIS > C using signed compares. */ template inline int wide_int_ro::cmps (const T &c) const { + return wide_int_ro::cmps (*this, c); +} + +/* Returns -1 if C1 < C2, 0 if C1 == C2 and 1 if C1 > C2 using + signed compares. */ +template +inline int +wide_int_ro::cmps (const T1 &c1, const T2 &c2) +{ int result; - HOST_WIDE_INT ws[WIDE_INT_MAX_ELTS]; - const HOST_WIDE_INT *s; - unsigned int cl; - unsigned int prec; + HOST_WIDE_INT ws1[WIDE_INT_MAX_ELTS]; + HOST_WIDE_INT ws2[WIDE_INT_MAX_ELTS]; + const HOST_WIDE_INT *s1, *s2; /* Returned data */ + unsigned int cl1, cl2; /* array lengths */ + unsigned int p1, p2; /* precisions */ - s = to_shwi1 (ws, &cl, &prec, c); - if (prec == 0) -prec = precision; + s1 = to_shwi1 (ws1, &cl1, &p1, c1); + s2 = to_shwi1 (ws2, &cl2, &p2, c2); + check_precision (&p1, &p2, false, true); - if (precision <= HOST_BITS_PER_WIDE_INT - && prec <= HOST_BITS_PER_WIDE_INT) + if (p1 <= HOST_BITS_PER_WIDE_INT + && p2 <= HOST_BITS_PER_WIDE_INT) { - HOST_WIDE_INT x0 = sext_hwi (val[0], precision); - HOST_WIDE_INT x1 = sext_hwi (s[0], prec); + HOST_WIDE_INT x0 = sext_hwi (s1[0], p1); + HOST_WIDE_INT x1 =
Re: wide-int branch updated.
On 08/28/2013 03:45 AM, Richard Biener wrote: On Tue, 27 Aug 2013, Kenneth Zadeck wrote: removed all knowledge of SHIFT_COUNT_TRUNCATED from wide-int both Richard Biener and Richard Sandiford had commented negatively about this. fixed bug with wide-int::fits_uhwi_p. inline bool wide_int_ro::fits_uhwi_p () const { - return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0); + return (precision <= HOST_BITS_PER_WIDE_INT) +|| (len == 1 && val[0] >= 0) +|| (len == 2 && (precision >= 2 * HOST_BITS_PER_WIDE_INT) && (val[1] == 0)) +|| (len == 2 && (sext_hwi (val[1], precision & (HOST_BITS_PER_WIDE_INT - 1)) == 0)); } it now get's scary ;) Still wrong for precision == 0? no, because anything that comes in at precision 0 is a canonized sign extended number already. the precision 0 just means that it is safe to be any precision. ;) I wonder what it's semantic is ... in double_int we simply require high == 0 (thus, negative numbers are not allowed). with precision <= HOST_BITS_PER_WIDE_INT you allow negative numbers. Matching what double-int fits_uhwi does would be (len == 1 && ((signed HOST_WIDE_INT)val[0]) >= 0) it is signed so i am matching this part. || (len == 2 && val[1] == 0) so this does not work. say i had a precision 70 bit wide-int. The bits above the precision are undefined, so i have to clear it out. This is what the two lines at len 2 are for. However if the precision is greater than 2 hwi's then we can do something this simple. kenny (I don't remember off-hand the signedness of val[], but eventually you missed the conversion to signed) Now, what double-int does is supposed to match host_integerp (..., 1) which I think it does. Richard.
Re: wide-int branch updated
On 08/28/2013 03:50 AM, Richard Biener wrote: On Tue, 27 Aug 2013, Richard Sandiford wrote: Kenneth Zadeck writes: fixed fits_uhwi_p. tested on x86-64. kenny Index: gcc/wide-int.h === --- gcc/wide-int.h (revision 201985) +++ gcc/wide-int.h (working copy) @@ -1650,7 +1650,7 @@ wide_int_ro::fits_shwi_p () const inline bool wide_int_ro::fits_uhwi_p () const { - return len == 1 || (len == 2 && val[1] == 0); + return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0); } With upper bits being undefined, it doesn't seem safe to check Err - we're back at upper bits being undefined? Ugh. Then having both fits_uhwi_p and fits_shwi_p doesn't make sense. yes, that is the problem. Richard is going to look into what it might take to make rtl always have the upper bits canonized as sign extended. if he/we can do this, then we will change everything so that it is always canonized above the precision. Every one thinks this is a good plan, assuming you do not have to rewrite ever back end to do it. val[0] or val[1] like this. I was thinking along the lines of: inline bool wide_int_ro::fits_uhwi_p () const { if (precision <= HOST_BITS_PER_WIDE_INT) return true; if (len == 1) return val[0] >= 0; if (precision < HOST_BITS_PER_WIDE_INT * 2) return ((unsigned HOST_WIDE_INT) val[1] << (HOST_BITS_PER_WIDE_INT * 2 - precision)) == 0; return val[1] == 0; } Since we don't have a sign, everything HWI-sized or smaller fits in a uhwi without loss of precision. Which then means fits_hwi_p () is simply if (precision <= HOST_BITS_PER_WIDE_INT) return true; zext (); // ick, modification in a predicate! stupid undefined bits! return len == 1 || (len == 2 && val[1] == 0); we do not ever do this. where are you looking. but if upper bits are undefined then the whole encoding and len business is broken. no it is not. btw - welcome back. we have missed you. Richard.
Re: wide-int branch now up for public comment and review
Note that the bits above the precision are not defined and the algorithms used here are careful not to depend on their value. In particular, values that come in from rtx constants may have random bits. Which is a red herring. It should be fixed. I cannot even believe that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH or INTVAL/UINTVAL. I don't see accesses masking out 'undefined' bits anywhere. Richi, you asked for the entire patch as a branch so that you could look at the whole picture. It is now time for you to do that.I understand it is very large and it will take some time for you to get your head around the whole thing. But remember this: The vast majority of the clients are dealing with intermediate code that has explicit promotions.Not only the rtl level, but the majority of the tree level takes inputs that have explicit precisions that have been matched and wants to see an explicit precision result. For this code, doing the fixed precision thing where you do not ever ask about what is behind the curtain is a very good match. However, there are parts of the compiler, all within the tree or gimple level, that do not have this view. For this, there are two templates export an interface that behaves in a manner very similar to what double-int does when the precision was smaller than 128 bits. (we discovered a large number of bugs when using double int for timode because they did make an infinite precision assumption, but that is another story.) All numbers are converted to signed numbers that are extended based on their input type and the math is performed in a large enough field so that they never push near the end. We know what the end is because we sniff the port. At this point we looked at the pass we were converting and we used the appropriate implementation that match the style of coding and the algorithm. i.e. we made no substantive changes. As mentioned in my earlier mail, i plan to change in the future tree-ssa-ccp to use the fixed precision form, but this is a change that is motivated by being able to find more constants with the proper representation than what is beautiful. But this is the only case where i think the rep should be substantially changed. I know you find the fixed precision stuff unappealing. But the truth is that you wanted us to make this patch so that it did as little damage to the way the compiler worked as possible and given that so much of the compiler actually does fixed precision math, this is the path of least resistance. If it is reasonable to change the rtl, we may change that, but the truth is that the clients never see this so it is not as much of an issue as you are making it. Now you can see all of the clients, see this for yourself. Kenny
Re: wide-int branch now up for public comment and review
On 08/28/2013 12:45 PM, Mike Stump wrote: On Aug 28, 2013, at 5:48 AM, Richard Biener wrote: Only if the precision is > HOST_BITS_PER_WIDE_INT. If the precision is HOST_BITS_PER_WIDE_INT then both are { -1U }. That wasn't my understanding on how things work. You are thinking about prec==0 numbers. These are useful and important for ease of use of the wide-int package. They allow one to do: wide_int w = …; w = w + 6; w = w - 3; w = w + (unsigned HOST_WIDE_INT)~0; and extend the constant out to the precision of the other side. This is a very narrow feature and not a general property of a wide_int. In general, signedness of a wide_int is an external feature of wide_int. We only permit prec==0 numbers for ease of use, and ease of use needs to track the sign, in the general case. Now, one is free to have a precision that allows the sign to be stored, this is available to the user, if they want. They merely are not forced to do this. For example, RTL largely doesn't want or need a sign. Right, that's what the constructors, from_* and to_* routines do. I wonder where the from_tree and to_tree ones are? tree t; wide_int w = t; wide_int_to_tree needs an additional type, so, the spelling is not as short out of necessity. i made wide_int_to_tree a function that lives in tree.[ch], not a member function of wide-int.This seemed to be consistent with the way other things were done.if you want it to be a member function, that is certainly doable. Are they from_double_int / wide_int_to_tree (what's wide_int_to_infinite_tree?) I think wide_int_to_infinite_tree is leftover junk. I removed it: diff --git a/gcc/wide-int.h b/gcc/wide-int.h index 86be20a..83c2170 100644 --- a/gcc/wide-int.h +++ b/gcc/wide-int.h @@ -4203,8 +4203,6 @@ wide_int_ro::to_shwi2 (HOST_WIDE_INT *s ATTRIBUTE_UNUSED, /* tree related routines. */ extern tree wide_int_to_tree (tree type, const wide_int_ro &cst); -extern tree wide_int_to_infinite_tree (tree type, const wide_int_ro &cst, - unsigned int prec); extern tree force_fit_type_wide (tree, const wide_int_ro &, int, bool); /* real related routines. */
Re: wide-int branch now up for public comment and review
Note that the bits above the precision are not defined and the algorithms used here are careful not to depend on their value. In particular, values that come in from rtx constants may have random bits. Which is a red herring. It should be fixed. I cannot even believe that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH or INTVAL/UINTVAL. I don't see accesses masking out 'undefined' bits anywhere. I can agree with you that this could be fixed. But it is not necessary to fix it. The rtl level and most of the tree level has existed for a long time by doing math within the precision. you do not see the masking at the rtl level because the masking is not necessary.if you never look beyond the precision you just do not care.There is the issue with divide and mod and quite frankly the code on the trunk scares me to death. my code at the rtl level makes sure that everything is clean since i can see if it is a udiv or a div that is enough info to clean the value up. I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't have random bits. The upper bits must be a sign extension of the value. There's exactly one valid rtx for each (value, mode) pair. If you saw something different then that sounds like a bug. The rules should already be fairly well enforced though, since something like (const_int 128) -- or (const_int 256) -- will not match a QImode operand. See. We're saved ;) this is richard's theory. There is a block of code at wide-int.cc:175 that is ifdefed out that checks to see if the rtl consts are canonical. if you bring it in, building the libraries fails very quickly. The million dollar question is this the only bug or is this the first bug of a 1000 bugs. Your comment about not seeing any masking cuts both ways. There is very little code on the way in if you create ints with GEN_INT that makes sure someone has done the right thing on that side. So my guess is that there are a lot of failures and a large number of them will be in the ports. If richard is right then there will be changes. The internals of wide-int will be changed so that everything is maintained in canonical form rather than just doing the canonization on the outside. This will clean up things like fits_uhwi_p and a lot more of the wide-int internals. But i think that if you want to change the compiler to use infinite precision arithmetic, you really ought to have a better reason that you think that it is cleaner. Because, it buys you nothing for most of the compiler and it is slower AND it is a much bigger change to the compiler than the one we want to make. This is probably the part of the representation that I disagree most with. There seem to be two main ways we could hande the extension to whole HWIs: (1) leave the stored upper bits undefined and extend them on read (2) keep the stored upper bits in extended form The patch goes for (1) but (2) seems better to me, for a few reasons: I agree whole-heartedly. my statement is above. if we can do 2 then we should. But I do not think that it is worth several people-years to do this. * As above, constants coming from rtl are already in the right form, so if you create a wide_int from an rtx and only query it, no explicit extension is needed. * Things like logical operations and right shifts naturally preserve the sign-extended form, so only a subset of write operations need to take special measures. * You have a public interface that exposes the underlying HWIs (which is fine with me FWIW), so it seems better to expose a fully-defined HWI rather than only a partially-defined HWI. E.g. zero_p is: HOST_WIDE_INT x; if (precision && precision < HOST_BITS_PER_WIDE_INT) x = sext_hwi (val[0], precision); else if (len == 0) { gcc_assert (precision == 0); return true; } else x = val[0]; return len == 1 && x == 0; but I think it really ought to be just: return len == 1 && val[0] == 0; Yes! But then - what value does keeping track of a 'precision' have in this case? It seems to me it's only a "convenient carrier" for wide_int x = wide-int-from-RTX (y); machine_mode saved_mode = mode-available? GET_MODE (y) : magic-mode; ... process x ... RTX = RTX-from-wide_int (x, saved_mode); that is, wide-int doesn't do anything with 'precision' but you can extract it later to not need to remember a mode you were interested in? Oh, and of course some operations require a 'precision', like rotate. As i have said, when the operands and result are all precision correct, as they are generally in both rtl and tree, then the default fixed precision interface is really very clean.I know you object to some implementation issues, but this is why we put the branch up, so you can see what you get from the other side. When the precision is 0, all the bits in the LEN elements of VE
Re: wide-int branch now up for public comment and review
On 08/29/2013 04:42 AM, Richard Biener wrote: On Wed, 28 Aug 2013, Kenneth Zadeck wrote: Note that the bits above the precision are not defined and the algorithms used here are careful not to depend on their value. In particular, values that come in from rtx constants may have random bits. Which is a red herring. It should be fixed. I cannot even believe that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH or INTVAL/UINTVAL. I don't see accesses masking out 'undefined' bits anywhere. I can agree with you that this could be fixed. But it is not necessary to fix it. The rtl level and most of the tree level has existed for a long time by doing math within the precision. you do not see the masking at the rtl level because the masking is not necessary.if you never look beyond the precision you just do not care. There is the issue with divide and mod and quite frankly the code on the trunk scares me to death. my code at the rtl level makes sure that everything is clean since i can see if it is a udiv or a div that is enough info to clean the value up. I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't have random bits. The upper bits must be a sign extension of the value. There's exactly one valid rtx for each (value, mode) pair. If you saw something different then that sounds like a bug. The rules should already be fairly well enforced though, since something like (const_int 128) -- or (const_int 256) -- will not match a QImode operand. See. We're saved ;) this is richard's theory. There is a block of code at wide-int.cc:175 that is ifdefed out that checks to see if the rtl consts are canonical. if you bring it in, building the libraries fails very quickly. The million dollar question is this the only bug or is this the first bug of a 1000 bugs. Your comment about not seeing any masking cuts both ways. There is very little code on the way in if you create ints with GEN_INT that makes sure someone has done the right thing on that side. So my guess is that there are a lot of failures and a large number of them will be in the ports. Well, clearly RTL code _expects_ constants to be properly extended. I can reproduce the bug and I simply guess that's a matter of mismatching modes here (or performing an implicit truncation without properly extending the constant). at /space/rguenther/src/svn/wide-int/gcc/combine.c:10086 10086 GEN_INT (count)); (gdb) l 10081 10082 mask_rtx = GEN_INT (nonzero_bits (varop, GET_MODE (varop))); 10083 10084 mask_rtx 10085 = simplify_const_binary_operation (code, result_mode, mask_rtx, 10086 GEN_INT (count)); uses of GEN_INT are frowned upon ... for exactly that reason - the mask_rtx is not a proper RTL constant for SImode. over time, the GEN_INTs will go away at the portable rtl level as more of the code is transitioned to use wide-int.The port story is not so good. Any port that uses TI or beyond will likely evolve to using wide-int for the math and unifying the cases between CONST_INT and CONST_WIDE_INT. (the wide-int constructor from rtl takes either and the constructor to rtl looks at the constant.) But a port like the mips that has no TI will likely never change. Btw, all this isn't a reason to not have a well-defined wide-int representation. You just have (as you have for trees) to properly canonize the representation at the time you convert from RTL constants to wide-int constants. The wide-int representation is completely well defined.This is what i cannot see why you do not understand. You do not need to look above the precision so there is nothing undefined!! I know this bothers you very badly and i will agree that if it is easy to clean up rtl to match this, then it would simplify the code somewhat.But it is not undefined or random. Furthermore, cleaning it will not change the abi so if rtl gets cleaner, we can change this. In the long run we want a uniform representation of constants so we can do zero-copying - but it looks like we now have three different representations - the tree one (sign or zero extended dependent on sign), RTL (garbage as you show) and wide-int (always sign-extended). You wanted this to look more like double-int, now that it does, you now to see the flaws in it.Do i ever get to win? I do not think anyone would have been served well by trying to make trees have only signed constants in the way that double-int did. I will admit that the implementation would be cleaner if we could just change everything else in the compiler to match a super clean wide-int design. I took the other approach and bottled as much of the ugliness behind the a api and tried to keep the rest of the compiler
Re: wide-int branch now up for public comment and review
On 08/28/2013 05:08 AM, Richard Biener wrote: On Sun, 25 Aug 2013, Mike Stump wrote: On Aug 25, 2013, at 12:26 AM, Richard Sandiford wrote: (2) Adding a new namespace, wi, for the operators. So far this just contains the previously-static comparison functions and whatever else was needed to avoid cross-dependencies between wi and wide_int_ro (except for the debug routines). It seems reasonable; I don't see anything I object to. Seems like most of the time, the code is shorter (though, you use wi, which is fairly short). It doesn't seem any more complex, though, knowing how to spell the operation wide_int:: v wi:: is confusing on the client side. I'm torn between this and the nice things that come with the patch. (3) Removing the comparison member functions and using the static ones everywhere. I've love to have richi weigh in (or someone else that wants to play the role of C++ coding expert)? I'd defer to them? Yeah - wi::lt (a, b) is much better than a.lt (b) IMHO. It mimics how the standard library works. The idea behind using a namespace rather than static functions is that it makes it easier to separate the core, tree and rtx bits. Being able to separate core, tree and rtx bits gets a +1 in my book. I do understand the beauty of this. Now, if you look back in discussions I wanted a storage abstraction anyway. Basically the interface is class wide_int_storage { int precision (); int len (); element_t get (unsigned); void set (unsigned, element_t); }; and wide_int is then templated like template class wide_int : public storage { }; where RTX / tree storage classes provide read-only access to their storage and a rvalue integer rep to its value. You can look at my example draft implementation I posted some months ago. But I'll gladly wiggle on the branch to make it more like above (easy step one: don't access the wide-int members directly but via accessor functions) you are of course welcome to do this. But there were two questions that i never got an answer to. 1) how does doing this help in any way. Are their clients that will use this? 2) isn't this just going to slow wide-int down? It seems clear that there is a real performance cost in that now there is going to be an extra step of indirection on each access to the underlying data structures. I will point out we have cut down on the copying by sharing the underlying value from trees or rtl when the value is short lived. But given that most constants only take a single hwi to represent, getting rid of the copying seems like a very small payback for the increase in access costs. IMO wide-int.h shouldn't know about trees and rtxes, and all routines related to them should be in tree.h and rtl.h instead. But using static functions means that you have to declare everything in one place. Also, it feels odd for wide_int to be both an object and a home of static functions that don't always operate on wide_ints, e.g. when comparing a CONST_INT against 16. Indeed - in my sample the wide-int-rtx-storage and wide-int-tree-storage storage models were declared in rtl.h and tree.h and wide-int.h did know nothing about them. this could be done for us too, it has nothing to do with the storage model. Yes, though, does wi feel odd being a home for comparing a CONST_INT and 16? :-) I realise I'm probably not being helpful here. Iterating on how we want to code to look like is reasonable. Prettying it up where it needs it, is good. Indeed, if the code is as you like, and as richi likes, we'll then our mission is just about complete. :-) For this patch, I'd love to defer to richi (or someone that has a stronger opinion than I do) to say, better, worse? The comparisons? Better. richard s is making these static. Thanks, Richard.
wide-int branch: fixed mips regression
Fixed FAIL: gcc.dg/fixed-point/int-warning.c (test for warnings, line 12) on mips64-unknown-linux-gnu Index: gcc/tree.c === --- gcc/tree.c (revision 202088) +++ gcc/tree.c (working copy) @@ -8531,11 +8531,11 @@ bool int_fits_type_p (const_tree c, const_tree type) { tree type_low_bound, type_high_bound; - bool ok_for_low_bound, ok_for_high_bound, unsc; + bool ok_for_low_bound, ok_for_high_bound; wide_int wc, wd; + signop sgn_c = TYPE_SIGN (TREE_TYPE (c)); wc = c; - unsc = TYPE_UNSIGNED (TREE_TYPE (c)); retry: type_low_bound = TYPE_MIN_VALUE (type); @@ -8555,17 +8555,17 @@ retry: if (type_low_bound && TREE_CODE (type_low_bound) == INTEGER_CST) { wd = type_low_bound; - if (unsc != TYPE_UNSIGNED (TREE_TYPE (type_low_bound))) + if (sgn_c != TYPE_SIGN (TREE_TYPE (type_low_bound))) { - int c_neg = (!unsc && wc.neg_p ()); - int t_neg = (unsc && wd.neg_p ()); + int c_neg = (sgn_c == SIGNED && wc.neg_p ()); + int t_neg = (sgn_c == UNSIGNED && wd.neg_p ()); if (c_neg && !t_neg) return false; if ((c_neg || !t_neg) && wc.ltu_p (wd)) return false; } - else if (wc.lt_p (wd, TYPE_SIGN (TREE_TYPE (type_low_bound + else if (wc.lt_p (wd, sgn_c)) return false; ok_for_low_bound = true; } @@ -8576,17 +8576,17 @@ retry: if (type_high_bound && TREE_CODE (type_high_bound) == INTEGER_CST) { wd = type_high_bound; - if (unsc != TYPE_UNSIGNED (TREE_TYPE (type_high_bound))) + if (sgn_c != TYPE_SIGN (TREE_TYPE (type_high_bound))) { - int c_neg = (!unsc && wc.neg_p ()); - int t_neg = (unsc && wd.neg_p ()); + int c_neg = (sgn_c == SIGNED && wc.neg_p ()); + int t_neg = (sgn_c == UNSIGNED && wd.neg_p ()); if (t_neg && !c_neg) return false; if ((t_neg || !c_neg) && wc.gtu_p (wd)) return false; } - else if (wc.gt_p (wd, TYPE_SIGN (TREE_TYPE (type_high_bound + else if (wc.gt_p (wd, sgn_c)) return false; ok_for_high_bound = true; } @@ -8600,7 +8600,7 @@ retry: /* Perform some generic filtering which may allow making a decision even if the bounds are not constant. First, negative integers never fit in unsigned types, */ - if (TYPE_UNSIGNED (type) && !unsc && wc.neg_p ()) + if (TYPE_UNSIGNED (type) && sgn_c == SIGNED && wc.neg_p ()) return false; /* Second, narrower types always fit in wider ones. */ @@ -8608,7 +8608,7 @@ retry: return true; /* Third, unsigned integers with top bit set never fit signed types. */ - if (! TYPE_UNSIGNED (type) && unsc && wc.neg_p ()) + if (!TYPE_UNSIGNED (type) && sgn_c == UNSIGNED && wc.neg_p ()) return false; /* If we haven't been able to decide at this point, there nothing more we Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 202088) +++ gcc/fold-const.c (working copy) @@ -1690,7 +1690,7 @@ fold_convert_const_int_from_fixed (tree /* Given a fixed-point constant, make new constant with new type, appropriately sign-extended or truncated. */ t = force_fit_type (type, wide_int::from_double_int (temp, - TYPE_PRECISION (type)), + HOST_BITS_PER_DOUBLE_INT), -1, (temp.is_negative () && (TYPE_UNSIGNED (type)
converting wide-int so that it uses its own type rather than hwi.
richi, on further thought, this is going to be a huge task. The problem is at the edges. right now we share the rep of the array with the tree-csts and rtl (though the rtl sharing may go away to support canonization). So if the hwi rep inside of wide int changes then we will have to implement copying with reblocking at that interface or push the type into there and change all of the fits_*hwi and to_*hwi interfaces to fit this different type. i think i can get at least as good and perhaps better test coverage by changing the rep of hwi for a port.There will also be fallout work there, but it will be productive, in that it is just changing code from only doing the hwi case to supporting all precisions. Kenny
Re: [RFC] Changes to the wide-int classes
My main discomfort is the double-int part. At this point in time the only usage of double-int left on the branch is because the fixed ints use it as their underlying representation. I think that rather than doing a lot of work to accommodate this, it would be better to just change the fixed representation. There is no place for exactly two HWIs in the machine independent parts of the compiler, and the truth is that there is no (except for the fixed stuff) use of double ints remaining in the ports. Even if, by sniffing the ports, we end up with a flavor of wide-int that has two wide-ints as its rep, it should not be be called double-int or expose what it's rep is. I do understand that richi had requested this in some of the early discussions but given that there are virtually no clients for it left, lets let this die. As for the rest of the patch, this is for the c++ experts to hash out. I have no problems with it but i am not an expert and so if the consensus likes it, then we can go in this direction. small bugs below this line. bottom of frag 3 of gcc/cp/init.c is wrong: you replaced rshift...lshift with lshift...lshift. i will finish reading this tomorrow, but i wanted to get some comments in for the early shift.i stopped reading at line 1275. kenny On 09/01/2013 03:21 PM, Richard Sandiford wrote: This is an RFC and patch for an alternative way of organising the wide-int classes, along the lines I mentioned earlier. The main points are below, each with a "headline" and a bit of extra waffle that can be skipped if too long: * As Richard requested, the main wide int class is parameterised by the storage: template class GTY(()) generic_wide_int : public storage * As Richard also requested, double_int is now implemented in terms of the wide-int routines. This didn't work out quite as elegantly as I'd hoped due to conflicting requirements. double_int is used in unions and so needs to be a POD, whereas the fancy things we want to allow for wide_int and fixed_wide_int mean that they need to have constructors. The patch therefore keeps double_int as the basic storage class and defines double_int_ext as the wide-int class. All the double_int methods therefore need to be kept, but are now simple wrappers around the wi:: routines. double_int_ext and fixed_wide_int are assignment-compatible. This is just to show that it's possible though. It probably isn't very efficient... * wide-int.h no longer includes tree.h, rtl.h or double-int.h. The rtx and machine_mode routines are now in rtl.h, and the tree-related ones are in tree.h. double-int.h now depends on wide-int.h, as described above. * wide-int.h no longer includes tm.h. This is done by adding a new MAX_BITS_PER_UNIT to machmode.def, so that the definition of MAX_BITSIZE_MODE_ANY_MODE no longer relies on BITS_PER_UNIT. Although I think we usually assume that BITS_PER_UNIT is a constant, that wouldn't necessarily be true if we ever did support multi-target compilers in future. MAX_BITS_PER_UNIT is logically the maximum value of BITS_PER_UNIT for any compiled-in target and must be a constant. * Precision 0 is no longer a special marker for primitive types like ints. It's only used for genuine 0-width integers. * The wide-int classes are now relatively light-weight. All the real work is done by wi:: routines. There are still operator methods for addition, multiplication, etc., but they just forward to the associated wi:: routine. I also reluctantly kept and_not and or_not as operator-like methods for now, although I'd like to get rid of them and just keep the genuine operators. The problem is that I'd have liked the AND routine to be "wi::and", but of course that isn't possible with "and" being a keyword, so I went for "wi::bit_and" instead. Same for "not" and "wi::bit_not", and "or" and "wi::bit_or". Then it seemed like the others should be bit_* too, and "wi::bit_and_not" just seems a bit unwieldly... Hmm, if we decide to forbid the use of "and" in gcc, perhaps we could #define it to something safe. But that would probably be too confusing. I'm sure those who like stepping through gcc with gdb are going to hate this patch already, without that to make things worse... * fixed_wide_int now only has the number of HWIs required by N. This makes addr_wide_int significantly smaller. * fixed_wide_int doesn't have a precision field; the precision is always taken directly from the template argument. This isn't a win sizewise, since the length and precision fitted snugly into a HWI slot, but it means that checks for particular precisions can be resolved at compile time. E.g. the fast single-HWI paths are now dropped when dealing with fixed_wide_ints. * Each integer type is classifed as one of: FLEXIBLE_PRECISION, CONST_PRECISION and VAR_PRECISION. FL
Re: [RFC] Changes to the wide-int classes
On 09/02/2013 05:35 AM, Richard Biener wrote: On Mon, 2 Sep 2013, Richard Sandiford wrote: Kenneth Zadeck writes: There is no place for exactly two HWIs in the machine independent parts of the compiler, I totally agree. In fact I was taking that so much for granted that I didn't even think to add a rider about it, sorry. I didn't mean to imply that we should keep double_int around. I think the reason for doing this is to prove that it can be done (so that the wide_int code isn't too big a change for the tree level) and to make it easier to merge the wide-int patches into trunk piecemeal if we need to. Note that changing the tree rep to non-double_int is easy. Also note that I only want 'double_int' to stay around to have a fast type that can work on two HWIs for the code that need more precision than a HWI. The only important cases I know of are in get_inner_reference and get_ref_base_and_extent and friends. Those are heavily used (and the only double-int function callers that even show up in regular cc1 profiles). So if wide_int_fixed<2> ('2' better be replaced with 'number-that-gets-me-twice-target-sizetype-precision') works for those cases then fine (and we can drop double-ints). This is what the addr_wide_int is for - it sniffs the port and is guaranteed to big enough to hold the largest pointer, + 4 bits (3 bits for bitposition and 1 bit so that unsigned things come over with no loss) then that is rounded up to the next hwi. (i will admit that the sniffing code needs a little work but that can be fixed without changing the api). kenny
Re: [RFC] Changes to the wide-int classes
Richi, and the rest of the community, Richard Sandiford has proposed a set of patches that change the wide-int api in a significant way. We think that we really need some input from the community as to if this is what we want using C++ in gcc is going to look like. There are, as I see it, two issues that are raised by these patches: 1) Visibility. In my original wide-int patch, there were things that were public and things that were private. In general, the implementation details were private, but also the privacy was used to enforce the readonlyness of the representation. A wide int is an object and that object has value that cannot be changed. In Richard's restructuring, everything is visible in the wi namespace. The privacy of the internals can only be enforced by review of the maintainers. It is possible restore the privacy of the original patches but this seems to involve using a private class inside the namespace and using C++ friends. Adding visibility adds extra code namespace wi { class impl { private: add_large() { } } friend add(…); }; that is roughly 1 line per client that uses an internal routine for the friend declaration and 5 extra lines for the other stuff. Richard worries that using friends is not within the scope acceptable C++ mechanisms to use within the gcc community. We are, of course, open to other suggestions. My personal view is that privacy is important and I do not want to let that go. In past comments, Richi, asked that we remove all of the public method calls that modified the internals of a wide-int. We did this and we generally are happy that we did this. So there is the question is the privacy important and if it is what is the proper mechanism to implement it. 2) In my original patch, almost all of the operations either used operators or method calls. In richard's patch, those method calls have all become regular function calls. This is not just a syntactic change. The wide-int operations are fairly generic. The operations are not just defined on wide-ints but also on tree and rtx constants. In my patch the tree and rtx arguments were not converted to wide-ints. instead, the code looked inside of the representation of the tree and rtl and just operated on that. However, in my patch the first parameter had to be a wide-int to make the method call. By making everything a regular function call, richard's patch avoids having to make the wide-int object just to make the method call. This seems like a big win, but it means that the wide-int code does not "look like" the double-int code anymore. I am in favor of this change, but it i (we) fear that there will be pushback from those that wanted this to look more oo-like as double-int currently does. comments suggestions, and questions are appreciated. thanks, kenny
Re: [wide-int] Fix mode choice in convert_modes
On 09/09/2013 09:56 AM, Richard Sandiford wrote: Richard Biener writes: On Mon, Sep 9, 2013 at 3:11 PM, Richard Sandiford wrote: Enabling the CONST_INT assert showed that this branch-local code was passing the wrong mode to std::make_pair. The mode of X is OLDMODE, and we're supposed to be converting it to MODE. In the case where OLDMODE is VOIDmode, I think we have to assume that every bit of the input is significant. Tested on x86_64-linux-gnu. OK to install? This and the other patches that I just committed to trunk are enough for a bootstrap and regression test to pass on x86_64-linux-gnu with the assertion for a canonical CONST_INT enabled. I think it would be better to leave the assertion out until after the merge though, so we can deal with any fallout separately. If consuming code assumes CONST_INTs are canonical we better leave it in though. Yeah, I think the patch Kenny's working on is going to canonise CONST_INTs via the scratch array, to avoid making any assumptions. I think we should keep that for the merge and only swap it for an assert later. Thanks, Richard yes, this is what my patch does now.
patch to make wide-int assume that everything is canonical beyond the precision?
i still have more testing to go on this but it seems to be ok.my current problem is that the branch seems to have a fair number of failures so i want to get that cleaned up before i do further testing and commit this. I left the old rtl constructor ifdefed out. We are likely to want them back soon, given how will richards work on cleaning up the rtl seems to have gone. Kenny Index: gcc/rtl.h === --- gcc/rtl.h (revision 202389) +++ gcc/rtl.h (working copy) @@ -1422,6 +1422,7 @@ wi::int_traits ::get_precisi return GET_MODE_PRECISION (x.second); } +#if 0 inline wi::storage_ref wi::int_traits ::decompose (HOST_WIDE_INT *, unsigned int precision, @@ -1437,13 +1438,72 @@ wi::int_traits ::decompose ( return wi::storage_ref (&CONST_WIDE_INT_ELT (x.first, 0), CONST_WIDE_INT_NUNITS (x.first), precision); +#if TARGET_SUPPORTS_WIDE_INT != 0 case CONST_DOUBLE: return wi::storage_ref (&CONST_DOUBLE_LOW (x.first), 2, precision); +#endif default: gcc_unreachable (); } } +#else +/* For now, assume that the storage is not canonical, i.e. that there + are bits above the precision that are not all zeros or all ones. + If this is fixed in rtl, then we will not need the calls to + force_to_size. */ +inline wi::storage_ref +wi::int_traits ::decompose (HOST_WIDE_INT *scratch, + unsigned int precision, + const rtx_mode_t &x) +{ + int len; + int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1); + int blocks_needed = (precision + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT; + + gcc_checking_assert (precision == get_precision (x)); + switch (GET_CODE (x.first)) +{ +case CONST_INT: + len = 1; + if (small_prec) + scratch[0] = sext_hwi (INTVAL (x.first), precision); + else + scratch = &INTVAL (x.first); + break; + +case CONST_WIDE_INT: + len = CONST_WIDE_INT_NUNITS (x.first); + if (small_prec && blocks_needed == len - 1) + { + int i; + for (i = 0; i < len - 1; i++) + scratch[i] = CONST_WIDE_INT_ELT (x.first, i); + scratch[len - 1] = sext_hwi (CONST_WIDE_INT_ELT (x.first, i), small_prec); + } + else + scratch = &CONST_WIDE_INT_ELT (x.first, 0); + break; + +#if TARGET_SUPPORTS_WIDE_INT == 0 +case CONST_DOUBLE: + len = 2; + if (small_prec) + { + scratch[0] = CONST_DOUBLE_LOW (x.first); + scratch[1] = sext_hwi (CONST_DOUBLE_HIGH (x.first), small_prec); + } + else + scratch = &CONST_DOUBLE_LOW (x.first); + break; +#endif + +default: + gcc_unreachable (); +} + return wi::storage_ref (scratch, len, precision); +} +#endif namespace wi { Index: gcc/wide-int.cc === --- gcc/wide-int.cc (revision 202389) +++ gcc/wide-int.cc (working copy) @@ -48,6 +48,9 @@ static const HOST_WIDE_INT zeros[WIDE_IN (PREC ? (((PREC) + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT) : 1) #define SIGN_MASK(X) (((HOST_WIDE_INT)X) >> (HOST_BITS_PER_WIDE_INT - 1)) +/* Return the value a VAL[I] if I < LEN, otherwise, return 0 or -1 + based on the top existing bit of VAL. */ + static unsigned HOST_WIDE_INT safe_uhwi (const HOST_WIDE_INT *val, unsigned int len, unsigned int i) { @@ -304,10 +307,10 @@ wi::force_to_size (HOST_WIDE_INT *val, c if (precision > xprecision) { /* Expanding. */ - unsigned int small_xprecision = xprecision % HOST_BITS_PER_WIDE_INT; - if (sgn == UNSIGNED) { + unsigned int small_xprecision = xprecision % HOST_BITS_PER_WIDE_INT; + if (small_xprecision && len == BLOCKS_NEEDED (xprecision)) val[len - 1] = zext_hwi (val[len - 1], small_xprecision); else if (val[len - 1] < 0) @@ -320,11 +323,6 @@ wi::force_to_size (HOST_WIDE_INT *val, c val[len++] = 0; } } - /* We have to do this because we cannot guarantee that there is - not trash in the top block of an uncompressed value. For a - compressed value, all the bits are significant. */ - else if (small_xprecision && len == BLOCKS_NEEDED (xprecision)) - val[len - 1] = sext_hwi (val[len - 1], small_xprecision); } else if (precision < xprecision) /* Contracting. */ @@ -352,27 +350,18 @@ selt (const HOST_WIDE_INT *a, unsigned i return 0; } - if (small_prec && index == blocks_needed - 1) -{ - /* The top block is partially outside of the precision. */ - if (sgn == SIGNED) - return sext_hwi (a[index], small_prec); - else - return zext_hwi (a[index], small_prec); -} - return a[index]; + if (sgn == UNSIGNED && small_prec && index == blocks_needed - 1) +return zext_hwi (a[index], small_prec); + else +return a[index]; } -/* Find the hignest bit represented in a wide int. This will in +/* Find the highest bit represented in a wide int. This will in general have the same value as the sign bit. */
patch to canonize small wide-ints.
Richi, This patch canonizes the bits above the precision for wide ints with types or modes that are not a perfect multiple of HOST_BITS_PER_WIDE_INT. I expect that most of the changes in rtl.h will go away. in particular, when we decide that we can depend on richard's patch to clean up rtl constants, then the only thing that will be left will be the addition of the TARGET_SUPPORTS_WIDE_INT test. I do believe that there is one more conserved force in the universe than what physicist's generally consider: it is uglyness. There is a lot of truth and beauty in the patch but in truth there is a lot of places where the uglyness is just moved someplace else. in the pushing the ugly around dept, trees and wide-ints are not canonized the same way.I spent several days going down the road where it tried to have them be the same, but it got very ugly having 32 bit unsigned int csts have the upper 32 bits set. So now wide_int_to_tree and the wide-int constructors from tree-cst are now more complex. i think that i am in favor of this patch, especially in conjunction with richards cleanup, but only mildly. There is also some cleanup where richard wanted the long lines addressed. Ok to commit to the wide-int branch? kenny Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c (revision 202389) +++ gcc/emit-rtl.c (working copy) @@ -579,8 +579,6 @@ immed_wide_int_const (const wide_int &v, if (len < 2 || prec <= HOST_BITS_PER_WIDE_INT) return gen_int_mode (v.elt (0), mode); - wide_int copy = v; - wi::clear_undef (copy, SIGNED); #if TARGET_SUPPORTS_WIDE_INT { unsigned int i; @@ -599,12 +597,12 @@ immed_wide_int_const (const wide_int &v, CWI_PUT_NUM_ELEM (value, len); for (i = 0; i < len; i++) - CONST_WIDE_INT_ELT (value, i) = copy.elt (i); + CONST_WIDE_INT_ELT (value, i) = v.elt (i); return lookup_const_wide_int (value); } #else - return immed_double_const (copy.elt (0), copy.elt (1), mode); + return immed_double_const (v.elt (0), v.elt (1), mode); #endif } Index: gcc/lto-streamer-in.c === --- gcc/lto-streamer-in.c (revision 202389) +++ gcc/lto-streamer-in.c (working copy) @@ -1273,7 +1273,7 @@ lto_input_tree_1 (struct lto_input_block for (i = 0; i < len; i++) a[i] = streamer_read_hwi (ib); result = wide_int_to_tree (type, wide_int::from_array - (a, len, TYPE_PRECISION (type), false)); + (a, len, TYPE_PRECISION (type))); streamer_tree_cache_append (data_in->reader_cache, result, hash); } else if (tag == LTO_tree_scc) Index: gcc/real.c === --- gcc/real.c (revision 202389) +++ gcc/real.c (working copy) @@ -2248,7 +2248,6 @@ real_from_integer (REAL_VALUE_TYPE *r, e /* Clear out top bits so elt will work with precisions that aren't a multiple of HOST_BITS_PER_WIDE_INT. */ val = wide_int::from (val, len, sgn); - wi::clear_undef (val, sgn); len = len / HOST_BITS_PER_WIDE_INT; SET_REAL_EXP (r, len * HOST_BITS_PER_WIDE_INT + e); Index: gcc/rtl.h === --- gcc/rtl.h (revision 202389) +++ gcc/rtl.h (working copy) @@ -1422,6 +1422,7 @@ wi::int_traits ::get_precisi return GET_MODE_PRECISION (x.second); } +#if 0 inline wi::storage_ref wi::int_traits ::decompose (HOST_WIDE_INT *, unsigned int precision, @@ -1437,13 +1438,57 @@ wi::int_traits ::decompose ( return wi::storage_ref (&CONST_WIDE_INT_ELT (x.first, 0), CONST_WIDE_INT_NUNITS (x.first), precision); +#if TARGET_SUPPORTS_WIDE_INT != 0 case CONST_DOUBLE: return wi::storage_ref (&CONST_DOUBLE_LOW (x.first), 2, precision); +#endif default: gcc_unreachable (); } } +#else +/* For now, assume that the storage is not canonical, i.e. that there + are bits above the precision that are not all zeros or all ones. + If this is fixed in rtl, then we will not need the calls to + force_to_size. */ +inline wi::storage_ref +wi::int_traits ::decompose (HOST_WIDE_INT *scratch, + unsigned int precision, + const rtx_mode_t &x) +{ + int len; + int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1); + + gcc_checking_assert (precision == get_precision (x)); + switch (GET_CODE (x.first)) +{ +case CONST_INT: + len = 1; + if (small_prec) + scratch[0] = sext_hwi (INTVAL (x.first), precision); + else + scratch = &INTVAL (x.first); + break; + +case CONST_WIDE_INT: + len = CONST_WIDE_INT_NUNITS (x.first); + scratch = &CONST_WIDE_INT_ELT (x.first, 0); + break; + +#if TARGET_SUPPORTS_WIDE_INT == 0 +case CONST_DOUBLE: + len = 2; + scratch = &CONST_DOUBLE_LOW (x.first); + break; +#endif + +default: + gcc_unreachable ()
Re: [wide-int] Fix mode choice in convert_modes
this is fine with me. kenny On 09/18/2013 03:27 AM, Richard Sandiford wrote: Ping Richard Sandiford writes: Enabling the CONST_INT assert showed that this branch-local code was passing the wrong mode to std::make_pair. The mode of X is OLDMODE, and we're supposed to be converting it to MODE. In the case where OLDMODE is VOIDmode, I think we have to assume that every bit of the input is significant. Tested on x86_64-linux-gnu. OK to install? This and the other patches that I just committed to trunk are enough for a bootstrap and regression test to pass on x86_64-linux-gnu with the assertion for a canonical CONST_INT enabled. I think it would be better to leave the assertion out until after the merge though, so we can deal with any fallout separately. Thanks, Richard Index: gcc/expr.c === --- gcc/expr.c 2013-09-09 11:32:33.734617409 +0100 +++ gcc/expr.c 2013-09-09 11:45:27.381001353 +0100 @@ -714,13 +714,14 @@ convert_modes (enum machine_mode mode, e && GET_MODE_CLASS (mode) == MODE_INT && (oldmode == VOIDmode || GET_MODE_CLASS (oldmode) == MODE_INT)) { - wide_int w = std::make_pair (x, mode); /* If the caller did not tell us the old mode, then there is -not much to do with respect to canonization. */ - if (oldmode != VOIDmode - && GET_MODE_PRECISION (mode) > GET_MODE_PRECISION (oldmode)) - w = wi::ext (w, GET_MODE_PRECISION (oldmode), -unsignedp ? UNSIGNED : SIGNED); +not much to do with respect to canonization. We have to assume +that all the bits are significant. */ + if (oldmode == VOIDmode) + oldmode = MAX_MODE_INT; + wide_int w = wide_int::from (std::make_pair (x, oldmode), + GET_MODE_PRECISION (mode), + unsignedp ? UNSIGNED : SIGNED); return immed_wide_int_const (w, mode); }
Re: [wide-int] Fix LTO regression that I'd introduced
this looks fine to me. On 09/19/2013 02:56 PM, Richard Sandiford wrote: It turns out that gcc20's version of binutils is too old for the LTO plugin, so the tests I'd been running hadn't exercised it. This patch fixes a regression that Kenny pointed out. The problem was that build_int_cst and build_int_cst_type were using the signedness of the type to decide how the HWI should be extended, whereas they're supposed to use sign extension regardless. Tested on x86_64-linux-gnu, this time with trunk binutils. OK for wide-int? Thanks, Richard gcc/ * tree.h (wi::hwi): Delete. * tree.c (build_int_cst, build_int_cst_type): Use wi::shwi. (build_int_cstu): Use wi::uhwi. Index: gcc/tree.h === --- gcc/tree.h (revision 202746) +++ gcc/tree.h (working copy) @@ -5206,8 +5206,6 @@ namespace wi { - hwi_with_prec hwi (HOST_WIDE_INT, const_tree); - template bool fits_to_tree_p (const T &x, const_tree); @@ -5216,12 +5214,6 @@ wide_int from_mpz (const_tree, mpz_t, bool); } -inline wi::hwi_with_prec -wi::hwi (HOST_WIDE_INT val, const_tree type) -{ - return hwi_with_prec (val, TYPE_PRECISION (type), TYPE_SIGN (type)); -} - template bool wi::fits_to_tree_p (const T &x, const_tree type) Index: gcc/tree.c === --- gcc/tree.c (revision 202746) +++ gcc/tree.c (working copy) @@ -1056,13 +1056,13 @@ if (!type) type = integer_type_node; - return wide_int_to_tree (type, wi::hwi (low, type)); + return wide_int_to_tree (type, wi::shwi (low, TYPE_PRECISION (type))); } tree build_int_cstu (tree type, unsigned HOST_WIDE_INT cst) { - return wide_int_to_tree (type, wi::hwi (cst, type)); + return wide_int_to_tree (type, wi::uhwi (cst, TYPE_PRECISION (type))); } /* Create an INT_CST node with a LOW value sign extended to TYPE. */ @@ -1071,7 +1071,7 @@ build_int_cst_type (tree type, HOST_WIDE_INT low) { gcc_assert (type); - return wide_int_to_tree (type, wi::hwi (low, type)); + return wide_int_to_tree (type, wi::shwi (low, TYPE_PRECISION (type))); } /* Constructs tree in type TYPE from with value given by CST. Signedness
wide-int: fix merge.
Mike started a merge from trunk to the wide-int branch because the previous merge had happened when bootstrap was broken for ppc. This patch fixes mike's partial merge and has been bootstrapped and tested on ppc. I will pull the patch down and test it on x86-64 tonight. kenny Index: gcc/postreload.c === --- gcc/postreload.c (revision 202811) +++ gcc/postreload.c (working copy) @@ -305,7 +305,7 @@ case ZERO_EXTEND: result = wide_int (std::make_pair (this_rtx, GET_MODE (src))); if (GET_MODE_PRECISION (GET_MODE (src)) > GET_MODE_PRECISION (word_mode)) - result = result.zext (GET_MODE_PRECISION (word_mode)); + result = wi::zext (result, GET_MODE_PRECISION (word_mode)); break; case SIGN_EXTEND: result = wide_int (std::make_pair (this_rtx, GET_MODE (src))); Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 202811) +++ gcc/fold-const.c (working copy) @@ -9897,17 +9897,15 @@ /* Mask out the tz least significant bits of X of type TYPE where tz is the number of trailing zeroes in Y. */ -static double_int -mask_with_tz (tree type, double_int x, double_int y) +static wide_int +mask_with_tz (tree type, wide_int x, wide_int y) { - int tz = y.trailing_zeros (); - + int tz = wi::ctz (y); if (tz > 0) { - double_int mask; + wide_int mask; - mask = ~double_int::mask (tz); - mask = mask.ext (TYPE_PRECISION (type), TYPE_UNSIGNED (type)); + mask = wi::mask (tz, true, TYPE_PRECISION (type)); return mask & x; } return x; @@ -11276,7 +11274,7 @@ == INTEGER_CST) { tree t = TREE_OPERAND (TREE_OPERAND (arg0, 0), 1); - double_int masked = mask_with_tz (type, c3, tree_to_double_int (t)); + wide_int masked = mask_with_tz (type, c3, t); try_simplify = (masked != c1); } @@ -11670,32 +11668,14 @@ && TREE_CODE (arg0) == MULT_EXPR && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST) { -<<< .working - int arg1tz = wi::ctz (TREE_OPERAND (arg0, 1)); - if (arg1tz > 0) - { - wide_int arg1mask, masked; - arg1mask = wi::mask (arg1tz, true, TYPE_PRECISION (type)); - masked = arg1mask & arg1; - if (masked == 0) - return omit_two_operands_loc (loc, type, build_zero_cst (type), - arg0, arg1); - else if (masked != arg1) - return fold_build2_loc (loc, code, type, op0, - wide_int_to_tree (type, masked)); - } -=== - double_int masked - = mask_with_tz (type, tree_to_double_int (arg1), - tree_to_double_int (TREE_OPERAND (arg0, 1))); + wide_int masked = mask_with_tz (type, arg1, TREE_OPERAND (arg0, 1)); - if (masked.is_zero ()) + if (masked == 0) return omit_two_operands_loc (loc, type, build_zero_cst (type), arg0, arg1); - else if (masked != tree_to_double_int (arg1)) + else if (masked != arg1) return fold_build2_loc (loc, code, type, op0, - double_int_to_tree (type, masked)); ->>> .merge-right.r202797 + wide_int_to_tree (type, masked)); } /* For constants M and N, if M == (1LL << cst) - 1 && (N & M) == M, Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 202811) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -763,7 +763,7 @@ int (i * S). Otherwise, just return double int zero. */ -static double_int +static max_wide_int backtrace_base_for_ref (tree *pbase) { tree base_in = *pbase; @@ -771,19 +771,19 @@ STRIP_NOPS (base_in); if (TREE_CODE (base_in) != SSA_NAME) -return tree_to_double_int (integer_zero_node); +return 0; base_cand = base_cand_from_table (base_in); while (base_cand && base_cand->kind != CAND_PHI) { if (base_cand->kind == CAND_ADD - && base_cand->index.is_one () + && base_cand->index == 1 && TREE_CODE (base_cand->stride) == INTEGER_CST) { /* X = B + (1 * S), S is integer constant. */ *pbase = base_cand->base_expr; - return tree_to_double_int (base_cand->stride); + return base_cand->stride; } else if (base_cand->kind == CAND_ADD && TREE_CODE (base_cand->stride) == INTEGER_CST @@ -800,7 +800,7 @@ base_cand = NULL; } - return tree_to_double_int (integer_zero_node); + return 0; } /* Look for the following pattern: Index: gcc/genpreds.c === --- gcc/genpreds.c (revision 202811) +++ gcc/genpreds.c (working copy) @@ -809,11 +809,7 @@ if (is_const_int || is_const_dbl) { enum rtx_code appropriate_code -#if TARGET_SUPPORTS_WIDE_INT - = is_const_int ? CONST_INT : CONST_WIDE_INT; -#else = is_const_int ? CONST_INT : CONST_DOUBLE; -#en
Re: patch to canonize small wide-ints.
On 09/24/2013 09:39 AM, Richard Biener wrote: On Tue, 17 Sep 2013, Kenneth Zadeck wrote: Richi, This patch canonizes the bits above the precision for wide ints with types or modes that are not a perfect multiple of HOST_BITS_PER_WIDE_INT. I expect that most of the changes in rtl.h will go away. in particular, when we decide that we can depend on richard's patch to clean up rtl constants, then the only thing that will be left will be the addition of the TARGET_SUPPORTS_WIDE_INT test. I do believe that there is one more conserved force in the universe than what physicist's generally consider: it is uglyness. There is a lot of truth and beauty in the patch but in truth there is a lot of places where the uglyness is just moved someplace else. in the pushing the ugly around dept, trees and wide-ints are not canonized the same way.I spent several days going down the road where it tried to have them be the same, but it got very ugly having 32 bit unsigned int csts have the upper 32 bits set. So now wide_int_to_tree and the wide-int constructors from tree-cst are now more complex. i think that i am in favor of this patch, especially in conjunction with richards cleanup, but only mildly. There is also some cleanup where richard wanted the long lines addressed. Ok to commit to the wide-int branch? Looks good to me. I'll be doing a separate review of the to/from tree parts when I find time to do that, but that's unrelated to this patch. Thanks, Richard. as i said on irc, after i check in a cleaned up version of this (for ppc) i will change the rep for unsiged tree-csts so that they have an extra block of zeros if their top bit is set. this will clean that up somewhat. thanks kenny
Re: patch to canonize small wide-ints.
On 09/24/2013 09:51 AM, Richard Biener wrote: On Tue, 24 Sep 2013, Kenneth Zadeck wrote: On 09/24/2013 09:39 AM, Richard Biener wrote: On Tue, 17 Sep 2013, Kenneth Zadeck wrote: Richi, This patch canonizes the bits above the precision for wide ints with types or modes that are not a perfect multiple of HOST_BITS_PER_WIDE_INT. I expect that most of the changes in rtl.h will go away. in particular, when we decide that we can depend on richard's patch to clean up rtl constants, then the only thing that will be left will be the addition of the TARGET_SUPPORTS_WIDE_INT test. I do believe that there is one more conserved force in the universe than what physicist's generally consider: it is uglyness. There is a lot of truth and beauty in the patch but in truth there is a lot of places where the uglyness is just moved someplace else. in the pushing the ugly around dept, trees and wide-ints are not canonized the same way.I spent several days going down the road where it tried to have them be the same, but it got very ugly having 32 bit unsigned int csts have the upper 32 bits set. So now wide_int_to_tree and the wide-int constructors from tree-cst are now more complex. i think that i am in favor of this patch, especially in conjunction with richards cleanup, but only mildly. There is also some cleanup where richard wanted the long lines addressed. Ok to commit to the wide-int branch? Looks good to me. I'll be doing a separate review of the to/from tree parts when I find time to do that, but that's unrelated to this patch. Thanks, Richard. as i said on irc, after i check in a cleaned up version of this (for ppc) i will change the rep for unsiged tree-csts so that they have an extra block of zeros if their top bit is set. this will clean that up somewhat. Yes, this will give wide-ints a sign (as I originally proposed). For RTL constants we're going to "lie" and make all constants negative that have their MSB set. Richard. it only sort of does. anyway it seems to be mostly consistent. As i said on irc, i am pleasantly surprised at the lack of damage to the ports by assuming that the values are always canonized. Two large public ports and my private port required no changes.
wide-int: Re: patch to canonize small wide-ints.
I just committed the patch to do this as revision 202871. there are two changes from the earlier patch: 1) the addition of frag in loop-doloop.c. This fixes an rtl canonization bug, but it is unique to the branch. it did not cause a problem on x86 but did on ppc. 2) the code in rtl.h for converting rtl to wide-ints is now much simpler. I now rely on richard's patch, which has been merged into this branch to canonize the rtl constants. I have an assert to make sure that canonization is never violated. I did not change the tree-cst constructor as richard and i had talked about on irc.my next patch will be to change the rep of tree-cst so that unsigned types that have precisions of some multiple of HBPWI and have their top bit set will just carry an extra block of zeros. That will force a rewrite of this constructor. Kenny On 09/24/2013 12:06 PM, Richard Sandiford wrote: Richard Biener writes: On Tue, 24 Sep 2013, Kenneth Zadeck wrote: On 09/24/2013 09:39 AM, Richard Biener wrote: On Tue, 17 Sep 2013, Kenneth Zadeck wrote: Richi, This patch canonizes the bits above the precision for wide ints with types or modes that are not a perfect multiple of HOST_BITS_PER_WIDE_INT. I expect that most of the changes in rtl.h will go away. in particular, when we decide that we can depend on richard's patch to clean up rtl constants, then the only thing that will be left will be the addition of the TARGET_SUPPORTS_WIDE_INT test. I do believe that there is one more conserved force in the universe than what physicist's generally consider: it is uglyness. There is a lot of truth and beauty in the patch but in truth there is a lot of places where the uglyness is just moved someplace else. in the pushing the ugly around dept, trees and wide-ints are not canonized the same way.I spent several days going down the road where it tried to have them be the same, but it got very ugly having 32 bit unsigned int csts have the upper 32 bits set. So now wide_int_to_tree and the wide-int constructors from tree-cst are now more complex. i think that i am in favor of this patch, especially in conjunction with richards cleanup, but only mildly. There is also some cleanup where richard wanted the long lines addressed. Ok to commit to the wide-int branch? Looks good to me. I'll be doing a separate review of the to/from tree parts when I find time to do that, but that's unrelated to this patch. Thanks, Richard. as i said on irc, after i check in a cleaned up version of this (for ppc) i will change the rep for unsiged tree-csts so that they have an extra block of zeros if their top bit is set. this will clean that up somewhat. Yes, this will give wide-ints a sign (as I originally proposed). No, nothing changes on that front. An N-bit wide_int has no sign. It looks the same whether it came from a signed tree constant, an unsigned tree constant, or an rtx (which also has no sign). All the posted patch does is change the internal representation of excess bits, which are operationally don't-cares. The semantics are just the same as before. And the point of the follow-up change that Kenny mentioned is to avoid a copy when treating an N-bit INTEGER_CST as a wider-than-N wide_int in cases where we're extending according to TYPE_SIGN. I.e. it's purely an optimisation. The resulting wide_int storage is exactly the same as before, we just do less work to get it. Thanks, Richard Index: gcc/tree.c === --- gcc/tree.c (revision 202811) +++ gcc/tree.c (working copy) @@ -1112,7 +1112,6 @@ force_fit_type (tree type, const wide_in || (overflowable > 0 && sign == SIGNED)) { wide_int tmp = wide_int::from (cst, TYPE_PRECISION (type), sign); - wi::clear_undef (tmp, sign); int l = tmp.get_len (); tree t = make_int_cst (l); if (l > 1) @@ -1205,11 +1204,11 @@ wide_int_to_tree (tree type, const wide_ } wide_int cst = wide_int::from (pcst, prec, sgn); - /* The following call makes sure that all tree-cst's are canonical. - i.e. it really does sign or zero extend the top block of the - value if the precision of the type is not an even multiple of the - size of an HWI. */ - wi::clear_undef (cst, sgn); + int len = int (cst.get_len ()); + int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1); + bool recanonize = sgn == UNSIGNED +&& (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len +&& small_prec; switch (TREE_CODE (type)) { @@ -1291,18 +1290,31 @@ wide_int_to_tree (tree type, const wide_ t = TREE_VEC_ELT (TYPE_CACHED_VALUES (type), ix); if (t) { - /* Make sure no one is clobbering the shared constant. */ + /* Make sure no one is clobbering the shared constant. We + must be careful here because tree-csts and wide-ints are + not canonicali
Re: patch to fix
i have already converted the vrp code, so i have some guess at where you are talking about. (of course correct me if i am wrong). in the code that computes the range when two variables are multiplied together needs to do a multiplication that produces a result that is twice as wide as the inputs. my library is able to do that with one catch (and this is a big catch): the target has to have an integer mode that is twice as big as the mode of the operands. The issue is that wide-ints actually carry around the mode of the value in order to get the bitsize and precision of the operands (it does not have the type, because this code has to both work on the rtl and tree level and i generally do not want the signness anyway). my current code in vrp checks to see if such a mode exists and if it does, it produces the product. if the mode does not exist, it returns bottom. What this means is that for most (many or some) targets that have a TImode, the largest thing that particular vrp discover ranges for is a DImode value. We could get around this by defining the next larger mode than what the target really needs but i wonder how much mileage you are going to get out of that with really large numbers. Of course you could have something else in mind. kenny On 10/03/2012 04:47 PM, Marc Glisse wrote: On Wed, 3 Oct 2012, Kenneth Zadeck wrote: The patch defines a new datatype, a 'wide_int' (defined in wide-int.[ch], and this datatype will be used to perform all of the integer constant math in the compiler. Externally, wide-int is very similar to double-int except that it does not have the limitation that math must be done on exactly two HOST_WIDE_INTs. Internally, a wide_int is a structure that contains a fixed sized array of HOST_WIDE_INTs, a length field and a mode. The size of the array is determined at generation time by dividing the number of bits of the largest integer supported on the target by the number of bits in a HOST_WIDE_INT of the host. Thus, with this format, any length of integer can be supported on any host. Hello, did you consider making the size of wide_int a template parameter, now that we are using C++? All with a convenient typedef or macro so it doesn't show. I am asking because in vrp I do some arithmetic that requires 2*N+1 bits where N is the size of double_int.
Re: patch to fix constant math
Let me talk about the mode here first. What this interface/patch provides is a facility where the constant math that is done in optimizations is done exactly the way that it would be done on the target machine. What we have now is a compiler that only does this if it convenient to do on the host. I admit that i care about this more than others right now, but if intel adds a couple of more instructions to their vector units, other people will start to really care about this issue. If you take an OImode value with the current compiler and left shift it by 250 the middle end will say that the result is 0. This is just wrong!!! What this means is that the bitsize and precision of the operations need to be carried along when doing math. when wide-int checks for overflow on the multiply or add, it is not checking the if the value overflowed on two HWIs, it is checking if the add overflowed in the mode of the types that are represented on the target. When we do shift, we are not doing a shift within two HWIs, we are truncating the shift value (if this is appropriate) according to the bitsize and shifting according the precision. I think that an argument could be made that storing the mode should be changed to an explicit precision and bitsize. (A possible other option would be to store a tree type, but this would make the usage at the rtl level very cumbersome since types are rare.) Aside from the work, you would not get much push back. But the signess is a different argument. At the rtl level, the signess is a matter of context. (you could argue that this is a mistake and i would agree, but that is an even bigger change.) But more to the point, at the tree level, there are a surprising number of places where the operation desired does not follow the sign of the types that were used to construct the constants. Furthermore, not carrying the sign is more consistent with the double int code, which as you point out carries nothing. As for the splitting out the patch in smaller pieces, i am all for it. I have done this twice already and i could get the const_scalar_int_p patch out quickly.But you do not get too far along that before you are still left with a big patch. I could split out wide-int.* and just commit those files with no clients as a first step. My guess is that Richard Sandiford would appreciate that because while he has carefully checked the rtl stuff, i think that the code inside wide-int is not in his comfort zone of things he would approve. As far as your btw - noticed this last night. it is an artifact of the way i produced the patch and "responsible people have been sacked". However, it shows that you read the patch carefully, and i really appreciate that. i owe you a beer (not that you need another at this time of year). Kenny On 10/04/2012 08:48 AM, Richard Guenther wrote: On Wed, Oct 3, 2012 at 7:15 PM, Kenneth Zadeck wrote: The enclosed patch is the third of at least four patches that fix the problems associated with supporting integers on the target that are wider than two HOST_WIDE_INTs. While GCC claims to support OI mode, and we have two public ports that make minor use of this mode, in practice, compilation that uses OImode mode commonly gives the wrong result or ices. We have a private port of GCC for an architecture that is further down the road to needing comprehensive OImode and we have discovered that this is unusable. We have decided to fix it in a general way that so that it is most beneficial to the GCC community. It is our belief that we are just a little ahead of the X86 and the NEON and these patches will shortly be essential. The first two of these patches were primarily lexigraphical and have already been committed.They transformed the uses of CONST_DOUBLE so that it is easy to tell what the intended usage is. The underlying structures in the next two patches are very general: once they are added to the compiler, the compiler will be able to support targets with any size of integer from hosts of any size integer. The patch enclosed deals with the portable RTL parts of the compiler. The next patch, which is currently under construction deals with the tree level. However, this patch can be put on the trunk as is, and it will eleviate many, but not all of the current limitations in the rtl parts of the compiler. Some of the patch is conditional, depending on a port defining the symbol 'TARGET_SUPPORTS_WIDE_INT' to be non zero. Defining this symbol to be non zero is declaring that the port has been converted to use the new form or integer constants. However, the patch is completely backwards compatible to allow ports that do not need this immediately to convert at their leasure. The conversion process is not difficult, but it does require some knowledge of the port, so we are not volinteering to do this for all ports. OVERVIEW OF THE PATCH: The patch de
Re: patch to fix
On 10/04/2012 09:17 AM, Marc Glisse wrote: On Wed, 3 Oct 2012, Mike Stump wrote: On Oct 3, 2012, at 1:47 PM, Marc Glisse wrote: did you consider making the size of wide_int a template parameter, now that we are using C++? All with a convenient typedef or macro so it doesn't show. I am asking because in vrp I do some arithmetic that requires 2*N+1 bits where N is the size of double_int. No, not really. I'd maybe answer it this way, we put in a type (singular) to support all integral constants in all languages on a port. Since we only needed 1, there was little need to templatize it. By supporting all integral constants in all languages, there is little need for more. If Ada say, wanted a 2048 bit integer, then, we just have it drop off the size it wants someplace and we would mix that in on a MAX(….) line, net result, the type we use would then directly support the needs of Ada. If vpr wanted 2x of all existing modes, we could simply change the MAX equation and essentially double it; if people need that. This comes as a cost, as the intermediate wide values are fixed size allocated (not variable); so these all would be larger. And this cost could be eliminated by having a template wide_int_ so only the places that need it actually use the extra size ;-) The space is not really an issue in most places since wide-ints tend to be short lived. i guess vrp is slightly different because it creates a lot at once. but then they go away. However the real question is what are you going to instantiate the template on?What we do is look at the target and determine the largest type that the target supports and build a wide int type that supports that.how are you going to do better? are you going to instantiate one for every type you see? are these going to be static or dynamic? The last line this email seems to imply that you were planning to "know" that __int128 was the largest integer that any target or front end could support. and then what do you do for the parts of the compiler that have operations that take things of two different types, like shift. The shift amount can and may times is a shorter type that what is being shifted. Would these different length integers be represented with different instances from the same template? I am not a c++ programmer and so all of this is a little new to me, but given a perspective of the rest of the compiler, this does not seem like the right way to go. On Wed, 3 Oct 2012, Kenneth Zadeck wrote: i have already converted the vrp code, so i have some guess at where you are talking about. (of course correct me if i am wrong). in the code that computes the range when two variables are multiplied together needs to do a multiplication that produces a result that is twice as wide as the inputs. Yes, exactly. my library is able to do that with one catch (and this is a big catch): the target has to have an integer mode that is twice as big as the mode of the operands. The issue is that wide-ints actually carry around the mode of the value in order to get the bitsize and precision of the operands (it does not have the type, because this code has to both work on the rtl and tree level and i generally do not want the signness anyway). my current code in vrp checks to see if such a mode exists and if it does, it produces the product. if the mode does not exist, it returns bottom. What this means is that for most (many or some) targets that have a TImode, the largest thing that particular vrp discover ranges for is a DImode value. We could get around this by defining the next larger mode than what the target really needs but i wonder how much mileage you are going to get out of that with really large numbers. This will be for discussion when you submit that next patch, but currently VRP handles integers the same size as double_int. In particular, it handles __int128. I would be unhappy if introducing a larger bigint type in gcc made us regress there. You are only happy now because you do not really understand the world around you.This is not what your code does. What you code does is that if the host is a 64 bit host you can handle __int128 and if your host is a 32 bit host you can handle a __int64. If you are building a cross compiler from a 32 bit host to a 64 bit target, your pass is either going to get the wrong answer, give up, or ice. There are currently parts of gcc that do each of these three "solutions" and my patch gets rid of these because it does the math as the target does the math, no matter that the target is. The goal of my patch is to make gcc produce the same correct results no matter what types the target or host support.The last thing that we need to have some optimization "knowing" what the limits of either of these are and hard coding that in a set of templates that have been statically instantiated.
Re: patch to fix
Actually richi, this code is "correct" for some broken definition of correct. If all that is done is to convert the rtl parts of the compiler, then this code is the best you can do (of course an assertion that the length is not greater than 2 would be a useful addition). The code that is in the follow on patch which converts the insides of a tree cst to look like a const wide int, i.e. an array of HWIs. When that happens, this code looks completely different. But if you only convert the rtl level, at some point there is going to be an impedance mismatch and it is buried here. I will point out that this is the fall out of trying to split things into a bunch of smaller patches that could in theory go in separately. kenny +/* Constructs tree in type TYPE from with value given by CST. Signedness + of CST is assumed to be the same as the signedness of TYPE. */ + +tree +wide_int_to_tree (tree type, const wide_int &cst) +{ + wide_int v; + if (TYPE_UNSIGNED (type)) +v = cst.zext (TYPE_PRECISION (type)); + else +v = cst.sext (TYPE_PRECISION (type)); + + return build_int_cst_wide (type, v.elt (0), v.elt (1)); +} is surely broken. A wide-int does not fit a double-int. How are you going to "fix" this? Thanks, Richard. kenny
Re: patch to fix constant math
On 10/04/2012 12:58 PM, Richard Guenther wrote: On Thu, Oct 4, 2012 at 3:55 PM, Kenneth Zadeck wrote: Let me talk about the mode here first. What this interface/patch provides is a facility where the constant math that is done in optimizations is done exactly the way that it would be done on the target machine. What we have now is a compiler that only does this if it convenient to do on the host. I admit that i care about this more than others right now, but if intel adds a couple of more instructions to their vector units, other people will start to really care about this issue. If you take an OImode value with the current compiler and left shift it by 250 the middle end will say that the result is 0. This is just wrong!!! What this means is that the bitsize and precision of the operations need to be carried along when doing math. when wide-int checks for overflow on the multiply or add, it is not checking the if the value overflowed on two HWIs, it is checking if the add overflowed in the mode of the types that are represented on the target. When we do shift, we are not doing a shift within two HWIs, we are truncating the shift value (if this is appropriate) according to the bitsize and shifting according the precision. I think that an argument could be made that storing the mode should be changed to an explicit precision and bitsize. (A possible other option would be to store a tree type, but this would make the usage at the rtl level very cumbersome since types are rare.) Aside from the work, you would not get much push back. But the signess is a different argument. At the rtl level, the signess is a matter of context. (you could argue that this is a mistake and i would agree, but that is an even bigger change.) But more to the point, at the tree level, there are a surprising number of places where the operation desired does not follow the sign of the types that were used to construct the constants. Furthermore, not carrying the sign is more consistent with the double int code, which as you point out carries nothing. Well, on RTL the signedness is on the operation (you have sdiv and udiv, etc.). yes, there is a complete enough set of operations that allow you to specify the signess where this matters. double-int tries to present a sign-less twos-complement entity of size 2 * HOST_BITS_PER_WIDE_INT. I think that is sensible and for obvious reasons should not change. Both tree and RTL rely on this. What we do not want is that up to TImode you get an internal representation done one way (twos-complement) and on OImode and larger you suddenly get subtly different behavior. That's a recepie for desaster. This is the main difference between double-int and wide-int.Wide int does the math the way the machine does it or the way the front end would expect it to be done.There is nothing about the host that is visible in the interfaces. I reiterate, our world is already bigger than 128 bits and the intel world is likely to be soon. Double int is stuck in a 64/128 bit world. these patches, which i admit are huge, are a way out of that box. I'd like to clean up the interface to double-int some more (now with the nice C++ stuff we have). double-int should be pure twos-complement, there should be no operations on double-ints that behave differently when done signed or unsigned, instead we have signed and unsigned versions of the operations (similar to how signedness is handled on the RTL level). With some trivial C++ fu you could have a double_sint and double_uint type that would get rid of the bool sign params we have to some functions (and then you could write double_sint >> n using operator notation). The problem is that size does matter.wide int is effectively infinite precision twos complement.In practice, we can get by by just looking at the bitsize and precision of the types/modes involved and this makes the implementation faster than true infinite precision. I went done the road trying to fix all of the places where the compiler either iced or got the wrong answer. I showed this to Sandiford and he talked me out of it. He was right, it was a rat hole. It could have been a smaller patch but it was there were places where it was clearly going to take monumental work just to be able to back out and say that you had nothing.The number of places in the compiler where you compare against the largest and smallest representation of an integer is not small and some of them are buried very deep down chains that were not designed to say "i cannot answer that question". I believe that i have all of the functionality of double int in wide int, it is just the calls look different because there are not all of the interfaces that take two HWI's.As mentioned before, all of the places where the overflow is computed for the purpose of asking if this is ok in two hwi's is gone. I
Re: patch to fix
There are a bunch of ways to skin the cat. 1) we can define the extra mode. 2) if we get rid of the mode inside the wide int and replace it with an explicit precision and bitsize, then we can just make the size of the buffer twice as big as the analysis of the modes indicates. 3) or we can leave your code in a form that uses 2 wide ints. my current patch (which i have not gotten working yet) changes this to use the mul_full call, but it could be changed. It is much simpler that the existing code. i do not see how templates offer any solution at all. the wide int code really needs to have something valid to indicate the length of the object, and if there is no mode that big, the code will ice. my personal feeling is that range analysis is quite useful for small integers and not so much as the values get larger. The only really large integer constants that you are likely to find in real code are encryption keys, like the dvd decoding keys, and if the key is chosen well there should be no useful optimization that you can perform on the code. If this did not work on the largest modes, no one would ever notice. i.e. i would bet that you never make a useful transformation on any integer that would not fit in an int32. However, this is your pass, and i understand the principal of "never back down". Kenny On 10/04/2012 05:06 PM, Marc Glisse wrote: On Wed, 3 Oct 2012, Kenneth Zadeck wrote: i have already converted the vrp code, so i have some guess at where you are talking about. (of course correct me if i am wrong). in the code that computes the range when two variables are multiplied together needs to do a multiplication that produces a result that is twice as wide as the inputs. my library is able to do that with one catch (and this is a big catch): the target has to have an integer mode that is twice as big as the mode of the operands. The issue is that wide-ints actually carry around the mode of the value in order to get the bitsize and precision of the operands (it does not have the type, because this code has to both work on the rtl and tree level and i generally do not want the signness anyway). Ah, after reading the whole thread, now I understand that it is because wide_int carries a mode that it makes little sense making it a template (sorry that it took me so long when the information was in your first answer). I understand that it would be inconvenient (much longer code) to have a base_wide_int that does just the arithmetic and a wrapper that contains the mode as well. Your idea below to define dummy extra modes does bring the template idea back to the table though ;-) my current code in vrp checks to see if such a mode exists and if it does, it produces the product. if the mode does not exist, it returns bottom. What this means is that for most (many or some) targets that have a TImode, the largest thing that particular vrp discover ranges for is a DImode value. We could get around this by defining the next larger mode than what the target really needs but i wonder how much mileage you are going to get out of that with really large numbers. The current wrapping multiplication code in vrp works with a pair of double_int, so it should keep working with a pair of wide_int. I see now why wide_int doesn't allow to simplify the code, but it doesn't have to break.
Re: patch to fix constant math
richi, let me address a bunch of issues that are randomly spread thru the thread. 1) unlike the double int's current relationship to int cst, we do not currently wrap a wide-int into an CONST_WIDE_INT nor (in the patch that you have not seen) do we wrap a wide-int into the int cst.wide-ints are designed to be short lived. With the exception of the vrp pass, they are almost always allocated on the stack and thrown away quickly.That way we can afford to grab a big buffer and play with it. The rep that lives in the CONST_WIDE_INT and the int cst is a garbage collected array of HWIs that is only large enough to hold the actual value. and it is almost always an array of length 1. This saves space and works because they are immutable. This means that there is almost no storage bloat by going with this representation no matter how large the widest integer is on the platform. 2) wide-int is really just a generalized version of double-int. The abi is different because it does not have all of those calls that take two HWIs explicitly and because i was building it all at once LOOKING BOTH AT THE TREE AND RTL levels, it has a richer set of calls. Aside from it carrying a mode around that it uses for precision and bitsize, it is really bare metal. My problem is that i am still working on getting the tree level stuff working, and phase 1 is closing soon and i wanted to get my foot in the door. 3) I think that everyone agrees that having CONST_DOUBLE and CONST_INT not carrying around modes was a big mistake. I toyed with the idea for a nanosecond of fixing that along with this, but as you point out the patches are already way to large and it would have meant that the burden on converting the ports would be larger. BUT, as the note in the original email said, this patch moves significantly in the direction of being able to have a mode for the integers at the rtl level.It does this by no longer storing integer values in CONST_DOUBLEs so there is no longer the test of VOIDmode to see what the CONST_DOUBLE contains. Also the patch contains a significant number of changes (in the machine independent rtl code) to the constructors that carry the mode, rather than using GENINT. 4) Having the mode (or at least the precision and bitsize) is crutial for the performance of wide-int. All of the interesting functions have quick out tests to use the bare metal operators if the precision fits a HWI. This way, 99% of the time, we are just doing c operators with a simple check. kenny On 10/05/2012 07:41 AM, Richard Guenther wrote: On Fri, Oct 5, 2012 at 1:24 PM, Richard Sandiford wrote: Richard Guenther writes: On Fri, Oct 5, 2012 at 11:55 AM, Richard Sandiford wrote: Richard Guenther writes: As far as the wide_ints recording a mode or precision goes: we're in the "lucky" position of having tried both options. Trees record the type (and thus precision) of all compile-time integer constants. RTL doesn't. And the RTL way is a right pain. It leads to interfaces in which rtx arguments often have to be accompanied by a mode. And it leads to clunky differences like those between simplify_unary_operation (which takes two mode arguments) and simplify_binary_operation (which with our current set of binary operations requires only one). But the issue here is not that double_int doesn't record a mode or precision (or a sign). The issue is that CONST_DOUBLE and CONST_INT don't! The _tree_ INTEGER_CST contains of a type and a double-int. I see double-int (and the proposed wide-int) as a common building-block used for kind of "arbitrary precision" (as far as the target needs) integer constants on both tree and RTL. And having a common worker implementation requires it to operate on something different than tree types or RTL mode plus signedness. To put it another way: every wide_int operation needs to know the precision of its arguments and the precision of its result. That's not true. Every tree or RTL operation does, not every wide_int operation. double_int's are just twos-complement numbers of precision 2 * HOST_BITS_PER_WIDE_INT. wide_int's should be just twos-complement numbers of precision len * WHATEVER_HOST_COMPONENT_TYPE_IS_SUITABLE_FOR_A_FAST_IMPLEMENTATION. Operations on double_int and wide_int are bare-metal, in nearly all of the times you should use routines that do proper sign-/zero-extending to a possibly smaller precision. When using bare-metal you are supposed to do that yourself. Which is why I was suggesting to add double_sint, double_uint, double_sintp (with precision), etc., wrappers and operations. Not storing the precision in the wide_int is putting the onus on the caller to keep track of the precision separately. But that's a matter of providing something high-level ontop of the bare-metal double-int/wide-int (something shared between RTL and trees). Duplicating information in the bare-metal doesn't make sense (and no, INTE
Re: patch to fix constant math
On 10/05/2012 10:36 AM, Richard Guenther wrote: On Fri, Oct 5, 2012 at 4:14 PM, Richard Sandiford wrote: Richard Guenther writes: On Fri, Oct 5, 2012 at 3:18 PM, Richard Sandiford wrote: Richard Sandiford writes: How is CONST_WIDE_INT variable size? It's just the usual trailing variable-length array thing. Good. Do you get rid of CONST_DOUBLE (for integers) at the same time? Yeah. I initially thought it might be OK to keep them and have CONST_INT, integer CONST_DOUBLEs and CONST_WIDE_INT alongside each other. (The way the patch is structured means that the choice of whether to keep integer CONST_DOUBLEs can be changed very easily.) But Kenny convinced me it was a bad idea. Sorry to follow up on myself, but to clarify: I was talking about converted targets here. (As in, I originally thought even converted targets could continue to use integer CONST_DOUBLEs.) Unconverted targets continue to use CONST_DOUBLE. Why is it that not all targets are "converted"? What's the difficulty with that? I really do not like partially transitioning there. The problem is that CONST_DOUBLE as it exists today has two meanings: a floating-point meaning and an integer meaning. Ports that handle CONST_DOUBLEs are aware of this and expect the two things to have the same rtx code. Whereas in a converted port, the two things have different rtx codes, and the integers have a different representation from the current low/high pair. So to take the first hit in config/alpha/alpha.c, namely alpha_rtx_costs: case CONST_DOUBLE: if (x == CONST0_RTX (mode)) *total = 0; else if ((outer_code == PLUS && add_operand (x, VOIDmode)) || (outer_code == AND && and_operand (x, VOIDmode))) *total = 0; else if (add_operand (x, VOIDmode) || and_operand (x, VOIDmode)) *total = 2; else *total = COSTS_N_INSNS (2); return true; What could the patch do to make this work without modification? The middle two cases are for integers, but the first and last presumably apply to both integers and floats. I didn't say it does not require changes, just that the transition should be finished. Some ports have little CONST_DOUBLE uses (which is what I am grepping for), and if the max-wide-int-size matches that of the current CONST_DOUBLE there is little chance for code generation differences (in fact there should be none, correct?). In the current patch no target is converted (maybe that's just going to be 5/n?), I'd like to see at least one commonly tested target converted and if we don't convert all targets another commonly tested target to stay unconverted (just check gcc-testresults for which pair of targets that would work). Definitely at the end of stage3 we should have zero unconverted targets (but I doubt converting them is a huge task - I have converted the VECTOR_CST representation as well and we've had the location + BLOCK merge and other changes affecting all targets). Richard. i will convert ppc if that is what it takes. david's office is 4 isles away and mike has a lot of experience on ppc also. (this is unless richard is willing to do mips one afternoon.) I have done my two private ports but i understand that that should not count. Richard
Re: patch to fix constant math
richard s, there are two comments that i deferred to you. that have the word richard in them, richi, thank, i will start doing this now. kenny On 10/05/2012 09:49 AM, Richard Guenther wrote: On Fri, Oct 5, 2012 at 2:39 PM, Richard Guenther wrote: Ok, I see where you are going. Let me look at the patch again. * The introduction and use of CONST_SCALAR_INT_P could be split out (obvious and good) * DEF_RTL_EXPR(CONST_WIDE_INT, "const_wide_int", "", RTX_CONST_OBJ) defining that new RTX is orthogonal to providing the wide_int abstraction for operating on CONST_INT and CONST_DOUBLE, right? @@ -144,6 +144,7 @@ static bool prefer_and_bit_test (enum machine_mode mode, int bitnum) { bool speed_p; + wide_int mask = wide_int::set_bit_in_zero (bitnum, mode); set_bit_in_zero ... why use this instead of wide_int::zero (mode).set_bit (bitnum) that would match the old double_int_zero.set_bit interface and be more composed of primitives? adding something like this was just based on usage.We do this operation all over the place, why not make it concise and efficient. The api is very bottom up. I looked at what the clients were doing all over the place and added those functions. wide-int has both and_not and or_not. double-int only has and_not, but there are a lot of places in where you do or_nots, so we have or_not also. if (and_test == 0) { @@ -164,8 +165,7 @@ prefer_and_bit_test (enum machine_mode m } /* Fill in the integers. */ - XEXP (and_test, 1) -= immed_double_int_const (double_int_zero.set_bit (bitnum), mode); + XEXP (and_test, 1) = immed_wide_int_const (mask); I suppose immed_wide_int_const may create a CONST_INT, a CONST_DOUBLE or a CONST_WIDE_INT? yes, on converted targets it builds const_ints and const_wide_ints and on unconverted targets it builds const_ints and const_doubles.The reason i did not want to convert the targets is that the code that lives in the targets generally wants to use the values to create constants and this code is very very very target specific. +#if TARGET_SUPPORTS_WIDE_INT +/* Returns 1 if OP is an operand that is a CONST_INT or CONST_WIDE_INT. */ +int +const_scalar_int_operand (rtx op, enum machine_mode mode) +{ why is it necessary to condition anything on TARGET_SUPPORTS_WIDE_INT? It seems testing/compile coverage of this code will be bad ... Can't you simply map CONST_WIDE_INT to CONST_DOUBLE for targets The accessors for the two are completely different.const doubles "know" that there are exactly two hwi's. for const_wide_ints, you only know that the len is greater than 1. anything with len 1 would be CONST_INT. In a modern c++ world, const_int would be a subtype of const_int, but that is a huge patch. not supporting CONST_WIDE_INT (what does it mean to "support" CONST_WIDE_INT? Does a target that supports CONST_WIDE_INT no longer use CONST_DOUBLEs for integers?) yes, that is exactly what it means. + if (!CONST_WIDE_INT_P (op)) +return 0; hm, that doesn't match the comment (CONST_INT is supposed to return 1 as well). The comment doesn't really tell what the function does it seems, I need some context here to reply. + int prec = GET_MODE_PRECISION (mode); + int bitsize = GET_MODE_BITSIZE (mode); + + if (CONST_WIDE_INT_NUNITS (op) * HOST_BITS_PER_WIDE_INT > bitsize) + return 0; it's mode argument is not documented. And this doesn't even try to see if the constant would fit the mode anyway (like if it's zero). Not sure what the function is for. I will upgrade the comments, they were inherited from the old version with the const_double changed to the const_wide_int + { + /* Multiword partial int. */ + HOST_WIDE_INT x + = CONST_WIDE_INT_ELT (op, CONST_WIDE_INT_NUNITS (op) - 1); + return (wide_int::sext (x, prec & (HOST_BITS_PER_WIDE_INT - 1)) + == x); err - so now we have wide_int with a mode but we pass in another mode and see if they have the same value? Same value as-in signed value? Which probably is why you need to rule out different size constants above? Because you don't know whether to sign or zero extend? no it is worse than that. I made the decision, which i think is correct, that we were not going to carry the mode inside const_wide_int. The justification was that we do not do it for const_int.There is a comment in the constructor for const_wide_int that says it would be so easy to just put this in. But, this is an old api that did not change. only the internals changed to support const_wide_int. +/* Returns 1 if OP is an operand that is a CONST_WIDE_INT. */ +int +const_wide_int_operand (rtx op, enum machine_mode mode) +{ similar comments, common code should be factored out at least. Instead of conditioning a whole set of new function on supports-wide-int please add cases to the existing functions to avoid diverging in pieces that both kind of targ
Re: patch to fix constant math
On 10/05/2012 01:29 PM, Richard Sandiford wrote: +/* This is the maximal size of the buffer needed for dump. */ >>+const int MAX = (MAX_BITSIZE_MODE_ANY_INT / 4 >>++ MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 32); >> >>I'd prefer a target macro for this, not anything based off >>MAX_BITSIZE_MODE_ANY_INT just to allow no-op transition of a >>target to wide-ints (which IMHO should be the default for all targets, >>simplifying the patch). Kenny didn't comment on this, but I disagree. Part of the process of "converting" a port is to ween it off any assumptions about the number of HWIs. i also disagree.my patch is littered with are places like this. consider: @@ -5180,13 +4808,11 @@ static rtx simplify_immed_subreg (enum machine_mode outermode, rtx op, enum machine_mode innermode, unsigned int byte) { - /* We support up to 512-bit values (for V8DFmode). */ enum { -max_bitsize = 512, value_bit = 8, value_mask = (1 << value_bit) - 1 }; - unsigned char value[max_bitsize / value_bit]; + unsigned char value [MAX_BITSIZE_MODE_ANY_MODE/value_bit]; int value_start; int i; int elem; Would you want do have a target macro for this also? The code works no matter what. we do not need to be chasing this kind of problem all over the place. kenny
Re: patch to fix constant math - first small patch
this patch adds two groups of things to hwint.h that are needed for wide-int.[ch]. A new data type, the HOST_HALF_WIDE_INT (and all of it related macros). This type is defined to be exactly 1/2 the width of a HOST_WIDE_INT. This is used by the multiplication and division routines in wide-int.c. This is the type for the "digits" that are multiplied together or divided since you want to do the largest multiplication that you can do that yields a HOST_WIDE_INT as the result. The zext_hwi and sext_hwi are simple convenience functions on HWIs. These were moved from the wide-int.h in the original patch at the request of richi. ok for commit? Kenny 2012-10-5 Kenneth Zadeck * hwint.h (HOST_BITS_PER_HALF_WIDE_INT, HOST_HALF_WIDE_INT, HOST_HALF_WIDE_INT_PRINT, HOST_HALF_WIDE_INT_PRINT_C, HOST_HALF_WIDE_INT_PRINT_DEC, HOST_HALF_WIDE_INT_PRINT_DEC_C, HOST_HALF_WIDE_INT_PRINT_UNSIGNED, HOST_HALF_WIDE_INT_PRINT_HEX, HOST_HALF_WIDE_INT_PRINT_HEX_PURE): New symbols. (sext_hwi, zext_hwi): New functions. diff --git a/gcc/hwint.h b/gcc/hwint.h index ca47148..07c4748 100644 --- a/gcc/hwint.h +++ b/gcc/hwint.h @@ -77,6 +77,40 @@ extern char sizeof_long_long_must_be_8[sizeof(long long) == 8 ? 1 : -1]; # endif #endif +/* Print support for half a host wide int. */ +#define HOST_BITS_PER_HALF_WIDE_INT (HOST_BITS_PER_WIDE_INT / 2) +#if HOST_BITS_PER_HALF_WIDE_INT == HOST_BITS_PER_LONG +# define HOST_HALF_WIDE_INT long +# define HOST_HALF_WIDE_INT_PRINT HOST_LONG_FORMAT +# define HOST_HALF_WIDE_INT_PRINT_C "L" +# define HOST_HALF_WIDE_INT_PRINT_DEC "%" HOST_HALF_WIDE_INT_PRINT "d" +# define HOST_HALF_WIDE_INT_PRINT_DEC_C HOST_HALF_WIDE_INT_PRINT_DEC HOST_HALF_WIDE_INT_PRINT_C +# define HOST_HALF_WIDE_INT_PRINT_UNSIGNED "%" HOST_HALF_WIDE_INT_PRINT "u" +# define HOST_HALF_WIDE_INT_PRINT_HEX "%#" HOST_HALF_WIDE_INT_PRINT "x" +# define HOST_HALF_WIDE_INT_PRINT_HEX_PURE "%" HOST_HALF_WIDE_INT_PRINT "x" +#elif HOST_BITS_PER_HALF_WIDE_INT == HOST_BITS_PER_INT +# define HOST_HALF_WIDE_INT int +# define HOST_HALF_WIDE_INT_PRINT "" +# define HOST_HALF_WIDE_INT_PRINT_C "" +# define HOST_HALF_WIDE_INT_PRINT_DEC "%" HOST_HALF_WIDE_INT_PRINT "d" +# define HOST_HALF_WIDE_INT_PRINT_DEC_C HOST_HALF_WIDE_INT_PRINT_DEC HOST_HALF_WIDE_INT_PRINT_C +# define HOST_HALF_WIDE_INT_PRINT_UNSIGNED "%" HOST_HALF_WIDE_INT_PRINT "u" +# define HOST_HALF_WIDE_INT_PRINT_HEX "%#" HOST_HALF_WIDE_INT_PRINT "x" +# define HOST_HALF_WIDE_INT_PRINT_HEX_PURE "%" HOST_HALF_WIDE_INT_PRINT "x" +#elif HOST_BITS_PER_HALF_WIDE_INT == HOST_BITS_PER_SHORT +# define HOST_HALF_WIDE_INT short +# define HOST_HALF_WIDE_INT_PRINT "h" +# define HOST_HALF_WIDE_INT_PRINT_C "" +# define HOST_HALF_WIDE_INT_PRINT_DEC "%" HOST_HALF_WIDE_INT_PRINT "d" +# define HOST_HALF_WIDE_INT_PRINT_DEC_C HOST_HALF_WIDE_INT_PRINT_DEC HOST_HALF_WIDE_INT_PRINT_C +# define HOST_HALF_WIDE_INT_PRINT_UNSIGNED "%" HOST_HALF_WIDE_INT_PRINT "u" +# define HOST_HALF_WIDE_INT_PRINT_HEX "%#" HOST_HALF_WIDE_INT_PRINT "x" +# define HOST_HALF_WIDE_INT_PRINT_HEX_PURE "%" HOST_HALF_WIDE_INT_PRINT "x" +#else +#error Please add support for HOST_HALF_WIDE_INT +#endif + + #define HOST_WIDE_INT_1 HOST_WIDE_INT_C(1) /* This is a magic identifier which allows GCC to figure out the type @@ -94,9 +128,13 @@ typedef HOST_WIDE_INT __gcc_host_wide_int__; # if HOST_BITS_PER_WIDE_INT == 64 # define HOST_WIDE_INT_PRINT_DOUBLE_HEX \ "0x%" HOST_LONG_FORMAT "x%016" HOST_LONG_FORMAT "x" +# define HOST_WIDE_INT_PRINT_PADDED_HEX \ + "%016" HOST_LONG_FORMAT "x" # else # define HOST_WIDE_INT_PRINT_DOUBLE_HEX \ "0x%" HOST_LONG_FORMAT "x%08" HOST_LONG_FORMAT "x" +# define HOST_WIDE_INT_PRINT_PADDED_HEX \ + "%08" HOST_LONG_FORMAT "x" # endif #else # define HOST_WIDE_INT_PRINT HOST_LONG_LONG_FORMAT @@ -104,6 +142,8 @@ typedef HOST_WIDE_INT __gcc_host_wide_int__; /* We can assume that 'long long' is at least 64 bits. */ # define HOST_WIDE_INT_PRINT_DOUBLE_HEX \ "0x%" HOST_LONG_LONG_FORMAT "x%016" HOST_LONG_LONG_FORMAT "x" +# define HOST_WIDE_INT_PRINT_PADDED_HEX \ +"%016" HOST_LONG_LONG_FORMAT "x" #endif /* HOST_BITS_PER_WIDE_INT == HOST_BITS_PER_LONG */ #define HOST_WIDE_INT_PRINT_DEC "%" HOST_WIDE_INT_PRINT "d" @@ -277,4 +317,32 @@ extern HOST_WIDE_INT pos_mul_hwi (HOST_WIDE_INT, HOST_WIDE_INT); extern HOST_WIDE_INT mul_hwi (HOST_WIDE_INT, HOST_WIDE_INT); extern HOST_WIDE_INT least_common_multiple (HOST_WIDE_INT, HOST_WIDE_INT);
Re: patch to fix constant math - second small patch
This patch adds machinery to genmodes.c so that largest possible sizes of various data structures can be determined at gcc build time. These functions create 3 symbols that are available in insn-modes.h: MAX_BITSIZE_MODE_INT - the bitsize of the largest int. MAX_BITSIZE_MODE_PARTIAL_INT - the bitsize of the largest partial int. MAX_BITSIZE_MODE_ANY_INT - the largest bitsize of any kind of int. Index: gcc/genmodes.c === --- gcc/genmodes.c (revision 191978) +++ gcc/genmodes.c (working copy) @@ -849,6 +849,38 @@ calc_wider_mode (void) #define print_closer() puts ("};") +/* Compute the max bitsize of some of the classes of integers. It may + be that there are needs for the other integer classes, and this + code is easy to extend. */ +static void +emit_max_int (void) +{ + unsigned int max, mmax; + struct mode_data *i; + int j; + + puts (""); + for (max = 1, i = modes[MODE_INT]; i; i = i->next) +if (max < i->bytesize) + max = i->bytesize; + printf ("#define MAX_BITSIZE_MODE_INT %d*BITS_PER_UNIT\n", max); + mmax = max; + for (max = 1, i = modes[MODE_PARTIAL_INT]; i; i = i->next) +if (max < i->bytesize) + max = i->bytesize; + printf ("#define MAX_BITSIZE_MODE_PARTIAL_INT %d*BITS_PER_UNIT\n", max); + if (max > mmax) +mmax = max; + printf ("#define MAX_BITSIZE_MODE_ANY_INT %d*BITS_PER_UNIT\n", mmax); + + mmax = 0; + for (j = 0; j < MAX_MODE_CLASS; j++) +for (i = modes[j]; i; i = i->next) + if (mmax < i->bytesize) + mmax = i->bytesize; + printf ("#define MAX_BITSIZE_MODE_ANY_MODE %d*BITS_PER_UNIT\n", mmax); +} + static void emit_insn_modes_h (void) { @@ -913,6 +945,7 @@ enum machine_mode\n{"); #endif printf ("#define CONST_MODE_IBIT%s\n", adj_ibit ? "" : " const"); printf ("#define CONST_MODE_FBIT%s\n", adj_fbit ? "" : " const"); + emit_max_int (); puts ("\ \n\ #endif /* insn-modes.h */"); 2012-10-5 Kenneth Zadeck * genmodes.c (emit_max_int): New function. (emit_insn_modes_h): Added call to emit_max_function.
Re: patch to fix constant math - third small patch
This is the third patch in the series of patches to fix constant math. this one changes some predicates at the rtl level to use the new predicate CONST_SCALAR_INT_P. I did not include a few that were tightly intertwined with other changes. Not all of these changes are strictly mechanical. Richard, when reviewing this had me make additional changes to remove what he thought were latent bugs at the rtl level. However, it appears that the bugs were not latent.I do not know what is going on here but i am smart enough to not look a gift horse in the mouth. All of this was done on the same machine with no changes and identical configs. It is an x86-64 with ubuntu 12-4. ok for commit? in the logs below, gbBaseline is a trunk from friday and the gbWide is the same revision but with my patches. Some of this like gfortran.dg/pr32627 is obviously flutter, but the rest does not appear to be. = heracles:~/gcc(13) gccBaseline/contrib/compare_tests gbBaseline/gcc/testsuite/gcc/gcc.log gbWide/gcc/testsuite/gcc/gcc.log New tests that PASS: gcc.dg/builtins-85.c scan-assembler mysnprintf gcc.dg/builtins-85.c scan-assembler-not __chk_fail gcc.dg/builtins-85.c (test for excess errors) heracles:~/gcc(14) gccBaseline/contrib/compare_tests gbBaseline/gcc/testsuite/gfortran/gfortran.log gbWide/gcc/testsuite/gfortran/gfortran.log New tests that PASS: gfortran.dg/pr32627.f03 -O3 -fomit-frame-pointer -funroll-loops (test for excess errors) gfortran.dg/pr32627.f03 -O3 -fomit-frame-pointer (test for excess errors) gfortran.dg/pr32627.f03 -Os (test for excess errors) gfortran.dg/pr32635.f -O0 execution test gfortran.dg/pr32635.f -O0 (test for excess errors) gfortran.dg/substr_6.f90 -O2 (test for excess errors) Old tests that passed, that have disappeared: (Eeek!) gfortran.dg/pr32627.f03 -O1 (test for excess errors) gfortran.dg/pr32627.f03 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (test for excess errors) gfortran.dg/pr32627.f03 -O3 -g (test for excess errors) gfortran.dg/substring_equivalence.f90 -O (test for excess errors) Using /home/zadeck/gcc/gccBaseline/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. === g++ Summary === # of expected passes49793 # of expected failures284 # of unsupported tests601 runtest completed at Fri Oct 5 16:10:20 2012 heracles:~/gcc(16) tail gbWide/gcc/testsuite/g++/g++.log Using /usr/share/dejagnu/config/unix.exp as generic interface file for target. Using /home/zadeck/gcc/gccWide/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. === g++ Summary === # of expected passes50472 # of expected failures284 # of unsupported tests613 runtest completed at Fri Oct 5 19:51:50 2012
Re: patch to fix constant math
On 10/07/2012 08:47 AM, Richard Guenther wrote: len seems redundant unless you want to optimize encoding. >>len == (precision + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT. > >that is exactly what we do. However, we are optimizing time, not space. >the value is always stored in compressed form, i.e the same representation >as is used inside the CONST_WIDE_INTs and INT_CSTs. this makes the >transformation between them very fast. And since we do this a lot, it >needs to be fast. So the len is the number of HWIs needed to represent the >value which is typically less than what would be implied by the precision. But doesn't this require a sign? Otherwise how do you encode TImode 0x? Would len be 1 here? (and talking about CONST_WIDE_INT vs CONST_INT, wouldn't CONST_INT be used anyway for all ints we can encode "small"? Or is it (hopefully) the goal to replace CONST_INT with CONST_WIDE_INT everywhere?) Or do you say wide_int is always "signed', thus TImode 0x needs len == 2 and an explicit zero upper HWI word? the compression of this has len ==2 with the explicit upper being 0. Or do you say wide_int is always "unsigned", thus TImode -1 needs len == 2? Noting that double-ints were supposed to be twos-complement (thus, 'unsigned') numbers having implicit non-zero bits sounds error-prone. That said - I don't see any value in optimizing storage for short-lived (as you say) wide-ints (apart from limiting it to the mode size). For example mutating operations on wide-ints are not really possible if you need to extend storage. the compression used is independent of whether it is sign or unsigned. but the compression "looks" signed. i.e. you do not represent the upper hwi if they would just be a sign extension of the top hwi that is represented. However, this does not imply anything about the math that was used to set those bits that way, and you can always go back to the full rep independent of the sign. I do not have any long term plans to merge CONST_INT into CONST_WIDE_INT. It would be a huge change in the ports and would be fairly space inefficient. My guess is that in real code, less than 1% of real constants will have a length greater than 1 even on a 32 bit host. CONST_INT is very space efficient. This could have been mentioned as part of Richard's response to your comment about the way we represent the CONST_WIDE_INTs. In practice, there are almost none of them anyway. In fact, you could argue that the tree level did it wrong (not that i am suggesting to change this). But it makes me think what was going on when the decision to make TYPE_PRECISION be an INT_CST rather than just a HWI was made. For that there is an implication that it could never take more than a HWI since no place in the code even checks TREE_INT_CST_HIGH for these. >>+ enum Op { >>+NONE, >> >>we don't have sth like this in double-int.h, why was the double-int >>mechanism >>not viable? > >i have chosen to use enums for things rather than passing booleans. But it's bad to use an enum with 4 values for a thing that can only possibly expect two. You cannot optimize tests as easily. Please retain the bool uns parameters instead. I am breaking it into two enums.
Re: patch to fix constant math
On 10/07/2012 09:19 AM, Richard Guenther wrote: >In fact, you could argue that the tree level did it wrong (not that i am >suggesting to change this). But it makes me think what was going on when >the decision to make TYPE_PRECISION be an INT_CST rather than just a HWI was >made. For that there is an implication that it could never take more than >a HWI since no place in the code even checks TREE_INT_CST_HIGH for these. Well - on the tree level we now always have two HWIs for all INTEGER_CSTs. If we can, based on the size of the underlying mode, reduce that to one HWI we already win something. If we add an explicit length to allow a smaller encoding for larger modes (tree_base conveniently has an available 'int' for this ...) then we'd win in more cases. Thus, is CONST_INT really necessarily better than optimized CONST_WIDE_INT storage? i have to admit, that looking at these data structures gives me a headache. This all looks like something that Rube Goldberg would have invented had he done object oriented design (Richard S did not know who Rube Goldberg when i mentioned this name to him a few years ago since this is an American thing, but the british had their own equivalent and I assume the germans do too.). i did the first cut of changing the rtl level structure and Richard S threw up on it and suggested what is there now, which happily (for me) i was able to get mike to implement. mike also did the tree level version of the data structures for me. i will make sure he used that left over length field. The bottom line is that you most likely just save the length, but that is a big percent of something this small. Both of rtl ints have a mode, so if we can make that change later, it will be space neutral.
Re: patch to fix constant math - third small patch
yes, my bad. here it is with the patches. On 10/06/2012 11:55 AM, Kenneth Zadeck wrote: This is the third patch in the series of patches to fix constant math. this one changes some predicates at the rtl level to use the new predicate CONST_SCALAR_INT_P. I did not include a few that were tightly intertwined with other changes. Not all of these changes are strictly mechanical. Richard, when reviewing this had me make additional changes to remove what he thought were latent bugs at the rtl level. However, it appears that the bugs were not latent.I do not know what is going on here but i am smart enough to not look a gift horse in the mouth. All of this was done on the same machine with no changes and identical configs. It is an x86-64 with ubuntu 12-4. ok for commit? in the logs below, gbBaseline is a trunk from friday and the gbWide is the same revision but with my patches. Some of this like gfortran.dg/pr32627 is obviously flutter, but the rest does not appear to be. = heracles:~/gcc(13) gccBaseline/contrib/compare_tests gbBaseline/gcc/testsuite/gcc/gcc.log gbWide/gcc/testsuite/gcc/gcc.log New tests that PASS: gcc.dg/builtins-85.c scan-assembler mysnprintf gcc.dg/builtins-85.c scan-assembler-not __chk_fail gcc.dg/builtins-85.c (test for excess errors) heracles:~/gcc(14) gccBaseline/contrib/compare_tests gbBaseline/gcc/testsuite/gfortran/gfortran.log gbWide/gcc/testsuite/gfortran/gfortran.log New tests that PASS: gfortran.dg/pr32627.f03 -O3 -fomit-frame-pointer -funroll-loops (test for excess errors) gfortran.dg/pr32627.f03 -O3 -fomit-frame-pointer (test for excess errors) gfortran.dg/pr32627.f03 -Os (test for excess errors) gfortran.dg/pr32635.f -O0 execution test gfortran.dg/pr32635.f -O0 (test for excess errors) gfortran.dg/substr_6.f90 -O2 (test for excess errors) Old tests that passed, that have disappeared: (Eeek!) gfortran.dg/pr32627.f03 -O1 (test for excess errors) gfortran.dg/pr32627.f03 -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions (test for excess errors) gfortran.dg/pr32627.f03 -O3 -g (test for excess errors) gfortran.dg/substring_equivalence.f90 -O (test for excess errors) Using /home/zadeck/gcc/gccBaseline/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. === g++ Summary === # of expected passes49793 # of expected failures284 # of unsupported tests601 runtest completed at Fri Oct 5 16:10:20 2012 heracles:~/gcc(16) tail gbWide/gcc/testsuite/g++/g++.log Using /usr/share/dejagnu/config/unix.exp as generic interface file for target. Using /home/zadeck/gcc/gccWide/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. === g++ Summary === # of expected passes50472 # of expected failures284 # of unsupported tests613 runtest completed at Fri Oct 5 19:51:50 2012 diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 299150e..0404605 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -3633,9 +3633,8 @@ expand_debug_locations (void) gcc_assert (mode == GET_MODE (val) || (GET_MODE (val) == VOIDmode - && (CONST_INT_P (val) + && (CONST_SCALAR_INT_P (val) || GET_CODE (val) == CONST_FIXED -|| CONST_DOUBLE_AS_INT_P (val) || GET_CODE (val) == LABEL_REF))); } diff --git a/gcc/combine.c b/gcc/combine.c index 4e0a579..b531305 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -2617,16 +2617,19 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, constant. */ if (i1 == 0 && (temp = single_set (i2)) != 0 - && (CONST_INT_P (SET_SRC (temp)) - || CONST_DOUBLE_AS_INT_P (SET_SRC (temp))) + && CONST_SCALAR_INT_P (SET_SRC (temp)) && GET_CODE (PATTERN (i3)) == SET - && (CONST_INT_P (SET_SRC (PATTERN (i3))) - || CONST_DOUBLE_AS_INT_P (SET_SRC (PATTERN (i3 + && CONST_SCALAR_INT_P (SET_SRC (PATTERN (i3))) && reg_subword_p (SET_DEST (PATTERN (i3)), SET_DEST (temp))) { rtx dest = SET_DEST (PATTERN (i3)); int offset = -1; int width = 0; + + /* There are not explicit tests to make sure that this is not a + float, but there is code here that would not be correct if it + was. */ + gcc_assert (GET_MODE_CLASS (GET_MODE (SET_SRC (temp))) != MODE_FLOAT); if (GET_CODE (dest) == ZERO_EXTRACT) { @@ -5102,8 +5105,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy) if (GET_CODE (new_rtx) == CLOBBER && XEXP (new_rtx, 0) == const0_rtx) return new_rtx; - if (GET_CODE (x) == SUBREG - && (CONST_INT_P (new_rtx) || CONST_DOUBLE_AS_INT_P (new_rtx))) + if (GET_CODE (x) == SUBREG && CONST_SCALAR_INT_P (new_rtx)) { enum machine_mode mode = GET_MODE (x); @@ -7133,7 +7135,7 @@ make_extraction (
Re: patch to fix constant math - first small patch
is this ok to commit with this change? kenny On 10/05/2012 08:14 PM, Joseph S. Myers wrote: On Fri, 5 Oct 2012, Kenneth Zadeck wrote: +# define HOST_HALF_WIDE_INT_PRINT "h" This may cause problems on hosts not supporting %hd (MinGW?), and there's no real need for using "h" here given the promotion of short to int; you can just use "" (rather than e.g. needing special handling in xm-mingw32.h like is done for HOST_LONG_LONG_FORMAT).
Re: wide int patch #6: Replacement of hwi extraction from int-csts.
On 10/18/2012 06:22 AM, Richard Biener wrote: On Wed, Oct 17, 2012 at 11:47 PM, Kenneth Zadeck wrote: Richi, I apologize for the size of this patch, but it only does one very small thing, it is just that it does it all over the middle end. This patch introduces a new api for extracting a signed or unsigned hwi from an integer cst. There had been an abi for doing this, but it has some problems that were fixed here, and it was only used sometimes. The new abi consists of 6 functions, three for testing if the constant in the int cst will fit and three for actually pulling out the value. The ones for testing are tree_fits_uhwi_p, tree_fits_shwi_p, tree_fits_hwi_p. The first two of these are unsigned and signed versions, and the second one takes a boolean parameter which is true if the value is positive. This replaces the host_integerp which basically has the same functionality of tree_fits_hwi_p. The reason for changing this is that there were about 400 calls to host_integerp and all but 4 of them took a constant parameter. These names are chosen to be similar to the similar functions in wide-int and are more mnemonic that the existing name since they use the more readable u and s prefixes that a lot of other places do. On the accessing side, there is tree_to_uhwi, tree_to_shwi, and tree_to_hwi. The first two do checking when checking is enabled. The last does no checking. Just a quick note here - the changelog mentions tree_low_cst (as new function!?) but not host_integerp. You should probably get rid of both, otherwise uses will creap back as people are familiar with them (I'm not sure these changes for consistency are always good ...) i will fix this. I don't like that tree_to_hwi does not do any checking. In fact I don't like that it _exists_, after all it has a return type which signedness does not magically change. Unchecked conversions should use TREE_LOW_CST. the idea is that when wide-int goes in, there is actually no TREE_INT_CST_LOW. The concept of low and high go out the door. the int-cst will have an array in it that is big enough to hold the value. so tree_to_hwi becomes a short hand for just accessing the lower element of the array. you could argue that you should say tree_fits_uhwi_p followed by tree_to_uhwi (and the same for signed). This is an easy fix. it just seemed a little redundant. I should also point out that about 2/3 if this patch is going to die as the rest of the wide int stuff goes in. But i did not want to submit a patch that only converted 1/3 of the cases. The truth is that most of these places where you are doing this conversion are just because the people were too lazy to do the math at the full precision of the double int. Thus, my 2 cents before I really look at the patch (which will likely be next week only, so maybe you can do a followup with the above suggestions). Thanks, Richard. it is expected normally, the unchecked accesses follows an explicit check, or if there is no explicit check, then one of the checked ones follows. This is always not the case for places that just want to look at the bottom bits of some large number, such as several places that compute alignment. These just use the naked unchecked access. There were a lot of places that either did no checking or did the checking inline. This patch tracks down all of those places and they now use the same abi. There are three places where i found what appear to be bugs in the code. These are places where the existing code did an unsigned check and then followed it with a signed access (or visa versa). These places are marked with comments that contain the string "THE NEXT LINE IS POSSIBLY WRONG". With some guidance from the reviewer, i will fix these are remove the unacceptable comments. Aside from that, this patch is very very boring because it just makes this one transformation. Look for interesting stuff in tree.h. Everything else is just forcing everywhere to use a single abi. This patch could go on with a little work without the other 5 patches or it can wait for them. kenny
Re: wide int patch #6: Replacement of hwi extraction from int-csts.
you know richi, i did not know who i was actually talking to. i said who is this richard beiner person and then i saw the email address. On 10/18/2012 08:58 AM, Richard Biener wrote: On Thu, Oct 18, 2012 at 2:52 PM, Kenneth Zadeck wrote: On 10/18/2012 06:22 AM, Richard Biener wrote: On Wed, Oct 17, 2012 at 11:47 PM, Kenneth Zadeck wrote: Richi, I apologize for the size of this patch, but it only does one very small thing, it is just that it does it all over the middle end. This patch introduces a new api for extracting a signed or unsigned hwi from an integer cst. There had been an abi for doing this, but it has some problems that were fixed here, and it was only used sometimes. The new abi consists of 6 functions, three for testing if the constant in the int cst will fit and three for actually pulling out the value. The ones for testing are tree_fits_uhwi_p, tree_fits_shwi_p, tree_fits_hwi_p. The first two of these are unsigned and signed versions, and the second one takes a boolean parameter which is true if the value is positive. This replaces the host_integerp which basically has the same functionality of tree_fits_hwi_p. The reason for changing this is that there were about 400 calls to host_integerp and all but 4 of them took a constant parameter. These names are chosen to be similar to the similar functions in wide-int and are more mnemonic that the existing name since they use the more readable u and s prefixes that a lot of other places do. On the accessing side, there is tree_to_uhwi, tree_to_shwi, and tree_to_hwi. The first two do checking when checking is enabled. The last does no checking. Just a quick note here - the changelog mentions tree_low_cst (as new function!?) but not host_integerp. You should probably get rid of both, otherwise uses will creap back as people are familiar with them (I'm not sure these changes for consistency are always good ...) i will fix this. these are bugs in the changelog, not the code. new changelog included. I don't like that tree_to_hwi does not do any checking. In fact I don't like that it _exists_, after all it has a return type which signedness does not magically change. Unchecked conversions should use TREE_LOW_CST. the idea is that when wide-int goes in, there is actually no TREE_INT_CST_LOW. The concept of low and high go out the door. the int-cst will have an array in it that is big enough to hold the value. so tree_to_hwi becomes a short hand for just accessing the lower element of the array. you could argue that you should say tree_fits_uhwi_p followed by tree_to_uhwi (and the same for signed). This is an easy fix. it just seemed a little redundant. Well, if you want raw access to the lower element (when do you actually want that, when not in wide-int.c/h?) ... you still need to care about the signedness of the result. And tree_fits_uhwi_p does not return the same as tree_fits_shwi_p all the time. I don't see any goodness in tree_to_hwi nor tree_fits_hwi really. Because if you just access the lower word then that still has a sign (either HOST_WIDE_INT or unsigned HOST_WIDE_INT). We should get rid of those places - can you enumerate them? I think you said it was just a few callers with variable signedness argument. Note that tree_fits_hwi_p does check. it just takes a parameter to say if it wants signed or unsigned checking (it is the old host_integerp, repackaged). You really do need this function as it is for the 4 or 5 places it is called. The parameter to it is typically, but not always, the sign of the type of the int cst being passed to it. it is the tree_to_hwi that is unchecked. Most of the places are identified with comments. This survives the changelog. (i happen to be in the group of people that think changelogs are useless, and that we should do a better job of commenting the code.) I do not know if this is sloppyness or not, but the signedness that is checked rarely matches the sign of the variable that the value is assigned. I found this quite frustrating when i was making the patch but this kind of thing is common in code where the original writer "knew what (s)he was doing." Unless you are doing comparisons or shifting, the signedness of target does not really make much difference. if you want me to change the sequences of explicit checking and unchecked access to explicit checking followed by a checked access, then i am happy to do this. kenny Richard. I should also point out that about 2/3 if this patch is going to die as the rest of the wide int stuff goes in. But i did not want to submit a patch that only converted 1/3 of the cases. The truth is that most of these places where you are doing this conversion are just because the people were too lazy to do the math at the full precision of the double int. Thus, my 2 cents before I really look at the patch (which will likely be next week only, so maybe you ca
Re: wide int patch #7: Replacement of INT_CST_LT and INT_CST_LT_UNSIGNED
(((unsigned HOST_WIDE_INT) TREE_INT_CST_HIGH (A) \ -< (unsigned HOST_WIDE_INT) TREE_INT_CST_HIGH (B)) \ - || (((unsigned HOST_WIDE_INT) TREE_INT_CST_HIGH (A) \ - == (unsigned HOST_WIDE_INT) TREE_INT_CST_HIGH (B)) \ - && TREE_INT_CST_LOW (A) < TREE_INT_CST_LOW (B))) - struct GTY(()) tree_int_cst { struct tree_typed typed; double_int int_cst; @@ -5689,6 +5677,9 @@ extern const_tree strip_invariant_refs (const_tree); extern tree lhd_gcc_personality (void); extern void assign_assembler_name_if_neeeded (tree); extern void warn_deprecated_use (tree, tree); +extern HOST_WIDE_INT int_cst_sign_mask (const_tree, int); +extern bool tree_int_cst_lts_p (const_tree, const_tree); +extern bool tree_int_cst_ltu_p (const_tree, const_tree); /* In cgraph.c */ 2012-10-18 Kenneth Zadeck * c-family/c-common.c (shorten_compare): Now uses tree_int_cst_ltu_p and tree_int_cst_lts_p. * cp/call.c (type_passed_as, convert_for_arg_passing): Ditto. * cp/class.c (walk_subobject_offsets, walk_subobject_offsets, end_of_class, include_empty_classes, layout_class_type): Ditto. * cp/decl.c (compute_array_index_type): Ditto. * fold-const.c (fold_relational_const): Ditto. * java/expr.c (build_field_ref): Ditto. * targhooks.c (default_cxx_get_cookie_size): Ditto. * tree-ssa-uninit.c (is_value_included_in): Ditto. * tree-vrp.c (operand_less_p): Ditto. * tree.c (tree_int_cst_lt): Ditto. (int_cst_sign_mask, tree_int_cst_lts_p) (tree_int_cst_ltu_p): New functions. * tree.h (INT_CST_LT, INT_CST_LT_UNSIGNED): Remove.
Re: wide int patch #7: Replacement of INT_CST_LT and INT_CST_LT_UNSIGNED
On 10/19/2012 04:22 AM, Richard Biener wrote: On Thu, Oct 18, 2012 at 7:32 PM, Kenneth Zadeck wrote: This patch replaces all instances of INT_CST_LT and INT_CST_LT_UNSIGNED with the new functions tree_int_cst_lts_p and tree_int_cst_ltu_p. With the new implementation of int_cst these functions will be too big to do inline. These new function names are extremely confusing given that we already have tree_int_cst_lt which does the right thing based on the signedness of the INTEGER_CST trees. The whole point of the macros was to be inlined and you break that. That is, for example if (unsignedp && unsignedp0) { - min_gt = INT_CST_LT_UNSIGNED (primop1, minval); - max_gt = INT_CST_LT_UNSIGNED (primop1, maxval); - min_lt = INT_CST_LT_UNSIGNED (minval, primop1); - max_lt = INT_CST_LT_UNSIGNED (maxval, primop1); + min_gt = tree_int_cst_ltu_p (primop1, minval); + max_gt = tree_int_cst_ltu_p (primop1, maxval); + min_lt = tree_int_cst_ltu_p (minval, primop1); + max_lt = tree_int_cst_ltu_p (maxval, primop1); } else { - min_gt = INT_CST_LT (primop1, minval); - max_gt = INT_CST_LT (primop1, maxval); - min_lt = INT_CST_LT (minval, primop1); - max_lt = INT_CST_LT (maxval, primop1); + min_gt = tree_int_cst_lts_p (primop1, minval); ... could have just been min_gt = tree_int_cst_lt (primop1, minval); without any sign check. So if you think you need to kill the inlined variants please use the existing tree_int_cst_lt instead. no, they could not have. tree_int_cst_lt uses the signedness of the type to determine how to do the comparison.These two functions, as the macros they replace, force the comparison to be done independent of the signedness of the type. I do not know why we need to do this. I am just applying a plug compatible replacement here. I did not write this code, but I do not think that i can just do as you say here. Kenny Thanks, Richard. This is a small patch that has no prerequisites. Tested on x86-64. kenny
Re: wide int patch #7: Replacement of INT_CST_LT and INT_CST_LT_UNSIGNED
On 10/19/2012 07:13 AM, Richard Biener wrote: On Fri, Oct 19, 2012 at 12:59 PM, Kenneth Zadeck wrote: On 10/19/2012 04:22 AM, Richard Biener wrote: On Thu, Oct 18, 2012 at 7:32 PM, Kenneth Zadeck wrote: This patch replaces all instances of INT_CST_LT and INT_CST_LT_UNSIGNED with the new functions tree_int_cst_lts_p and tree_int_cst_ltu_p. With the new implementation of int_cst these functions will be too big to do inline. These new function names are extremely confusing given that we already have tree_int_cst_lt which does the right thing based on the signedness of the INTEGER_CST trees. The whole point of the macros was to be inlined and you break that. That is, for example if (unsignedp && unsignedp0) { - min_gt = INT_CST_LT_UNSIGNED (primop1, minval); - max_gt = INT_CST_LT_UNSIGNED (primop1, maxval); - min_lt = INT_CST_LT_UNSIGNED (minval, primop1); - max_lt = INT_CST_LT_UNSIGNED (maxval, primop1); + min_gt = tree_int_cst_ltu_p (primop1, minval); + max_gt = tree_int_cst_ltu_p (primop1, maxval); + min_lt = tree_int_cst_ltu_p (minval, primop1); + max_lt = tree_int_cst_ltu_p (maxval, primop1); } else { - min_gt = INT_CST_LT (primop1, minval); - max_gt = INT_CST_LT (primop1, maxval); - min_lt = INT_CST_LT (minval, primop1); - max_lt = INT_CST_LT (maxval, primop1); + min_gt = tree_int_cst_lts_p (primop1, minval); ... could have just been min_gt = tree_int_cst_lt (primop1, minval); without any sign check. So if you think you need to kill the inlined variants please use the existing tree_int_cst_lt instead. no, they could not have. tree_int_cst_lt uses the signedness of the type to determine how to do the comparison.These two functions, as the macros they replace, force the comparison to be done independent of the signedness of the type. Well, looking at the surrounding code it's indeed non-obvious that this would be a 1:1 transform. But then they should not compare _trees_ but instead compare double-ints (or wide-ints). That said, I still think having a tree_int_cst_lt[us]_p function is wrong. tree INTEGER_CSTs have a sign. (apart from that opinion we have tree_int_cst_lt and you introduce tree_int_cst_ltu_p - consistent would be tree_int_cst_ltu). This reply applies just as much to this patch as patch 6. I morally agree with you 100%.But the code does not agree with you. On patch 6, there are about 450 places where we want to take the lower hwi worth of bits out of a int cst.Of those, only 5 places use the function that allows the signedness to be passed in. The rest make the signedness decision based on the local code and completely ignore any information in the type.Of those 5 that do allow the signedness to be passed in, only three of them actually pass it in based on the signedness of the variable they are accessing. I am sure that a lot of these are wrong. But i could not tell you which are and which are not. luckily, a lot of this will go away with the full wide-int code because i just do most of this math in the full precision so the issue never comes up.But after i am finished, there will still be a fair number of places that do this. (luckily, a large number of them are pulling the number out and comparing it to the precision of something, so this is likely to be harmless no matter how the code is written). But to a large extent, you are shooting the messenger here, and not person who committed the crime. I will be happy to add some comments to point the clients of these to the one that looks at the type. In looking over the patch, the only obvious ones that could be changed are the ones in tree-ssa-uninit.c and the tree-vrp.c. The one in tree-vrp.c just looks like that the person writing the code did not know about tree_int_cst_lt and wrote the check out our himself. (i will fix this in the tree-vrp patch that i am working on now. The one in tree-ssa-uniunit looks correct. But beyond that, the rest are in the front ends and so i think that this as good as you get out of me. Kenny I do not know why we need to do this. I am just applying a plug compatible replacement here. I did not write this code, but I do not think that i can just do as you say here. So use the double-int interface in the places you substituted your new tree predicates. Yes, you'll have to touch that again when converting to wide-int - but if those places really want to ignore the sign of the tree they should not use a tree interface. Richard. Kenny Thanks, Richard. This is a small patch that has no prerequisites. Tested on x86-64. kenny
Re: wide int patch #7: Replacement of INT_CST_LT and INT_CST_LT_UNSIGNED
On 10/19/2012 07:58 AM, Richard Biener wrote: On Fri, Oct 19, 2012 at 1:49 PM, Kenneth Zadeck wrote: On 10/19/2012 07:13 AM, Richard Biener wrote: On Fri, Oct 19, 2012 at 12:59 PM, Kenneth Zadeck wrote: On 10/19/2012 04:22 AM, Richard Biener wrote: On Thu, Oct 18, 2012 at 7:32 PM, Kenneth Zadeck wrote: This patch replaces all instances of INT_CST_LT and INT_CST_LT_UNSIGNED with the new functions tree_int_cst_lts_p and tree_int_cst_ltu_p. With the new implementation of int_cst these functions will be too big to do inline. These new function names are extremely confusing given that we already have tree_int_cst_lt which does the right thing based on the signedness of the INTEGER_CST trees. The whole point of the macros was to be inlined and you break that. That is, for example if (unsignedp && unsignedp0) { - min_gt = INT_CST_LT_UNSIGNED (primop1, minval); - max_gt = INT_CST_LT_UNSIGNED (primop1, maxval); - min_lt = INT_CST_LT_UNSIGNED (minval, primop1); - max_lt = INT_CST_LT_UNSIGNED (maxval, primop1); + min_gt = tree_int_cst_ltu_p (primop1, minval); + max_gt = tree_int_cst_ltu_p (primop1, maxval); + min_lt = tree_int_cst_ltu_p (minval, primop1); + max_lt = tree_int_cst_ltu_p (maxval, primop1); } else { - min_gt = INT_CST_LT (primop1, minval); - max_gt = INT_CST_LT (primop1, maxval); - min_lt = INT_CST_LT (minval, primop1); - max_lt = INT_CST_LT (maxval, primop1); + min_gt = tree_int_cst_lts_p (primop1, minval); ... could have just been min_gt = tree_int_cst_lt (primop1, minval); without any sign check. So if you think you need to kill the inlined variants please use the existing tree_int_cst_lt instead. no, they could not have. tree_int_cst_lt uses the signedness of the type to determine how to do the comparison.These two functions, as the macros they replace, force the comparison to be done independent of the signedness of the type. Well, looking at the surrounding code it's indeed non-obvious that this would be a 1:1 transform. But then they should not compare _trees_ but instead compare double-ints (or wide-ints). That said, I still think having a tree_int_cst_lt[us]_p function is wrong. tree INTEGER_CSTs have a sign. (apart from that opinion we have tree_int_cst_lt and you introduce tree_int_cst_ltu_p - consistent would be tree_int_cst_ltu). This reply applies just as much to this patch as patch 6. I morally agree with you 100%.But the code does not agree with you. On patch 6, there are about 450 places where we want to take the lower hwi worth of bits out of a int cst.Of those, only 5 places use the function that allows the signedness to be passed in. The rest make the signedness decision based on the local code and completely ignore any information in the type.Of those 5 that do allow the signedness to be passed in, only three of them actually pass it in based on the signedness of the variable they are accessing. I am sure that a lot of these are wrong. But i could not tell you which are and which are not. luckily, a lot of this will go away with the full wide-int code because i just do most of this math in the full precision so the issue never comes up. But after i am finished, there will still be a fair number of places that do this. (luckily, a large number of them are pulling the number out and comparing it to the precision of something, so this is likely to be harmless no matter how the code is written). But to a large extent, you are shooting the messenger here, and not person who committed the crime. I will be happy to add some comments to point the clients of these to the one that looks at the type. In looking over the patch, the only obvious ones that could be changed are the ones in tree-ssa-uninit.c and the tree-vrp.c. The one in tree-vrp.c just looks like that the person writing the code did not know about tree_int_cst_lt and wrote the check out our himself. (i will fix this in the tree-vrp patch that i am working on now. The one in tree-ssa-uniunit looks correct. But beyond that, the rest are in the front ends and so i think that this as good as you get out of me. Well, if you transform bogus (by moral standards) code into other bogus code the whole point of your patch is to exchange the names of a set of macros / functions to another set of macros / functions. I see no point in that then. Leave broken code as-is. The more often you touch broken code and just mangle it in some way the harder it gets to get to the point that would maybe reveal the real intent of the original code. Sorry for the harsh words, but to take the example of INT_CST_LT and INT_CST_LT_UNSIGNED -- those are remanents of a world before double_ints existed. All uses should have been replaced by double_int usage by now; replacing them with something tree-ish is the wrong way.
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/23/2012 10:12 AM, Richard Biener wrote: On Tue, Oct 9, 2012 at 5:09 PM, Kenneth Zadeck wrote: This patch implements the wide-int class.this is a more general version of the double-int class and is meant to be the eventual replacement for that class.The use of this class removes all dependencies of the host from the target compiler's integer math. I have made all of the changes i agreed to in the earlier emails. In particular, this class internally maintains a bitsize and precision but not a mode. The class now is neutral about modes and tree-types.the functions that take modes or tree-types are just convenience functions that translate the parameters into bitsize and precision and where ever there is a call that takes a mode, there is a corresponding call that takes a tree-type. All of the little changes that richi suggested have also been made. The buffer sizes is now twice the size needed by the largest integer mode. This gives enough room for tree-vrp to do full multiplies on any type that the target supports. Tested on x86-64. This patch depends on the first three patches. I am still waiting on final approval on the hwint.h patch. Ok to commit? diff --git a/gcc/wide-int.h b/gcc/wide-int.h new file mode 100644 index 000..efd2c01 --- /dev/null +++ b/gcc/wide-int.h ... +#ifndef GENERATOR_FILE The whole file is guarded with that ... why? That is bound to be fragile once use of wide-int spreads? How do generator programs end up including this file if they don't need it at all? This is so that wide-int can be included at the level of the generators. There some stuff that needs to see this type that is done during the build build phase that cannot see the types that are included in wide-int.h. +#include "tree.h" +#include "hwint.h" +#include "options.h" +#include "tm.h" +#include "insn-modes.h" +#include "machmode.h" +#include "double-int.h" +#include +#include "insn-modes.h" + That's a lot of tree and rtl dependencies. double-int.h avoids these by placing conversion routines in different headers or by only resorting to types in coretypes.h. Please try to reduce the above to a minimum. + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. I still would like to have the ability to provide specializations of wide_int for "small" sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int<2> should be identical to double_int, thus we should be able to do typedef wide_int<2> double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. in double-int.h and replace its implementation with a specialization of wide_int. Due to a number of divergences (double_int is not a subset of wide_int) that doesn't seem easily possible (one reason is the ShiftOp and related enums you use). Of course wide_int is not a template either. For the hypotetical embedded target above we'd end up using wide_int<1>, a even more trivial specialization. I realize again this wide-int is not what your wide-int is (because you add a precision member). Still factoring out the "common"s of wide-int and double-int into a wide_int_raw <> template should be possible. +class wide_int { + /* Internal representation. */ + + /* VAL is set to a size that is capable of computing a full + multiplication on the largest mode that is represented on the + target. The full multiplication is use by tree-vrp. If + operations are added that require larger buffers, then VAL needs + to be changed. */ + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; + unsigned short len; + unsigned int bitsize; + unsigned int precision; The len, bitsize and precision members need documentation. At least one sounds redundant. + public: + enum ShiftOp { +NONE, NONE is never a descriptive name ... I suppose this is for arithmetic vs. logical shifts? suggest something +/* There are two uses for the wide-int shifting functions. The + first use is as an emulation of the target hardware. The + second use is as service routines for other optimizations. The + first case needs to be identified by passing TRUNC as the value + of ShiftOp so that shift amount is proper
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/23/2012 02:38 PM, Lawrence Crowl wrote: On 10/23/12, Kenneth Zadeck wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + inline bool minus_one_p () const; + inline bool zero_p () const; + inline bool one_p () const; + inline bool neg_p () const; what's wrong with w == -1, w == 0, w == 1, etc.? I would love to do this and you seem to be somewhat knowledgeable of c++. But i cannot for the life of me figure out how to do it. Starting from the simple case, you write an operator ==. as global operator: bool operator == (wide_int w, int i); as member operator: bool wide_int::operator == (int i); In the simple case, bool operator == (wide_int w, int i) { switch (i) { case -1: return w.minus_one_p (); case 0: return w.zero_p (); case 1: return w.one_p (); default: unexpected } } no, this seems wrong.you do not want to write code that can only fail at runtime unless there is a damn good reason to do that. say i have a TImode number, which must be represented in 4 ints on a 32 bit host (the same issue happens on 64 bit hosts, but the examples are simpler on 32 bit hosts) and i compare it to -1. The value that i am going to see as the argument of the function is going have the value 0x. but the value that i have internally is 128 bits. do i take this and 0 or sign extend it? What would you have done with w.minus_one_p ()? the code "knows" that -1 is a negative number and it knows the precision of w.That is enough information. So it logically builds a -1 that has enough bits to do the conversion. in particular if someone wants to compare a number to 0xdeadbeef i have no idea what to do. I tried defining two different functions, one that took a signed and one that took and unsigned number but then i wanted a cast in front of all the positive numbers. This is where it does get tricky. For signed arguments, you should sign extend. For unsigned arguments, you should not. At present, we need multiple overloads to avoid type ambiguities. bool operator == (wide_int w, long long int i); bool operator == (wide_int w, unsigned long long int i); inline bool operator == (wide_int w, long int i) { return w == (long long int) i; } inline bool operator (wide_int w, unsigned long int i) { return w == (unsigned long long int) i; } inline bool operator == (wide_int w, int i) { return w == (long long int) i; } inline bool operator (wide_int w, unsigned int i) { return w == (unsigned long long int) i; } (There is a proposal before the C++ committee to fix this problem.) Even so, there is room for potential bugs when wide_int does not carry around whether or not it is signed. The problem is that regardless of what the programmer thinks of the sign of the wide int, the comparison will use the sign of the int. when they do we can revisit this. but i looked at this and i said the potential bugs were not worth the effort. If there is a way to do this, then i will do it, but it is going to have to work properly for things larger than a HOST_WIDE_INT. The long-term solution, IMHO, is to either carry the sign information around in either the type or the class data. (I prefer type, but with a mechanism to carry it as data when needed.) Such comparisons would then require consistency in signedness between the wide int and the plain int. carrying the sign information is a non starter.The rtl level does not have it and the middle end violates it more often than not.My view was to design this having looked at all of the usage. I have basically converted the whole compiler before i released the abi. I am still getting out the errors and breaking it up in reviewable sized patches, but i knew very very well who my clients were before i wrote the abi. I know that double-int does some of this and it does not carry around a notion of signedness either. is this just code that has not been fully tested or is there a trick in c++ that i am missing? The double int class only provides == and !=, and only with other double ints. Otherwise, it has the same value query functions that you do above. In the case of double int, the goal was to simplify use of the existing semantics. If you are changing the semantics, consider incorporating sign explicitly. i have, and it does not work.
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/23/2012 04:25 PM, Lawrence Crowl wrote: On 10/23/12, Kenneth Zadeck wrote: On 10/23/2012 02:38 PM, Lawrence Crowl wrote: On 10/23/12, Kenneth Zadeck wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + inline bool minus_one_p () const; + inline bool zero_p () const; + inline bool one_p () const; + inline bool neg_p () const; what's wrong with w == -1, w == 0, w == 1, etc.? I would love to do this and you seem to be somewhat knowledgeable of c++. But i cannot for the life of me figure out how to do it. Starting from the simple case, you write an operator ==. as global operator: bool operator == (wide_int w, int i); as member operator: bool wide_int::operator == (int i); In the simple case, bool operator == (wide_int w, int i) { switch (i) { case -1: return w.minus_one_p (); case 0: return w.zero_p (); case 1: return w.one_p (); default: unexpected } } no, this seems wrong.you do not want to write code that can only fail at runtime unless there is a damn good reason to do that. Well, that's because it's the oversimplified case. :-) say i have a TImode number, which must be represented in 4 ints on a 32 bit host (the same issue happens on 64 bit hosts, but the examples are simpler on 32 bit hosts) and i compare it to -1. The value that i am going to see as the argument of the function is going have the value 0x. but the value that i have internally is 128 bits. do i take this and 0 or sign extend it? What would you have done with w.minus_one_p ()? the code "knows" that -1 is a negative number and it knows the precision of w. That is enough information. So it logically builds a -1 that has enough bits to do the conversion. And the code could also know that '-n' is a negative number and do the identical conversion. It would certainly be more difficult to write and get all the edge cases. I am not a c++ hacker. if someone wants to go there later, we can investigate this. but it seems like a can of worms right now. in particular if someone wants to compare a number to 0xdeadbeef i have no idea what to do. I tried defining two different functions, one that took a signed and one that took and unsigned number but then i wanted a cast in front of all the positive numbers. This is where it does get tricky. For signed arguments, you should sign extend. For unsigned arguments, you should not. At present, we need multiple overloads to avoid type ambiguities. bool operator == (wide_int w, long long int i); bool operator == (wide_int w, unsigned long long int i); inline bool operator == (wide_int w, long int i) { return w == (long long int) i; } inline bool operator (wide_int w, unsigned long int i) { return w == (unsigned long long int) i; } inline bool operator == (wide_int w, int i) { return w == (long long int) i; } inline bool operator (wide_int w, unsigned int i) { return w == (unsigned long long int) i; } (There is a proposal before the C++ committee to fix this problem.) Even so, there is room for potential bugs when wide_int does not carry around whether or not it is signed. The problem is that regardless of what the programmer thinks of the sign of the wide int, the comparison will use the sign of the int. when they do we can revisit this. but i looked at this and i said the potential bugs were not worth the effort. I won't disagree. I was answering what I thought were questions on what was possible. If there is a way to do this, then i will do it, but it is going to have to work properly for things larger than a HOST_WIDE_INT. The long-term solution, IMHO, is to either carry the sign information around in either the type or the class data. (I prefer type, but with a mechanism to carry it as data when needed.) Such comparisons would then require consistency in signedness between the wide int and the plain int. carrying the sign information is a non starter.The rtl level does not have it and the middle end violates it more often than not.My view was to design this having looked at all of the usage. I have basically converted the whole compiler before i released the abi. I am still getting out the errors and breaking it up in reviewable sized patches, but i knew very very well who my clients were before i wrote the abi. Okay. I know that double-int does some of this and it does not carry around a notion of signedness either. is this just code that has not been fully tested or is there a trick in c++ that i am missing? The double int class only provides == and !=, and only with other double ints. Otherwise, it has the same value query functions that you do above. In the case of double int, the goal was to simplify use of the existing semantics. If you are changing the semantics, consider incorporating sign explicitly. i have, and it does not work. Unfortunate. There is certainly a desire here not to le
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for "small" sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int<2> should be identical to double_int, thus we should be able to do typedef wide_int<2> double_int; If you want to go down this path after the patches get in, go for it.I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively.I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation.
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
jakub, i am hoping to get the rest of my wide integer conversion posted by nov 5. I am under some adverse conditions here: hurricane sandy hit her pretty badly. my house is hooked up to a small generator, and no one has any power for miles around. So far richi has promised to review them. he has sent some comments, but so far no reviews.Some time after i get the first round of them posted, i will do a second round that incorporates everyones comments. But i would like a little slack here if possible.While this work is a show stopper for my private port, the patches address serious problems for many of the public ports, especially ones that have very flexible vector units.I believe that there are significant set of latent problems currently with the existing ports that use ti mode that these patches will fix. However, i will do everything in my power to get the first round of the patches posted by nov 5 deadline. kenny On 10/29/2012 01:56 PM, Jakub Jelinek wrote: Status == I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Patches posted before the freeze, but reviewed shortly after the freeze, may still go in, further changes should be just bugfixes and documentation fixes. Quality Data Priority # Change from Last Report --- --- P1 23 + 23 P2 77 + 8 P3 85 + 84 --- --- Total 185 +115 Previous Report === http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html The next report will be sent by me again, announcing end of stage 1.
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
Richi, Let me explain to you what a broken api is. I have spent the last week screwing around with tree-vpn and as of last night i finally got it to work. In tree-vpn, it is clear that double-int is the precise definition of a broken api. The tree-vpn uses an infinite-precision view of arithmetic. However, that infinite precision is implemented on top of a finite, CARVED IN STONE, base that is and will always be without a patch like this, 128 bits on an x86-64.However, as was pointed out by earlier, tree-vrp needs 2 * the size of a type + 1 bit to work correctly.Until yesterday i did not fully understand the significance of that 1 bit. what this means is that tree-vrp does not work on an x86-64 with _int128 variables. There are no checks in tree-vrp to back off when it sees something too large, tree-vrp simply gets the wrong answer. To me, this is a broken api and is GCC at its very worst. The patches that required this SHOULD HAVE NEVER GONE INTO GCC. What you have with my patches is someone who is willing to fix a large and complex problem that should have been fixed years ago. I understand that you do not like several aspects of the wide-int api and i am willing to make some of those improvements. However, what i am worried about is that you are in some ways really attached to the style of programmed where everything is dependent on the size of a HWI.I will continue to push back on those comments but have been working the rest in as i have been going along. To answer your other question, it will be a significant problem if i cannot get these patches in. They are very prone to patch rot and my customer wants a product without many patches to the base code. Also, i fear that your real reason that you want to wait is because you really do not like the fact these patches get rid of double in and that style of programming and putting off that day serves no one well. kenny On 10/31/2012 05:59 AM, Richard Sandiford wrote: Richard Biener writes: On Tue, Oct 30, 2012 at 10:05 PM, Kenneth Zadeck wrote: jakub, i am hoping to get the rest of my wide integer conversion posted by nov 5. I am under some adverse conditions here: hurricane sandy hit her pretty badly. my house is hooked up to a small generator, and no one has any power for miles around. So far richi has promised to review them. he has sent some comments, but so far no reviews.Some time after i get the first round of them posted, i will do a second round that incorporates everyones comments. But i would like a little slack here if possible.While this work is a show stopper for my private port, the patches address serious problems for many of the public ports, especially ones that have very flexible vector units.I believe that there are significant set of latent problems currently with the existing ports that use ti mode that these patches will fix. However, i will do everything in my power to get the first round of the patches posted by nov 5 deadline. I suppose you are not going to merge your private port for 4.8 and thus the wide-int changes are not a show-stopper for you. That said, I considered the main conversion to be appropriate to be defered for the next stage1. There is no advantage in disrupting the tree more at this stage. I would like the wide_int class and rtl stuff to go in 4.8 though. IMO it's a significant improvement in its own right, and Kenny submitted it well before the deadline. Richard
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/31/2012 08:11 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford wrote: Richard Biener writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford wrote: Richard Biener writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for "small" sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int<2> should be identical to double_int, thus we should be able to do typedef wide_int<2> double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low & ~b.low; result.high = high & ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were complaining about too many fused ops. Clearly this one could be removed in favour of separate and() and not() operations, but why not provide a fused one if there are clients who'll make use of it? I was more concerned about fused operations that use precision or bitsize as input. That is for example + bool only_sign_
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
jakub my port has 256 bit integers. They are done by strapping together all of the elements of a vector unit. if one looks at where intel is going, they are doing exactly the same thing.The difference is that they like to add the operations one at a time rather than just do a clean implementation like we did. Soon they will get there, it is just a matter of time. i understand the tree-vrp code well enough to say that this operation does not work if you have timode, but i do not know how to translate that back into c to generate a test case.My patch to tree-vrp is adaptable in that it looks at the types in the program and adjusts its definition of infinite precision based on the code that it sees. I can point people to that code in tree vrp and am happy to do that, but that is not my priority now. also, richi pointed out that there are places in the tree level constant propagators that require infinite precision so he is really the person who both should know about this and generate proper tests. kenny On 10/31/2012 09:55 AM, Jakub Jelinek wrote: On Wed, Oct 31, 2012 at 09:44:50AM -0400, Kenneth Zadeck wrote: The tree-vpn uses an infinite-precision view of arithmetic. However, that infinite precision is implemented on top of a finite, CARVED IN STONE, base that is and will always be without a patch like this, 128 bits on an x86-64.However, as was pointed out by earlier, tree-vrp needs 2 * the size of a type + 1 bit to work correctly. Until yesterday i did not fully understand the significance of that 1 bit. what this means is that tree-vrp does not work on an x86-64 with _int128 variables. If you see a VRP bug, please file a PR with a testcase, or point to existing PR. I agree with richi that it would be better to add a clean wide_int implementation for 4.9, rather than rushing something in, introducing lots of bugs, just for a port that hasn't been submitted, nor I understand why > int128_t integer types are so crucial to your port, the vector support doesn't generally need very large integers, even if your vector unit is 256-bit, 512-bit or larger. Jakub
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/31/2012 10:05 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck wrote: On 10/31/2012 08:11 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford wrote: Richard Biener writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford wrote: Richard Biener writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for "small" sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int<2> should be identical to double_int, thus we should be able to do typedef wide_int<2> double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low & ~b.low; result.high = high & ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were complaining about too many fused ops. Clearly this one could be removed in favour of separate and() and not() operations, but why not provide a fused one if there are clients who'll make use of it? I was more conce
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/31/2012 09:54 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 2:30 PM, Richard Sandiford wrote: Richard Biener writes: But that means that wide_int has to model a P-bit operation as a "normal" len*HOST_WIDE_INT operation and then fix up the result after the fact, which seems unnecessarily convoluted. It does that right now. The operations are carried out in a loop over len HOST_WIDE_INT parts, the last HWI is then special-treated to account for precision/size. (yes, 'len' is also used as optimization - the fact that len ends up being mutable is another thing I dislike about wide-int. If wide-ints are cheap then all ops should be non-mutating (at least to 'len')). But the point of having a mutating len is that things like zero and -1 are common even for OImode values. So if you're doing someting potentially expensive like OImode multiplication, why do it to the number of HOST_WIDE_INTs needed for an OImode value when the value we're processing has only one significant HOST_WIDE_INT? I don't propose doing that. I propose that no wide-int member function may _change_ it's len (to something larger). Only that way you can avoid allocating wasted space for zero and -1. That way also the artificial limit on 2 * largest-int-mode-hwis goes. it is now 4x not 2x to accomodate the extra bit in tree-vrp. remember that the space burden is minimal.wide-ints are not persistent and there are never more than a handful at a time. I still don't see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision X*HOST_WIDE_INT operation for any X) has any special meaning. Well, the same reason as a HOST_WIDE_INT variable has a meaning. We use it to constrain what we (efficiently) want to work on. For example CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when doing bit-constant-propagation in loops (for TImode integers on a x86_64 host). But what about targets with modes wider than TImode? Would double_int still be appropriate then? If not, why does CCP have to use a templated type with a fixed number of HWIs (and all arithmetic done to a fixed number of HWIs) rather than one that can adapt to the runtime values, like wide_int can? Because nobody cares about accurate bit-tracking for modes larger than TImode. And because no convenient abstraction was available ;) yes, but tree-vrp does not even work for timode. and there are not tests to scale it back when it does see ti-mode. I understand that these can be added, but they so far have not been. I would also point out that i was corrected on this point by (i believe) lawrence. He points out that tree-vrp is still important for converting signed to unsigned for larger modes. Oh, and I don't necessary see a use of double_int in its current form but for an integer representation on the host that is efficient to manipulate integer constants of a target dependent size. For example the target detail that we have partial integer modes with bitsize > precision and that the bits > precision appearantly have a meaning when looking at the bit-representation of a constant should not be part of the base class of wide-int (I doubt it belongs to wide-int at all, but I guess you know more about the reason we track bitsize in addition to precision - I think it's abstraction at the wrong level, the tree level does fine without knowing about bitsize). TBH I'm uneasy about the bitsize thing too. I think bitsize is only tracked for shift truncation, and if so, I agree it makes sense to do that separately. So, can we please remove all traces of bitsize from wide-int then? But anyway, this whole discussion seems to have reached a stalemate. Or I suppose a de-facto rejection, since you're the only person in a position to approve the thing :-) There are many (silent) people that are able to approve the thing. But the point is I have too many issues with the current patch that I'm unable to point at a specific thing I want Kenny to change after which the patch would be fine. So I rely on some guesswork from Kenny giving my advices "leaner API", "less fused ops", "get rid of bitsize", "think of abstracting the core HWI[len] operation", "there should be no tree or RTL dependencies in the wide-int API" to produce an updated variant. Which of course takes time, which of course crosses my vacation, which in the end means it isn't going to make 4.8 (I _do_ like the idea of not having a dependence on host properties for integer constant representation). Btw, a good hint at what a minimal wide-int API would look like is if you _just_ replace double-int users with it. Then you obviously have to implement only the double-int interface and conversion from/to double-int. Richard. Richard
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/31/2012 10:24 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 3:18 PM, Kenneth Zadeck wrote: On 10/31/2012 10:05 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck wrote: On 10/31/2012 08:11 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford wrote: Richard Biener writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford wrote: Richard Biener writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for "small" sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int<2> should be identical to double_int, thus we should be able to do typedef wide_int<2> double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low & ~b.low; result.high = high & ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were complaining about too many fused ops. Clearly this one could be removed in favour of separate and() and not() opera
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/31/2012 08:44 AM, Richard Biener wrote: On Wed, Oct 31, 2012 at 1:22 PM, Richard Sandiford wrote: Richard Biener writes: On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford wrote: Richard Biener writes: On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford wrote: Richard Biener writes: On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck wrote: On 10/25/2012 06:42 AM, Richard Biener wrote: On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump wrote: On Oct 24, 2012, at 2:43 AM, Richard Biener wrote: On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck wrote: On 10/23/2012 10:12 AM, Richard Biener wrote: + HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT]; are we sure this rounds properly? Consider a port with max byte mode size 4 on a 64bit host. I do not believe that this can happen. The core compiler includes all modes up to TI mode, so by default we already up to 128 bits. And mode bitsizes are always power-of-two? I suppose so. Actually, no, they are not. Partial int modes can have bit sizes that are not power of two, and, if there isn't an int mode that is bigger, we'd want to round up the partial int bit size. Something like ((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT should do it. I still would like to have the ability to provide specializations of wide_int for "small" sizes, thus ideally wide_int would be a template templated on the number of HWIs in val. Interface-wise wide_int<2> should be identical to double_int, thus we should be able to do typedef wide_int<2> double_int; If you want to go down this path after the patches get in, go for it. I see no use at all for this. This was not meant to be a plug in replacement for double int. This goal of this patch is to get the compiler to do the constant math the way that the target does it. Any such instantiation is by definition placing some predefined limit that some target may not want. Well, what I don't really like is that we now have two implementations of functions that perform integer math on two-HWI sized integers. What I also don't like too much is that we have two different interfaces to operate on them! Can't you see how I come to not liking this? Especially the latter … double_int is logically dead. Reactoring wide-int and double-int is a waste of time, as the time is better spent removing double-int from the compiler. All the necessary semantics and code of double-int _has_ been refactored into wide-int already. Changing wide-int in any way to vend anything to double-int is wrong, as once double-int is removed, then all the api changes to make double-int share from wide-int is wasted and must then be removed. The path forward is the complete removal of double-int; it is wrong, has been wrong and always will be wrong, nothing can change that. double_int, compared to wide_int, is fast and lean. I doubt we will get rid of it - you will make compile-time math a _lot_ slower. Just profile when you for example change get_inner_reference to use wide_ints. To be able to remove double_int in favor of wide_int requires _at least_ templating wide_int on 'len' and providing specializations for 1 and 2. It might be a non-issue for math that operates on trees or RTXen due to the allocation overhead we pay, but in recent years we transitioned important paths away from using tree math to using double_ints _for speed reasons_. Richard. i do not know why you believe this about the speed. double int always does synthetic math since you do everything at 128 bit precision. the thing about wide int, is that since it does math to the precision's size, it almost never does uses synthetic operations since the sizes for almost every instance can be done using the native math on the machine. almost every call has a check to see if the operation can be done natively. I seriously doubt that you are going to do TI mode math much faster than i do it and if you do who cares. the number of calls does not effect the performance in any negative way and it fact is more efficient since common things that require more than one operation in double in are typically done in a single operation. Simple double-int operations like inline double_int double_int::and_not (double_int b) const { double_int result; result.low = low & ~b.low; result.high = high & ~b.high; return result; } are always going to be faster than conditionally executing only one operation (but inside an offline function). OK, this is really in reply to the 4.8 thing, but it felt more appropriate here. It's interesting that you gave this example, since before you were complaining about too many fused ops. Clearly this one could be removed in favour of separate and() and not() operations, but why not provide a fused one if there are clients who'll make use of it? I was more concerned about fused o
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
I was not planning to do that mangling for 4.8.My primary justification for getting it in publicly now is that there are a large number of places where the current compiler (both at the tree and rtl levels) do not do optimization of the value is larger than a single hwi.My code generalizes all of these places so that they do the transformations independent of the size of the hwi. (in some cases at the rtl level, the transformations were only done on 32 bit or smaller types, but i have seen nothing like that at the tree level.) This provides benefits for cross compilers and for ports that support timode now. The fact that i have chosen to do it in such a way that we will never have this problem again is the part of the patch that richi seems to object to. We have patches that do the mangling for 256 for the front ends but we figured that we would post those for comments. These are likely to be controversial because the require extensions to the syntax to accept large constants. But there is no reason why the patches that fix the existing problems in a general way should not be considered for this release. Kenny On 10/31/2012 10:27 AM, Jakub Jelinek wrote: On Wed, Oct 31, 2012 at 10:04:58AM -0400, Kenneth Zadeck wrote: if one looks at where intel is going, they are doing exactly the same thing.The difference is that they like to add the operations one at a time rather than just do a clean implementation like we did. Soon they will get there, it is just a matter of time. All I see on Intel is whole vector register shifts (and like on many other ports and/or/xor/andn could be considered whole register too). And, even if your port has 256-bit integer arithmetics, there is no mangling for __int256_t or similar, so I don't see how you can offer such data type as supported in the 4.8 timeframe. Jakub
Re: patch to fix constant math - 4th patch - the wide-int class.
On 10/31/2012 09:30 AM, Richard Sandiford wrote: Richard Biener writes: But that means that wide_int has to model a P-bit operation as a "normal" len*HOST_WIDE_INT operation and then fix up the result after the fact, which seems unnecessarily convoluted. It does that right now. The operations are carried out in a loop over len HOST_WIDE_INT parts, the last HWI is then special-treated to account for precision/size. (yes, 'len' is also used as optimization - the fact that len ends up being mutable is another thing I dislike about wide-int. If wide-ints are cheap then all ops should be non-mutating (at least to 'len')). But the point of having a mutating len is that things like zero and -1 are common even for OImode values. So if you're doing someting potentially expensive like OImode multiplication, why do it to the number of HOST_WIDE_INTs needed for an OImode value when the value we're processing has only one significant HOST_WIDE_INT? I think with a little thought i can add some special constructors and get rid of the mutating aspects of the interface. I still don't see why a full-precision 2*HOST_WIDE_INT operation (or a full-precision X*HOST_WIDE_INT operation for any X) has any special meaning. Well, the same reason as a HOST_WIDE_INT variable has a meaning. We use it to constrain what we (efficiently) want to work on. For example CCP might iterate up to 2 * HOST_BITS_PER_WIDE_INT times when doing bit-constant-propagation in loops (for TImode integers on a x86_64 host). But what about targets with modes wider than TImode? Would double_int still be appropriate then? If not, why does CCP have to use a templated type with a fixed number of HWIs (and all arithmetic done to a fixed number of HWIs) rather than one that can adapt to the runtime values, like wide_int can? Oh, and I don't necessary see a use of double_int in its current form but for an integer representation on the host that is efficient to manipulate integer constants of a target dependent size. For example the target detail that we have partial integer modes with bitsize > precision and that the bits > precision appearantly have a meaning when looking at the bit-representation of a constant should not be part of the base class of wide-int (I doubt it belongs to wide-int at all, but I guess you know more about the reason we track bitsize in addition to precision - I think it's abstraction at the wrong level, the tree level does fine without knowing about bitsize). TBH I'm uneasy about the bitsize thing too. I think bitsize is only tracked for shift truncation, and if so, I agree it makes sense to do that separately. But anyway, this whole discussion seems to have reached a stalemate. Or I suppose a de-facto rejection, since you're the only person in a position to approve the thing :-) Richard
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
Jakub, it is hard from all of the threads to actually distill what the real issues are here. So let me start from a clean slate and state them simply. Richi has three primary objections: 1) that we can do all of this with a templated version of double-int. 2) that we should not be passing in a precision and bitsize into the interface. 3) that the interface is too large. I have attached a fragment of my patch #5 to illustrate the main thrust of my patches and to illustrate the usefulness to gcc right now. In the current trunk, we have code that does simplification when the mode fits in an HWI and we have code that does the simplification if the mode fits in two HWIs. if the mode does not fit in two hwi's the code does not do the simplification. Thus here and in a large number of other places we have two copies of the code.Richi wants there to be multiple template instantiations of double-int.This means that we are now going to have to have 3 copies of this code to support oi mode on a 64 bit host and 4 copies on a 32 bit host. Further note that there are not as many cases for the 2*hwi in the code as their are for the hwi case and in general this is true through out the compiler. (CLRSB is missing from the 2hwi case in the patch) We really did not write twice the code when we stated supporting 2 hwi, we added about 1.5 times the code (simplify-rtx is better than most of the rest of the compiler). I am using the rtl level as an example here because i have posted all of those patches, but the tree level is no better. I do not want to write this code a third time and certainly not a fourth time. Just fixing all of this is quite useful now: it fills in a lot of gaps in our transformations and it removes many edge case crashes because ti mode really is lightly tested. However, this patch becomes crucial as the world gets larger. Richi's second point is that we should be doing everything at "infinite precision" and not passing in an explicit bitsize and precision. That works ok (sans the issues i raised with it in tree-vpn earlier) when the largest precision on the machine fits in a couple of hwis.However, for targets that have large integers or cross compilers, this becomes expensive.The idea behind my set of patches is that for the transformations that can work this way, we do the math in the precision of the type or mode. In general this means that almost all of the math will be done quickly, even on targets that support really big integers. For passes like tree-vrp, the math will be done at some multiple of the largest type seen in the actual program.The amount of the multiple is a function of the optimization, not the target or the host. Currently (on my home computer) the wide-int interface allows the optimization to go 4x the largest mode on the target. I can get rid of this bound at the expense of doing an alloca rather than stack allocating a fixed sized structure.However, given the extremely heavy use of this interface, that does not seem like the best of tradeoffs. The truth is that the vast majority of the compiler actually wants to see the math done the way that it is going to be done on the machine. Tree-vrp and the gimple constant prop do not. But i have made accommodations to handle both needs.I believe that the reason that double-int was never used at the rtl level is that it does not actually do the math in a way that is useful to the target. Richi's third objection is that the interface is too large. I disagree. It was designed based on the actual usage of the interface. When i found places where i was writing the same code over and over again, i put it in a function as part of the interface. I later went back and optimized many of these because this is a very heavily used interface. Richi has many other objections, but i have agreed to fix almost all of them, so i am not going to address them here. It really will be a huge burden to have to carry these patched until the next revision. We are currently in stage 1 and i believe that the minor issues that richi raises can be easily addressed. kenny @@ -1373,302 +1411,87 @@ simplify_const_unary_operation (enum rtx_code code, enum machine_mode mode, return CONST_DOUBLE_FROM_REAL_VALUE (d, mode); } - if (CONST_INT_P (op) - && width <= HOST_BITS_PER_WIDE_INT && width > 0) + if (CONST_SCALAR_INT_P (op) && width > 0) { - HOST_WIDE_INT arg0 = INTVAL (op); - HOST_WIDE_INT val; + wide_int result; + enum machine_mode imode = op_mode == VOIDmode ? mode : op_mode; + wide_int op0 = wide_int::from_rtx (op, imode); + +#if TARGET_SUPPORTS_WIDE_INT == 0 + /* This assert keeps the simplification from producing a result + that cannot be represented in a CONST_DOUBLE but a lot of + upstream callers expect that this function never fails to + simplify something and so you if you a
Re: patch to canonize small wide-ints.
Richard and others, This patch changes the way that unsigned trees are canonicalized. As with the last patch, I think that this patch is not that much of a step forward. While it is still true that the signed tree-csts match the representation of wide-ints, unsigned tree-csts do not and require conversion on the way in and out, as before. The only thing is that the conversion is different. Again, if Richard wants it in then i will commit it. Kenny Index: gcc/tree.c === --- gcc/tree.c (revision 203039) +++ gcc/tree.c (working copy) @@ -1187,10 +1187,10 @@ wide_int_to_tree (tree type, const wide_ tree t; int ix = -1; int limit = 0; - int i; + unsigned int i; gcc_assert (type); - int prec = TYPE_PRECISION (type); + unsigned int prec = TYPE_PRECISION (type); signop sgn = TYPE_SIGN (type); /* Verify that everything is canonical. */ @@ -1204,11 +1204,11 @@ wide_int_to_tree (tree type, const wide_ } wide_int cst = wide_int::from (pcst, prec, sgn); - int len = int (cst.get_len ()); - int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1); + unsigned int len = int (cst.get_len ()); + unsigned int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1); bool recanonize = sgn == UNSIGNED -&& (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len -&& small_prec; +&& small_prec +&& (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len; switch (TREE_CODE (type)) { @@ -1235,7 +1235,7 @@ wide_int_to_tree (tree type, const wide_ case INTEGER_TYPE: case OFFSET_TYPE: - if (TYPE_UNSIGNED (type)) + if (TYPE_SIGN (type) == UNSIGNED) { /* Cache 0..N */ limit = INTEGER_SHARE_LIMIT; @@ -1294,7 +1294,7 @@ wide_int_to_tree (tree type, const wide_ must be careful here because tree-csts and wide-ints are not canonicalized in the same way. */ gcc_assert (TREE_TYPE (t) == type); - gcc_assert (TREE_INT_CST_NUNITS (t) == len); + gcc_assert (TREE_INT_CST_NUNITS (t) == (int)len); if (recanonize) { len--; @@ -1321,7 +1321,10 @@ wide_int_to_tree (tree type, const wide_ TREE_VEC_ELT (TYPE_CACHED_VALUES (type), ix) = t; } } - else if (cst.get_len () == 1) + else if (cst.get_len () == 1 + && (TYPE_SIGN (type) == SIGNED + || recanonize + || cst.elt (0) >= 0)) { /* 99.99% of all int csts will fit in a single HWI. Do that one efficiently. */ @@ -1351,14 +1354,29 @@ wide_int_to_tree (tree type, const wide_ for the gc to take care of. There will not be enough of them to worry about. */ void **slot; - tree nt = make_int_cst (len); - TREE_INT_CST_NUNITS (nt) = len; + tree nt; + if (!recanonize + && TYPE_SIGN (type) == UNSIGNED + && cst.elt (len - 1) < 0) + { + unsigned int blocks_needed + = (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT; + + nt = make_int_cst (blocks_needed + 1); + for (i = len; i < blocks_needed; i++) + TREE_INT_CST_ELT (nt, i) = (HOST_WIDE_INT)-1; + + TREE_INT_CST_ELT (nt, blocks_needed) = 0; + } + else + nt = make_int_cst (len); if (recanonize) { len--; TREE_INT_CST_ELT (nt, len) = zext_hwi (cst.elt (len), small_prec); } - for (int i = 0; i < len; i++) + + for (i = 0; i < len; i++) TREE_INT_CST_ELT (nt, i) = cst.elt (i); TREE_TYPE (nt) = type; @@ -10556,7 +10574,8 @@ widest_int_cst_value (const_tree x) #if HOST_BITS_PER_WIDEST_INT > HOST_BITS_PER_WIDE_INT gcc_assert (HOST_BITS_PER_WIDEST_INT >= HOST_BITS_PER_DOUBLE_INT); - gcc_assert (TREE_INT_CST_NUNITS (x) <= 2); + gcc_assert (TREE_INT_CST_NUNITS (x) <= 2 + || (TREE_INT_CST_NUNITS (x) == 3 && TREE_INT_CST_ELT (x, 2) == 0)); if (TREE_INT_CST_NUNITS (x) == 1) val = ((HOST_WIDEST_INT)val << HOST_BITS_PER_WIDE_INT) >> HOST_BITS_PER_WIDE_INT; @@ -10565,7 +10584,8 @@ widest_int_cst_value (const_tree x) << HOST_BITS_PER_WIDE_INT); #else /* Make sure the sign-extended value will fit in a HOST_WIDE_INT. */ - gcc_assert (TREE_INT_CST_NUNITS (x) == 1); + gcc_assert (TREE_INT_CST_NUNITS (x) == 1 + || (TREE_INT_CST_NUNITS (x) == 2 && TREE_INT_CST_ELT (x, 1) == 0)); #endif if (bits < HOST_BITS_PER_WIDEST_INT) Index: gcc/tree.h === --- gcc/tree.h (revision 203039) +++ gcc/tree.h (working copy) @@ -3050,7 +3050,8 @@ cst_fits_shwi_p (const_tree x) if (TREE_CODE (x) != INTEGER_CST) return false; - return TREE_INT_CST_NUNITS (x) == 1; + return TREE_INT_CST_NUNITS (x) == 1 +|| (TREE_INT_CST_NUNITS (x) == 2 && TREE_INT_CST_ELT (x, 1) == 0); } /* Checks that X is integer constant that can be expressed in signed @@ -3093,7 +3094,7 @@ tree_fits_uhwi_p (const_tree cst) /* For numbers of unsigned type that are longer than a HWI, if the top bit of the bottom word
Re: patch to canonize small wide-ints.
On 10/03/2013 10:29 AM, Richard Sandiford wrote: Kenneth Zadeck writes: + /* Got to be careful of precision 0 values. */ + if (precision) +len = MIN (len, max_len); + if (TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED) { - if (precision < HOST_BITS_PER_WIDE_INT - && TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED) + unsigned int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1); + if (small_prec) { - /* The rep of wide-int is signed, so if the value comes from -an unsigned int_cst, we have to sign extend it to make it -correct. */ - scratch[0] = sext_hwi (val[0], precision); - return wi::storage_ref (scratch, 1, precision); + /* We have to futz with this because the canonization for +short unsigned numbers in wide-int is different from the +canonized short unsigned numbers in the tree-cst. */ + if (len == max_len) + { + for (unsigned int i = 0; i < len - 1; i++) + scratch[i] = val[i]; + scratch[len - 1] = sext_hwi (val[len - 1], precision); + return wi::storage_ref (scratch, len, precision); + } } - /* Otherwise we can use the constant as-is when not extending. */ - return wi::storage_ref (val, len, precision); + + unsigned int xprecision = get_precision (x); + len = wi::force_to_size (scratch, val, len, xprecision, precision, UNSIGNED); + return wi::storage_ref (scratch, len, precision); } - /* Widen the constant according to its sign. */ - len = wi::force_to_size (scratch, val, len, xprecision, precision, - TYPE_SIGN (TREE_TYPE (x))); - return wi::storage_ref (scratch, len, precision); + /* Signed and the rest of the unsigned cases are easy. */ + return wi::storage_ref (val, len, precision); AFAICT, no unsigned cases fall through to the "rest of the unsigned cases" statement. You are seeing the result of me being corner cased to death. Changing the representation of unsigned constants is only worthwhile if we can avoid the force_to_size for some unsigned cases. I think we can avoid it for precision >= xprecision && !small_prec. Either we should take the hit of doing that comparison (but see below) or the change isn't worthwhile. i think that it is something closer to precision >= xprecision + HOST_BITS_PER_WIDE_INT && ... The problem is that the tree cst may have one extra block beyond the precision. for my example say HBPWI is 8, not 64. The uint16 constant 0x must be represented by 3 hwi's in the tree-cst as 00 ff ff. This is why the patch came out today rather than yesterday. I can fire up a run with this, if you want. I was thinking that we should always be able to use the constant as-is for max_wide_int-based and addr_wide_int-based operations. The small_prec again, you can get edge cased to death here.i think it would work for max because that really is bigger than anything else, but it is possible (though unlikely) to have something big converted to an address by truncation. case will get optimised away (because precision is a compile-time constant) and the force_to_size shouldn't be necessary (because "max" and "addr" should always be wide enough). Could we assert for precision >= xprecision instead of the force_to_size? Or are there cases in which we implicitly truncate tree constants for (variable-precision) wide_int-based operations? again the max is safe by definition.The addr is inherently unsafe because i believe that some of the places where the values are converted to addr-wide-int come from tree-csts where the type comes from the source program. Thanks, Richard
Re: patch to canonize unsigned tree-csts
I was thinking that we should always be able to use the constant as-is for max_wide_int-based and addr_wide_int-based operations. The small_prec again, you can get edge cased to death here.i think it would work for max because that really is bigger than anything else, but it is possible (though unlikely) to have something big converted to an address by truncation. But I'd have expected that conversion to be represented by an explicit CONVERT_EXPR or NOP_EXPR. It seems wrong to use addr_wide_int directly on something that isn't bit- or byte-address-sized. It'd be the C equivalent of int + long -> int rather than the expected int + long -> long. Same goes for wide_int. If we're doing arithmetic at a specific precision, it seems odd for one of the inputs to be wider and yet not have an explicit truncation. you miss the second reason why we needed addr-wide-int. A large amount of the places where the addressing arithmetic is done are not "type safe".Only the gimple and rtl that is translated from the source code is really type safe. In passes like the strength reduction where they just "grab things from all over", the addr-wide-int or the max-wide-int are safe haven structures that are guaranteed to be large enough to not matter.So what i fear here is something like a very wide loop counter being grabbed into some address calculation. Thanks, Richard
Re: patch to canonicalize tree-csts.
So here is a patch with the change. As before, bootstrapped an tested on x86-64. On 10/03/2013 12:16 PM, Richard Sandiford wrote: Kenneth Zadeck writes: Changing the representation of unsigned constants is only worthwhile if we can avoid the force_to_size for some unsigned cases. I think we can avoid it for precision >= xprecision && !small_prec. Either we should take the hit of doing that comparison (but see below) or the change isn't worthwhile. i think that it is something closer to precision >= xprecision + HOST_BITS_PER_WIDE_INT && ... The problem is that the tree cst may have one extra block beyond the precision. Ah, OK. I was thinking that we should always be able to use the constant as-is for max_wide_int-based and addr_wide_int-based operations. The small_prec again, you can get edge cased to death here.i think it would work for max because that really is bigger than anything else, but it is possible (though unlikely) to have something big converted to an address by truncation. But I'd have expected that conversion to be represented by an explicit CONVERT_EXPR or NOP_EXPR. It seems wrong to use addr_wide_int directly on something that isn't bit- or byte-address-sized. It'd be the C equivalent of int + long -> int rather than the expected int + long -> long. Same goes for wide_int. If we're doing arithmetic at a specific precision, it seems odd for one of the inputs to be wider and yet not have an explicit truncation. Thanks, Richard Index: gcc/tree.c === --- gcc/tree.c (revision 203039) +++ gcc/tree.c (working copy) @@ -1187,10 +1187,10 @@ wide_int_to_tree (tree type, const wide_ tree t; int ix = -1; int limit = 0; - int i; + unsigned int i; gcc_assert (type); - int prec = TYPE_PRECISION (type); + unsigned int prec = TYPE_PRECISION (type); signop sgn = TYPE_SIGN (type); /* Verify that everything is canonical. */ @@ -1204,11 +1204,11 @@ wide_int_to_tree (tree type, const wide_ } wide_int cst = wide_int::from (pcst, prec, sgn); - int len = int (cst.get_len ()); - int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1); + unsigned int len = int (cst.get_len ()); + unsigned int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1); bool recanonize = sgn == UNSIGNED -&& (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len -&& small_prec; +&& small_prec +&& (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len; switch (TREE_CODE (type)) { @@ -1235,7 +1235,7 @@ wide_int_to_tree (tree type, const wide_ case INTEGER_TYPE: case OFFSET_TYPE: - if (TYPE_UNSIGNED (type)) + if (TYPE_SIGN (type) == UNSIGNED) { /* Cache 0..N */ limit = INTEGER_SHARE_LIMIT; @@ -1294,7 +1294,7 @@ wide_int_to_tree (tree type, const wide_ must be careful here because tree-csts and wide-ints are not canonicalized in the same way. */ gcc_assert (TREE_TYPE (t) == type); - gcc_assert (TREE_INT_CST_NUNITS (t) == len); + gcc_assert (TREE_INT_CST_NUNITS (t) == (int)len); if (recanonize) { len--; @@ -1321,7 +1321,10 @@ wide_int_to_tree (tree type, const wide_ TREE_VEC_ELT (TYPE_CACHED_VALUES (type), ix) = t; } } - else if (cst.get_len () == 1) + else if (cst.get_len () == 1 + && (TYPE_SIGN (type) == SIGNED + || recanonize + || cst.elt (0) >= 0)) { /* 99.99% of all int csts will fit in a single HWI. Do that one efficiently. */ @@ -1351,14 +1354,29 @@ wide_int_to_tree (tree type, const wide_ for the gc to take care of. There will not be enough of them to worry about. */ void **slot; - tree nt = make_int_cst (len); - TREE_INT_CST_NUNITS (nt) = len; + tree nt; + if (!recanonize + && TYPE_SIGN (type) == UNSIGNED + && cst.elt (len - 1) < 0) + { + unsigned int blocks_needed + = (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT; + + nt = make_int_cst (blocks_needed + 1); + for (i = len; i < blocks_needed; i++) + TREE_INT_CST_ELT (nt, i) = (HOST_WIDE_INT)-1; + + TREE_INT_CST_ELT (nt, blocks_needed) = 0; + } + else + nt = make_int_cst (len); if (recanonize) { len--; TREE_INT_CST_ELT (nt, len) = zext_hwi (cst.elt (len), small_prec); } - for (int i = 0; i < len; i++) + + for (i = 0; i < len; i++) TREE_INT_CST_ELT (nt, i) = cst.elt (i); TREE_TYPE (nt) = type; @@ -10556,7 +10574,8 @@ widest_int_cst_value (const_tree x) #if HOST_BITS_PER_WIDEST_INT > HOST_BITS_PER_WIDE_INT gcc_assert (HOST_BITS_PER_WIDEST_INT >= HOST_BITS_PER_DOUBLE_INT); - gcc_assert (TREE_INT_CST_NUNITS (x) <= 2); + gcc_assert (TREE_INT_CST_NUNITS (x) <= 2 + || (TREE_INT_CST_NUNITS (x) == 3 && TREE_INT_CST_ELT (x
Re: patch to canonize unsigned tree-csts
On 10/04/2013 01:00 PM, Richard Sandiford wrote: I was hoping Richard would weigh in here. In case not... Kenneth Zadeck writes: I was thinking that we should always be able to use the constant as-is for max_wide_int-based and addr_wide_int-based operations. The small_prec again, you can get edge cased to death here.i think it would work for max because that really is bigger than anything else, but it is possible (though unlikely) to have something big converted to an address by truncation. But I'd have expected that conversion to be represented by an explicit CONVERT_EXPR or NOP_EXPR. It seems wrong to use addr_wide_int directly on something that isn't bit- or byte-address-sized. It'd be the C equivalent of int + long -> int rather than the expected int + long -> long. Same goes for wide_int. If we're doing arithmetic at a specific precision, it seems odd for one of the inputs to be wider and yet not have an explicit truncation. you miss the second reason why we needed addr-wide-int. A large amount of the places where the addressing arithmetic is done are not "type safe".Only the gimple and rtl that is translated from the source code is really type safe. In passes like the strength reduction where they just "grab things from all over", the addr-wide-int or the max-wide-int are safe haven structures that are guaranteed to be large enough to not matter.So what i fear here is something like a very wide loop counter being grabbed into some address calculation. It still seems really dangerous to be implicitly truncating a wider type to addr_wide_int. It's not something we'd ever do in mainline, because uses of addr_wide_int are replacing uses of double_int, and double_int is our current maximum-width representation. Using addr_wide_int rather than max_wide_int is an optimisation. IMO part of implementating that optimisation should be to introduce explicit truncations whenever we try to use addr_wide_int to operate on inputs than might be wider than addr_wide_int. So I still think the assert is the way to go. addr_wide_int is not as much of an optimization as it is documentation of what you are doing - i.e. this is addressing arithmetic. My justification for putting it in was that we wanted a sort of an abstract type to say that this was not just user math, it was addressing arithmetic and that the ultimate result is going to be slammed into a target pointer. I was only using that as an example to try to indicate that I did not think that it was wrong if someone did truncate. In particular, would you want the assert to be that the value was truncated or that the type of the value would allow numbers that would be truncated? I actually think neither. If a programmer uses a long long on a 32 bit machine for some index variable and slams that into a pointer, he either knows what he is doing or has made a mistake.do you really think that the compiler should ice? Thanks, Richard
Re: patch to canonize unsigned tree-csts
i added the assertion that richard requested and tested this on x86-64. committed as revision 203602. On 10/06/2013 05:13 AM, Richard Sandiford wrote: Kenneth Zadeck writes: On 10/04/2013 01:00 PM, Richard Sandiford wrote: I was hoping Richard would weigh in here. In case not... Kenneth Zadeck writes: I was thinking that we should always be able to use the constant as-is for max_wide_int-based and addr_wide_int-based operations. The small_prec again, you can get edge cased to death here.i think it would work for max because that really is bigger than anything else, but it is possible (though unlikely) to have something big converted to an address by truncation. But I'd have expected that conversion to be represented by an explicit CONVERT_EXPR or NOP_EXPR. It seems wrong to use addr_wide_int directly on something that isn't bit- or byte-address-sized. It'd be the C equivalent of int + long -> int rather than the expected int + long -> long. Same goes for wide_int. If we're doing arithmetic at a specific precision, it seems odd for one of the inputs to be wider and yet not have an explicit truncation. you miss the second reason why we needed addr-wide-int. A large amount of the places where the addressing arithmetic is done are not "type safe".Only the gimple and rtl that is translated from the source code is really type safe. In passes like the strength reduction where they just "grab things from all over", the addr-wide-int or the max-wide-int are safe haven structures that are guaranteed to be large enough to not matter.So what i fear here is something like a very wide loop counter being grabbed into some address calculation. It still seems really dangerous to be implicitly truncating a wider type to addr_wide_int. It's not something we'd ever do in mainline, because uses of addr_wide_int are replacing uses of double_int, and double_int is our current maximum-width representation. Using addr_wide_int rather than max_wide_int is an optimisation. IMO part of implementating that optimisation should be to introduce explicit truncations whenever we try to use addr_wide_int to operate on inputs than might be wider than addr_wide_int. So I still think the assert is the way to go. addr_wide_int is not as much of an optimization as it is documentation of what you are doing - i.e. this is addressing arithmetic. My justification for putting it in was that we wanted a sort of an abstract type to say that this was not just user math, it was addressing arithmetic and that the ultimate result is going to be slammed into a target pointer. I was only using that as an example to try to indicate that I did not think that it was wrong if someone did truncate. In particular, would you want the assert to be that the value was truncated or that the type of the value would allow numbers that would be truncated? I actually think neither. I'm arguing for: gcc_assert (precision >= xprecision); in wi::int_traits ::decompose. IIRC one of the reasons for wanting addr_wide_int rather than wide_int was that we wanted a type that could handle both bit and byte sizes. And we wanted to be able to convert between bits and bytes seamlessly. That means that shifting is a valid operation for addr_wide_int. But if we also implicitly (and that's the key word) used addr_wide_int directly on tree constants that are wider than addr_wide_int, and say shifted the result right, the answer would be different from what you'd get if you did the shift in max_wide_int. That seems like new behaviour, since all address arithmetic is effectively done to maximum precision on mainline. It's just that the maximum on mainline is rather small. If code is looking through casts to see wider-than-addr_wide_int types, I think it's reasonable for that code to have to explicitly force the tree to addr_wide_int size, via addr_wide_int::from. Leaving it implicit seems too subtle and also means that every caller to wi::int_traits ::decompose does a check that is usually unnecessary. If a programmer uses a long long on a 32 bit machine for some index variable and slams that into a pointer, he either knows what he is doing or has made a mistake.do you really think that the compiler should ice? No, I'm saying that passes that operate on addr_wide_ints while "grabbing trees from all over" (still not sure what that means in practice) should explicitly mark places where a truncation is deliberately allowed. Those places then guarantee that the dropped bits wouldn't affect any of the later calculations, which is something only the pass itself can know. We already forbid direct assignments like: addr_wide_int x = max_wide_int(...); at compile time, for similar reasons. Thanks, Richard Index: gcc/tree.c === --- gcc/tree.c (revisio
Re: [wide-int] int_traits
On 10/16/2013 09:55 AM, Richard Biener wrote: Speaking in terms of a patch: Index: tree.h === --- tree.h (revision 203690) +++ tree.h (working copy) @@ -5184,14 +5184,11 @@ wi::int_traits ::decompose ( / HOST_BITS_PER_WIDE_INT); unsigned int xprecision = get_precision (x); - gcc_assert (precision >= xprecision); + gcc_checking_assert (precision >= xprecision && precision != 0); - /* Got to be careful of precision 0 values. */ - if (precision) -len = MIN (len, max_len); if (TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED) { - unsigned int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1); + unsigned int small_prec = xprecision & (HOST_BITS_PER_WIDE_INT - 1); if (small_prec) { /* We have to futz with this because the canonization for @@ -5201,7 +5198,7 @@ wi::int_traits ::decompose ( { for (unsigned int i = 0; i < len - 1; i++) scratch[i] = val[i]; - scratch[len - 1] = sext_hwi (val[len - 1], precision); + scratch[len - 1] = sext_hwi (val[len - 1], small_prec); return wi::storage_ref (scratch, len, precision); } } the precision 0 thing is gone as far as I understand? There is still the case where the front ends create types with precision 0, and that causes a few odd cases, but the precision 0 that you refer to is gone. Also sign-extending the upper hwi must be done with precision % HOST_BITS_PER_WIDE_INT, no? And it assumes that we only ever have to extend the upper most HWI, thus 'len' is never too big. yes I note that we immediately return in the above case, so if (precision < xprecision + HOST_BITS_PER_WIDE_INT) { len = wi::force_to_size (scratch, val, len, xprecision, precision, UNSIGNED); return wi::storage_ref (scratch, len, precision); } applies only when the desired precision is a multiple of a HWI. yes I assume it adds that extra zero word in case the MSB is set, right? yes I am confused about the condition written here and how we look at precision and not xprecision when deciding how to extend - given a 8-bit precision unsigned 0xff and precision == 64 we do not even consider sign-extending the value because we look at precision and not xprecision. Don't we need to look at xprecision here? I do not think so. There are three cases here: 1) xprecision < precision: say xprecision = 8 and precision = 32. The number is between 0 and 255 and after canonicalization the number will be between 0 and 255. 2) xprecision == precision not much to talk about here. 3) xprecision > precision We need to loose the upper bits of the input and it does not matter what they were.The only thing that we are concerned about is that the bits above what is now the sign bit pointed to by precision are matched in the bits above precision. I do think we could change is that the "if (precision < xprecision + HWI)" to become "else if (...)", which will save a test. After all an assert precision == xprecision does not work in this routine. Quickly execute.exp tested only. To avoid code quality wreckage we have to implement a different way of allocating the 'scratch' storage of wide_int_ref_storage (well, ideally we wouldn't need it ...). I thought of simply allocating it via alloca (), but we need to do that in all callers that build a wide_int_ref (eventually via a hidden default arg). But as that's a template specialization of generic_wide_int ... so the option is to provide a function wrapper inside wi:: for this, like I want richard and mike to be the people who respond to the next point.I am not a c++ person and all of this storage manager layer stuff gives me a headache. template wide_int_ref wi::ref (const T &t, HOST_WIDE_INT *scratch = XALLOCAVEC (HOST_WIDE_INT, WIDE_INT_MAX_ELTS)) { return wide_int_ref (t, scratch); } and amend the storage constructors to get a scratch argument. The reason for all this is that otherwise the wide_int_ref object escapes when we pass the scratch storage to the workers in int_traits . Thanks, Richard.
powerpc support for TARGET_SUPPORTS_WIDE_INT.
This patch makes the PPC the first public target to use the CONST_WIDE_INT representation rather than CONST_DOUBLES to represent larger integers. There are two parts to this patch. The changes to genrecog.c represent fixes that would be needed for any port that supports CONST_WIDE_INTS and the rest are changes specific to the rs6000 port. Bootstrapped and tested on a ppc64. Committed as revision 204710. Kenny Index: gcc/genrecog.c === --- gcc/genrecog.c (revision 203701) +++ gcc/genrecog.c (working copy) @@ -588,6 +588,7 @@ validate_pattern (rtx pattern, rtx insn, && GET_CODE (src) != PC && GET_CODE (src) != CC0 && !CONST_INT_P (src) + && !CONST_WIDE_INT_P (src) && GET_CODE (src) != CALL) { const char *which; @@ -772,13 +773,14 @@ add_to_sequence (rtx pattern, struct dec We can optimize the generated code a little if either (a) the predicate only accepts one code, or (b) the - predicate does not allow CONST_INT, in which case it - can match only if the modes match. */ + predicate does not allow CONST_INT or CONST_WIDE_INT, + in which case it can match only if the modes match. */ pred = lookup_predicate (pred_name); if (pred) { test->u.pred.data = pred; - allows_const_int = pred->codes[CONST_INT]; + allows_const_int = (pred->codes[CONST_INT] +|| pred->codes[CONST_WIDE_INT]); if (was_code == MATCH_PARALLEL && pred->singleton != PARALLEL) message_with_line (pattern_lineno, Index: gcc/recog.c === --- gcc/recog.c (revision 203701) +++ gcc/recog.c (working copy) @@ -1198,7 +1198,8 @@ const_scalar_int_operand (rtx op, enum m /* Returns 1 if OP is an operand that is a CONST_WIDE_INT of mode MODE. This most likely is not as useful as - const_scalar_int_operand, but is here for consistancy. */ + const_scalar_int_operand since it does not accept CONST_INTs, but + is here for consistancy. */ int const_wide_int_operand (rtx op, enum machine_mode mode) { Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 203701) +++ gcc/config/rs6000/predicates.md (working copy) @@ -19,7 +19,7 @@ ;; Return 1 for anything except PARALLEL. (define_predicate "any_operand" - (match_code "const_int,const_double,const,symbol_ref,label_ref,subreg,reg,mem")) + (match_code "const_int,const_double,const_wide_int,const,symbol_ref,label_ref,subreg,reg,mem")) ;; Return 1 for any PARALLEL. (define_predicate "any_parallel_operand" @@ -596,7 +596,7 @@ (define_predicate "easy_vector_constant_ ;; Return 1 if operand is constant zero (scalars and vectors). (define_predicate "zero_constant" - (and (match_code "const_int,const_double,const_vector") + (and (match_code "const_int,const_double,const_wide_int,const_vector") (match_test "op == CONST0_RTX (mode)"))) ;; Return 1 if operand is 0.0. @@ -790,7 +790,7 @@ (define_predicate "logical_operand" ;; Return 1 if op is a constant that is not a logical operand, but could ;; be split into one. (define_predicate "non_logical_cint_operand" - (and (match_code "const_int,const_double") + (and (match_code "const_int,const_wide_int") (and (not (match_operand 0 "logical_operand")) (match_operand 0 "reg_or_logical_cint_operand" @@ -1058,7 +1058,7 @@ (define_predicate "current_file_function ;; Return 1 if this operand is a valid input for a move insn. (define_predicate "input_operand" (match_code "symbol_ref,const,reg,subreg,mem, - const_double,const_vector,const_int") + const_double,const_wide_int,const_vector,const_int") { /* Memory is always valid. */ if (memory_operand (op, mode)) @@ -1071,8 +1071,7 @@ (define_predicate "input_operand" /* Allow any integer constant. */ if (GET_MODE_CLASS (mode) == MODE_INT - && (GET_CODE (op) == CONST_INT - || GET_CODE (op) == CONST_DOUBLE)) + && CONST_SCALAR_INT_P (op)) return 1; /* Allow easy vector constants. */ @@ -,7 +1110,7 @@ (define_predicate "input_operand" ;; Return 1 if this operand is a valid input for a vsx_splat insn. (define_predicate "splat_input_operand" (match_code "symbol_ref,const,reg,subreg,mem, - const_double,const_vector,const_int") + const_double,const_wide_int,const_vector,const_int") { if (MEM_P (op)) { Index: gcc/config/rs6000/darwin.h === --- gcc/config/rs6000/darwin.h (revision 203701) +++ gcc/config/rs6000/darwin.h (working copy) @@ -421,3 +421,4 @@ do { \ /* So far, there is no rs6000_fold_builtin, if one is introduced, then this will need to be modified similar to the x86 case. */ #define TARGET_FOLD_BUILTIN SUBTARGET_FOLD_BUILTIN + Index: gcc/config/r
Re: [wide-int] int_traits
On 10/16/2013 02:45 PM, Richard Sandiford wrote: Kenneth Zadeck writes: I note that we immediately return in the above case, so if (precision < xprecision + HOST_BITS_PER_WIDE_INT) { len = wi::force_to_size (scratch, val, len, xprecision, precision, UNSIGNED); return wi::storage_ref (scratch, len, precision); } applies only when the desired precision is a multiple of a HWI. yes I assume it adds that extra zero word in case the MSB is set, right? yes I am confused about the condition written here and how we look at precision and not xprecision when deciding how to extend - given a 8-bit precision unsigned 0xff and precision == 64 we do not even consider sign-extending the value because we look at precision and not xprecision. Don't we need to look at xprecision here? I do not think so. There are three cases here: 1) xprecision < precision: say xprecision = 8 and precision = 32. The number is between 0 and 255 and after canonicalization the number will be between 0 and 255. 2) xprecision == precision not much to talk about here. 3) xprecision > precision We need to loose the upper bits of the input and it does not matter what they were.The only thing that we are concerned about is that the bits above what is now the sign bit pointed to by precision are matched in the bits above precision. (3) is gone with the assert though. you seem to be correct here. As mentioned in my message yesterday, I thought your new way of canonising unsigned tree constants meant that there was always an upper zero bit. Is that right? i believe this is correct. If so, xprecision < precision is a no-op, because the number always has the right form for wider precisions. The only difficult case is xprecision == precision, since then we need to peel off any upper -1 HWIs. say my HWI size is 8 bits (just to keep from typing a million 'f's. if i have a 16 bit unsigned number that is all 1s in the tree world it is 3 hwis 0x00 0xff 0xff. but inside regular wide int, it would take 1 wide int whose value is 0xff. inside of max it would be the same as the tree, but then the test precision < xprecision + hbpwi never kicks in because precision is guaranteed to be huge. inside of addr_wide_int, i think we tank with the assertion. the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. If that's right and xprecision == precision can use val with an adjusted len, then... After all an assert precision == xprecision does not work in this routine. Quickly execute.exp tested only. To avoid code quality wreckage we have to implement a different way of allocating the 'scratch' storage of wide_int_ref_storage (well, ideally we wouldn't need it ...). I thought of simply allocating it via alloca (), but we need to do that in all callers that build a wide_int_ref (eventually via a hidden default arg). But as that's a template specialization of generic_wide_int ... so the option is to provide a function wrapper inside wi:: for this, like I want richard and mike to be the people who respond to the next point.I am not a c++ person and all of this storage manager layer stuff gives me a headache. ...the use of the scratch array goes away completely for addr_wide_int and max_wide_int. If this code is on a hot path for wide_int then maybe we should simply require that wide_int operations involving trees have explicit extensions, like in rtl. (I.e. only do the implicit extension for addr_wide_int and max_wide_int.) i do not understand how this is true. I'm not opposed to going back to a separate scratch array if all else fails, but it's really ugly, so I'd like to look at the alternatives first... Thanks, Richard
Re: patch to canonize unsigned tree-csts
On 10/15/2013 02:30 PM, Richard Sandiford wrote: Richard Sandiford writes: if (small_prec) ; else if (precision == xprecision) while (len >= 0 && val[len - 1] == -1) len--; Err, len > 0 obviously. you were only close.patch tested on ppc and committed as revision 203739. Index: gcc/tree.c === --- gcc/tree.c (revision 203701) +++ gcc/tree.c (working copy) @@ -1204,7 +1204,7 @@ wide_int_to_tree (tree type, const wide_ } wide_int cst = wide_int::from (pcst, prec, sgn); - unsigned int len = int (cst.get_len ()); + unsigned int len = cst.get_len (); unsigned int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1); bool recanonize = sgn == UNSIGNED && small_prec Index: gcc/tree.h === --- gcc/tree.h (revision 203701) +++ gcc/tree.h (working copy) @@ -5204,13 +5204,13 @@ wi::int_traits ::decompose ( scratch[len - 1] = sext_hwi (val[len - 1], precision); return wi::storage_ref (scratch, len, precision); } - } - - if (precision < xprecision + HOST_BITS_PER_WIDE_INT) - { - len = wi::force_to_size (scratch, val, len, xprecision, precision, UNS IGNED); - return wi::storage_ref (scratch, len, precision); - } + } + /* We have to futz here because a large unsigned int with +precision 128 may look (0x0 0x 0xF...) as a +tree-cst and as (0xF...) as a wide-int. */ + else if (precision == xprecision && len == max_len) +while (len > 1 && val[len - 1] == (HOST_WIDE_INT)-1) + len--; } /* Signed and the rest of the unsigned cases are easy. */
Re: [wide-int] int_traits
On 10/17/2013 04:46 AM, Richard Biener wrote: the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid wide-int value if it has precision 16. AFAIK that is what the code produces, but now Kenny says this is only for some kind of wide-ints but not all? That is, why is The issue is not that the rules are different between the different flavors of wide int, it is that the circumstances are different. The rule is that the only bits above the precision that exist are if the precision is not an even multiple of the HBPWI. In that case, the bits are always an extension of the sign bits. max_wide_int and addr_wide_int have large enough precisions so that it is impossible to ever generate an unsigned number on the target that is large enough to ever run against the precision.However, for the fixed precision case, you can, and at least on the ppc, you do, generate unsigned numbers that are big enough to have to be recanonicalized like this. inline wi::storage_ref wi::int_traits ::decompose (HOST_WIDE_INT *scratch, unsigned int precision, const_tree x) { unsigned int len = TREE_INT_CST_NUNITS (x); const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) &TREE_INT_CST_ELT (x, 0); return wi::storage_ref (val, len, precision); } not a valid implementation together with making sure that the INTEGER_CST tree rep has that extra word of zeros if required? I would like to see us move in that direction (given that the tree rep of INTEGER_CST has transitioned to variable-length already). Btw, code such as tree wide_int_to_tree (tree type, const wide_int_ref &pcst) { ... unsigned int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1); bool recanonize = sgn == UNSIGNED && small_prec && (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len; definitely needs a comment. I would have thought _all_ unsigned numbers need re-canonicalization. Well, maybe only if we're forcing the extra zeros. It will get a comment. [This function shows another optimization issue: case BOOLEAN_TYPE: /* Cache false or true. */ limit = 2; if (wi::leu_p (cst, 1)) ix = cst.to_uhwi (); I would have expected cst <= 1 be optimized to cst.len == 1 && cst.val[0] <= 1. It expands to : MEM[(long int *)&D.50698 + 16B] = 1; MEM[(struct wide_int_ref_storage *)&D.50698] = &MEM[(struct wide_int_ref_storage *)&D.50698].scratch; MEM[(struct wide_int_ref_storage *)&D.50698 + 8B] = 1; MEM[(struct wide_int_ref_storage *)&D.50698 + 12B] = 32; _277 = MEM[(const struct wide_int_storage *)&cst + 260B]; if (_277 <= 64) goto ; else goto ; : xl_491 = zext_hwi (1, 32); // ok, checking enabled and thus out-of-line _494 = MEM[(const long int *)&cst]; _495 = (long unsigned int) _494; yl_496 = zext_hwi (_495, _277); _497 = xl_491 < yl_496; goto ; : _503 = wi::ltu_p_large (&MEM[(struct wide_int_ref_storage *)&D.50698].scratch, 1, 32, &MEM[(const struct wide_int_storage *)&cst].val, len_274, _277); this keeps D.50698 and cst un-SRAable - inline storage is problematic for this reason. But the representation should guarantee the compare with a low precision (32 bit) constant is evaluatable at compile-time if len of the larger value is > 1, no? : # _504 = PHI <_497(42), _503(43)> D.50698 ={v} {CLOBBER}; if (_504 != 0) goto ; else goto ; : pretmp_563 = MEM[(const struct wide_int_storage *)&cst + 256B]; goto (); : _65 = generic_wide_int::to_uhwi (&cst, 0); ix_66 = (int) _65; goto ; The question is whether we should try to optimize wide-int for such cases or simply not use wi:leu_p (cst, 1) but rather if (cst.fits_uhwi_p () == 1 && cst.to_uhwi () < 1) ? i find this ugly, but i see where you are coming from. The problem is that both you and i know that the len has to be 1, but the optimizer does not. This is a case where I think that we made a mistake getting rid of the wi::one_p, wi::zero_p and wi::minus_one_p. The internals of one_p were return (len == 1 && val[0] ==1) and i think that is much nicer than what you put there. On the other hand, it seem that a person more skilled than i am with c++ could specialize the comparisons with an integer constant, since i believe that that constant must fit in one hwi (I am a little concerned about large unsigned constants). Thanks, Richard
Re: [wide-int] int_traits
On 10/17/2013 07:49 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Kenneth Zadeck wrote: On 10/17/2013 04:46 AM, Richard Biener wrote: the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid wide-int value if it has precision 16. AFAIK that is what the code produces, but now Kenny says this is only for some kind of wide-ints but not all? That is, why is The issue is not that the rules are different between the different flavors of wide int, it is that the circumstances are different. The rule is that the only bits above the precision that exist are if the precision is not an even multiple of the HBPWI. In that case, the bits are always an extension of the sign bits. max_wide_int and addr_wide_int have large enough precisions so that it is impossible to ever generate an unsigned number on the target that is large enough to ever run against the precision.However, for the fixed precision case, you can, and at least on the ppc, you do, generate unsigned numbers that are big enough to have to be recanonicalized like this. inline wi::storage_ref wi::int_traits ::decompose (HOST_WIDE_INT *scratch, unsigned int precision, const_tree x) { unsigned int len = TREE_INT_CST_NUNITS (x); const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) &TREE_INT_CST_ELT (x, 0); return wi::storage_ref (val, len, precision); } not a valid implementation together with making sure that the INTEGER_CST tree rep has that extra word of zeros if required? I would like to see us move in that direction (given that the tree rep of INTEGER_CST has transitioned to variable-length already). Btw, code such as tree wide_int_to_tree (tree type, const wide_int_ref &pcst) { ... unsigned int small_prec = prec & (HOST_BITS_PER_WIDE_INT - 1); bool recanonize = sgn == UNSIGNED && small_prec && (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len; definitely needs a comment. I would have thought _all_ unsigned numbers need re-canonicalization. Well, maybe only if we're forcing the extra zeros. It will get a comment. [This function shows another optimization issue: case BOOLEAN_TYPE: /* Cache false or true. */ limit = 2; if (wi::leu_p (cst, 1)) ix = cst.to_uhwi (); I would have expected cst <= 1 be optimized to cst.len == 1 && cst.val[0] <= 1. It expands to : MEM[(long int *)&D.50698 + 16B] = 1; MEM[(struct wide_int_ref_storage *)&D.50698] = &MEM[(struct wide_int_ref_storage *)&D.50698].scratch; MEM[(struct wide_int_ref_storage *)&D.50698 + 8B] = 1; MEM[(struct wide_int_ref_storage *)&D.50698 + 12B] = 32; _277 = MEM[(const struct wide_int_storage *)&cst + 260B]; if (_277 <= 64) goto ; else goto ; : xl_491 = zext_hwi (1, 32); // ok, checking enabled and thus out-of-line _494 = MEM[(const long int *)&cst]; _495 = (long unsigned int) _494; yl_496 = zext_hwi (_495, _277); _497 = xl_491 < yl_496; goto ; : _503 = wi::ltu_p_large (&MEM[(struct wide_int_ref_storage *)&D.50698].scratch, 1, 32, &MEM[(const struct wide_int_storage *)&cst].val, len_274, _277); this keeps D.50698 and cst un-SRAable - inline storage is problematic for this reason. But the representation should guarantee the compare with a low precision (32 bit) constant is evaluatable at compile-time if len of the larger value is > 1, no? : # _504 = PHI <_497(42), _503(43)> D.50698 ={v} {CLOBBER}; if (_504 != 0) goto ; else goto ; : pretmp_563 = MEM[(const struct wide_int_storage *)&cst + 256B]; goto (); : _65 = generic_wide_int::to_uhwi (&cst, 0); ix_66 = (int) _65; goto ; The question is whether we should try to optimize wide-int for such cases or simply not use wi:leu_p (cst, 1) but rather if (cst.fits_uhwi_p () == 1 && cst.to_uhwi () < 1) ? i find this ugly, but i see where you are coming from. The problem is that both you and i know that the len has to be 1, but the optimizer does not. This is a case where I think that we made a mistake getting rid of the wi::one_p, wi::zero_p and wi::minus_one_p. The internals of one_p were return (len == 1 && val[0] ==1) and i think that is much nicer than what you put there. On the other hand, it seem that a person more skilled than i am with c++ could specia
Re: [wide-int] int_traits
On 10/17/2013 08:29 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener writes: The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It means that we can now use the tree's HWI array directly, without any copying, for addr_wide_int and max_wide_int. The only part of decompose () that does a copy is the small_prec case, which is trivially compiled out for addr_wide_int and max_wide_int. " 2) addr_wide_int. This is a fixed size representation that is guaranteed to be large enough to compute any bit or byte sized address calculation on the target. Currently the value is 64 + 4 bits rounded up to the next number even multiple of HOST_BITS_PER_WIDE_INT (but this can be changed when the first port needs more than 64 bits for the size of a pointer). This flavor can be used for all address math on the target. In this representation, the values are sign or zero extended based on their input types to the internal precision. All math is done in this precision and then the values are truncated to fit in the result type. Unlike most gimple or rtl intermediate code, it is not useful to perform the address arithmetic at the same precision in which the operands are represented because there has been no effort by the front ends to convert most addressing arithmetic to canonical types. In the addr_wide_int, all numbers are represented as signed numbers. There are enough bits in the internal representation so that no infomation is lost by representing them this way." so I guess from that that addr_wide_int.get_precision is always that "64 + 4 rounded up". Thus decompose gets that constant precision input and the extra zeros make the necessary extension always a no-op. Aha. it is until someone comes up with a port that this will not work for, then they will have to add some machinery to sniff the port and make this bigger.I am hoping to be retired by the time this happens. For max_wide_int the same rules apply, just its size is larger. Ok. So the reps are only canonical wide-int because we only ever use them with precision > xprecision (maybe we should assert that). It is now asserted for (as of a few days ago). Btw, we are not using them directly, but every time we actually build a addr_wide_int / max_wide_int we copy them anyway: /* Initialize the storage from integer X, in precision N. */ template template inline fixed_wide_int_storage ::fixed_wide_int_storage (const T &x) { /* Check for type compatibility. We don't want to initialize a fixed-width integer from something like a wide_int. */ WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED; wide_int_ref xi (x, N); len = xi.len; for (unsigned int i = 0; i < len; ++i) val[i] = xi.val[i]; } it avoids a 2nd copy though, which shows nicely what was rummaging in my head for the last two days - that the int_trais <> abstraction was somehow at the wrong level - it should have been traits that are specific to the storage model? or the above should use int_traits<>::decompose manually with it always doing the copy (that would also optimize away one copy and eventually would make the extra zeros not necessary). this came in with richard's storage manager patch.In my older code, we tried and succeeded many times to just borrow the underlying rep. I think that richard needs to work this out. I originally thought that extra zeros get rid of all copying from trees to all wide-int kinds. What's the reason again to not use my original proposed encoding of the MSB being the sign bit? RTL constants simply are all signed then. Just you have to also sign-extend in functions like lts_p as not all constants are sign-extended. But we can use both tree (with the now appended zero) and RTL constants representation unchanged. I am not following you here. In trees, the msb is effectively a sign bit, even for unsigned numbers because we add that extra block. but inside of wide int, we do not add extra blocks beyond the precision. That would be messy for a lot of other reasons.
Re: [wide-int] int_traits
On 10/17/2013 07:30 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener writes: On Thu, 17 Oct 2013, Richard Sandiford wrote: Kenneth Zadeck writes: As mentioned in my message yesterday, I thought your new way of canonising unsigned tree constants meant that there was always an upper zero bit. Is that right? i believe this is correct. If so, xprecision < precision is a no-op, because the number always has the right form for wider precisions. The only difficult case is xprecision == precision, since then we need to peel off any upper -1 HWIs. say my HWI size is 8 bits (just to keep from typing a million 'f's. if i have a 16 bit unsigned number that is all 1s in the tree world it is 3 hwis 0x00 0xff 0xff. but inside regular wide int, it would take 1 wide int whose value is 0xff. inside of max it would be the same as the tree, but then the test precision < xprecision + hbpwi never kicks in because precision is guaranteed to be huge. inside of addr_wide_int, i think we tank with the assertion. It should be OK for addr_wide_int too. The precision still fits 2 HWIs. The initial length is greater than the maximum length of an addr_wide_int, but your "len = MAX (len, max_len)" deals with that. It's len = MIN (len, max_len) Oops, yeah, I meant MIN, sorry. which looked suspicious to me, but with precision >= xprecision precision can only be zero if xprecision is zero which looked to me like it cannot happen - or rather it should be fixed. Despite the comment above the code, I don't think this MIN is there for the zero-precision case. I think it's there to handle the new tree representation. The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It was changed because you asked me to change it. However, you end up with special cases either way. the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid wide-int value if it has precision 16. No, for a 16-bit wide_int it should be 0xff. 0x00 0xff 0xff is correct for any wide_int wider than 16 bits though. AFAIK that is what the code produces, In which case? This is: precision == 16 xprecision == 16 len == 3 max_len == 2 The MIN trims the len to 2 and then the loop Kenny added trims it again to 1, so the "0x00 0xff 0xff" becomes "0xff". The "0x00 0xff" is still there in the array, but not used. but now Kenny says this is only for some kind of wide-ints but not all? That is, why is inline wi::storage_ref wi::int_traits ::decompose (HOST_WIDE_INT *scratch, unsigned int precision, const_tree x) { unsigned int len = TREE_INT_CST_NUNITS (x); const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) &TREE_INT_CST_ELT (x, 0); return wi::storage_ref (val, len, precision); } not a valid implementation together with making sure that the INTEGER_CST tree rep has that extra word of zeros if required? The fundamental problem here is that we're trying to support two cases: (a) doing N-bit arithemtic in cases where the inputs have N bits (b) doing N-bit arithmetic in cases where the inputs have fewer than N bits and are extended according to TYPE_SIGN. Let's assume 32-bit HWIs. The 16-bit (4-hex-digit) constant 0x8000 is 0x8000 regardless of whether the type is signed or unsigned. But if it's extended to 32-bits you get two different numbers 0x8000 and 0x8000, depending on the sign. So for one value of the "precision" parameter (i.e. xprecision), signed and unsigned constants produce the same number. But for another value of the "precision" parameter (those greater than xprecision), signed and unsigned constants produce different numbers. Yet at the moment the tree constant has a single representation. But a correctly extended one, up to its len! (as opposed to RTL) So I think the possibilities are: (1) Use the represent
Re: [wide-int] int_traits
On 10/17/2013 09:16 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Kenneth Zadeck wrote: On 10/17/2013 08:29 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener writes: The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It means that we can now use the tree's HWI array directly, without any copying, for addr_wide_int and max_wide_int. The only part of decompose () that does a copy is the small_prec case, which is trivially compiled out for addr_wide_int and max_wide_int. " 2) addr_wide_int. This is a fixed size representation that is guaranteed to be large enough to compute any bit or byte sized address calculation on the target. Currently the value is 64 + 4 bits rounded up to the next number even multiple of HOST_BITS_PER_WIDE_INT (but this can be changed when the first port needs more than 64 bits for the size of a pointer). This flavor can be used for all address math on the target. In this representation, the values are sign or zero extended based on their input types to the internal precision. All math is done in this precision and then the values are truncated to fit in the result type. Unlike most gimple or rtl intermediate code, it is not useful to perform the address arithmetic at the same precision in which the operands are represented because there has been no effort by the front ends to convert most addressing arithmetic to canonical types. In the addr_wide_int, all numbers are represented as signed numbers. There are enough bits in the internal representation so that no infomation is lost by representing them this way." so I guess from that that addr_wide_int.get_precision is always that "64 + 4 rounded up". Thus decompose gets that constant precision input and the extra zeros make the necessary extension always a no-op. Aha. it is until someone comes up with a port that this will not work for, then they will have to add some machinery to sniff the port and make this bigger. I am hoping to be retired by the time this happens. For max_wide_int the same rules apply, just its size is larger. Ok. So the reps are only canonical wide-int because we only ever use them with precision > xprecision (maybe we should assert that). It is now asserted for (as of a few days ago). Btw, we are not using them directly, but every time we actually build a addr_wide_int / max_wide_int we copy them anyway: /* Initialize the storage from integer X, in precision N. */ template template inline fixed_wide_int_storage ::fixed_wide_int_storage (const T &x) { /* Check for type compatibility. We don't want to initialize a fixed-width integer from something like a wide_int. */ WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED; wide_int_ref xi (x, N); len = xi.len; for (unsigned int i = 0; i < len; ++i) val[i] = xi.val[i]; } it avoids a 2nd copy though, which shows nicely what was rummaging in my head for the last two days - that the int_trais <> abstraction was somehow at the wrong level - it should have been traits that are specific to the storage model? or the above should use int_traits<>::decompose manually with it always doing the copy (that would also optimize away one copy and eventually would make the extra zeros not necessary). this came in with richard's storage manager patch.In my older code, we tried and succeeded many times to just borrow the underlying rep.I think that richard needs to work this out. I originally thought that extra zeros get rid of all copying from trees to all wide-int kinds. What's the reason again to not use my original proposed encoding of the MSB being the sign bit? RTL constants simply are all signed then. Just you have to also sign-extend in functions like lts_p as not all constants are sign-extended. But we can use both tree (with the now appended zero) and RTL constants representation unchanged. I am not following you here. In trees, the msb is effectively a sign bit, even for unsigned numbers because we add that extra block. but inside of wide int, we do not add extra blocks beyond the precision. That would be messy for a lot of other reasons. Can you elaborate? It would make tree and RTX reps directly usable, only wide-int-to-tr
Re: [wide-int] int_traits
On 10/17/2013 09:48 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener writes: On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener writes: The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It means that we can now use the tree's HWI array directly, without any copying, for addr_wide_int and max_wide_int. The only part of decompose () that does a copy is the small_prec case, which is trivially compiled out for addr_wide_int and max_wide_int. " 2) addr_wide_int. This is a fixed size representation that is guaranteed to be large enough to compute any bit or byte sized address calculation on the target. Currently the value is 64 + 4 bits rounded up to the next number even multiple of HOST_BITS_PER_WIDE_INT (but this can be changed when the first port needs more than 64 bits for the size of a pointer). This flavor can be used for all address math on the target. In this representation, the values are sign or zero extended based on their input types to the internal precision. All math is done in this precision and then the values are truncated to fit in the result type. Unlike most gimple or rtl intermediate code, it is not useful to perform the address arithmetic at the same precision in which the operands are represented because there has been no effort by the front ends to convert most addressing arithmetic to canonical types. In the addr_wide_int, all numbers are represented as signed numbers. There are enough bits in the internal representation so that no infomation is lost by representing them this way." so I guess from that that addr_wide_int.get_precision is always that "64 + 4 rounded up". Thus decompose gets that constant precision input and the extra zeros make the necessary extension always a no-op. Aha. For max_wide_int the same rules apply, just its size is larger. Ok. So the reps are only canonical wide-int because we only ever use them with precision > xprecision (maybe we should assert that). No, we allow precision == xprecision for addr_wide_int and max_wide_int too. But all we do in that case is trim the length. Requiring precision > xprecision was option (5) from my message. Btw, we are not using them directly, but every time we actually build a addr_wide_int / max_wide_int we copy them anyway: /* Initialize the storage from integer X, in precision N. */ template template inline fixed_wide_int_storage ::fixed_wide_int_storage (const T &x) { /* Check for type compatibility. We don't want to initialize a fixed-width integer from something like a wide_int. */ WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED; wide_int_ref xi (x, N); len = xi.len; for (unsigned int i = 0; i < len; ++i) val[i] = xi.val[i]; } Are you saying that: max_wide_int x = (tree) y; should just copy a pointer? Then what about: No, it should do a copy. But only one - which it does now with the append-extra-zeros-in-tree-rep, I was just thinking how to avoid it when not doing that which means adjusting the rep in the copy we need to do anyway. If fixed_wide_int_storage is really the only reason to enlarge the tree rep. max_wide_int z = x; Should that just copy a pointer too, or is it OK for the second constructor to do a copy? In most cases we only use the fixed_wide_int to store the result of the computation. We don't use the above constructor for things like: max_wide_int x = wi::add ((tree) y, (int) z); wi::add uses y and z without going through max_wide_int. Yes, I am aware of that. What's the reason again to not use my original proposed encoding of the MSB being the sign bit? RTL constants simply are all signed then. Just you have to also sign-extend in functions like lts_p as not all constants are sign-extended. But we can use both tree (with the now appended zero) and RTL constants representation unchanged. The answer's the same as always. RTL constants don't have a sign. Any time we extend an RTL constant, we need to say whether the extension should be sign extension or zero extension. So the natural model for RTL is that an SImode addition is done to SImode width, not some arbitrary wider width. RTL constants are sign-extended (whether you call them then "signed" is up to you). They have a precision. This is how they a
Re: [wide-int] int_traits
On 10/17/2013 10:05 AM, Richard Sandiford wrote: Richard Biener writes: What's the reason again to not use my original proposed encoding of the MSB being the sign bit? RTL constants simply are all signed then. Just you have to also sign-extend in functions like lts_p as not all constants are sign-extended. But we can use both tree (with the now appended zero) and RTL constants representation unchanged. The answer's the same as always. RTL constants don't have a sign. Any time we extend an RTL constant, we need to say whether the extension should be sign extension or zero extension. So the natural model for RTL is that an SImode addition is done to SImode width, not some arbitrary wider width. RTL constants are sign-extended (whether you call them then "signed" is up to you). They have a precision. This is how they are valid reps for wide-ints, and that doesn't change. I was saying that if we make not _all_ wide-ints sign-extended then we can use the tree rep as-is. We'd then have the wide-int rep being either zero or sign extended but not arbitrary random bits outside of the precision (same as the tree rep). OK, but does that have any practical value over leaving them arbitrary, as in Kenny's original implementation? Saying that upper bits can be either signs or zeros means that readers wouldn't be able to rely on one particular extension (so would have to do the work that they did in Kenny's implementation). At the same time it means that writers can't leave the upper bits in an arbitrary state, so would have to make sure that they are signs or zeros (presumably having a free choice of which). Thanks, Richard not so easy here.We would still need to clean things up when we convert back to tree-cst or rtl at least for the short precisions. Otherwise, the parts of the compiler that look inside of trees, in particular the hash functions have to clean the value them selves. As much as my arm still hurts from richi forcing me to change this, it is true that it seems to have been the right thing to do. the change for the wider unsigned trees perhaps should be backed out. kenny
Re: [wide-int] int_traits
On 10/18/2013 04:45 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Mike Stump wrote: On Oct 17, 2013, at 1:46 AM, Richard Biener wrote: [This function shows another optimization issue: case BOOLEAN_TYPE: /* Cache false or true. */ limit = 2; if (wi::leu_p (cst, 1)) ix = cst.to_uhwi (); I would have expected cst <= 1 be optimized to cst.len == 1 && cst.val[0] <= 1. So lts_p has the same problem, and is used just below. If we do: @@ -1461,12 +1470,22 @@ wi::ne_p (const T1 &x, const T2 &y) inline bool wi::lts_p (const wide_int_ref &x, const wide_int_ref &y) { - if (x.precision <= HOST_BITS_PER_WIDE_INT - && y.precision <= HOST_BITS_PER_WIDE_INT) -return x.slow () < y.slow (); - else -return lts_p_large (x.val, x.len, x.precision, y.val, y.len, - y.precision); + // We optimize w < N, where N is 64 or fewer bits. + if (y.precision <= HOST_BITS_PER_WIDE_INT) +{ + // If it fits directly into a shwi, we can compare directly. + if (wi::fits_shwi_p (x)) + return x.slow () < y.slow (); + // If it is doesn't fit, then it must be more negative than any + // value in y, and hence smaller. + if (neg_p (x, SIGNED)) + return true; + // If it is positve, then it must be larger than any value in y, + // and hence greater than. + return false; +} + return lts_p_large (x.val, x.len, x.precision, y.val, y.len, + y.precision); } then for: bool MGEN(wide_int cst) { return wi::lts_p (cst, 100); } we generate: movl520(%rsp), %eax cmpl$1, %eax je .L5283 subl$1, %eax movq8(%rsp,%rax,8), %rax shrq$63, %rax ret .p2align 4,,10 .p2align 3 .L5283: cmpq$99, 8(%rsp) setle %al ret which as you can see, never calls lts_p_large, and hence, if that was the only reason the storage escapes, then it should be able to optimize better. In the above code, minimal, no function calls, and the 100 is propagated all the way down into the cmpq. Now, the reason why we should do it this way, most of the time, most types are 64 bits or less. When that is true, then it will never call into lts_p_large. If the above 100 is changed to l, from a parameter long l, then the code is the same except the last part is: cmpq8(%rsp), %rdi setg%al ret If we have two wide_int's, then the code does this: .cfi_startproc subq$8, %rsp .cfi_def_cfa_offset 16 movl1052(%rsp), %r9d movl1048(%rsp), %r8d movl532(%rsp), %edx movl528(%rsp), %esi cmpl$64, %r9d ja .L5281 cmpl$1, %esi je .L5284 subl$1, %esi movq16(%rsp,%rsi,8), %rax addq$8, %rsp .cfi_remember_state .cfi_def_cfa_offset 8 shrq$63, %rax ret .p2align 4,,10 .p2align 3 .L5281: .cfi_restore_state leaq536(%rsp), %rcx leaq16(%rsp), %rdi call_ZN2wi11lts_p_largeEPKljjS1_jj addq$8, %rsp .cfi_remember_state .cfi_def_cfa_offset 8 ret .p2align 4,,10 .p2align 3 .L5284: .cfi_restore_state movq536(%rsp), %rax cmpq%rax, 16(%rsp) setl%al addq$8, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc which is still pretty nice. Does this look ok? Kenny, can you double check the cases, think I have them right, but? a double check would be good. That works for me. i talked to mike about this last night, but did not follow up with an email until now. The problem is that this code is wrong!!! He is working to fix that and so i would expect something from him later (tomorrow for those in europe).The problem is if the variable that comes in is a unsigned HWI that has to the top bit set.In that case, the wide-int canonicalized version would have two hwi's, the top one being a 0 block and so you cannot do the fast path test coming in on that even though it's precison fits. the representation should guarantee the compare with a low precision (32 bit) constant is evaluatable at compile-time if len of the larger value is > 1, no? I agree, though, I didn't check the symmetric case, as constants and smaller values can be put on the right by convention. The question is whether we should try to optimize wide-int for such cases or simply not use wi:leu_p (cst, 1) but rather if (cst.fits_uhwi_p () == 1 && cst.to_uhwi () < 1) ? Since we can do an excellent job with the simple interface, I'd argue for the simple interface and the simple change to do the conditional better. I'd rather up the ante on meta programming to get any performance we want, retaining the s
Re: [wide-int] int_traits
Richi, Do you want me to back out the patch that changes the rep for unsigned tree-csts? kenny
Re: [wide-int] int_traits
On 10/19/2013 05:01 AM, Richard Sandiford wrote: Mike Stump writes: + // We optimize x < y, where y is 64 or fewer bits. + // We have to be careful to not allow comparison to a large positive + // unsigned value like 0x8000, those would be encoded + // with a y.len == 2. + if (y.precision <= HOST_BITS_PER_WIDE_INT + && y.len == 1) I don't get this. If y.precision <= HOST_BITS_PER_WIDE_INT then y.len must be 1. I realise that tree constants can be stored with TREE_INT_CST_NUNITS > TYPE_PRECISION / HOST_BITS_PER_WIDE_INT (so that extensions beyond TYPE_PRECISION are free). But the wide-int code is shielded from that by the ::decompose routine. A wide int never has len > precision / HOST_BITS_PER_WIDE_INT. Thanks, Richard I think that part of this is that neither mike or i really understand how this stuff works anymore. in the old version where we had precision 0, we would wait to canonicalize a constant or a simple integer until we saw what the precision of the other operand was. That was what precison 0 meant. it was kind of like what you are now proposing with this new trait, but for the reason that we actually did not know what to do than some concern about escapement. What i do not understand is what happens what do you get when you bring in an integer variable that is an unsigned HOST_WIDE_INT with the top bit set. In the precision 0 days, if the prec of the other side was 64 or less, the incoming integer took 1 hwi and if the precision was larger, it took two hwis. The canonicalization happened late enough so that there was never a question. Here we are trying to do this at compile time to avoid the escape. This is why my emails about this have continued to talk about the unsigned HOST_WIDE_INT as a boundary case.It is clear, that if the value is a constant, then you should be able to see at compile time if the top bit is set, but if the value is a simple integer variable, you should still be able to do the non escaping test as long as the type is signed HOST_WIDE_INT or smaller. I think that mike certainly has not captured this yet. But those are the issues as i see them.
Re: [wide-int] Go back to having undefined exccess bits on read
On 10/19/2013 02:41 PM, Mike Stump wrote: On Oct 19, 2013, at 7:22 AM, Richard Sandiford wrote: As discussed, this patch effectively goes back to your original idea of having excess upper bits in a HWI being undefined on read (at least as the default assumption). wide_int itself still ensures that the excess bits are stored as signs though. Or we could decide that it isn't worth the hassle and just leave excess upper bits as undefined on write too, which really is going back to your original model. :-) I don't get it. If the bits are undefined upon read, then, they should be undefined upon write. It doesn't make any sense to me to have them be undefined upon read and defined upon write. To me, one describes the semantics of the data, then once that is done, all else falls out from that. If no one is allowed to deviate their behavior upon a bit (that bit is don't care), then, trivially reads need to zap it out, and writes can write anything into it. no, they should not be undefined on write.if they are undefined on write, then it makes tree_fits* (the successor functions from integer_hostp) much more complex. kenny
Re: [wide-int] int_traits
On 10/19/2013 10:18 AM, Richard Sandiford wrote: Kenneth Zadeck writes: On 10/19/2013 05:01 AM, Richard Sandiford wrote: Mike Stump writes: + // We optimize x < y, where y is 64 or fewer bits. + // We have to be careful to not allow comparison to a large positive + // unsigned value like 0x8000, those would be encoded + // with a y.len == 2. + if (y.precision <= HOST_BITS_PER_WIDE_INT + && y.len == 1) I don't get this. If y.precision <= HOST_BITS_PER_WIDE_INT then y.len must be 1. I realise that tree constants can be stored with TREE_INT_CST_NUNITS > TYPE_PRECISION / HOST_BITS_PER_WIDE_INT (so that extensions beyond TYPE_PRECISION are free). But the wide-int code is shielded from that by the ::decompose routine. A wide int never has len > precision / HOST_BITS_PER_WIDE_INT. Thanks, Richard I think that part of this is that neither mike or i really understand how this stuff works anymore. in the old version where we had precision 0, we would wait to canonicalize a constant or a simple integer until we saw what the precision of the other operand was. That was what precison 0 meant. it was kind of like what you are now proposing with this new trait, but for the reason that we actually did not know what to do than some concern about escapement. What i do not understand is what happens what do you get when you bring in an integer variable that is an unsigned HOST_WIDE_INT with the top bit set. In the precision 0 days, if the prec of the other side was 64 or less, the incoming integer took 1 hwi and if the precision was larger, it took two hwis. The canonicalization happened late enough so that there was never a question. Ah, I think I know what you mean. The original implementation was: template static inline const HOST_WIDE_INT* to_shwi1 (HOST_WIDE_INT *s, unsigned int *l, unsigned int *p, const T& x) { s[0] = x; if (signedp(x) || sizeof (T) < sizeof (HOST_WIDE_INT) || ! top_bit_set (x)) { *l = 1; } else { s[1] = 0; *l = 2; } *p = 0; return s; } void wide_int_ro::check_precision (unsigned int *p1, unsigned int *p2, bool check_equal ATTRIBUTE_UNUSED, bool check_zero ATTRIBUTE_UNUSED) { gcc_checking_assert ((!check_zero) || *p1 != 0 || *p2 != 0); if (*p1 == 0) *p1 = *p2; if (*p2 == 0) *p2 = *p1; gcc_checking_assert ((!check_equal) || *p1 == *p2); } /* Return true if C1 < C2 using signed comparisons. */ template static inline bool lts_p (const T1 &c1, const T2 &c2) { bool result; HOST_WIDE_INT ws1[WIDE_INT_MAX_ELTS]; HOST_WIDE_INT ws2[WIDE_INT_MAX_ELTS]; const HOST_WIDE_INT *s1, *s2; /* Returned data */ unsigned int cl1, cl2; /* array lengths */ unsigned int p1, p2; /* precisions */ s1 = to_shwi1 (ws1, &cl1, &p1, c1); s2 = to_shwi1 (ws2, &cl2, &p2, c2); check_precision (&p1, &p2, false, true); if (p1 <= HOST_BITS_PER_WIDE_INT && p2 <= HOST_BITS_PER_WIDE_INT) { HOST_WIDE_INT x0 = sext_hwi (s1[0], p1); HOST_WIDE_INT x1 = sext_hwi (s2[0], p2); result = x0 < x1; } else result = lts_p_large (s1, cl1, p1, s2, cl2, p2); #ifdef DEBUG_WIDE_INT debug_vaa ("wide_int_ro:: %d = (%s lts_p %s\n", result, s1, cl1, p1, s2, cl2, p2); #endif return result; } you need to be careful about asserting too much from the old code. the time line was: 1) we developed the stuff on x86-64 2) you did your patch 3) we ported everything to ppc and our private port. i really only became very sensitive to this issue during step 3 because the x86 does not exhibit these bugs. So if you had a 128-bit wide_int and T == unsigned HOST_WIDE_INT, this lts_p would zero-extend the unsigned HOST_WIDE_INT to 128 bits and then do a signed comparison. The thing here is that the "check_equal" argument is false. So if instead you were comparing a 128-bit wide_int with a 64-bit tree constant, lts_p would treat that tree constant as a signed 64-bit number, even if it was TYPE_UNSIGNED. Similarly if you were comparing a 128-bit tree constant and a 64-bit tree constant. You also allowed a comparison of a 128-bit wide_int with a 64-bit rtx, again treating the 64-bit rtx as signed. I do not think that this is what check_equal meant because the 0 precision was a wild card. The 0 precision allowed the values to come in from simple vars and constants and be converted on the fly. However, as i said above, i am not sure how well this worked since the testing for wide stuff was so thin. So when doing the wi:: conversion, I'd interpreted the desired semantics for lts_p as being "treat bot
Re: [wide-int] Reduce the size of the scratch arrays
On 10/20/2013 06:30 AM, Richard Sandiford wrote: If yesterday's patch goes in, we'll only need the scratch array for HWI inputs. We can therefore shrink it to 2 HWIs. Tested on x86_64-linux-gnu. OK for wide-int? Thanks, Richard Index: gcc/wide-int.h === --- gcc/wide-int.h 2013-10-19 20:08:15.571987579 +0100 +++ gcc/wide-int.h 2013-10-19 20:09:12.299436972 +0100 @@ -778,7 +778,7 @@ struct wide_int_ref_storage : public wi: private: /* Scratch space that can be used when decomposing the original integer. It must live as long as this object. */ - HOST_WIDE_INT scratch[WIDE_INT_MAX_ELTS]; + HOST_WIDE_INT scratch[2]; public: template we now allow partial int precisions that are any size. this will break that. kenny
Re: [wide-int] Make trees more like rtxes
On 10/23/2013 08:13 AM, Richard Biener wrote: On Wed, 23 Oct 2013, Richard Sandiford wrote: Richard Biener writes: The patch does that by adding: wi::address (t) for when we want to extend tree t to addr_wide_int precision and: wi::extend (t) for when we want to extend it to max_wide_int precision. (Better names welcome.) These act just like addr_wide_int (t) and max_wide_int (t) would on current sources, except that they use the tree representation directly, so there's no copying. Good. Better names - ah well, wi::to_max_wide_int (t) and wi::to_addr_wide_int (t)? Btw, "addr_wide_int" is an odd name as it has at least the precision of the maximum _bit_ offset possible, right? So more like [bit_]offset_wide_int? Or just [bit_]offset_int? And then wi::to_offset (t) and wi::to_max (t)? offset_int, max_int, wi::to_offset and wi::to_max sound OK to me. Kenny? Mike? Most of the patch is mechanical and many of the "wi::address (...)"s and "wi::extend (...)"s reinstate "addr_wide_int (...)"s and "max_wide_int (...)"s from the initial implementation. Sorry for the run-around on this. One change I'd like to point out though is: @@ -7287,7 +7287,9 @@ native_encode_int (const_tree expr, unsi for (byte = 0; byte < total_bytes; byte++) { int bitpos = byte * BITS_PER_UNIT; - value = wi::extract_uhwi (expr, bitpos, BITS_PER_UNIT); + /* Extend EXPR according to TYPE_SIGN if the precision isn't a whole +number of bytes. */ + value = wi::extract_uhwi (wi::extend (expr), bitpos, BITS_PER_UNIT); if (total_bytes > UNITS_PER_WORD) { I think this preserves the existing trunk behaviour but I wasn't sure whether it was supposed to work like that or whether upper bits should be zero. I think the upper bits are undefined, the trunk native_interpret_int does result = double_int::from_buffer (ptr, total_bytes); return double_int_to_tree (type, result); where the call to double_int_to_tree re-extends according to the types precision and sign. wide_int_to_tree doesn't though? This is native_encode_int rather than native_interpret_int though. Yes, I was looking at the matched interpret variant though to see what we do. the wide_int_to_tree really needs to canonicalize the value before making it into a tree. the calls to tree_fits*_p (the successor to host_integer_p) depend on this being clean. Otherwise, these functions will have to clean the short integers and they get called all over the place. AIUI it's used for VIEW_CONVERT_EXPRs, so I thought the upper bits might get used. Yeah, that might happen, but still relying on the upper bits in any way would be brittle here. Richard.
Re: [wide-int] More optimisations
On 10/29/2013 08:43 AM, Richard Sandiford wrote: Richard Biener writes: On Sun, 27 Oct 2013, Richard Sandiford wrote: This patch adds some more optimisations to the wi:: comparison functions. It uses the: #define CONSTANT(X) (__builtin_constant_p (X) && (X)) idiom that was mentioned before, except that I thought CONSTANT would be too easily confused with CONSTANT_P, so I went for CAN_TELL instead. Better names welcome. STATIC_CONSTANT_P similar to static_assert? Sounds good. The changes are: - Add a fast path to eq_p for when one of the inputs isn't sign-extended. This includes code to handle compile-time 0 specially. - Add the opposite optimisation to Mike's lts_p change, if we can tell at compile time that it applies. I think the cases Mike added should only be enabled when we can figure them out at compile-time, too. Well, the idea with most of these functions was to handle the simple "everything is one HWI" cases inline. For operations like addition this needs to be based on the precision, since that's the only cheap way of knowing whether the result is also a single HWI. But for logic operations like AND and OR, it can be based on whether the inputs are single HWIs, since single-HWI inputs give a single-HWI output. This catches more cases. lts_p is a bit like AND and OR in this respect. fits_shwi_p is the same as "len == 1", which is simple to check and is often true. If we restrict the optimisation to STATIC_CONSTANT_P we'll end up using the out-of-line comparison for all TREE_INT_CST_LTs, even the int-int ones that we currently handle inline. This has always been my way of looking at it. kenny Thanks, Richard
Re: [wide-int] Update main comment
On 10/29/2013 06:37 PM, Richard Sandiford wrote: This patch tries to update the main wide_int comment to reflect the current implementation. - bitsizetype is TImode on x86_64 and others, so I don't think it's necessarily true that all offset_ints are signed. (widest_int are though.) i am wondering if this is too conservative an interpretation.I believe that they are ti mode because that is the next thing after di mode and so they wanted to accommodate the 3 extra bits. Certainly there is no x86 that is able to address more than 64 bits. aside from that all of this looks OK to me. - As discussed in the early threads, I think the first reason for using widest_int is bogus. Extensions should be done in the sign of source regardless of which wide_int type you're using. Extending directly to another wide_int is fine. - offset_int now only contains the HWIs that it needs, rather than the full MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT. - offset_int and widest_int no longer store the precision. - Precision 0 is gone (both as a constant marker and for zero-width bitfields -- thanks Richard). Does it look OK? Thanks, Richard Index: gcc/wide-int.h === --- gcc/wide-int.h (revision 204183) +++ gcc/wide-int.h (working copy) @@ -70,28 +70,18 @@ been no effort by the front ends to convert most addressing arithmetic to canonical types. - In the offset_int, all numbers are represented as signed numbers. - There are enough bits in the internal representation so that no - infomation is lost by representing them this way. - 3) widest_int. This representation is an approximation of infinite precision math. However, it is not really infinite precision math as in the GMP library. It is really finite precision math where the precision is 4 times the size of the largest integer that the target port can represent. - Like the offset_ints, all numbers are inherently signed. + widest_int is supposed to be wider than any number that it needs to + store, meaning that there is always at least one leading sign bit. + All widest_int values are therefore signed. There are several places in the GCC where this should/must be used: - * Code that does widening conversions. The canonical way that - this is performed is to sign or zero extend the input value to - the max width based on the sign of the type of the source and - then to truncate that value to the target type. This is in - preference to using the sign of the target type to extend the - value directly (which gets the wrong value for the conversion - of large unsigned numbers to larger signed types). - * Code that does induction variable optimizations. This code works with induction variables of many different types at the same time. Because of this, it ends up doing many different @@ -122,17 +112,17 @@ two, the default is the prefered representation. All three flavors of wide_int are represented as a vector of - HOST_WIDE_INTs. The vector contains enough elements to hold a - value of MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT which is - a derived for each host/target combination. The values are stored - in the vector with the least significant HOST_BITS_PER_WIDE_INT - bits of the value stored in element 0. + HOST_WIDE_INTs. The default and widest_int vectors contain enough elements + to hold a value of MAX_BITSIZE_MODE_ANY_INT bits. offset_int contains only + enough elements to hold ADDR_MAX_PRECISION bits. The values are stored + in the vector with the least significant HOST_BITS_PER_WIDE_INT bits + in element 0. - A wide_int contains three fields: the vector (VAL), precision and a - length (LEN). The length is the number of HWIs needed to - represent the value. For the widest_int and the offset_int, - the precision is a constant that cannot be changed. For the - default wide_int, the precision is set from the constructor. + The default wide_int contains three fields: the vector (VAL), + the precision and a length (LEN). The length is the number of HWIs + needed to represent the value. widest_int and offset_int have a + constant precision that cannot be changed, so they only store the + VAL and LEN fields. Since most integers used in a compiler are small values, it is generally profitable to use a representation of the value that is @@ -143,75 +133,90 @@ as long as they can be reconstructed from the top bit that is being represented. + The precision and length of a wide_int are always greater than 0. + Any bits in a wide_int above the precision are sign-extended from the + most significant bit. For example, a 4-bit value 0x8 is represented as + VAL = { 0xf...fff8 }. However, as an o
Re: [wide-int] More optimisations
On 10/29/2013 08:43 AM, Richard Sandiford wrote: Richard Biener writes: On Sun, 27 Oct 2013, Richard Sandiford wrote: This patch adds some more optimisations to the wi:: comparison functions. It uses the: #define CONSTANT(X) (__builtin_constant_p (X) && (X)) idiom that was mentioned before, except that I thought CONSTANT would be too easily confused with CONSTANT_P, so I went for CAN_TELL instead. Better names welcome. STATIC_CONSTANT_P similar to static_assert? Sounds good. The changes are: - Add a fast path to eq_p for when one of the inputs isn't sign-extended. This includes code to handle compile-time 0 specially. - Add the opposite optimisation to Mike's lts_p change, if we can tell at compile time that it applies. I think the cases Mike added should only be enabled when we can figure them out at compile-time, too. Well, the idea with most of these functions was to handle the simple "everything is one HWI" cases inline. For operations like addition this needs to be based on the precision, since that's the only cheap way of knowing whether the result is also a single HWI. But for logic operations like AND and OR, it can be based on whether the inputs are single HWIs, since single-HWI inputs give a single-HWI output. This catches more cases. lts_p is a bit like AND and OR in this respect. fits_shwi_p is the same as "len == 1", which is simple to check and is often true. If we restrict the optimisation to STATIC_CONSTANT_P we'll end up using the out-of-line comparison for all TREE_INT_CST_LTs, even the int-int ones that we currently handle inline. Thanks, Richard yes, this is a very boring and mechanical patch and i found nothing to talk about. kenny