Richard Biener <[email protected]> writes:
> @@ -958,6 +961,12 @@ streamer_write_tree_header (struct output_block
> *ob, tree expr)
> streamer_write_uhwi (ob, BINFO_N_BASE_BINFOS (expr));
> else if (TREE_CODE (expr) == CALL_EXPR)
> streamer_write_uhwi (ob, call_expr_nargs (expr));
> + else if (CODE_CONTAINS_STRUCT (code, TS_INT_CST))
> + {
> + gcc_checking_assert (TREE_INT_CST_NUNITS (expr));
> + streamer_write_uhwi (ob, TREE_INT_CST_NUNITS (expr));
> + streamer_write_uhwi (ob, TREE_INT_CST_EXT_NUNITS (expr));
> + }
>
> isn't ext_nunits always either nunits or nunits + 1? So it should be
> sufficient to write a single bit in the tree_base bitpack and only
> write the nunits that are required for the actual allocation in
> write_tree_header, not both.
ext_nunits can be > nunits + 1. E.g. imagine a 256-bit all-1s unsigned
integer. As a 256-bit number it's simply { -1 }, with all other bits
being sign-extensions of the -1 HWI. As a >256-bit number it's
{ -1, -1, -1, -1, 0 }, assuming 64-bit HWIs.
> index 4a24c66..f3e0ffe 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -1141,15 +1142,7 @@ operand_less_p (tree val, tree val2)
> {
> /* LT is folded faster than GE and others. Inline the common case. */
> if (TREE_CODE (val) == INTEGER_CST && TREE_CODE (val2) == INTEGER_CST)
> - {
> - if (TYPE_UNSIGNED (TREE_TYPE (val)))
> - return INT_CST_LT_UNSIGNED (val, val2);
> - else
> - {
> - if (INT_CST_LT (val, val2))
> - return 1;
> - }
> - }
> + return INT_CST_LT (val, val2);
> else
> {
> tree tcmp;
>
> Note that INT_CST_LT and INT_CST_LT_UNSIGNED were supposed to
> be very fast. Now you made them
>
> #define INT_CST_LT(A, B) (wi::lts_p (wi::to_widest (A), wi::to_widest (B)))
>
> which a) doesn't look the same to me and b) hides complexity.
Not sure what complexity you mean here: to_widest just reads the
constant in its "infinite-precision" form. I'm not sure:
> So why not write
>
> if (TREE_CODE (val) == INTEGER_CST && TREE_CODE (val2) == INTEGER_CST)
> if (TYPE_UNSIGNED (...)
> return wi::ltu_p (val, va2);
> else
> return wi::lts_p (val, val2);
>
> ?
...this would be faster in practice, since the fast path in lts_p is
length-based and can handle small integers quickly whatever their precision.
The second form means that VAL1 and VAL2 must have the same precision.
I don't know whether that's a good thing or not in this instance.
> @@ -6966,12 +6975,7 @@ tree_int_cst_lt (const_tree t1, const_tree t2)
> int
> tree_int_cst_compare (const_tree t1, const_tree t2)
> {
> - if (tree_int_cst_lt (t1, t2))
> - return -1;
> - else if (tree_int_cst_lt (t2, t1))
> - return 1;
> - else
> - return 0;
> + return wi::cmps (wi::to_widest (t1), wi::to_widest (t2));
> }
>
> /* Return true if T is an INTEGER_CST whose numerical value (extended
>
> How does this work for comparing UHWI_MAX to 0 for example? Looking
> at
>
> template <typename T1, typename T2>
> inline int
> wi::cmps (const T1 &x, const T2 &y)
> {
> unsigned int precision = get_binary_precision (x, y);
> WIDE_INT_REF_FOR (T1) xi (x, precision);
> WIDE_INT_REF_FOR (T2) yi (y, precision);
> if (precision <= HOST_BITS_PER_WIDE_INT)
> {
> HOST_WIDE_INT xl = xi.to_shwi ();
> HOST_WIDE_INT yl = yi.to_shwi ();
> if (xl < yl)
> return -1;
>
> this seems to happily return -1 instead of 1? Similar issues elsewhere
> where you change compares to unconditonally perform signed compares.
> (it's at least non-obvious to me).
>
> Ah, the to_widest () will make it XImode * 4 extended and thus
> get_precision returns 2048 (or so) so we don't take the shortcut
> (which means it's slow).
>
> Shouldn't the shortcuts be taken on 'len' == 1 rather than
> precision <= HWI?
I suppose we could use the same approach as for lts_p, if that seems
OK to you.
> Side-note: we have too many ways to compare trees / integers
Do you mean in wide-int, or in the tree routines? At the wide-int
level there are only really two ways: compare trees as sequences of bits
(ignoring their sign) and compare trees as "infinite-precision" numbers.
Thanks,
Richard