Re: [PATCH][v2] tree-optimization: Fold (type)X / (type)Y [PR103855]

Jeff Law via Gcc-patches Sat, 09 Jul 2022 11:19:54 -0700



On 3/16/2022 6:49 PM, Zhao Wei Liew via Gcc-patches wrote:

Thanks for the detailed review. I have uploaded patch v2 based on the review.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590604.html
Changes since v1:
1. Add patterns for the cases CST / (T)x and (T)x / CST as well. Fix
test regressions caused by those patterns.
2. Support signed integers except for the possible INT_MIN / -1 case.
3. Support cases where X and Y have different precisions.

On Tue, 22 Feb 2022 at 15:54, Richard Biener <richard.guent...@gmail.com> wrote:

+/* (type)X / (type)Y -> (type)(X / Y)
+   when the resulting type is at least precise as the original types
+   and when all the types are unsigned integral types. */
+(simplify
+ (trunc_div (convert @0) (convert @1))
+ (if (INTEGRAL_TYPE_P (type)
+      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+      && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+      && TYPE_UNSIGNED (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && TYPE_UNSIGNED (TREE_TYPE (@1))
+      && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
+      && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0)))
+  (convert (trunc_div @0 @1))))

since you are requiring the types of @0 and @1 to match it's easier to write

      && types_match (TREE_TYPE(@0), TREE_TYPE (@1))

Thanks. In the new patch, TREE_TYPE (@0) and TREE_TYPE (@1) can now
have different precisions, so I believe that I can't use types_match()
anymore.

that allows you to elide checks on either @0 or @1.  I suppose the transform
does not work for signed types because of the -INT_MIN / -1 overflow case?
It might be possible to use expr_not_equal_to (@0, -INT_MIN) ||
expr_not_equal_to (@1, -1)
(correctly written, lookup the existing examples in match.pd for the X
% -Y transform)

That's right. I have made changes to support signed types except for
the INT_MIN / -1 case.

I'll note that as written the transform will not catch CST / (T)x or
(T)x / CST since
you'll not see conversions around constants.  I'm not sure whether
using (convert[12]? ...)
or writing special patterns with INTEGER_CST operands is more convenient.
There is int_fits_type_p to check whether a constant will fit in a
type without truncation
or sign change.

I see. I couldn't find an easy way to use (convert[12]? ...) in this
case so I added 2 other patterns for the CST / (T)x and (T)x / CST
cases.

When @0 and @1 do not have the same type there might still be a common type
that can be used and is smaller than 'type', it might be as simple as using
build_nonstandard_integer_type (MIN (@0-prec, @1-prec), 1 /*unsigned_p*/).

Thanks for the tip. I've made a similar change, but using either
TREE_TYPE (@0) or TREE_TYPE (@1) instead of
build_nonstandard_integer_type().

In the past there have been attempts to more globally narrow operations using
a new pass rather than using individual patterns.  So for more complicated cases
that might be the way to go.  There's now also the ISEL pass which does
pre-RTL expansion transforms that need some global context and for example
can look at SSA uses.

I had wanted to do something similar to handle all integral
trunc_divs, but I'm not sure where/how to start. It seems out of my
league at this moment. I'll gladly explore it after this change
though!

Just a couple general notes in case you want to poke further in this space.

Earlier you mentioned you were trying to do some of this work inexpand_divmod, but the operands had rtx types rather than tree types,thus you're unable to get at the range data.

In expand_expr_divmod there's treeop0, treeop1. So if they areSSA_NAMEs, then you can query for their range.

Richi mentioned attempts to globally narrow during a new pass. That's amuch broader chance, but has the potential to apply in a lot moreplaces. Narrowing early would potentially expose the narrowedstatements to the vectorizer which could be a win. Narrowing late islikely a win too, but it likely only helps the expansion phase generatenarrower operations. This is obviously a larger project than justhandling the division cases in match.pd. Tackling it is not (IMHO) arequirement to move forward.

Finally, narrowing isn't always a win. There have beenmicro-architectures were sub-word accesses are slower than wordaccesses. I'm not too worried about it for this patch, but it couldbecome more important if you were to look into a more general solution.

As far as the patch is concerned. At one point I was a bit worried thatexpr_not_equal_to wasn't right. But after digging a bit deeper into itsimplementation, it should do the right thing here. Note if you wantedto see how to get to underlying range data, you can find a simpleexample in that function.

Your new testcase needs to limit itself to targets where integers are atleast 32 bits wide. Something like this at the top of the new test:


/* { dg-require-effective-target int32plus } */

There are additional cases Andrew Pinski pointed out in BZ. Does yournew code handle all of them. It's OK if it doesn't, but obviously notoptimal. Consider adding his additional testcases, potentially xfailedif they're not handled yet.

You changed gcc.dg/ipa/pr91088.c and the ChangeLog indicates "fix a testregression". Can you say a bit more about what's going on there. It'slikely find, but we want to make sure that your change hasn'tcompromised the test.


Otherwise it looks pretty good.

jeff

Re: [PATCH][v2] tree-optimization: Fold (type)X / (type)Y [PR103855]

Reply via email to