On 3/16/2022 6:49 PM, Zhao Wei Liew via Gcc-patches wrote:
Thanks for the detailed review. I have uploaded patch v2 based on the review.
v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590604.html
Changes since v1:
1. Add patterns for the cases CST / (T)x and (T)x / CST as well. Fix
test regressions caused by those patterns.
2. Support signed integers except for the possible INT_MIN / -1 case.
3. Support cases where X and Y have different precisions.
On Tue, 22 Feb 2022 at 15:54, Richard Biener <richard.guent...@gmail.com> wrote:
+/* (type)X / (type)Y -> (type)(X / Y)
+ when the resulting type is at least precise as the original types
+ and when all the types are unsigned integral types. */
+(simplify
+ (trunc_div (convert @0) (convert @1))
+ (if (INTEGRAL_TYPE_P (type)
+ && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+ && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+ && TYPE_UNSIGNED (type)
+ && TYPE_UNSIGNED (TREE_TYPE (@0))
+ && TYPE_UNSIGNED (TREE_TYPE (@1))
+ && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
+ && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0)))
+ (convert (trunc_div @0 @1))))
since you are requiring the types of @0 and @1 to match it's easier to write
&& types_match (TREE_TYPE(@0), TREE_TYPE (@1))
Thanks. In the new patch, TREE_TYPE (@0) and TREE_TYPE (@1) can now
have different precisions, so I believe that I can't use types_match()
anymore.
that allows you to elide checks on either @0 or @1. I suppose the transform
does not work for signed types because of the -INT_MIN / -1 overflow case?
It might be possible to use expr_not_equal_to (@0, -INT_MIN) ||
expr_not_equal_to (@1, -1)
(correctly written, lookup the existing examples in match.pd for the X
% -Y transform)
That's right. I have made changes to support signed types except for
the INT_MIN / -1 case.
I'll note that as written the transform will not catch CST / (T)x or
(T)x / CST since
you'll not see conversions around constants. I'm not sure whether
using (convert[12]? ...)
or writing special patterns with INTEGER_CST operands is more convenient.
There is int_fits_type_p to check whether a constant will fit in a
type without truncation
or sign change.
I see. I couldn't find an easy way to use (convert[12]? ...) in this
case so I added 2 other patterns for the CST / (T)x and (T)x / CST
cases.
When @0 and @1 do not have the same type there might still be a common type
that can be used and is smaller than 'type', it might be as simple as using
build_nonstandard_integer_type (MIN (@0-prec, @1-prec), 1 /*unsigned_p*/).
Thanks for the tip. I've made a similar change, but using either
TREE_TYPE (@0) or TREE_TYPE (@1) instead of
build_nonstandard_integer_type().
In the past there have been attempts to more globally narrow operations using
a new pass rather than using individual patterns. So for more complicated cases
that might be the way to go. There's now also the ISEL pass which does
pre-RTL expansion transforms that need some global context and for example
can look at SSA uses.
I had wanted to do something similar to handle all integral
trunc_divs, but I'm not sure where/how to start. It seems out of my
league at this moment. I'll gladly explore it after this change
though!
Just a couple general notes in case you want to poke further in this space.
Earlier you mentioned you were trying to do some of this work in
expand_divmod, but the operands had rtx types rather than tree types,
thus you're unable to get at the range data.
In expand_expr_divmod there's treeop0, treeop1. So if they are
SSA_NAMEs, then you can query for their range.
Richi mentioned attempts to globally narrow during a new pass. That's a
much broader chance, but has the potential to apply in a lot more
places. Narrowing early would potentially expose the narrowed
statements to the vectorizer which could be a win. Narrowing late is
likely a win too, but it likely only helps the expansion phase generate
narrower operations. This is obviously a larger project than just
handling the division cases in match.pd. Tackling it is not (IMHO) a
requirement to move forward.
Finally, narrowing isn't always a win. There have been
micro-architectures were sub-word accesses are slower than word
accesses. I'm not too worried about it for this patch, but it could
become more important if you were to look into a more general solution.
As far as the patch is concerned. At one point I was a bit worried that
expr_not_equal_to wasn't right. But after digging a bit deeper into its
implementation, it should do the right thing here. Note if you wanted
to see how to get to underlying range data, you can find a simple
example in that function.
Your new testcase needs to limit itself to targets where integers are at
least 32 bits wide. Something like this at the top of the new test:
/* { dg-require-effective-target int32plus } */
There are additional cases Andrew Pinski pointed out in BZ. Does your
new code handle all of them. It's OK if it doesn't, but obviously not
optimal. Consider adding his additional testcases, potentially xfailed
if they're not handled yet.
You changed gcc.dg/ipa/pr91088.c and the ChangeLog indicates "fix a test
regression". Can you say a bit more about what's going on there. It's
likely find, but we want to make sure that your change hasn't
compromised the test.
Otherwise it looks pretty good.
jeff