On Tue, Nov 9, 2021 at 5:25 PM Navid Rahimi <navidrah...@microsoft.com> wrote: > > Hi Richard, > > Thank you so much for your detailed feedback. I am attaching another version > of the patch which does include all the changes you mentioned. > > Bellow you can see my response to your feedbacks: > > > the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as > > constants come last - you can then remove the 'c' > Fixed. I was not aware of the canonical order. > > > you can use INTEGRAL_TYPE_P (type). > Fixed. Didn't know about "type" either. > > > this test is not necessary > Fixed. > > > But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So > > it looks like a conflict with the x * (1 + b) <-> x + x * b transform > > (fold_plusminus_mult)? That said, the special case of one > > definitely makes the result cheaper (one less operation). > For this special case, it does remove operation indeed. But I was not able to > reproduce it for any other constant [1]. If it was possible to do it with > other constants I would've changed the pattern to have be more general like > "x * (C + y/x) - y -> C*x - y % x". Basically anything other than 1 wasn't a > win. Regarding the "x * (1 + b) <-> x + x * b" transformation, it appears to > me when there is a "- y" at the end "x * (1 + b)", there is opportunity to > optimize. Without that "- y" I was not able to make any operation more > performant. Either direction, looks like same amount of computation. > > 1) https://compiler-explorer.com/z/dWsq7Tzf4 > > > Please move the pattern next to the most relatest which I think is > Fixed. > > > the return value of 1 is an unreliable way to fail, please instead call > > __builtin_abort (); > Fixed. > > > do we really need -O3 for this? I think that -O should be enough here? > We don't specifically need that. But I realized that the optimization can > happen in two different level at the compiler. It seems if you spread the > computation over multiple statement like: > int c = a/b; > int y = b * (1 + c); > return y - a; > > instead of : > return b * (1 + a / b) - a; > > Then you have to have at least -O1 to have it optimized. Granted, I am not > doing that in the testcase. In the new patch I am changing it to -O. Let me > know if you have any suggestions.
-O is fine, generally at -O0 we shouldn't expect such transforms to happen (but they still do, of course). The patch looks OK now. Thanks, Richard. > > Best wishes, > Navid. > > ________________________________________ > From: Richard Biener <richard.guent...@gmail.com> > Sent: Tuesday, November 9, 2021 02:36 > To: Navid Rahimi > Cc: gcc-patches@gcc.gnu.org > Subject: [EXTERNAL] Re: [PATCH] PR tree-optimization/102232 Adding a missing > pattern to match.pd > > On Tue, Nov 9, 2021 at 5:12 AM Navid Rahimi via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > Hi GCC community, > > > > This patch will add the missed pattern described in bug 102232 [1] to the > > match.pd. The testcase will test whether the multiplication and division > > has been removed from the code or not. The correctness proof for this > > pattern is here [2] in case anyone is curious. > > > > PR tree-optimization/102232 > > * match.pd (x * (1 + y / x) - y) -> (x - y % x): New optimization. > > +/* x * (1 + y / x) - y -> x - y % x */ > +(simplify > + (minus (mult:cs @0 (plus:cs integer_onep (trunc_div:s @1 @0))) @1) > > the canonical order of the plus is (plus:s (trunc_div ...) integer_onep) as > constants come last - you can then remove the 'c' > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > you can use INTEGRAL_TYPE_P (type). > > + && types_match (@0, @1)) > > this test is not necessary > > + (minus @0 (trunc_mod @1 @0)))) > > But should we also optimize x * (2 + y/x) - y -> 2*x - y % x? So > it looks like a conflict with the x * (1 + b) <-> x + x * b transform > (fold_plusminus_mult)? That said, the special case of one > definitely makes the result cheaper (one less operation). > > Please move the pattern next to the most relatest which I think is > > /* X - (X / Y) * Y is the same as X % Y. */ > (simplify > (minus (convert1? @0) (convert2? (mult:c (trunc_div @@0 @@1) @1))) > (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) > (convert (trunc_mod @0 @1)))) > > +int > +main (void) > +{ > + // few randomly generated test cases > + if (foo (71856034, 238) != 212) > + { > + return 1; > > the return value of 1 is an unreliable way to fail, please instead call > __builtin_abort (); > > +/* { dg-options "-O3 -fdump-tree-optimized" } */ > > do we really need -O3 for this? I think that -O should be enough here? > > Thanks, > Richard. > > > * gcc.dg/tree-ssa/pr102232.c: testcase for this optimization. > > > > > > 1) > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D102232&data=04%7C01%7Cnavidrahimi%40microsoft.com%7Cbc89643c6a14411d11ac08d9a36ce282%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637720510334697749%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Br7B1ShCT4ynLwypyM29LqaNF9GBJF3%2BIR6sZrDRTKM%3D&reserved=0 > > 2) > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Falive2.llvm.org%2Fce%2Fz%2F2VScjD&data=04%7C01%7Cnavidrahimi%40microsoft.com%7Cbc89643c6a14411d11ac08d9a36ce282%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637720510334707741%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FH0%2BwEHmaGjyclNFD2HrKMXtXDscPAlc9uUs3p6UMkw%3D&reserved=0 > > > > Best wishes, > > Navid.