Re: Fix PR85726 (div-div suboptimization) and a rant on match.pd :s-flag

Hans-Peter Nilsson Thu, 10 May 2018 04:04:00 -0700

> Date: Thu, 10 May 2018 10:33:39 +0200 (CEST)
> From: Marc Glisse <marc.gli...@inria.fr>

> On Thu, 10 May 2018, Hans-Peter Nilsson wrote:
> 
> > Replacing a division feeding a division helps only when the
> > second division is the only user, and "fusing" the divisions is
> 
> Well, that's not quite true.
> int x, y;
> void f(int n){
>    int c = 3 << 20;
>    x = n / c;
>    y = n / c;
> }
> 
> Here we can optimize the last division to y = 0. After your patch, we 
> likely need VRP to do that simplification.

Incorrect; the transformation can't match anything in that code
neither before or after my patch.  (Please adjust your example
to be true, I just couldn't correct it on my own to make sense.)

> There are probably more 
> complicated transformations this disables.

I'm providing an example from *real* code where the
transformation is bad (admittedly just for the div-div case).
Please provide the same in your counterargument.

> > downright bad if another user of the result of first division is
> > a modulus of the same value as the second division, forming a
> > divmod pair.  See the test-case, where for the tested
> > architectures (which all fail the test-case before the patch)
> > the div and mod are implemented using the high-part of a widened
> > multiplication and shift, emitted separately but combined as
> > late as in rtl, with the multiplicaton and shift re-used.  That
> > of course does not happen if later passes see (y / 48; y % 3).
> > While in match.pd, I noticed the corresponding mul-mul match,
> > which I believe should be guarded the same way.
> 
> Did you notice bad codegen because of the multiplication? You are only 
> adding a test for divisions. I am asking because I know such a change will 
> help some cases and hurt others...

I can take that part out for lack of evidence, but first I'll
argue that e.g. a multiplication by 5 won't be helped to be
transformed into a multiplication by 15; multiplying or dividing
by a larger constant is not by itself a simplification or
canonicalization.

> To simplify, the goal of :s is to avoid increasing the number of 
> instructions. Normally, the transformation output is smaller (or the same 
> size but cheaper, simpler, more canonical) than the input.

That may be the intent, but the second sentence is not generally
true.  Any such transformation must be carefully inspected to
have that property *on their own*; it's not true for the div-div
and mul-mul case for example.  After a quick look at uses of :s
in match.pd I'd say that's actually rare and for most codes not
true, carefully remembering we're talking about the context
where the intermediate still lives after the transformation.

> But if we can't 
> get rid of some of the input intermediate results, the size may still 
> increase. :s does test single_use, but it has a special case. If the 
> output (possibly after several rounds of resimplifications) is at most one 
> instruction, then it cannot be larger than the input (we are at least 
> getting rid of the instruction we called the simplification
> on),

(not true in the div-div or mul-mul case)

> so the 
> transformation does happen even if !single_use. Originally this was only 
> done when the simplification led to an SSA_NAME or a constant, IIRC.

Can you please provide an example?  I had a look at a couple of
the :s uses in match.pd imagining intermediate use of the :s-ed
operand and it didn't seem that they'd help where :s doesn't
mean single_use.

> Then people start wanting single_use restrictions to reduce register 
> pressure, reduce the size / number of constants, etc. And not all of those 
> want exactly the same conditions.
> 
> It is useful for high-level transformations to push the canonicalization 
> as far as possible, to notice equivalent quantities or constant bounds in 
> particular. So on a case by case basis, we use :s or single_use or 
> whatever...

Again, you're speaking as if match.pd naturally contains just
canonicalizations, but that's not true.

> If we use both y=x/3 and z=x/15 in the same function, should we make an 
> effort to detect it and rewrite to z=y/5?

I see what you did there. :)  I'd say it depends on how far
those calculations are from each other and if there's
intermediate use of y.

brgds, H-P

Re: Fix PR85726 (div-div suboptimization) and a rant on match.pd :s-flag

Reply via email to