On Tue, Mar 15, 2022 at 4:15 PM Joern Rennecke
<joern.renne...@embecosm.com> wrote:
>
> On 15/03/2022, Richard Biener <richard.guent...@gmail.com> wrote:
>
> > Why's this a new pass?  Every walk over all insns costs time.
>
> If should typically scan considerably less than all the insns.
>
> >  The pass
> > lacks any comments as to what CFG / stmt structure is matched.
>
> I've put a file in:
> config/riscv/tree-crc-doc.txt
>
> would this text be suitabe to put in a comment block in tree-crc.cc ?

Yes, that would be a better place I think.

> >  From
> > a quick look it seems like it first(?) statically matches a stmt sequence
> > without considering intermediate stmts, so matching should be quite
> > fragile.
>
> It might be fragile inasmuch as it won't match when things change, but
> the matching has remained effective for seven years and across two
> architecture families with varying word sizes.
> And with regards to matching only what it's supposed to match, I believe
> I have checked all the data dependencies and phis so that it's definitely
> calculating a CRC.
>
> >  Why not match (sub-)expressions with the help of match.pd?
>
> Can you match a loop with match.pd ?

No, this is why I said (sub-)expression.  I'm mainly talking about

      tmp = (tmp >> 1) | (tmp << (sizeof (tmp) * (8 /*CHAR_BIT*/) - 1));
      if ((long)tmp < 0)

for example - one key bit of the CRC seems to be a comparison, so
you'd match that with sth like

(match (crc_compare @0)
 (lt (bit_ior (rshift @0 integer_onep) (lshift @0 INTEGER_CST@1)) integer_zerop)
 (if (compare_tree_int (@1, TYPE_SIZE_UNIT (TREE_TYPE (@0)) * 8 - 1))))

where you can add alternative expression forms.  You can then use
the generated match function from the pass.  See for example the
tree-ssa-forwprop.cc use of gimple_ctz_table_index.

> > Any reason why you match CRC before early inlinig and thus even when
> > not optimizing?  Matching at least after early FRE/DCE/DSE would help
> > to get rid of abstraction and/or memory temporary uses.
>
> I haven't originally placed it there, but I believe benefits include:
> - Getting rid of loop without having to actively deleting it in the
> crc pass (this also
>   might be safer as we just have to make sure we're are computing the CRC, and
>   DCE will determine if there is any ancillary result that is left,
> and only delete the
>   loop if it's really dead.
> - The optimized function is available for inlining.

The canonical place to transform loops into builtins would be loop distribution.
Another place would be final value replacement since you basically replace
the reduction result with a call to the builtin, but I think
loop-distribution is
the better overall place.  See how we match strlen() there.

Richard.

Reply via email to