On Tue, Sep 5, 2017 at 5:26 AM, Prathamesh Kulkarni
<prathamesh.kulka...@linaro.org> wrote:
> Hi,
> I have attached revamped version of Kugan's original patch for type promotion
> (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00472.html)
> rebased on r249469. The motivation of the pass is to minimize
> generation of subregs
> to avoid redundant zero/sign extensions by carrying out computations
> in PROMOTE_MODE
> as much as possible on tree-level.
>
> * Working of pass
> The pass is dominator-based, and tries to promote type of def to
> PROMOTE_MODE in each gimple stmt. Before beginning domwalk, all the
> default definitions are promoted to PROMOTE_MODE
> in promote_all_ssa_defined_with_nop (). The patch adds a new tree code
> SEXT_EXPR to represent sign-extension on tree level. CONVERT_EXPR is
> either replaced by an explicit sign or zero extension depending on the
> signedness of operands.
>
> The core of the pass is in following two routines:
> a) promote_ssa: This pass looks at the def of gimple_stmt, and
> promotes type of def to promoted_type in-place if possible. If it
> cannot promote def in-place, then
> it transforms:
> def = def_stmt
> to
> new_def = def_stmt
> def = convert_expr new_def
> where new_def is a clone of def, and type of def is set to promoted_type.
>
> b) fixup_use: The main intent is to "fix" uses of a promoted variable
> to preserve semantics
> of the code, for instance if the variable is used in stmt where it's
> original type is required.
> Another case is when def is not promoted by promote_ssa, but some uses
> could be promoted.
>
> promote_all_stmts () is the driver function that calls fixup_use and
> promote_ssa for each stmt
> within the basic block. The pass relies extensively on dom and vrp to
> remove redundancies generated by the pass and is thus enabled only if
> vrp is enabled.
>
> Issues:
> 1] Pass ordering: type-promote pass generates too many redundancies
> which can hamper other optimizations. One case I observed was on arm
> when it inhibited path splitting optimization because the number of
> stmts in basic block exceeded the value of
> param-max-jump-thread-duplication-stmts. So I placed the pass just
> before dom. I am not sure if this is a good approach. Maybe the pass
> itself
> should avoid generating redundant statements and not rely too much on
> dom and vrp ?
>
> 2] Redundant copies left after the pass: When it's safe, vrp optimzies
> _1 = op1 sext op2
> into
> _1 = op1
>
> which leaves redundant copies that are not optimized away at GIMPLE level.
> I placed pass_copy_prop after vrp to eliminate these copies but not sure if 
> that
> is ideal. Maybe we should do this during vrp itself as this is the
> only case that
> generates redundant copies ?
>
> 3] Backward phi's: Since we traverse in dominated order, fixup_use()
> gets called first on a ssa_name that is an argument of a backward-phi.
> If it promotes the type and later if promote_ssa() decides that the
> ssa_name should not be promoted, then we have to again walk the
> backward phi's to undo the "incorrect" promotion, which is done by
> emitting a convert_expr back to the original type from promoted_type.
> While I suppose it shouldn't affect correctness, it generates
> redundant casts and was wondering if there's a better approach to
> handle this issue.
>
> * SPEC2k6 benchmarking:
>
> Results of  benchmarking the patch for aarch64-linux-gnu cortex-a57:
>
> Improved:
> 401.bzip2          +2.11%
> 459.GemsFDTD  +1.42%
> 464.h264ref       +1.96%
> 471.omnetpp      +1.05%
> 481.wrf                +0.99%
>
> Regressed:
> 447.dealII            -1.34%
> 450.soplex          -1.54%
> 456.hmmer          -3.79%
> 482.sphinx3         -2.95%
>
> The remaining benchmarks didn't have much difference. However there
> was some noise
> in the above run, and I suppose the numbers are not precise.
> I will give another run with benchmarking. There was no significant
> difference in code-size with and without patch for SPEC2k6.
>
> * Validation
>
> The patch passes bootstrap+test on x86_64-unknown-linux-gnu,
> arm-linux-gnueabihf,
> aarch64-linux-gnu and ppc64le-linux-gnu. On arm-linux-gnueabihf, and
> aarch64-linux-gnu, there is following fallout:
>
> 1] gcc.dg/attr-alloc_size-11.c missing range info for signed char
> (test for warnings, line 50)
> The issue seems to be that we call reset_flow_sensitive_info(def) if
> def is promoted and that invalidates value range which is probably
> causing the regression.
>
> 2] gcc.dg/fold-narrowbopcst-1.c scan-tree-dump optimized " = _.* \\\\+ 156"
> This requires adjusting the scan-dump.
>
> On ppc64le-linux-gnu, I am observing several regressions in the
> testsuite. Most of these seem to be because type-promotion is
> interfering with other optimizations, especially widening_mul and
> bswap pass. Also observed fallout for some cases due to interference
> with strlen, tail-call and jump-threading passes. I suppose I am not
> seeing these on arm/aarch64 because ppc64 defines
> PROMOTE_MODE to be DImode and aarch64 defines it to be SImode ? I am
> working to fix performance regressions. However I would be grateful to
> receive feedback on the pass, if it's going in the right direction.

I'm glad that this type of optimization is being pursued.

Why does AArch64 define PROMOTE_MODE as SImode?  GCC ports for other
RISC targets mostly seem to use a 64-bit mode.  Maybe SImode is the
correct definition based on the current GCC optimization
infrastructure, but this seems like a change that should be applied to
all 64 bit RISC targets.

Thanks, David

Reply via email to