On Tue, Sep 5, 2017 at 5:26 AM, Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> wrote: > Hi, > I have attached revamped version of Kugan's original patch for type promotion > (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00472.html) > rebased on r249469. The motivation of the pass is to minimize > generation of subregs > to avoid redundant zero/sign extensions by carrying out computations > in PROMOTE_MODE > as much as possible on tree-level. > > * Working of pass > The pass is dominator-based, and tries to promote type of def to > PROMOTE_MODE in each gimple stmt. Before beginning domwalk, all the > default definitions are promoted to PROMOTE_MODE > in promote_all_ssa_defined_with_nop (). The patch adds a new tree code > SEXT_EXPR to represent sign-extension on tree level. CONVERT_EXPR is > either replaced by an explicit sign or zero extension depending on the > signedness of operands. > > The core of the pass is in following two routines: > a) promote_ssa: This pass looks at the def of gimple_stmt, and > promotes type of def to promoted_type in-place if possible. If it > cannot promote def in-place, then > it transforms: > def = def_stmt > to > new_def = def_stmt > def = convert_expr new_def > where new_def is a clone of def, and type of def is set to promoted_type. > > b) fixup_use: The main intent is to "fix" uses of a promoted variable > to preserve semantics > of the code, for instance if the variable is used in stmt where it's > original type is required. > Another case is when def is not promoted by promote_ssa, but some uses > could be promoted. > > promote_all_stmts () is the driver function that calls fixup_use and > promote_ssa for each stmt > within the basic block. The pass relies extensively on dom and vrp to > remove redundancies generated by the pass and is thus enabled only if > vrp is enabled. > > Issues: > 1] Pass ordering: type-promote pass generates too many redundancies > which can hamper other optimizations. One case I observed was on arm > when it inhibited path splitting optimization because the number of > stmts in basic block exceeded the value of > param-max-jump-thread-duplication-stmts. So I placed the pass just > before dom. I am not sure if this is a good approach. Maybe the pass > itself > should avoid generating redundant statements and not rely too much on > dom and vrp ? > > 2] Redundant copies left after the pass: When it's safe, vrp optimzies > _1 = op1 sext op2 > into > _1 = op1 > > which leaves redundant copies that are not optimized away at GIMPLE level. > I placed pass_copy_prop after vrp to eliminate these copies but not sure if > that > is ideal. Maybe we should do this during vrp itself as this is the > only case that > generates redundant copies ? > > 3] Backward phi's: Since we traverse in dominated order, fixup_use() > gets called first on a ssa_name that is an argument of a backward-phi. > If it promotes the type and later if promote_ssa() decides that the > ssa_name should not be promoted, then we have to again walk the > backward phi's to undo the "incorrect" promotion, which is done by > emitting a convert_expr back to the original type from promoted_type. > While I suppose it shouldn't affect correctness, it generates > redundant casts and was wondering if there's a better approach to > handle this issue. > > * SPEC2k6 benchmarking: > > Results of benchmarking the patch for aarch64-linux-gnu cortex-a57: > > Improved: > 401.bzip2 +2.11% > 459.GemsFDTD +1.42% > 464.h264ref +1.96% > 471.omnetpp +1.05% > 481.wrf +0.99% > > Regressed: > 447.dealII -1.34% > 450.soplex -1.54% > 456.hmmer -3.79% > 482.sphinx3 -2.95% > > The remaining benchmarks didn't have much difference. However there > was some noise > in the above run, and I suppose the numbers are not precise. > I will give another run with benchmarking. There was no significant > difference in code-size with and without patch for SPEC2k6. > > * Validation > > The patch passes bootstrap+test on x86_64-unknown-linux-gnu, > arm-linux-gnueabihf, > aarch64-linux-gnu and ppc64le-linux-gnu. On arm-linux-gnueabihf, and > aarch64-linux-gnu, there is following fallout: > > 1] gcc.dg/attr-alloc_size-11.c missing range info for signed char > (test for warnings, line 50) > The issue seems to be that we call reset_flow_sensitive_info(def) if > def is promoted and that invalidates value range which is probably > causing the regression. > > 2] gcc.dg/fold-narrowbopcst-1.c scan-tree-dump optimized " = _.* \\\\+ 156" > This requires adjusting the scan-dump. > > On ppc64le-linux-gnu, I am observing several regressions in the > testsuite. Most of these seem to be because type-promotion is > interfering with other optimizations, especially widening_mul and > bswap pass. Also observed fallout for some cases due to interference > with strlen, tail-call and jump-threading passes. I suppose I am not > seeing these on arm/aarch64 because ppc64 defines > PROMOTE_MODE to be DImode and aarch64 defines it to be SImode ? I am > working to fix performance regressions. However I would be grateful to > receive feedback on the pass, if it's going in the right direction.
I'm glad that this type of optimization is being pursued. Why does AArch64 define PROMOTE_MODE as SImode? GCC ports for other RISC targets mostly seem to use a 64-bit mode. Maybe SImode is the correct definition based on the current GCC optimization infrastructure, but this seems like a change that should be applied to all 64 bit RISC targets. Thanks, David