On Thu, Oct 22, 2015 at 12:50 PM, Kugan <kugan.vivekanandara...@linaro.org> wrote: > > > On 21/10/15 23:45, Richard Biener wrote: >> On Tue, Oct 20, 2015 at 10:03 PM, Kugan >> <kugan.vivekanandara...@linaro.org> wrote: >>> >>> >>> On 07/09/15 12:53, Kugan wrote: >>>> >>>> This a new version of the patch posted in >>>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done >>>> more testing and spitted the patch to make it more easier to review. >>>> There are still couple of issues to be addressed and I am working on them. >>>> >>>> 1. AARCH64 bootstrap now fails with the commit >>>> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is mis-compiled >>>> in stage2 and fwprop.c is failing. It looks to me that there is a latent >>>> issue which gets exposed my patch. I can also reproduce this in x86_64 >>>> if I use the same PROMOTE_MODE which is used in aarch64 port. For the >>>> time being, I am using patch >>>> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a >>>> workaround. This meeds to be fixed before the patches are ready to be >>>> committed. >>>> >>>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with >>>> -O3 -g Error: unaligned opcodes detected in executable segment. It works >>>> fine if I remove the -g. I am looking into it and needs to be fixed as >>>> well. >>> >>> Hi Richard, >>> >>> Now that stage 1 is going to close, I would like to get these patches >>> accepted for stage1. I will try my best to address your review comments >>> ASAP. >> >> Ok, can you make the whole patch series available so I can poke at the >> implementation a bit? Please state the revision it was rebased on >> (or point me to a git/svn branch the work resides on). >> > > Thanks. Please find the patched rebated against trunk@229156. I have > skipped the test-case readjustment patches.
Some quick observations. On x86_64 when building short bar (short y); int foo (short x) { short y = bar (x) + 15; return y; } with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs) I get <bb 2>: _1 = (int) x_10(D); _2 = (_1) sext (16); _11 = bar (_2); _5 = (int) _11; _12 = (unsigned int) _5; _6 = _12 & 65535; _7 = _6 + 15; _13 = (int) _7; _8 = (_13) sext (16); _9 = (_8) sext (16); return _9; which looks fine but the VRP optimization doesn't trigger for the redundant sext (ranges are computed correctly but the 2nd extension is not removed). This also makes me notice trivial match.pd patterns are missing, like for example (simplify (sext (sext@2 @0 @1) @3) (if (tree_int_cst_compare (@1, @3) <= 0) @2 (sext @0 @3))) as VRP doesn't run at -O1 we must rely on those to remove rendudant extensions, otherwise generated code might get worse compared to without the pass(?) I also notice that the 'short' argument does not get it's sign-extension removed as redundand either even though we have _1 = (int) x_8(D); Found new range for _1: [-32768, 32767] In the end I suspect that keeping track of the "simple" cases in the promotion pass itself (by keeping a lattice) might be a good idea (after we fix VRP to do its work). In some way whether the ABI guarantees promoted argument registers might need some other target hook queries. Now onto the 0002 patch. +static bool +type_precision_ok (tree type) +{ + return (TYPE_PRECISION (type) == 8 + || TYPE_PRECISION (type) == 16 + || TYPE_PRECISION (type) == 32); +} that's a weird function to me. You probably want TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type)) here? And guard that thing with POINTER_TYPE_P || INTEGRAL_TYPE_P? +/* Return the promoted type for TYPE. */ +static tree +get_promoted_type (tree type) +{ + tree promoted_type; + enum machine_mode mode; + int uns; + if (POINTER_TYPE_P (type) + || !INTEGRAL_TYPE_P (type) + || !type_precision_ok (type)) + return type; + + mode = TYPE_MODE (type); +#ifdef PROMOTE_MODE + uns = TYPE_SIGN (type); + PROMOTE_MODE (mode, uns, type); +#endif + uns = TYPE_SIGN (type); + promoted_type = lang_hooks.types.type_for_mode (mode, uns); + if (promoted_type + && (TYPE_PRECISION (promoted_type) > TYPE_PRECISION (type))) + type = promoted_type; I think what you want to verify is that TYPE_PRECISION (promoted_type) == GET_MODE_PRECISION (mode). And to not even bother with this simply use promoted_type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode), uns); You use a domwalk but also might create new basic-blocks during it (insert_on_edge_immediate), that's a no-no, commit edge inserts after the domwalk. ssa_sets_higher_bits_bitmap looks unused and we generally don't free dominance info, so please don't do that. I fired off a bootstrap on ppc64-linux which fails building stage1 libgcc with /abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/ -B/usr/local/powerpc64-unknown-linux-gnu/bin/ -B/usr/local/powerpc64-unknown-linux-gnu/lib/ -isystem /usr/local/powerpc64-unknown-linux-gnu/include -isystem /usr/local/powerpc64-unknown-linux-gnu/sys-include -g -O2 -O2 -g -O2 -DIN_GCC -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fPIC -mlong-double-128 -mno-minimal-toc -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -fPIC -mlong-double-128 -mno-minimal-toc -I. -I. -I../.././gcc -I../../../trunk/libgcc -I../../../trunk/libgcc/. -I../../../trunk/libgcc/../gcc -I../../../trunk/libgcc/../include -I../../../trunk/libgcc/../libdecnumber/dpd -I../../../trunk/libgcc/../libdecnumber -DHAVE_CC_TLS -o _divdi3.o -MT _divdi3.o -MD -MP -MF _divdi3.dep -DL_divdi3 -c ../../../trunk/libgcc/libgcc2.c \ -fexceptions -fnon-call-exceptions -fvisibility=hidden -DHIDE_EXPORTS In file included from ../../../trunk/libgcc/libgcc2.c:56:0: ../../../trunk/libgcc/libgcc2.c: In function ‘__divti3’: ../../../trunk/libgcc/libgcc2.h:193:20: internal compiler error: in expand_debug_locations, at cfgexpand.c:5277 as hinted at above a bootstrap on i?86 (yes, 32bit) with --with-tune=pentiumpro might be another good testing candidate. + FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_USE | SSA_OP_DEF) + promote_def_and_uses (def); it looks like you are doing some redundant work by walking both defs and uses of each stmt. I'd say you should separate def and use processing and use FOR_EACH_SSA_USE_OPERAND (use, stmt, iter, SSA_OP_USE) promote_use (use); FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF) promote_def (def); this should make processing more efficient (memory local) compared to doing the split handling in promote_def_and_uses. I think it will be convenient to have a SSA name info structure where you can remember the original type a name was promoted from as well as whether it was promoted or not. This way adjusting debug uses should be "trivial": +static unsigned int +fixup_uses (tree use, tree promoted_type, tree old_type) +{ + gimple *stmt; + imm_use_iterator ui; + gimple_stmt_iterator gsi; + use_operand_p op; + + FOR_EACH_IMM_USE_STMT (stmt, ui, use) + { + bool do_not_promote = false; + switch (gimple_code (stmt)) + { + case GIMPLE_DEBUG: + { + gsi = gsi_for_stmt (stmt); + gsi_remove (&gsi, true); rather than doing the above you'd do sth like SET_USE (use, fold_convert (old_type, new_def)); update_stmt (stmt); note that while you may not be able to use promoted regs at all uses (like calls or asms) you can promote all defs, if only with a compensation statement after the original def. The SSA name info struct can be used to note down the actual SSA name holding the promoted def. The pass looks a lot better than last time (it's way smaller!) but still needs some improvements. There are some more fishy details with respect to how you allocate/change SSA names but I think those can be dealt with once the basic structure looks how I like it to be. Thanks, Richard. > > Thanks, > Kugan