On Tue, Oct 27, 2015 at 1:50 AM, kugan <kugan.vivekanandara...@linaro.org> wrote: > > > On 23/10/15 01:23, Richard Biener wrote: >> >> On Thu, Oct 22, 2015 at 12:50 PM, Kugan >> <kugan.vivekanandara...@linaro.org> wrote: >>> >>> >>> >>> On 21/10/15 23:45, Richard Biener wrote: >>>> >>>> On Tue, Oct 20, 2015 at 10:03 PM, Kugan >>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>> >>>>> >>>>> >>>>> On 07/09/15 12:53, Kugan wrote: >>>>>> >>>>>> >>>>>> This a new version of the patch posted in >>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done >>>>>> more testing and spitted the patch to make it more easier to review. >>>>>> There are still couple of issues to be addressed and I am working on >>>>>> them. >>>>>> >>>>>> 1. AARCH64 bootstrap now fails with the commit >>>>>> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is >>>>>> mis-compiled >>>>>> in stage2 and fwprop.c is failing. It looks to me that there is a >>>>>> latent >>>>>> issue which gets exposed my patch. I can also reproduce this in x86_64 >>>>>> if I use the same PROMOTE_MODE which is used in aarch64 port. For the >>>>>> time being, I am using patch >>>>>> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a >>>>>> workaround. This meeds to be fixed before the patches are ready to be >>>>>> committed. >>>>>> >>>>>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with >>>>>> -O3 -g Error: unaligned opcodes detected in executable segment. It >>>>>> works >>>>>> fine if I remove the -g. I am looking into it and needs to be fixed as >>>>>> well. >>>>> >>>>> >>>>> Hi Richard, >>>>> >>>>> Now that stage 1 is going to close, I would like to get these patches >>>>> accepted for stage1. I will try my best to address your review comments >>>>> ASAP. >>>> >>>> >>>> Ok, can you make the whole patch series available so I can poke at the >>>> implementation a bit? Please state the revision it was rebased on >>>> (or point me to a git/svn branch the work resides on). >>>> >>> >>> Thanks. Please find the patched rebated against trunk@229156. I have >>> skipped the test-case readjustment patches. >> >> >> Some quick observations. On x86_64 when building > > > Hi Richard, > > Thanks for the review. > >> >> short bar (short y); >> int foo (short x) >> { >> short y = bar (x) + 15; >> return y; >> } >> >> with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs) >> I get >> >> <bb 2>: >> _1 = (int) x_10(D); >> _2 = (_1) sext (16); >> _11 = bar (_2); >> _5 = (int) _11; >> _12 = (unsigned int) _5; >> _6 = _12 & 65535; >> _7 = _6 + 15; >> _13 = (int) _7; >> _8 = (_13) sext (16); >> _9 = (_8) sext (16); >> return _9; >> >> which looks fine but the VRP optimization doesn't trigger for the >> redundant sext >> (ranges are computed correctly but the 2nd extension is not removed). >> >> This also makes me notice trivial match.pd patterns are missing, like >> for example >> >> (simplify >> (sext (sext@2 @0 @1) @3) >> (if (tree_int_cst_compare (@1, @3) <= 0) >> @2 >> (sext @0 @3))) >> >> as VRP doesn't run at -O1 we must rely on those to remove rendudant >> extensions, >> otherwise generated code might get worse compared to without the pass(?) > > > Do you think that we should enable this pass only when vrp is enabled. > Otherwise, even when we do the simple optimizations you mentioned below, we > might not be able to remove all the redundancies. > >> >> I also notice that the 'short' argument does not get it's sign-extension >> removed >> as redundand either even though we have >> >> _1 = (int) x_8(D); >> Found new range for _1: [-32768, 32767] >> > > I am looking into it. > >> In the end I suspect that keeping track of the "simple" cases in the >> promotion >> pass itself (by keeping a lattice) might be a good idea (after we fix VRP >> to do >> its work). In some way whether the ABI guarantees promoted argument >> registers might need some other target hook queries. >> >> Now onto the 0002 patch. >> >> +static bool >> +type_precision_ok (tree type) >> +{ >> + return (TYPE_PRECISION (type) == 8 >> + || TYPE_PRECISION (type) == 16 >> + || TYPE_PRECISION (type) == 32); >> +} >> >> that's a weird function to me. You probably want >> TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type)) >> here? And guard that thing with POINTER_TYPE_P || INTEGRAL_TYPE_P? >> > > I will change this. (I have a patch which I am testing with other changes > you have asked for) > > >> +/* Return the promoted type for TYPE. */ >> +static tree >> +get_promoted_type (tree type) >> +{ >> + tree promoted_type; >> + enum machine_mode mode; >> + int uns; >> + if (POINTER_TYPE_P (type) >> + || !INTEGRAL_TYPE_P (type) >> + || !type_precision_ok (type)) >> + return type; >> + >> + mode = TYPE_MODE (type); >> +#ifdef PROMOTE_MODE >> + uns = TYPE_SIGN (type); >> + PROMOTE_MODE (mode, uns, type); >> +#endif >> + uns = TYPE_SIGN (type); >> + promoted_type = lang_hooks.types.type_for_mode (mode, uns); >> + if (promoted_type >> + && (TYPE_PRECISION (promoted_type) > TYPE_PRECISION (type))) >> + type = promoted_type; >> >> I think what you want to verify is that TYPE_PRECISION (promoted_type) >> == GET_MODE_PRECISION (mode). >> And to not even bother with this simply use >> >> promoted_type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode), >> uns); >> > > I am changing this too. > >> You use a domwalk but also might create new basic-blocks during it >> (insert_on_edge_immediate), that's a >> no-no, commit edge inserts after the domwalk. > > > I am sorry, I dont understand "commit edge inserts after the domwalk" Is > there a way to do this in the current implementation?
Yes, simply use gsi_insert_on_edge () and after the domwalk is done do gsi_commit_edge_inserts (). >> ssa_sets_higher_bits_bitmap looks unused and >> we generally don't free dominance info, so please don't do that. >> >> I fired off a bootstrap on ppc64-linux which fails building stage1 libgcc >> with >> >> /abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/ >> -B/usr/local/powerpc64-unknown-linux-gnu/bin/ >> -B/usr/local/powerpc64-unknown-linux-gnu/lib/ -isystem >> /usr/local/powerpc64-unknown-linux-gnu/include -isystem >> /usr/local/powerpc64-unknown-linux-gnu/sys-include -g -O2 -O2 -g >> -O2 -DIN_GCC -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual >> -Wno-format -Wstrict-prototypes -Wmissing-prototypes >> -Wold-style-definition -isystem ./include -fPIC -mlong-double-128 >> -mno-minimal-toc -g -DIN_LIBGCC2 -fbuilding-libgcc >> -fno-stack-protector -fPIC -mlong-double-128 -mno-minimal-toc -I. >> -I. -I../.././gcc -I../../../trunk/libgcc -I../../../trunk/libgcc/. >> -I../../../trunk/libgcc/../gcc -I../../../trunk/libgcc/../include >> -I../../../trunk/libgcc/../libdecnumber/dpd >> -I../../../trunk/libgcc/../libdecnumber -DHAVE_CC_TLS -o _divdi3.o >> -MT _divdi3.o -MD -MP -MF _divdi3.dep -DL_divdi3 -c >> ../../../trunk/libgcc/libgcc2.c \ >> -fexceptions -fnon-call-exceptions -fvisibility=hidden >> -DHIDE_EXPORTS >> In file included from ../../../trunk/libgcc/libgcc2.c:56:0: >> ../../../trunk/libgcc/libgcc2.c: In function ‘__divti3’: >> ../../../trunk/libgcc/libgcc2.h:193:20: internal compiler error: in >> expand_debug_locations, at cfgexpand.c:5277 >> > > I am testing on gcc computefarm. I will get it to bootstrap and will do the > regression testing before posting the next version. > >> as hinted at above a bootstrap on i?86 (yes, 32bit) with >> --with-tune=pentiumpro might be another good testing candidate. >> >> + FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_USE | >> SSA_OP_DEF) >> + promote_def_and_uses (def); >> >> it looks like you are doing some redundant work by walking both defs >> and uses of each stmt. I'd say you should separate >> def and use processing and use >> >> FOR_EACH_SSA_USE_OPERAND (use, stmt, iter, SSA_OP_USE) >> promote_use (use); >> FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF) >> promote_def (def); >> > > Name promote_def_and_uses in my implementation is a bit confusing. It is > promoting the SSA_NAMEs. We only have to do that for the definitions if we > can do the SSA_NAMEs defined by parameters. > > I also have a bitmap to see if we have promoted a variable and avoid doing > it again. I will try to improve this. > > > >> this should make processing more efficient (memory local) compared to >> doing the split handling >> in promote_def_and_uses. >> >> I think it will be convenient to have a SSA name info structure where >> you can remember the original >> type a name was promoted from as well as whether it was promoted or >> not. This way adjusting >> debug uses should be "trivial": >> >> +static unsigned int >> +fixup_uses (tree use, tree promoted_type, tree old_type) >> +{ >> + gimple *stmt; >> + imm_use_iterator ui; >> + gimple_stmt_iterator gsi; >> + use_operand_p op; >> + >> + FOR_EACH_IMM_USE_STMT (stmt, ui, use) >> + { >> + bool do_not_promote = false; >> + switch (gimple_code (stmt)) >> + { >> + case GIMPLE_DEBUG: >> + { >> + gsi = gsi_for_stmt (stmt); >> + gsi_remove (&gsi, true); >> >> rather than doing the above you'd do sth like >> >> SET_USE (use, fold_convert (old_type, new_def)); >> update_stmt (stmt); >> > > We do have these information (original type a name was promoted from as well > as whether it was promoted or not). To make it easy to review, in the patch > that adds the pass,I am removing these debug stmts. But in patch 4, I am > trying to handle this properly. Maybe I should combine them. Yeah, it's a bit confusing otherwise. >> note that while you may not be able to use promoted regs at all uses >> (like calls or asms) you can promote all defs, if only with a compensation >> statement after the original def. The SSA name info struct can be used >> to note down the actual SSA name holding the promoted def. >> >> The pass looks a lot better than last time (it's way smaller!) but >> still needs some >> improvements. There are some more fishy details with respect to how you >> allocate/change SSA names but I think those can be dealt with once the >> basic structure looks how I like it to be. >> > > I will post an updated patch in a day or two. Thanks, Richard. > Thanks again, > Kugan