sext

Richard Biener Wed, 28 Oct 2015 08:46:28 -0700

On Tue, Oct 27, 2015 at 1:50 AM, kugan
<kugan.vivekanandara...@linaro.org> wrote:
>
>
> On 23/10/15 01:23, Richard Biener wrote:
>>
>> On Thu, Oct 22, 2015 at 12:50 PM, Kugan
>> <kugan.vivekanandara...@linaro.org> wrote:
>>>
>>>
>>>
>>> On 21/10/15 23:45, Richard Biener wrote:
>>>>
>>>> On Tue, Oct 20, 2015 at 10:03 PM, Kugan
>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 07/09/15 12:53, Kugan wrote:
>>>>>>
>>>>>>
>>>>>> This a new version of the patch posted in
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done
>>>>>> more testing and spitted the patch to make it more easier to review.
>>>>>> There are still couple of issues to be addressed and I am working on
>>>>>> them.
>>>>>>
>>>>>> 1. AARCH64 bootstrap now fails with the commit
>>>>>> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is
>>>>>> mis-compiled
>>>>>> in stage2 and fwprop.c is failing. It looks to me that there is a
>>>>>> latent
>>>>>> issue which gets exposed my patch. I can also reproduce this in x86_64
>>>>>> if I use the same PROMOTE_MODE which is used in aarch64 port. For the
>>>>>> time being, I am using  patch
>>>>>> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a
>>>>>> workaround. This meeds to be fixed before the patches are ready to be
>>>>>> committed.
>>>>>>
>>>>>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
>>>>>> -O3 -g Error: unaligned opcodes detected in executable segment. It
>>>>>> works
>>>>>> fine if I remove the -g. I am looking into it and needs to be fixed as
>>>>>> well.
>>>>>
>>>>>
>>>>> Hi Richard,
>>>>>
>>>>> Now that stage 1 is going to close, I would like to get these patches
>>>>> accepted for stage1. I will try my best to address your review comments
>>>>> ASAP.
>>>>
>>>>
>>>> Ok, can you make the whole patch series available so I can poke at the
>>>> implementation a bit?  Please state the revision it was rebased on
>>>> (or point me to a git/svn branch the work resides on).
>>>>
>>>
>>> Thanks. Please find the patched rebated against trunk@229156. I have
>>> skipped the test-case readjustment patches.
>>
>>
>> Some quick observations.  On x86_64 when building
>
>
> Hi Richard,
>
> Thanks for the review.
>
>>
>> short bar (short y);
>> int foo (short x)
>> {
>>    short y = bar (x) + 15;
>>    return y;
>> }
>>
>> with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs)
>> I get
>>
>>    <bb 2>:
>>    _1 = (int) x_10(D);
>>    _2 = (_1) sext (16);
>>    _11 = bar (_2);
>>    _5 = (int) _11;
>>    _12 = (unsigned int) _5;
>>    _6 = _12 & 65535;
>>    _7 = _6 + 15;
>>    _13 = (int) _7;
>>    _8 = (_13) sext (16);
>>    _9 = (_8) sext (16);
>>    return _9;
>>
>> which looks fine but the VRP optimization doesn't trigger for the
>> redundant sext
>> (ranges are computed correctly but the 2nd extension is not removed).
>>
>> This also makes me notice trivial match.pd patterns are missing, like
>> for example
>>
>> (simplify
>>   (sext (sext@2 @0 @1) @3)
>>   (if (tree_int_cst_compare (@1, @3) <= 0)
>>    @2
>>    (sext @0 @3)))
>>
>> as VRP doesn't run at -O1 we must rely on those to remove rendudant
>> extensions,
>> otherwise generated code might get worse compared to without the pass(?)
>
>
> Do you think that we should enable this pass only when vrp is enabled.
> Otherwise, even when we do the simple optimizations you mentioned below, we
> might not be able to remove all the redundancies.
>
>>
>> I also notice that the 'short' argument does not get it's sign-extension
>> removed
>> as redundand either even though we have
>>
>> _1 = (int) x_8(D);
>> Found new range for _1: [-32768, 32767]
>>
>
> I am looking into it.
>
>> In the end I suspect that keeping track of the "simple" cases in the
>> promotion
>> pass itself (by keeping a lattice) might be a good idea (after we fix VRP
>> to do
>> its work).  In some way whether the ABI guarantees promoted argument
>> registers might need some other target hook queries.
>>
>> Now onto the 0002 patch.
>>
>> +static bool
>> +type_precision_ok (tree type)
>> +{
>> +  return (TYPE_PRECISION (type)  == 8
>> +         || TYPE_PRECISION (type) == 16
>> +         || TYPE_PRECISION (type) == 32);
>> +}
>>
>> that's a weird function to me.  You probably want
>> TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))
>> here?  And guard that thing with POINTER_TYPE_P || INTEGRAL_TYPE_P?
>>
>
> I will change this. (I have a patch which I am testing with other changes
> you have asked for)
>
>
>> +/* Return the promoted type for TYPE.  */
>> +static tree
>> +get_promoted_type (tree type)
>> +{
>> +  tree promoted_type;
>> +  enum machine_mode mode;
>> +  int uns;
>> +  if (POINTER_TYPE_P (type)
>> +      || !INTEGRAL_TYPE_P (type)
>> +      || !type_precision_ok (type))
>> +    return type;
>> +
>> +  mode = TYPE_MODE (type);
>> +#ifdef PROMOTE_MODE
>> +  uns = TYPE_SIGN (type);
>> +  PROMOTE_MODE (mode, uns, type);
>> +#endif
>> +  uns = TYPE_SIGN (type);
>> +  promoted_type = lang_hooks.types.type_for_mode (mode, uns);
>> +  if (promoted_type
>> +      && (TYPE_PRECISION (promoted_type) > TYPE_PRECISION (type)))
>> +    type = promoted_type;
>>
>> I think what you want to verify is that TYPE_PRECISION (promoted_type)
>> == GET_MODE_PRECISION (mode).
>> And to not even bother with this simply use
>>
>> promoted_type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
>> uns);
>>
>
> I am changing this too.
>
>> You use a domwalk but also might create new basic-blocks during it
>> (insert_on_edge_immediate), that's a
>> no-no, commit edge inserts after the domwalk.
>
>
> I am sorry, I dont understand "commit edge inserts after the domwalk" Is
> there a way to do this in the current implementation?


Yes, simply use gsi_insert_on_edge () and after the domwalk is done do
gsi_commit_edge_inserts ().

>> ssa_sets_higher_bits_bitmap looks unused and
>> we generally don't free dominance info, so please don't do that.
>>
>> I fired off a bootstrap on ppc64-linux which fails building stage1 libgcc
>> with
>>
>> /abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/
>> -B/usr/local/powerpc64-unknown-linux-gnu/bin/
>> -B/usr/local/powerpc64-unknown-linux-gnu/lib/ -isystem
>> /usr/local/powerpc64-unknown-linux-gnu/include -isystem
>> /usr/local/powerpc64-unknown-linux-gnu/sys-include    -g -O2 -O2  -g
>> -O2 -DIN_GCC    -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual
>> -Wno-format -Wstrict-prototypes -Wmissing-prototypes
>> -Wold-style-definition  -isystem ./include   -fPIC -mlong-double-128
>> -mno-minimal-toc -g -DIN_LIBGCC2 -fbuilding-libgcc
>> -fno-stack-protector   -fPIC -mlong-double-128 -mno-minimal-toc -I.
>> -I. -I../.././gcc -I../../../trunk/libgcc -I../../../trunk/libgcc/.
>> -I../../../trunk/libgcc/../gcc -I../../../trunk/libgcc/../include
>> -I../../../trunk/libgcc/../libdecnumber/dpd
>> -I../../../trunk/libgcc/../libdecnumber -DHAVE_CC_TLS  -o _divdi3.o
>> -MT _divdi3.o -MD -MP -MF _divdi3.dep -DL_divdi3 -c
>> ../../../trunk/libgcc/libgcc2.c \
>>            -fexceptions -fnon-call-exceptions -fvisibility=hidden
>> -DHIDE_EXPORTS
>> In file included from ../../../trunk/libgcc/libgcc2.c:56:0:
>> ../../../trunk/libgcc/libgcc2.c: In function ‘__divti3’:
>> ../../../trunk/libgcc/libgcc2.h:193:20: internal compiler error: in
>> expand_debug_locations, at cfgexpand.c:5277
>>
>
> I am testing on gcc computefarm. I will get it to bootstrap and will do the
> regression testing before posting the next version.
>
>> as hinted at above a bootstrap on i?86 (yes, 32bit) with
>> --with-tune=pentiumpro might be another good testing candidate.
>>
>> +      FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_USE |
>> SSA_OP_DEF)
>> +       promote_def_and_uses (def);
>>
>> it looks like you are doing some redundant work by walking both defs
>> and uses of each stmt.  I'd say you should separate
>> def and use processing and use
>>
>>    FOR_EACH_SSA_USE_OPERAND (use, stmt, iter, SSA_OP_USE)
>>      promote_use (use);
>>    FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF)
>>      promote_def (def);
>>
>
> Name promote_def_and_uses in my implementation is a bit confusing. It is
> promoting the SSA_NAMEs. We only have to do that for the definitions if we
> can do the SSA_NAMEs defined by parameters.
>
> I also have a bitmap to see if we have promoted a variable and avoid doing
> it again. I will try to improve this.
>
>
>
>> this should make processing more efficient (memory local) compared to
>> doing the split handling
>> in promote_def_and_uses.
>>
>> I think it will be convenient to have a SSA name info structure where
>> you can remember the original
>> type a name was promoted from as well as whether it was promoted or
>> not.  This way adjusting
>> debug uses should be "trivial":
>>
>> +static unsigned int
>> +fixup_uses (tree use, tree promoted_type, tree old_type)
>> +{
>> +  gimple *stmt;
>> +  imm_use_iterator ui;
>> +  gimple_stmt_iterator gsi;
>> +  use_operand_p op;
>> +
>> +  FOR_EACH_IMM_USE_STMT (stmt, ui, use)
>> +    {
>> +      bool do_not_promote = false;
>> +      switch (gimple_code (stmt))
>> +       {
>> +       case GIMPLE_DEBUG:
>> +           {
>> +             gsi = gsi_for_stmt (stmt);
>> +             gsi_remove (&gsi, true);
>>
>> rather than doing the above you'd do sth like
>>
>>    SET_USE (use, fold_convert (old_type, new_def));
>>    update_stmt (stmt);
>>
>
> We do have these information (original type a name was promoted from as well
> as whether it was promoted or not). To make it easy to review, in the patch
> that adds the pass,I am removing these debug stmts. But in patch 4, I am
> trying to handle this properly. Maybe  I should combine them.

Yeah, it's a bit confusing otherwise.

>> note that while you may not be able to use promoted regs at all uses
>> (like calls or asms) you can promote all defs, if only with a compensation
>> statement after the original def.  The SSA name info struct can be used
>> to note down the actual SSA name holding the promoted def.
>>
>> The pass looks a lot better than last time (it's way smaller!) but
>> still needs some
>> improvements.  There are some more fishy details with respect to how you
>> allocate/change SSA names but I think those can be dealt with once the
>> basic structure looks how I like it to be.
>>
>
> I will post an updated patch in a day or two.

Thanks,
Richard.

> Thanks again,
> Kugan

Re: [0/7] Type promotion pass and elimination of zext/sext

Reply via email to