Re: [PATCH, ARM] correctly encode the CC reg data flow

Bernd Edlinger Wed, 06 Sep 2017 06:02:14 -0700

On 09/06/17 14:51, Richard Earnshaw (lists) wrote:
> On 06/09/17 13:44, Bernd Edlinger wrote:
>> On 09/04/17 21:54, Bernd Edlinger wrote:
>>> Hi Kyrill,
>>>
>>> Thanks for your review!
>>>
>>>
>>> On 09/04/17 15:55, Kyrill Tkachov wrote:
>>>> Hi Bernd,
>>>>
>>>> On 18/01/17 15:36, Bernd Edlinger wrote:
>>>>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>>>>> On 01/13/17 17:10, Bernd Edlinger wrote:
>>>>>>> On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
>>>>>>>> On 18/12/16 12:58, Bernd Edlinger wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> this is related to PR77308, the follow-up patch will depend on this
>>>>>>>>> one.
>>>>>>>>>
>>>>>>>>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>>>>>>>>> before reload, a mis-compilation in libgcc function
>>>>>>>>> __gnu_satfractdasq
>>>>>>>>> was discovered, see [1] for more details.
>>>>>>>>>
>>>>>>>>> The reason seems to be that when the *arm_cmpdi_insn is directly
>>>>>>>>> followed by a *arm_cmpdi_unsigned instruction, both are split
>>>>>>>>> up into this:
>>>>>>>>>
>>>>>>>>>      [(set (reg:CC CC_REGNUM)
>>>>>>>>>            (compare:CC (match_dup 0) (match_dup 1)))
>>>>>>>>>       (parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>>                       (compare:CC (match_dup 3) (match_dup 4)))
>>>>>>>>>                  (set (match_dup 2)
>>>>>>>>>                       (minus:SI (match_dup 5)
>>>>>>>>>                                (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>>>> (const_int
>>>>>>>>> 0))))])]
>>>>>>>>>
>>>>>>>>>      [(set (reg:CC CC_REGNUM)
>>>>>>>>>            (compare:CC (match_dup 2) (match_dup 3)))
>>>>>>>>>       (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>>>>>>>>>                  (set (reg:CC CC_REGNUM)
>>>>>>>>>                       (compare:CC (match_dup 0) (match_dup 1))))]
>>>>>>>>>
>>>>>>>>> The problem is that the reg:CC from the *subsi3_carryin_compare
>>>>>>>>> is not mentioning that the reg:CC is also dependent on the reg:CC
>>>>>>>>> from before.  Therefore the *arm_cmpsi_insn appears to be
>>>>>>>>> redundant and thus got removed, because the data values are
>>>>>>>>> identical.
>>>>>>>>>
>>>>>>>>> I think that applies to a number of similar pattern where data
>>>>>>>>> flow is happening through the CC reg.
>>>>>>>>>
>>>>>>>>> So this is a kind of correctness issue, and should be fixed
>>>>>>>>> independently from the optimization issue PR77308.
>>>>>>>>>
>>>>>>>>> Therefore I think the patterns need to specify the true
>>>>>>>>> value that will be in the CC reg, in order for cse to
>>>>>>>>> know what the instructions are really doing.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>>> Is it OK for trunk?
>>>>>>>>>
>>>>>>>> I agree you've found a valid problem here, but I have some issues
>>>>>>>> with
>>>>>>>> the patch itself.
>>>>>>>>
>>>>>>>>
>>>>>>>> (define_insn_and_split "subdi3_compare1"
>>>>>>>>     [(set (reg:CC_NCV CC_REGNUM)
>>>>>>>>       (compare:CC_NCV
>>>>>>>>         (match_operand:DI 1 "register_operand" "r")
>>>>>>>>         (match_operand:DI 2 "register_operand" "r")))
>>>>>>>>      (set (match_operand:DI 0 "register_operand" "=&r")
>>>>>>>>       (minus:DI (match_dup 1) (match_dup 2)))]
>>>>>>>>     "TARGET_32BIT"
>>>>>>>>     "#"
>>>>>>>>     "&& reload_completed"
>>>>>>>>     [(parallel [(set (reg:CC CC_REGNUM)
>>>>>>>>              (compare:CC (match_dup 1) (match_dup 2)))
>>>>>>>>             (set (match_dup 0) (minus:SI (match_dup 1) (match_dup
>>>>>>>> 2)))])
>>>>>>>>      (parallel [(set (reg:CC_C CC_REGNUM)
>>>>>>>>              (compare:CC_C
>>>>>>>>                (zero_extend:DI (match_dup 4))
>>>>>>>>                (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>>                     (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>>             (set (match_dup 3)
>>>>>>>>              (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>>                    (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0))))])]
>>>>>>>>
>>>>>>>>
>>>>>>>> This pattern is now no-longer self consistent in that before the
>>>>>>>> split
>>>>>>>> the overall result for the condition register is in mode CC_NCV, but
>>>>>>>> afterwards it is just CC_C.
>>>>>>>>
>>>>>>>> I think CC_NCV is correct mode (the N, C and V bits all correctly
>>>>>>>> reflect the result of the 64-bit comparison), but that then
>>>>>>>> implies that
>>>>>>>> the cc mode of subsi3_carryin_compare is incorrect as well and
>>>>>>>> should in
>>>>>>>> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to
>>>>>>>> agree
>>>>>>>> that CC_NCV is the correct mode for this operation
>>>>>>>>
>>>>>>>> I'm not sure if there are other consequences that will fall out from
>>>>>>>> fixing this (it's possible that we might need a change to
>>>>>>>> select_cc_mode
>>>>>>>> as well).
>>>>>>>>
>>>>>>> Yes, this is still a bit awkward...
>>>>>>>
>>>>>>> The N and V bit will be the correct result for the subdi3_compare1
>>>>>>> a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
>>>>>>> only gets the C bit correct, the expression for N and V is a different
>>>>>>> one.
>>>>>>>
>>>>>>> It probably works, because the subsi3_carryin_compare instruction sets
>>>>>>> more CC bits than the pattern does explicitly specify the value.
>>>>>>> We know the subsi3_carryin_compare also computes the NV bits, but
>>>>>>> it is
>>>>>>> hard to write down the correct rtl expression for it.
>>>>>>>
>>>>>>> In theory the pattern should describe everything correctly,
>>>>>>> maybe, like:
>>>>>>>
>>>>>>> set (reg:CC_C CC_REGNUM)
>>>>>>>       (compare:CC_C
>>>>>>>         (zero_extend:DI (match_dup 4))
>>>>>>>         (plus:DI (zero_extend:DI (match_dup 5))
>>>>>>>                  (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>> set (reg:CC_NV CC_REGNUM)
>>>>>>>       (compare:CC_NV
>>>>>>>        (match_dup 4))
>>>>>>>        (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM)
>>>>>>> (const_int 0)))
>>>>>>> set (match_dup 3)
>>>>>>>       (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>>>>>>>                 (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>>>>
>>>>>>>
>>>>>>> But I doubt that will work to set CC_REGNUM with two different modes
>>>>>>> in parallel?
>>>>>>>
>>>>>>> Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
>>>>>>> defines C from the DImode result, and NV from the SImode result,
>>>>>>> similar to the CC_NOOVmode, that also leaves something open what
>>>>>>> bits it really defines?
>>>>>>>
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Bernd.
>>>>>> I think maybe the right solution is to invent a new CCmode
>>>>>> that defines C as if the comparison is done in DImode
>>>>>> but N and V as if the comparison is done in SImode.
>>>>>>
>>>>>> I thought maybe I would call it CC_NCV_CIC (CIC = Carry-In-Compare),
>>>>>> furthermore I think the CC_NOOV should be renamed to CC_NZ (because
>>>>>> only N and Z are set correctly), but in a different patch of course.
>>>>>>
>>>>>> Attached is a new version that implements the new CCmode.
>>>>>>
>>>>>> How do you like this new version?
>>>>>>
>>>>>> It seems to be able to build a cross-compiler at least.
>>>>>>
>>>>>> I will start a new bootstrap with this new patch, but that can take
>>>>>> some
>>>>>> time until I have definitive results.
>>>>>>
>>>>>> Is there still a chance that it can go into gcc-7 or should it wait
>>>>>> for the next stage1?
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.
>>>>>
>>>>> I thought I should also look at where the subdi_compare1 amd the
>>>>> negdi2_compare patterns are used, and look if the caller is fine with
>>>>> not having all CC bits available.
>>>>>
>>>>> And indeed usubv<mode>4 turns out to be questionabe, because it
>>>>> emits gen_sub<mode>3_compare1 and uses arm_gen_unlikely_cbranch (LTU,
>>>>> CCmode) which is inconsistent when subdi3_compare1 no longer uses
>>>>> CCmode.
>>>>>
>>>>> To correct this, the branch should use CC_Cmode which is always defined.
>>>>>
>>>>> So I tried to test this pattern, with the following test programs,
>>>>> and found that the code actually improves when the branch uses CC_Cmode
>>>>> instead of CCmode, both for SImode and DImode data, which was a bit
>>>>> surprising.
>>>>>
>>>>> I used this test program to see how the usubv<mode>4 pattern works:
>>>>>
>>>>> cat test.c (DImode)
>>>>> unsigned long long x, y, z;
>>>>> int b;
>>>>> void test()
>>>>> {
>>>>>      b = __builtin_sub_overflow (y,z, &x);
>>>>> }
>>>>>
>>>>>
>>>>> unpatched code used 8 byte more stack than patched,
>>>>> because the DImode subtraction is effectively done twice.
>>>>>
>>>>> cat test1.c (SImode)
>>>>> unsigned long x, y, z;
>>>>> int b;
>>>>> void test()
>>>>> {
>>>>>      b = __builtin_sub_overflow (y,z, &x);
>>>>> }
>>>>>
>>>>> which generates (unpatched):
>>>>>            cmp     r3, r0
>>>>>            sub     ip, r3, r0
>>>>>
>>>>> instead of expected (patched):
>>>>>      subs    r3, r3, r2
>>>>>
>>>>>
>>>>> The condition is extracted by ifconversion and/or combine
>>>>> and complicates the resulting code instead of simplifying.
>>>>>
>>>>> I think this happens only when the branch and the subsi/di3_compare1
>>>>> is using the same CC mode.
>>>>>
>>>>> That does not happen when the CC modes disagree, as with the
>>>>> proposed patch.  All other uses of the pattern are already using
>>>>> CC_Cmode or CC_Vmode in the branch, and these do not change.
>>>>>
>>>>> Attached is an updated version of the patch, that happens to
>>>>> improve the code generation of the usubsi4 and usubdi4 pattern,
>>>>> as a side effect.
>>>>>
>>>>>
>>>>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>>>>> Is it OK for trunk?
>>>>
>>>> I'm very sorry it has taken so long to review.
>>>> I've been ramping up on the context recently now so I'll try to move
>>>> this along...
>>>>
>>>> This patch looks mostly ok to me from reading the patterns and the
>>>> discussion around it.
>>>> I have one concern:
>>>>
>>>>
>>>>    (define_insn_and_split "negdi2_compare"
>>>> -  [(set (reg:CC CC_REGNUM)
>>>> -    (compare:CC
>>>> +  [(set (reg:CC_NCV CC_REGNUM)
>>>> +    (compare:CC_NCV
>>>>          (const_int 0)
>>>>          (match_operand:DI 1 "register_operand" "0,r")))
>>>>       (set (match_operand:DI 0 "register_operand" "=r,&r")
>>>> @@ -4647,8 +4650,12 @@
>>>>               (compare:CC (const_int 0) (match_dup 1)))
>>>>              (set (match_dup 0) (minus:SI (const_int 0)
>>>>                           (match_dup 1)))])
>>>> -   (parallel [(set (reg:CC CC_REGNUM)
>>>> -           (compare:CC (const_int 0) (match_dup 3)))
>>>> +   (parallel [(set (reg:CC_NCV_CIC CC_REGNUM)
>>>> +           (compare:CC_NCV_CIC
>>>> +             (const_int 0)
>>>> +             (plus:DI
>>>> +               (zero_extend:DI (match_dup 3))
>>>> +               (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)))))
>>>>             (set (match_dup 2)
>>>>              (minus:SI
>>>>               (minus:SI (const_int 0) (match_dup 3))
>>>>
>>>>
>>>> I was somewhat concerned with having the first operand of the COMPARE
>>>> being a const_int 0 and the second being
>>>> a complex expression as the RTL canonicalization rules usually require
>>>> the complex operand going first if possible.
>>>> Reading the RTL rules in rtl.texi I see it says this:
>>>> "If one of the operands is a constant, it should be placed in the
>>>> second operand and the comparison code adjusted as appropriate."
>>>> So it seems that the pre-existing pattern that puts const_int 0 as the
>>>> first operand already breaks that rule.
>>>> I think we should fix that and update the use of condition code to a
>>>> GEU rather than LTU as well.
>>>>
>>>
>>
>> Well, the sentence before that one is even more explicit:
>>
>> "Normally, @var{x} and @var{y} must have the same mode.  Otherwise,
>> @code{compare} is valid only if the mode of @var{x} is in class
>> @code{MODE_INT} and @var{y} is a @code{const_int} or
>> @code{const_double} with mode @code{VOIDmode}."
>>
>> So because the const_int 0 has VOIDmode the comparison is done
>> in y-mode not x-mode.
>>
>> But unfortunately I see no way how to accomplish this,
>> because this assumes that the compare can be easily swapped
>> if the conditional instruction just uses one of GT/GE/LE/LT
>> or GTU/GEU/LEU/LTU.  But that is only the case for plain CCmode.
>>
>> And in this example we ask for "overflow", but while 0-X can
>> overflow X-0 simply can't.  And moreover there are non-symmetric
>> modes like CC_NCVmode which only support LT/GE/LTU/GEU but not
>> the swapped conditions GT/LE/GTU/LEU.
>>
>> I think the only solution would be to adjust the spec to
>> reflect the implementation:
>>
>> Index: rtl.texi
>> ===================================================================
>> --- rtl.texi (revision 251752)
>> +++ rtl.texi (working copy)
>> @@ -2252,6 +2252,13 @@
>>    If one of the operands is a constant, it should be placed in the
>>    second operand and the comparison code adjusted as appropriate.
>>
>> +There may be exceptions from this rule if the mode @var{m} carries
>> +not enough information for the swapped comparison operator, or
> 
> There may be exceptions _to_ ... if mode @var{m} does not carry enough...
> 
>> +if we ask for overflow from the subtraction.
> 
> Aren't we really trying to 'detect overflow' rather than 'ask' for it?
>


Yes :-), of course: it is used this way in a pattern that does
a negation and a conditional branch if a overflow is detected:

    emit_insn (gen_negdi2_compare (operands[0], operands[1]));
    arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);



>> That means, while
>> +0-X may overfow X-0 can never overflow.  Under these conditions
>> +a compare may have the constant expression at the left side.
> 
> In these circumstances the constant will be in the first operand .
> 
> (left and right don't really make sense for RTL).

Yes, thanks.  Corrected paragraph below:

Index: rtl.texi
===================================================================
--- rtl.texi    (Revision 251752)
+++ rtl.texi    (Arbeitskopie)
@@ -2252,6 +2252,13 @@
  If one of the operands is a constant, it should be placed in the
  second operand and the comparison code adjusted as appropriate.

+There may be exceptions from this rule if the mode @var{m} carries
+not enough information for the swapped comparison operator, or
+if we try to detect overflow from the subtraction.  That means, while
+0-X may overfow X-0 can never overflow.  Under these conditions
+a compare may have the constant expression at the first operand.
+Examples are the ARM negdi2_compare pattern and similar.
+
  A @code{compare} specifying two @code{VOIDmode} constants is not valid
  since there is no way to know in what mode the comparison is to be
  performed; the comparison must either be folded during the compilation


>> +Examples are the ARM negdi2_compare pattern and similar.
>> +
>>    A @code{compare} specifying two @code{VOIDmode} constants is not valid
>>    since there is no way to know in what mode the comparison is to be
>>    performed; the comparison must either be folded during the compilation
>>
>>
>>
>> Please advise.
>>
>> Thanks
>> Bernd.
>>
>>
>>>
>>> Hmmm...
>>>
>>> I think the compare is not a commutative operation, and swapping
>>> the arguments will imply a different value in the flags.
>>>
>>> So if I write
>>> (set (reg:CC_NCV CC_REGNUM)
>>>        (compare:CC_NCV
>>>          (const_int 0)
>>>          (reg:DI 123)))
>>>
>>> I have C,N,V set to the result of (0 - r123), C = usable for LTU or GEU,
>>> N,V = usable for LT, GE
>>>
>>> But if I write
>>> (set (reg:CC_NCV CC_REGNUM)
>>>        (compare:CC_NCV
>>>          (reg:DI 123)
>>>          (const_int 0)))
>>>
>>> I have C,N,V set to the result of (r123 - 0), but the expansion stays
>>> the same and the actual value in the flags is defined by the expansion.
>>> Of course there exists probably no matching expansion for that.
>>>
>>> Note that both LTU in the above hunk are in a parallel-stmt and operate
>>> on the flags from the previous pattern, so changing these to GEU
>>> will probably be wrong.
>>>
>>> Both (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)) in the negdi2_compare
>>> use the flags from the previous (set (reg:CC CC_REGNUM) (compare:CC
>>> (const_int 0) (match_dup 1)).
>>>
>>> One use of the resulting flags (I know of) is in negvdi3 where we
>>> have:
>>>
>>>     emit_insn (gen_negdi2_compare (operands[0], operands[1]));
>>>     arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
>>>
>>> I think only 0-x can overflow while x-0 can never overflow.
>>>
>>> Of course the CC_NCV_CIC mode bends the definition of the RTL compare
>>> a lot and I guess if this pattern is created by a splitter, this can
>>> only be expanded by an exactly matching pattern, there is (hopefully)
>>> no way how combine could mess with this pattern due to the exotic
>>> CCmode.  So while I think it would work to swap only the notation of
>>> all CC_NCV_CIC patterns, _without_ changing the assembler-parts and the
>>> consuming statements, that would make it quite hard to follow for the
>>> human reader at least.
>>>
>>> What do you think?
>>>
>>>
>>> Bernd.
>

Re: [PATCH, ARM] correctly encode the CC reg data flow

Reply via email to