On Fri, Nov 29, 2013 at 12:46 PM, Yufeng Zhang <[email protected]> wrote:
> On 11/29/13 07:52, Bin.Cheng wrote:
>>
>> On Thu, Nov 28, 2013 at 8:06 PM, Bin.Cheng<[email protected]> wrote:
>>>
>>> On Thu, Nov 28, 2013 at 6:48 PM, Richard Earnshaw<[email protected]>
>>> wrote:
>>>>
>>>> On 18/09/13 10:15, bin.cheng wrote:
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: [email protected] [mailto:gcc-patches-
>>>>>> [email protected]] On Behalf Of bin.cheng
>>>>>> Sent: Monday, September 02, 2013 3:09 PM
>>>>>> To: Richard Earnshaw
>>>>>> Cc: [email protected]
>>>>>> Subject: RE: [PATCH ARM]Refine scaled address expression on ARM
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Richard Earnshaw
>>>>>>> Sent: Thursday, August 29, 2013 9:06 PM
>>>>>>> To: Bin Cheng
>>>>>>> Cc: [email protected]
>>>>>>> Subject: Re: [PATCH ARM]Refine scaled address expression on ARM
>>>>>>>
>>>>>>> On 28/08/13 08:00, bin.cheng wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This patch refines scaled address expression on ARM. It supports
>>>>>>>> "base+index*scale" in arm_legitimate_address_outer_p. It also tries
>>>>>>>> to legitimize "base + index * scale + offset" with "reg<- base +
>>>>>>>> offset; reg
>>>>>>>> + index * scale" by introducing thumb2_legitimize_address. For now
>>>>>>>> + function
>>>>>>>> thumb2_legitimize_address is a kind of placeholder and just does the
>>>>>>>> mentioned transformation by calling to try_multiplier_address.
>>>>>>>> Hoping we can improve it in the future.
>>>>>>>>
>>>>>>>> With this patch:
>>>>>>>> 1) "base+index*scale" is recognized.
>>>>>>>
>>>>>>>
>>>>>>> That's because (PLUS (REG) (MULT (REG) (CONST))) is not canonical
>>>>>>> form.
>>>>>>> So this shouldn't be necessary. Can you identify where this
>>>>>>
>>>>>> non-canoncial form is being generated?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Oh, for now ivopt constructs "index*scale" to test whether backend
>>>>>> supports scaled addressing mode, which is not valid on ARM, so I was
>>>>>> going
>>>>>> to construct "base + index*scale" instead. Since "base + index *
>>>>>> scale"
>>>>>
>>>>> is not
>>>>>>
>>>>>> canonical form, I will construct the canonical form and drop this part
>>>>>> of
>>>>>
>>>>> the
>>>>>>
>>>>>> patch.
>>>>>>
>>>>>> Is rest of this patch OK?
>>>>>>
>>>>> Hi Richard, I removed the part over which you concerned and created
>>>>> this
>>>>> updated patch.
>>>>>
>>>>> Is it OK?
>>>>>
>>>>> Thanks.
>>>>> bin
>>>>>
>>>>> 2013-09-18 Bin Cheng<[email protected]>
>>>>>
>>>>> * config/arm/arm.c (try_multiplier_address): New function.
>>>>> (thumb2_legitimize_address): New function.
>>>>> (arm_legitimize_address): Call try_multiplier_address and
>>>>> thumb2_legitimize_address.
>>>>>
>>>>>
>>>>> 6-arm-scaled_address-20130918.txt
>>>>>
>>>>>
>>>>> Index: gcc/config/arm/arm.c
>>>>> ===================================================================
>>>>> --- gcc/config/arm/arm.c (revision 200774)
>>>>> +++ gcc/config/arm/arm.c (working copy)
>>>>> @@ -6652,6 +6654,106 @@ legitimize_tls_address (rtx x, rtx reg)
>>>>> }
>>>>> }
>>>>>
>>>>> +/* Try to find address expression like base + index * scale + offset
>>>>> + in X. If we find one, force base + offset into register and
>>>>> + construct new expression reg + index * scale; return the new
>>>>> + address expression if it's valid. Otherwise return X. */
>>>>> +static rtx
>>>>> +try_multiplier_address (rtx x, enum machine_mode mode
>>>>> ATTRIBUTE_UNUSED)
>>>>> +{
>>>>> + rtx tmp, base_reg, new_rtx;
>>>>> + rtx base = NULL_RTX, index = NULL_RTX, scale = NULL_RTX, offset =
>>>>> NULL_RTX;
>>>>> +
>>>>> + gcc_assert (GET_CODE (x) == PLUS);
>>>>> +
>>>>> + /* Try to find and record base/index/scale/offset in X. */
>>>>> + if (GET_CODE (XEXP (x, 1)) == MULT)
>>>>> + {
>>>>> + tmp = XEXP (x, 0);
>>>>> + index = XEXP (XEXP (x, 1), 0);
>>>>> + scale = XEXP (XEXP (x, 1), 1);
>>>>> + if (GET_CODE (tmp) != PLUS)
>>>>> + return x;
>>>>> +
>>>>> + base = XEXP (tmp, 0);
>>>>> + offset = XEXP (tmp, 1);
>>>>> + }
>>>>> + else
>>>>> + {
>>>>> + tmp = XEXP (x, 0);
>>>>> + offset = XEXP (x, 1);
>>>>> + if (GET_CODE (tmp) != PLUS)
>>>>> + return x;
>>>>> +
>>>>> + base = XEXP (tmp, 0);
>>>>> + scale = XEXP (tmp, 1);
>>>>> + if (GET_CODE (base) == MULT)
>>>>> + {
>>>>> + tmp = base;
>>>>> + base = scale;
>>>>> + scale = tmp;
>>>>> + }
>>>>> + if (GET_CODE (scale) != MULT)
>>>>> + return x;
>>>>> +
>>>>> + index = XEXP (scale, 0);
>>>>> + scale = XEXP (scale, 1);
>>>>> + }
>>>>> +
>>>>> + if (CONST_INT_P (base))
>>>>> + {
>>>>> + tmp = base;
>>>>> + base = offset;
>>>>> + offset = tmp;
>>>>> + }
>>>>> +
>>>>> + if (CONST_INT_P (index))
>>>>> + {
>>>>> + tmp = index;
>>>>> + index = scale;
>>>>> + scale = tmp;
>>>>> + }
>>>>> +
>>>>> + /* ARM only supports constant scale in address. */
>>>>> + if (!CONST_INT_P (scale))
>>>>> + return x;
>>>>> +
>>>>> + if (GET_MODE (base) != SImode || GET_MODE (index) != SImode)
>>>>> + return x;
>>>>> +
>>>>> + /* Only register/constant are allowed in each part. */
>>>>> + if (!symbol_mentioned_p (base)
>>>>> +&& !symbol_mentioned_p (offset)
>>>>> +&& !symbol_mentioned_p (index)
>>>>> +&& !symbol_mentioned_p (scale))
>>>>> + {
>>>>
>>>>
>>>> It would be easier to do this at the top of the function --
>>>> if (symbol_mentioned_p (x))
>>>> return x;
>>>>
>>>>
>>>>> + /* Force "base+offset" into register and construct
>>>>> + "register+index*scale". Return the new expression
>>>>> + only if it's valid. */
>>>>> + tmp = gen_rtx_PLUS (SImode, base, offset);
>>>>> + base_reg = force_reg (SImode, tmp);
>>>>> + tmp = gen_rtx_fmt_ee (MULT, SImode, index, scale);
>>>>> + new_rtx = gen_rtx_PLUS (SImode, base_reg, tmp);
>>>>> + return new_rtx;
>>>>
>>>>
>>>> I can't help thinking that this is backwards. That is, you want to
>>>> split out the mult expression and use offset addressing in the addresses
>>>> itself. That's likely to lead to either better CSE, or more induction
>>>
>>> Thanks to your review.
>>>
>>> Actually base+offset is more likely loop invariant than scaled
>>> expression, just as reported in pr57540. The bug is observed in
>>> spec2k bzip/gzip, and hurts arm in hot loops. The loop induction
>>> variable doesn't matter that much comparing to invariant because we
>>> are in RTL now.
>>>
>>>> vars. Furthermore, scaled addressing modes are likely to be more
>>>> expensive than simple offset addressing modes on at least some cores.
>>>
>>> The purpose is to CSE (as you pointed out) or hoist loop invariant.
>>> As for addressing mode is concerned, Though we may guide the
>>> transformation once we have reliable address cost mode, we don't have
>>> the information if base+offset is invariant, so it's difficult to
>>> handle in backend, right?
>>>
>>> What do you think about this?
>>>
>>
>> Additionally, there is no induction variable issue here. The memory
>> reference we are facing during expand are not lowered by gimple IVOPT,
>> which means either it's outside loop, or doesn't have induction
>> variable address.
>>
>> Generally, there are three passes which lowers memory reference:
>> gimple strength reduction picks opportunity outside loop; gimple IVOPT
>> handles references with induction variable addresses; references not
>> handled by SLSR/IVOPT are handled by RTL expand, which makes bad
>> choice of address re-association. I think Yufeng also noticed the
>> problem and are trying with patch like:
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02878.html
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03339.html
>
>
> Yes, when it comes to addressing expression, the re-association in RTL
> expand is quite sensitive to the available address modes on the target and
> its address legitimization routines. Both patches I proposed try to make
> the RTL expansion more canonicalized on addressing expressions, especially
> on ARM. It is done by essentially enabling simplify_gen_binary () to work
> on a bigger RTX node.
>
>
>> After thinking twice, I some kind of think we should not re-associate
>> addresses during expanding, because of lacking of context information.
>> Take base + scaled_index + offset as an example in PR57540, we just
>> don't know if "base+offset" is loop invariant from either backend or
>> RTL expander.
>
>
> I'm getting less convinced by re-associating base with offset
> unconditionally. One counter example is
>
> typedef int arr_1[20];
> void foo (arr_1 a1, int i)
> {
> a1[i+10] = 1;
> }
>
> I'm experimenting a patch to get the immediate offset in the above example
> to be the last addend in the address computing (as mentioned in
> http://gcc.gnu.org/ml/gcc/2013-11/msg00581.html), aiming to get the
> following code-gen:
>
> add r1, r0, r1, asl #2
> mov r3, #1
> str r3, [r1, #40]
>
> With your patch applied, the effort will be reverted to
>
> add r0, r0, #40
> mov r3, #1
> str r3, [r0, r1, asl #2]
>
>
> I briefly looked into PR57540. I noticed that you are trying to tackle a
> loop invariant in a hot loop:
>
> .L5:
> add lr, sp, #2064 ////loop invariant
> add r2, r2, #1
> add r3, lr, r3, asl #2
> ldr r3, [r3, #-2064]
> cmp r3, #0
> bge .L5
> uxtb r2, r2
Why does RTL invariant motion not move it?
Richard.
> Looking into the example code:
>
> void
> foo ()
> {
> int parent [ 258 * 2 ];
> for (i = 1; i <= alphaSize; i++)
> {
> while (parent[k] >= 0)
> {
> k = parent[k];
> ...
> }
> ...
>
> The loop invariant is part of address computing for a stack variable. The
> immediate 2064 is the offset of the variable on the stack frame rather than
> what we normally expect, e.g. part of indexing; it is a back-end specific
> issue and there is nothing the mid-end can do (the mem_ref lowering cannot
> help either). One possible solution may be to force the base address of an
> array on stack to a REG, before the RTL expand does anything 'smart'. Is it
> something you think worth giving a try?
>
> Just my two cents.
>
> Thanks,
> Yufeng
>