> On 18 Nov 2025, at 11:48 pm, Richard Biener <[email protected]> 
> wrote:
>
> External email: Use caution opening links or attachments
>
>
> On Mon, Nov 17, 2025 at 1:03 AM Kugan Vivekanandarajah
> <[email protected]> wrote:
>>
>>
>>
>>> On 23 Oct 2025, at 8:37 pm, Richard Biener <[email protected]> 
>>> wrote:
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Wed, Oct 22, 2025 at 12:12 PM Kugan Vivekanandarajah
>>> <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>>> On 22 Oct 2025, at 7:51 pm, Richard Biener <[email protected]> 
>>>>> wrote:
>>>>>
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> On Tue, Oct 21, 2025 at 11:55 PM Kugan Vivekanandarajah
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Hi Richard,
>>>>>> Thanks for the review.
>>>>>>
>>>>>>> On 16 Oct 2025, at 11:06 pm, Richard Biener 
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> External email: Use caution opening links or attachments
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 15, 2025 at 11:08 PM Kugan Vivekanandarajah
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This patch enables Loop Invariant Code Motion (LICM) to hoist loads 
>>>>>>>> that
>>>>>>>> alias with stores rom the loaded value.
>>>>>>>>
>>>>>>>> The pattern a[i] = a[0] is common in TSVC benchmarks (s293):
>>>>>>>>
>>>>>>>> for (int i = 0; i < N; i++)
>>>>>>>> a[i] = a[0];
>>>>>>>>
>>>>>>>> Previously, GCC conservatively rejected hoisting a[0] due to potential
>>>>>>>> aliasing when i==0. However, this is a "self write" - even when 
>>>>>>>> aliasing
>>>>>>>> occurs, we're writing back the same value, making hoisting safe.
>>>>>>>>
>>>>>>>> The optimization checks that:
>>>>>>>> 1. One reference is a load, the other is a store
>>>>>>>> 2. The stored SSA value equals the loaded SSA value
>>>>>>>> 3. Only simple cases with single accesses per reference
>>>>>>>>
>>>>>>>> This enables vectorization of these patterns by allowing the vectorizer
>>>>>>>> to see the hoisted loop-invariant value.
>>>>>>>>
>>>>>>>> With the patch, the loop now vectorizes and generates:
>>>>>>>>
>>>>>>>>     .L2:
>>>>>>>> -       ldr     s31, [x1]
>>>>>>>> -       str     s31, [x0], 4
>>>>>>>> -       cmp     x0, x2
>>>>>>>> +       str     q31, [x0], 16
>>>>>>>> +       cmp     x0, x1
>>>>>>>>     bne     .L2
>>>>>>>>
>>>>>>>> gcc/ChangeLog:
>>>>>>>
>>>>>>> So refs_independent_p has no particular oder on the arguments it
>>>>>>> receives and both could be stores.  It seems to me the special-casing
>>>>>>> would be better done in ref_indep_loop_p for the kind == lim_raw
>>>>>>> only.  Also
>>>>>>>
>>>>>>> +      if (((ref1->loaded && !ref1->stored && ref2->stored)
>>>>>>> +          || (ref2->loaded && !ref2->stored && ref1->stored))
>>>>>>> +         && is_self_write (ref1, ref2))
>>>>>>> +       {
>>>>>>>
>>>>>>> you seem to imply both orders are handled, but is_self_write only
>>>>>>> handles ref1 being the load.  Handing this upthread makes the
>>>>>>> order obvious.
>>>>>>>
>>>>>>> So can you see to move this up?  Otherwise I think this should be OK.
>>>>>>> I'll note this does not only handle the "obvious" case but also
>>>>>>
>>>>>> In the attached patched, I have moved based on the suggestion.
>>>>>> Bootstrappped and regression tested on aarch64-linux-gnu with no New 
>>>>>> regressions.
>>>>>> Is this OK?
>>>>>
>>>>> It seems you attached the old patch?
>>>>
>>>> Apologies. I sent the wrong patch. Here is the correct one.
>>>
>>> This seems to be an incremental patch of some sorts.  Applying some 
>>> "guessing"
>>> the overall thing looks OK, I'd probably try micro-optimizing this
>>> timing critical
>>> part a bit by pre-computing load_stmt outside of the refs_to_check iteration
>>> and only call is_self_write if that was successful (lim_raw && single
>>> access in loop).
>>>
>>> Please send an updated and complete patch.  You might want to double-check 
>>> it
>>> before posting ;)
>>>
>> Apologies for the mixup again. Here is the patch with added check for 
>> ref1->loaded && ref2->stored
>
> This is the original patch again ....
>
> So NACK.
>
> :/
>
> Please triple-check what you attach.

Oops, Sorry. No excuse for this stupidity from my end. Thanks again for your 
patience.
I have revised it. Hopefully I didn’t make any mistake this time.

Bootstrapped and regression tested on aarch64-linux-gnu with no New

Thanks,
Kugan


>
> Richard.
>
>> Bootstrapped and regression tests on aarch64-linuc-gnu with no new 
>> regressions.
>>
>> Thanks,
>> Kugan
>>
>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks,
>>>> Kugan
>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Kugan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> void foo(int *p, int *q, int j)
>>>>>>> {
>>>>>>> for  (int i =0; i < N; ++i)
>>>>>>> p[i] = q[j];
>>>>>>> }
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>     * tree-ssa-loop-im.cc (is_self_write): New.
>>>>>>>>     (refs_independent_p): Allow hoisting when aliasing references
>>>>>>>>     form a self write pattern.
>>>>>>>>
>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>
>>>>>>>>     * gcc.dg/vect/vect-licm-hoist-1.c: New.
>>>>>>>>     * gcc.dg/vect/vect-licm-hoist-2.c: Likewise.
>>>>>>>>
>>>>>>>> Bootstrappped and regression tested on aarch64-linux-gnu with no New
>>>>>>>> regressions.
>>>>>>>>
>>>>>>>> Is this Ok?
>>>>>>>>
>>>>>>>> Signed-off-by: Kugan Vivekanandarajah <[email protected]>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Kugan


Attachment: 0001-tree-optimization-v3-Allow-LICM-to-hoist-loads-in-se.patch
Description: 0001-tree-optimization-v3-Allow-LICM-to-hoist-loads-in-se.patch

Reply via email to