> On 18 Nov 2025, at 11:48 pm, Richard Biener <[email protected]> > wrote: > > External email: Use caution opening links or attachments > > > On Mon, Nov 17, 2025 at 1:03 AM Kugan Vivekanandarajah > <[email protected]> wrote: >> >> >> >>> On 23 Oct 2025, at 8:37 pm, Richard Biener <[email protected]> >>> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>> On Wed, Oct 22, 2025 at 12:12 PM Kugan Vivekanandarajah >>> <[email protected]> wrote: >>>> >>>> >>>> >>>>> On 22 Oct 2025, at 7:51 pm, Richard Biener <[email protected]> >>>>> wrote: >>>>> >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> On Tue, Oct 21, 2025 at 11:55 PM Kugan Vivekanandarajah >>>>> <[email protected]> wrote: >>>>>> >>>>>> Hi Richard, >>>>>> Thanks for the review. >>>>>> >>>>>>> On 16 Oct 2025, at 11:06 pm, Richard Biener >>>>>>> <[email protected]> wrote: >>>>>>> >>>>>>> External email: Use caution opening links or attachments >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 15, 2025 at 11:08 PM Kugan Vivekanandarajah >>>>>>> <[email protected]> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> This patch enables Loop Invariant Code Motion (LICM) to hoist loads >>>>>>>> that >>>>>>>> alias with stores rom the loaded value. >>>>>>>> >>>>>>>> The pattern a[i] = a[0] is common in TSVC benchmarks (s293): >>>>>>>> >>>>>>>> for (int i = 0; i < N; i++) >>>>>>>> a[i] = a[0]; >>>>>>>> >>>>>>>> Previously, GCC conservatively rejected hoisting a[0] due to potential >>>>>>>> aliasing when i==0. However, this is a "self write" - even when >>>>>>>> aliasing >>>>>>>> occurs, we're writing back the same value, making hoisting safe. >>>>>>>> >>>>>>>> The optimization checks that: >>>>>>>> 1. One reference is a load, the other is a store >>>>>>>> 2. The stored SSA value equals the loaded SSA value >>>>>>>> 3. Only simple cases with single accesses per reference >>>>>>>> >>>>>>>> This enables vectorization of these patterns by allowing the vectorizer >>>>>>>> to see the hoisted loop-invariant value. >>>>>>>> >>>>>>>> With the patch, the loop now vectorizes and generates: >>>>>>>> >>>>>>>> .L2: >>>>>>>> - ldr s31, [x1] >>>>>>>> - str s31, [x0], 4 >>>>>>>> - cmp x0, x2 >>>>>>>> + str q31, [x0], 16 >>>>>>>> + cmp x0, x1 >>>>>>>> bne .L2 >>>>>>>> >>>>>>>> gcc/ChangeLog: >>>>>>> >>>>>>> So refs_independent_p has no particular oder on the arguments it >>>>>>> receives and both could be stores. It seems to me the special-casing >>>>>>> would be better done in ref_indep_loop_p for the kind == lim_raw >>>>>>> only. Also >>>>>>> >>>>>>> + if (((ref1->loaded && !ref1->stored && ref2->stored) >>>>>>> + || (ref2->loaded && !ref2->stored && ref1->stored)) >>>>>>> + && is_self_write (ref1, ref2)) >>>>>>> + { >>>>>>> >>>>>>> you seem to imply both orders are handled, but is_self_write only >>>>>>> handles ref1 being the load. Handing this upthread makes the >>>>>>> order obvious. >>>>>>> >>>>>>> So can you see to move this up? Otherwise I think this should be OK. >>>>>>> I'll note this does not only handle the "obvious" case but also >>>>>> >>>>>> In the attached patched, I have moved based on the suggestion. >>>>>> Bootstrappped and regression tested on aarch64-linux-gnu with no New >>>>>> regressions. >>>>>> Is this OK? >>>>> >>>>> It seems you attached the old patch? >>>> >>>> Apologies. I sent the wrong patch. Here is the correct one. >>> >>> This seems to be an incremental patch of some sorts. Applying some >>> "guessing" >>> the overall thing looks OK, I'd probably try micro-optimizing this >>> timing critical >>> part a bit by pre-computing load_stmt outside of the refs_to_check iteration >>> and only call is_self_write if that was successful (lim_raw && single >>> access in loop). >>> >>> Please send an updated and complete patch. You might want to double-check >>> it >>> before posting ;) >>> >> Apologies for the mixup again. Here is the patch with added check for >> ref1->loaded && ref2->stored > > This is the original patch again .... > > So NACK. > > :/ > > Please triple-check what you attach.
Oops, Sorry. No excuse for this stupidity from my end. Thanks again for your
patience.
I have revised it. Hopefully I didn’t make any mistake this time.
Bootstrapped and regression tested on aarch64-linux-gnu with no New
Thanks,
Kugan
>
> Richard.
>
>> Bootstrapped and regression tests on aarch64-linuc-gnu with no new
>> regressions.
>>
>> Thanks,
>> Kugan
>>
>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks,
>>>> Kugan
>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Kugan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> void foo(int *p, int *q, int j)
>>>>>>> {
>>>>>>> for (int i =0; i < N; ++i)
>>>>>>> p[i] = q[j];
>>>>>>> }
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> * tree-ssa-loop-im.cc (is_self_write): New.
>>>>>>>> (refs_independent_p): Allow hoisting when aliasing references
>>>>>>>> form a self write pattern.
>>>>>>>>
>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>
>>>>>>>> * gcc.dg/vect/vect-licm-hoist-1.c: New.
>>>>>>>> * gcc.dg/vect/vect-licm-hoist-2.c: Likewise.
>>>>>>>>
>>>>>>>> Bootstrappped and regression tested on aarch64-linux-gnu with no New
>>>>>>>> regressions.
>>>>>>>>
>>>>>>>> Is this Ok?
>>>>>>>>
>>>>>>>> Signed-off-by: Kugan Vivekanandarajah <[email protected]>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Kugan
0001-tree-optimization-v3-Allow-LICM-to-hoist-loads-in-se.patch
Description: 0001-tree-optimization-v3-Allow-LICM-to-hoist-loads-in-se.patch
