On Thu, Nov 27, 2025 at 2:15 AM Jeff Law <[email protected]> wrote:

>
>
> On 11/19/25 7:37 AM, Konstantinos Eleftheriou wrote:
> > Sometimes, `store_bit_field` copies the destination register into a new
> one,
> > which leads to the old register being used in the instructions that
> follow
> > the ones generated by `store_bit_field`, while the bit field insertion is
> > performed on the new register.
> >
> > This patch copies back the new destination register into the old one when
> > needed.
> >
> > gcc/ChangeLog:
> >
> >          * avoid-store-forwarding.cc (generate_bit_insert_sequence):
> >       Copy back the new destination register into the old one when
> needed.
> This sounds more like a bit in store_bit_field.  It's stated purpose is
> to store the source field into an object.  I'd really like to understand
> in detail the scenario in which str_rtx does not contain the right value
> after a call to store_bit_field.
>

We were getting regressions in these testcases w/o this patch:
- * gcc: gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions execution test
- * gcc: gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c -O3 -g
execution test
- * gcc: gcc.target/aarch64/vldN_lane_1.c execution test

For bf16_vldN_lane_1.c for example, this ASF transformation is triggered:
```
Store forwarding detected:
From: (insn 35 64 36 2 (set (mem/c:V4SI (plus:DI (reg/f:DI 64 sfp)
                (const_int -16 [0xfffffffffffffff0])) [0 MEM <char[1:32]>
[(void *)&vectors]+16 S16 A128])
        (reg:V4SI 143 [ MEM <char[1:32]> [(void *)data_22(D)]+16 ]))
"../testcase.c":21:20 discrim 1 1333 {*aarch64_simd_movv4si}
     (expr_list:REG_DEAD (reg:V4SI 143 [ MEM <char[1:32]> [(void
*)data_22(D)]+16 ])
        (nil)))
To: (insn 41 40 42 2 (set (reg:V4x4BF 60 v28)
        (mem/c:V4x4BF (plus:DI (reg/f:DI 64 sfp)
                (const_int -32 [0xffffffffffffffe0])) [2 vectors+0 S32
A128])) 4838 {*aarch64_movv4x4bf}
     (nil))
Store forwarding avoided with bit inserts:
With sequence:
  (insn 345 0 346 (set (reg:V4x4BF 185)
        (reg:V4x4BF 60 v28)) 4838 {*aarch64_movv4x4bf}
     (nil))
  (insn 346 345 347 (set (subreg:DI (reg:V4x4BF 185) 16)
        (subreg:DI (reg:V4SI 184) 0)) 110 {*movdi_aarch64}
     (nil))
  (insn 347 346 0 (set (subreg:DI (reg:V4x4BF 185) 24)
        (subreg:DI (reg:V4SI 184) 8)) 110 {*movdi_aarch64}
     (nil))
```
Here, str_rtx is v28, which is copied into 185. So, 185 is updated instead.
The next instruction after the sequence generated by store_bit_field is:
```
(insn 45 44 309 2 (set (reg:V4x4BF 60 v28)
        (unspec:V4x4BF [
                (mem:BLK (reg/f:DI 183 [ overwrite ]) [0  S8 A8])
                (reg:V4x4BF 60 v28)
                (const_int 3 [0x3])
            ] UNSPEC_LD4_LANE)) "./gcc/include/arm_neon.h":28361:10 4597
{aarch64_vec_load_lanesv4x4bf_lanev4bf}
     (nil))
```
This one uses v28, which holds the old value.

Konstantinos

>
> I can certainly believe we may have a problem here though -- many
> routines are defined as putting their result into a given target pseudo
> *if it's convenient* to do so, but leave it up to the caller to handle
> the case where that doesn't happen.  I could easily see something in the
> bowels of store_bit_field or its children getting semantics wrong
> because of that.
>
> Jeff
>

Reply via email to