On 11/27/25 1:56 AM, Konstantinos Eleftheriou wrote:
On Thu, Nov 27, 2025 at 2:15 AM Jeff Law <[email protected]
<mailto:[email protected]>> wrote:
On 11/19/25 7:37 AM, Konstantinos Eleftheriou wrote:
> Sometimes, `store_bit_field` copies the destination register into
a new one,
> which leads to the old register being used in the instructions
that follow
> the ones generated by `store_bit_field`, while the bit field
insertion is
> performed on the new register.
>
> This patch copies back the new destination register into the old
one when
> needed.
>
> gcc/ChangeLog:
>
> * avoid-store-forwarding.cc (generate_bit_insert_sequence):
> Copy back the new destination register into the old one
when needed.
This sounds more like a bit in store_bit_field. It's stated purpose is
to store the source field into an object. I'd really like to
understand
in detail the scenario in which str_rtx does not contain the right
value
after a call to store_bit_field.
We were getting regressions in these testcases w/o this patch:
# * gcc: gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c -O3 -
fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-
functions execution test
# * gcc: gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c -O3 -g
execution test
# * gcc: gcc.target/aarch64/vldN_lane_1.c execution test
For bf16_vldN_lane_1.c for example, this ASF transformation is triggered:
```
Store forwarding detected:
From: (insn 35 64 36 2 (set (mem/c:V4SI (plus:DI (reg/f:DI 64 sfp)
(const_int -16 [0xfffffffffffffff0])) [0 MEM
<char[1:32]> [(void *)&vectors]+16 S16 A128])
(reg:V4SI 143 [ MEM <char[1:32]> [(void *)data_22(D)]+16 ]))
"../testcase.c":21:20 discrim 1 1333 {*aarch64_simd_movv4si}
(expr_list:REG_DEAD (reg:V4SI 143 [ MEM <char[1:32]> [(void
*)data_22(D)]+16 ])
(nil)))
To: (insn 41 40 42 2 (set (reg:V4x4BF 60 v28)
(mem/c:V4x4BF (plus:DI (reg/f:DI 64 sfp)
(const_int -32 [0xffffffffffffffe0])) [2 vectors+0 S32
A128])) 4838 {*aarch64_movv4x4bf}
(nil))
Store forwarding avoided with bit inserts:
With sequence:
(insn 345 0 346 (set (reg:V4x4BF 185)
(reg:V4x4BF 60 v28)) 4838 {*aarch64_movv4x4bf}
(nil))
(insn 346 345 347 (set (subreg:DI (reg:V4x4BF 185) 16)
(subreg:DI (reg:V4SI 184) 0)) 110 {*movdi_aarch64}
(nil))
(insn 347 346 0 (set (subreg:DI (reg:V4x4BF 185) 24)
(subreg:DI (reg:V4SI 184) 8)) 110 {*movdi_aarch64}
(nil))
```
Here, str_rtx is v28, which is copied into 185. So, 185 is updated instead.
The next instruction after the sequence generated by store_bit_field is:
```
(insn 45 44 309 2 (set (reg:V4x4BF 60 v28)
(unspec:V4x4BF [
(mem:BLK (reg/f:DI 183 [ overwrite ]) [0 S8 A8])
(reg:V4x4BF 60 v28)
(const_int 3 [0x3])
] UNSPEC_LD4_LANE)) "./gcc/include/arm_neon.h":28361:10
4597 {aarch64_vec_load_lanesv4x4bf_lanev4bf}
(nil))
```
This one uses v28, which holds the old value.
Yea, this sounds like something in that path not honoring the "result is
generated in TARGET, if convenient to do so" semantics that many of
those expansion routines have.
For those kinds of scenarios, something should be testing that the
result returned by the expander function (or whatever function is in
question) is a different register than was passed in as the desired
destination. And in that case it should emit an insn to copy the value
from the return value into the actual target value.
As an example of these semantics, see do_store_flag. The net of having
these semantics for an API is that every caller has to check and emit
that extra insn. Also note this sometimes expands outwards. Continuing
with do_store_flag, expand_expr_real_2 calls it in such a way that
expand_expr_real_2 effectively has the same semantics as does
expand_expr_real_1, and so-on.
To be clear, the semantics are *awful*, but that's how things have
worked forever.
So I think the way forward is to trace through store_bit_field and its
children. At some point something is returning (reg 185) as a result
and the caller doesn't recognize that it needs to copy (reg 185) into
(reg 60). So if possible you might be able to key a conditional
breakpoint on generating (reg 185) and something in the call stack when
that happens is probably the culprit.
Essentially I suspect you're working around a latent bug somewhere.
jeff