On 12/16/2025 2:07 AM, Konstantinos Eleftheriou wrote:
On Mon, Dec 15, 2025 at 12:55 PM Konstantinos Eleftheriou
<[email protected]> wrote:
On Mon, Dec 15, 2025 at 5:54 AM Jeff Law <[email protected]>
wrote:
On 12/9/25 10:01 AM, Konstantinos Eleftheriou wrote:
> The call to `gen_lowpart` in `store_bit_field_1` might copy
the destination
> register into a new one, which may lead to wrong code
generation, as the bit
> insertions update the new register instead of updating
`str_rtx`.
>
> This patch copies back the new destination register into
`str_rtx` when needed.
>
> gcc/ChangeLog:
>
> * expmed.cc (store_bit_field_1): Copy back the new
destination
> register into `str_rtx` when needed.
Ugh. gen_lowpart would not be a routine I'd want to see in
this call
chain. Largely because it's hooked and thus the underlying
implementation changes depending on where in the pipeline we
are there's
also various helpers that tend to get used. Sigh.
Regardless something doesn't make sense with your patch. The
existing
code passes the return value from gen_lowpart (potentially the
new
pseudo) as the destination for store_integral_bit_field.
Based on the RTL that I provided in the previous thread, the code
seems to rely
on the fact that `gen_lowpart` will generate a subreg to convert
the vector mode
into an integral one, preparing the call to
'store_integral_bit_field'. That's why I chose
to copy the value back after the call to `store_integral_bit_field`.
It seems like the problem would be inside
store_integral_bit_field or
its children. store_integral_bit_field doesn't have the
problematic
semantics (it returns a bool, not an RTX, so by definition it
doesn't
have those problem semantics).
I'm thinking I really should throw this under the debugger to
understand
the dataflow here. How can I trigger the underlying issue?
I'm using 'gcc.target/aarch64/vldN_lane_1.c', compiled with `-O3
-fno-inline` (as in the testcase).
The RTL comes from the `test_vld4q_lane_u16` function.
Sorry, I made a mistake here. The sequence is from the
`test_vld4_lane_bf16` function in
'gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c',
compiled with `-march=armv8.2-a+fp16 -march=armv8.2-a+bf16 -O3 -g
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions`.
So at the end of store_bit_field_1 we have this:
88 op0 = gen_lowpart (op0_mode.require (), op0);
889 }
890
891 return store_integral_bit_field (op0, op0_mode, ibitsize,
ibitnum,
892 bitregion_start, bitregion_end,
893 fieldmode, value, reverse,
fallback_p);
where the call to gen_lowpart may create the new pseudo. So isn't hte
solution to just copy the result op0 into the original op0? I don't
think any scanning of the RTL is advisable here. Something like
orig_op0 = op0
op0 = gen_lowpart (op0_mode.require (), op0);
}
retval = store_integral_bit_field (...)
if (op0 != orig_op0)
emit_move_insn (orig_op0, op0)
return retval;
orig_op0 = op0
op0 = gen_lowpart (op0_mode.require (), op0);
}
retval = store_integral_bit_field (...)
if (op0 != orig_op0)
emit_move_insn (orig_op0, op0)
return retval;
I suspect the emit_move_insn will need a bit of adjustment because the
modes are wrong, but I *think* that's the basic code structure we're
looking for. No need to scan the sequence or anything like that.
jeff