On Sat, Dec 27, 2025 at 12:15 AM Jeffrey Law <[email protected]> wrote:
> > On 12/16/2025 2:07 AM, Konstantinos Eleftheriou wrote: > > > > On Mon, Dec 15, 2025 at 12:55 PM Konstantinos Eleftheriou < > [email protected]> wrote: > >> >> On Mon, Dec 15, 2025 at 5:54 AM Jeff Law <[email protected]> wrote: >> >>> >>> >>> On 12/9/25 10:01 AM, Konstantinos Eleftheriou wrote: >>> > The call to `gen_lowpart` in `store_bit_field_1` might copy the >>> destination >>> > register into a new one, which may lead to wrong code generation, as >>> the bit >>> > insertions update the new register instead of updating `str_rtx`. >>> > >>> > This patch copies back the new destination register into `str_rtx` >>> when needed. >>> > >>> > gcc/ChangeLog: >>> > >>> > * expmed.cc (store_bit_field_1): Copy back the new destination >>> > register into `str_rtx` when needed. >>> Ugh. gen_lowpart would not be a routine I'd want to see in this call >>> chain. Largely because it's hooked and thus the underlying >>> implementation changes depending on where in the pipeline we are there's >>> also various helpers that tend to get used. Sigh. >>> >>> Regardless something doesn't make sense with your patch. The existing >>> code passes the return value from gen_lowpart (potentially the new >>> pseudo) as the destination for store_integral_bit_field. >> >> >> Based on the RTL that I provided in the previous thread, the code seems >> to rely >> on the fact that `gen_lowpart` will generate a subreg to convert the >> vector mode >> into an integral one, preparing the call to 'store_integral_bit_field'. >> That's why I chose >> to copy the value back after the call to `store_integral_bit_field`. >> >>> >>> >> It seems like the problem would be inside store_integral_bit_field or >>> its children. store_integral_bit_field doesn't have the problematic >>> semantics (it returns a bool, not an RTX, so by definition it doesn't >>> have those problem semantics). >>> >>> >>> I'm thinking I really should throw this under the debugger to understand >>> the dataflow here. How can I trigger the underlying issue? >>> >> >> I'm using 'gcc.target/aarch64/vldN_lane_1.c', compiled with `-O3 >> -fno-inline` (as in the testcase). >> The RTL comes from the `test_vld4q_lane_u16` function. >> >> Sorry, I made a mistake here. The sequence is from the > `test_vld4_lane_bf16` function in > 'gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c', > compiled with `-march=armv8.2-a+fp16 -march=armv8.2-a+bf16 -O3 -g > -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer > -finline-functions`. > > > So at the end of store_bit_field_1 we have this: > > > 88 op0 = gen_lowpart (op0_mode.require (), op0); > 889 } > 890 > 891 return store_integral_bit_field (op0, op0_mode, ibitsize, > ibitnum, > 892 bitregion_start, bitregion_end, > 893 fieldmode, value, reverse, > fallback_p); > > > > where the call to gen_lowpart may create the new pseudo. So isn't hte > solution to just copy the result op0 into the original op0? I don't think > any scanning of the RTL is advisable here. Something like > The reason why I'm doing the scanning is that 'op0' might be a subreg of the original register. In that case, `REGNO (op0)` would be different than `REGNO (orig_op0)`, but we wouldn't need to do anything in that instance, as the destination register hasn't changed. Is there a way to do this without the scanning? > > orig_op0 = op0 > > op0 = gen_lowpart (op0_mode.require (), op0); > > } > > retval = store_integral_bit_field (...) > > if (op0 != orig_op0) > > emit_move_insn (orig_op0, op0) > > return retval; > > orig_op0 = op0 > > op0 = gen_lowpart (op0_mode.require (), op0); > > } > > retval = store_integral_bit_field (...) > > if (op0 != orig_op0) > > emit_move_insn (orig_op0, op0) > > return retval; > > > I suspect the emit_move_insn will need a bit of adjustment because the > modes are wrong, but I *think* that's the basic code structure we're > looking for. No need to scan the sequence or anything like that. > `convert_move` copies the value and handles the mode changes too. Konstantinos > jeff > > > > >
