On Sat, Dec 27, 2025 at 12:15 AM Jeffrey Law <[email protected]> wrote:

>
> On 12/16/2025 2:07 AM, Konstantinos Eleftheriou wrote:
>
>
>
> On Mon, Dec 15, 2025 at 12:55 PM Konstantinos Eleftheriou <
> [email protected]> wrote:
>
>>
>> On Mon, Dec 15, 2025 at 5:54 AM Jeff Law <[email protected]> wrote:
>>
>>>
>>>
>>> On 12/9/25 10:01 AM, Konstantinos Eleftheriou wrote:
>>> > The call to `gen_lowpart` in `store_bit_field_1` might copy the
>>> destination
>>> > register into a new one, which may lead to wrong code generation, as
>>> the bit
>>> > insertions update the new register instead of updating `str_rtx`.
>>> >
>>> > This patch copies back the new destination register into `str_rtx`
>>> when needed.
>>> >
>>> > gcc/ChangeLog:
>>> >
>>> >          * expmed.cc (store_bit_field_1): Copy back the new destination
>>> >       register into `str_rtx` when needed.
>>> Ugh.  gen_lowpart would not be a routine I'd want to see in this call
>>> chain.  Largely because it's hooked and thus the underlying
>>> implementation changes depending on where in the pipeline we are there's
>>> also various helpers that tend to get used.  Sigh.
>>>
>>> Regardless something doesn't make sense with your patch.  The existing
>>> code passes the return value from gen_lowpart (potentially the new
>>> pseudo) as the destination for store_integral_bit_field.
>>
>>
>> Based on the RTL that I provided in the previous thread, the code seems
>> to rely
>> on the fact that `gen_lowpart` will generate a subreg to convert the
>> vector mode
>> into an integral one, preparing the call to 'store_integral_bit_field'.
>> That's why I chose
>> to copy the value back after the call to `store_integral_bit_field`.
>>
>>>
>>>
>> It seems like the problem would be inside store_integral_bit_field or
>>> its children.  store_integral_bit_field doesn't have the problematic
>>> semantics (it returns a bool, not an RTX, so by definition it doesn't
>>> have those problem semantics).
>>>
>>>
>>> I'm thinking I really should throw this under the debugger to understand
>>> the dataflow here.  How can I trigger the underlying issue?
>>>
>>
>> I'm using 'gcc.target/aarch64/vldN_lane_1.c', compiled with `-O3
>> -fno-inline` (as in the testcase).
>> The RTL comes from the `test_vld4q_lane_u16` function.
>>
>> Sorry, I made a mistake here. The sequence is from the
> `test_vld4_lane_bf16` function in
> 'gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c',
> compiled with `-march=armv8.2-a+fp16 -march=armv8.2-a+bf16 -O3 -g
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
> -finline-functions`.
>
>
> So at the end of store_bit_field_1 we have this:
>
>
> 88             op0 = gen_lowpart (op0_mode.require (), op0);
> 889         }
> 890
> 891       return store_integral_bit_field (op0, op0_mode, ibitsize,
> ibitnum,
> 892                                        bitregion_start, bitregion_end,
> 893                                        fieldmode, value, reverse,
> fallback_p);
>
>
>
> where the call to gen_lowpart may create the new pseudo.   So isn't hte
> solution to just copy the result op0 into the original op0?  I don't think
> any scanning of the RTL is advisable here.  Something like
>
The reason why I'm doing the scanning is that 'op0' might be a subreg of
the original register. In that case, `REGNO (op0)` would be different than
`REGNO (orig_op0)`,
but we wouldn't need to do anything in that instance, as the destination
register hasn't changed. Is there a way to do this without the scanning?

>
>       orig_op0 = op0
>
>       op0 = gen_lowpart (op0_mode.require (), op0);
>
>   }
>
> retval = store_integral_bit_field (...)
>
> if (op0 != orig_op0)
>
>   emit_move_insn (orig_op0, op0)
>
> return retval;
>
>       orig_op0 = op0
>
>       op0 = gen_lowpart (op0_mode.require (), op0);
>
>   }
>
> retval = store_integral_bit_field (...)
>
> if (op0 != orig_op0)
>
>   emit_move_insn (orig_op0, op0)
>
> return retval;
>
>
> I suspect the emit_move_insn will need a bit of adjustment because the
> modes are wrong, but I *think* that's the basic code structure we're
> looking for.  No need to scan the sequence or anything like that.
>
`convert_move` copies the value and handles the mode changes too.

Konstantinos

> jeff
>
>
>
>
>

Reply via email to