On 12/16/2025 2:07 AM, Konstantinos Eleftheriou wrote:


On Mon, Dec 15, 2025 at 12:55 PM Konstantinos Eleftheriou <[email protected]> wrote:


    On Mon, Dec 15, 2025 at 5:54 AM Jeff Law <[email protected]>
    wrote:



        On 12/9/25 10:01 AM, Konstantinos Eleftheriou wrote:
        > The call to `gen_lowpart` in `store_bit_field_1` might copy
        the destination
        > register into a new one, which may lead to wrong code
        generation, as the bit
        > insertions update the new register instead of updating
        `str_rtx`.
        >
        > This patch copies back the new destination register into
        `str_rtx` when needed.
        >
        > gcc/ChangeLog:
        >
        >          * expmed.cc (store_bit_field_1): Copy back the new
        destination
        >       register into `str_rtx` when needed.
        Ugh.  gen_lowpart would not be a routine I'd want to see in
        this call
        chain.  Largely because it's hooked and thus the underlying
        implementation changes depending on where in the pipeline we
        are there's
        also various helpers that tend to get used.  Sigh.

        Regardless something doesn't make sense with your patch.  The
        existing
        code passes the return value from gen_lowpart (potentially the
        new
        pseudo) as the destination for store_integral_bit_field.


    Based on the RTL that I provided in the previous thread, the code
    seems to rely
    on the fact that `gen_lowpart` will generate a subreg to convert
    the vector mode
    into an integral one, preparing the call to
    'store_integral_bit_field'. That's why I chose
    to copy the value back after the call to `store_integral_bit_field`.


        It seems like the problem would be inside
        store_integral_bit_field or
        its children.  store_integral_bit_field doesn't have the
        problematic
        semantics (it returns a bool, not an RTX, so by definition it
        doesn't
        have those problem semantics).


        I'm thinking I really should throw this under the debugger to
        understand
        the dataflow here.  How can I trigger the underlying issue?


    I'm using 'gcc.target/aarch64/vldN_lane_1.c', compiled with `-O3
    -fno-inline` (as in the testcase).
    The RTL comes from the `test_vld4q_lane_u16` function.

Sorry, I made a mistake here. The sequence is from the `test_vld4_lane_bf16` function in
'gcc.target/aarch64/advsimd-intrinsics/bf16_vldN_lane_1.c',
compiled with `-march=armv8.2-a+fp16 -march=armv8.2-a+bf16 -O3 -g -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions`.

So at the end of store_bit_field_1 we have this:


88             op0 = gen_lowpart (op0_mode.require (), op0);
889         }
890
891       return store_integral_bit_field (op0, op0_mode, ibitsize, ibitnum,
892                                        bitregion_start, bitregion_end,
893                                        fieldmode, value, reverse, fallback_p);


where the call to gen_lowpart may create the new pseudo.   So isn't hte solution to just copy the result op0 into the original op0?  I don't think any scanning of the RTL is advisable here. Something like


      orig_op0 = op0

      op0 = gen_lowpart (op0_mode.require (), op0);

  }

retval = store_integral_bit_field (...)

if (op0 != orig_op0)

  emit_move_insn (orig_op0, op0)

return retval;

      orig_op0 = op0

      op0 = gen_lowpart (op0_mode.require (), op0);

  }

retval = store_integral_bit_field (...)

if (op0 != orig_op0)

  emit_move_insn (orig_op0, op0)

return retval;


I suspect the emit_move_insn will need a bit of adjustment because the modes are wrong, but I *think* that's the basic code structure we're looking for.  No need to scan the sequence or anything like that.


jeff



Reply via email to