Great! Thanks for fixing it.

LGTM.



juzhe.zh...@rivai.ai
 
From: Joern Rennecke
Date: 2023-10-01 04:30
To: GCC Patches
CC: Jeff Law; 钟居哲
Subject: RFA: RISC-V: Make riscv_vector::legitimize_move adjust SRC in the 
caller. (Was: Remove mem-to-mem VLS move pattern[PR111566])
>On 9/27/23 03:38, juzhe.zh...@rivai.ai wrote:
>>  >> Why add `can_create_pseudo_p ()` here? this will split after reload,
>>>>but we forbid that pattern between reload and split2?
>>
>> I have no ideal. Some fortran tests just need recognization of
>> mem-to-mem pattern before RA
>> I don't know the reason.
>But isn't that the key to understanding what's going on here?
 
Jeff law:
>There is nothing special about Fortran here.  Whatever problem this is
>working around will almost certainly show up again in other,
>non-Fortran, contexts.
 
I also ran into the problem of the  mov<mode>_mem_to_mem pattern
making ira combine the instructions output by my cpymem patch into
an unsplittable must-split pattern.  And just plain removing the mem-to-mem
pattern gives a newlib build failure.
The underlying problem is in the declaration of riscv_vector::legitimize_move .
The function gets passed by value a source and destination, and it either
emits (instructions for) a move and returns true, or does checks and/or
preparation statements and a modifications of its *copy of* src and returns.
IIRC, we don't want C++ pass-by-reference syntax in GCC source, so the
solution should be the tried-and trusted method of passing an explicit pointer
to rtl that we want modified.
 
I have attached a patch, regression tested for:
    riscv-sim
    
riscv-sim/-march=rv32gcv_zfh/-mabi=ilp32d/-ftree-vectorize/--param=riscv-autovec-preference=scalable
    riscv-sim/-march=rv32imac/-mabi=ilp32
    
riscv-sim/-march=rv64gcv_zfh_zvfh_zba_zbb_zbc_zicond_zicboz_zawrs/-mabi=lp64d/-ftree-vectorize/--param=riscv-autovec-preference=scalable
    riscv-sim/-march=rv64imac/-mabi=lp64
 
Incidentally, the optimization that the mov<mode>_mem_to_mem made was invalid,
as it didn't check alignments, nor that the target supports unaligned
accesses with
a fast hardware implementation.  I think this optimization - with the
appropriate check
for hardware support - should be put into the non-vector path of the
cpymem expander,
simply as a relaxation of the alignment test for using scalars values
spanning multiple
addressable units.

Reply via email to