Great! Thanks for fixing it. LGTM.
juzhe.zh...@rivai.ai From: Joern Rennecke Date: 2023-10-01 04:30 To: GCC Patches CC: Jeff Law; 钟居哲 Subject: RFA: RISC-V: Make riscv_vector::legitimize_move adjust SRC in the caller. (Was: Remove mem-to-mem VLS move pattern[PR111566]) >On 9/27/23 03:38, juzhe.zh...@rivai.ai wrote: >> >> Why add `can_create_pseudo_p ()` here? this will split after reload, >>>>but we forbid that pattern between reload and split2? >> >> I have no ideal. Some fortran tests just need recognization of >> mem-to-mem pattern before RA >> I don't know the reason. >But isn't that the key to understanding what's going on here? Jeff law: >There is nothing special about Fortran here. Whatever problem this is >working around will almost certainly show up again in other, >non-Fortran, contexts. I also ran into the problem of the mov<mode>_mem_to_mem pattern making ira combine the instructions output by my cpymem patch into an unsplittable must-split pattern. And just plain removing the mem-to-mem pattern gives a newlib build failure. The underlying problem is in the declaration of riscv_vector::legitimize_move . The function gets passed by value a source and destination, and it either emits (instructions for) a move and returns true, or does checks and/or preparation statements and a modifications of its *copy of* src and returns. IIRC, we don't want C++ pass-by-reference syntax in GCC source, so the solution should be the tried-and trusted method of passing an explicit pointer to rtl that we want modified. I have attached a patch, regression tested for: riscv-sim riscv-sim/-march=rv32gcv_zfh/-mabi=ilp32d/-ftree-vectorize/--param=riscv-autovec-preference=scalable riscv-sim/-march=rv32imac/-mabi=ilp32 riscv-sim/-march=rv64gcv_zfh_zvfh_zba_zbb_zbc_zicond_zicboz_zawrs/-mabi=lp64d/-ftree-vectorize/--param=riscv-autovec-preference=scalable riscv-sim/-march=rv64imac/-mabi=lp64 Incidentally, the optimization that the mov<mode>_mem_to_mem made was invalid, as it didn't check alignments, nor that the target supports unaligned accesses with a fast hardware implementation. I think this optimization - with the appropriate check for hardware support - should be put into the non-vector path of the cpymem expander, simply as a relaxation of the alignment test for using scalars values spanning multiple addressable units.