https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417
--- Comment #7 from Jim Wilson <wilson at gcc dot gnu.org> --- That patch is basically correct. I would suggest using gen_lowpart instead of gen_rtx_SUBREG as a minor cleanup. It will do the same thing, and is shorter and easier to read. There is one problem here that you can't generate new pseudo registers during register allocation, or after register allocation is complete. So you need to disable this optimization in this case. You can do that by adding a check for can_create_pseudo_p (). This is already used explicitly in one place in riscv_legitimize_move and implicitly in some subfunctions, and is used in the arm.md movqi pattern. The next part is testing the patch. We need some correctness testing. You can just run the gcc testsuite for that. And we need some code size/performance testing. I'd suggest compiling some code with and without the patch and check function sizes and look for anything that got bigger with the patch, and check to see if it is a problem. I like to use the toolchain libraries like libc.a and libstdc++.a since they are being built anways, but using a nice benchmark is OK also as long as it is big enough to stress the compiler. If the patch passes testing, then we can look at expanding the scope to handle more modes, and also handle MEM dest in addition to REG dest. Yes, we can discuss this Monday.