https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104345
Roger Sayle <roger at nextmovesoftware dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |roger at nextmovesoftware dot com Status|UNCONFIRMED |NEW Last reconfirmed| |2022-02-02 Ever confirmed|0 |1 --- Comment #1 from Roger Sayle <roger at nextmovesoftware dot com> --- Hi Torsten, Thanks for the bug report. The STORE_FLAG_VALUE=1 patch was one of a series to dramatically improve the quality of nvptx code. Alas not all of them have yet been reviewed/approved, and it's likely these later improvements address the quality regression you're seeing. The other patches in the "nvptx Boolean" series are: patchq3: nvptx: Expand QI mode operations using SI mode instructions. https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587999.html patchq4: nvptx: Fix and use BI mode logic instructions (e.g. and.pred). https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588555.html [and purely for reference, my other outstanding nvptx patches are] patchn: nvptx: Improved support for HFMode including neghf2 and abshf2. https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587949.html patchw: nvptx: Add support for 64-bit mul.hi (and other) instructions. https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588453.html And one other related patch is that there's also a middle-end SUBREG patch intended to improve code generation on nvptx is also pending at: patchs: Simplify paradoxical subreg extensions of TRUNCATE https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578848.html My guess is that patchq3+patchq4 above should (hopefully) resolve this particular regression. If you could give them a spin on your system to see if they reduce register pressure sufficiently for this case, that would be greatly appreciated. As you can read in the above postings, the total number of instructions/registers (after all of these changes) should be dramatically reduced. I'll see what I can do from my end.