https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458
--- Comment #11 from Jeffrey A. Law <law at gcc dot gnu.org> --- Hmm. Are we somehow mis-computing the size of the mask operand thinking it takes v0..v7 in the LMUL8 scenario? If so that would mean this instruction consumes the entire vector register file. In isolation that should be manageable, but it may become problematical if we have to do any spilling. I guess the first question is why did we spill. Does the .reload dump file give any good clues? Do we perhaps re-use the mask and end up with crossing lifetimes for the mask?