https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933
--- Comment #5 from Kewen Lin <linkw at gcc dot gnu.org> --- (In reply to Segher Boessenkool from comment #4) > Yes, timing suggests there is some SHL/LHS flush. > > On p9 and later we can use mtvsrdd instead of mtvsrd (moving two > bytes into place at one), which reduces the number of moves from > 16 to 8, and the number of merges from 15 to 7 (and reduces path > length by 1). This sounds like a no-brainer win with that :-) Good idea, it looks better on P9. One thing to double confirm, currently there are no instructions like vmrgob and vmrgoh, so of the mergings you mentioned from vector bytes to vector short and vector short to vector word needs artificial control vector?