https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
--- Comment #36 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #35) > (In reply to Richard Biener from comment #33) > > Created attachment 50308 [details] > > patch > > > > I am testing the following. > > It FAILs > > FAIL: gcc.target/i386/avx512dq-concatv2di-1.c scan-assembler > vpinsrq[^\\n\\r]*\\ > \\\$1[^\\n\\r]*%[re]si[^\\n\\r]*%xmm18[^\\n\\r]*%xmm19 That's exactly the case we're looking after. V2DI concat from two GPRs. > FAIL: gcc.target/i386/avx512dq-concatv2di-1.c scan-assembler > vpinsrq[^\\n\\r]*\\\\\$1[^\\n\\r]*%rsi[^\\n\\r]*%xmm16[^\\n\\r]*%xmm17 This is, like below, a MEM case. > FAIL: gcc.target/i386/avx512vl-concatv2di-1.c scan-assembler > vmovhps[^\\n\\r]*%[re]si[^\\n\\r]*%xmm18[^\\n\\r]*%xmm19 This one is because nonimmediate_gr_operand also matches a MEM, in this case we apply the peephole to (insn 12 11 13 2 (set (reg/v:V2DI 55 xmm19 [ c ]) (vec_concat:V2DI (reg:DI 54 xmm18 [91]) (mem:DI (reg/v/f:DI 4 si [orig:86 y ] [86]) [1 *y_8(D)+0 S8 A64]))) latency-wise memory isn't any better than a GPR so the decision to split is reasonable. > I'll see how to update those next week. So I updated the above to check for vpunpcklqdq instead.