https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Blocks| |53947 Status|UNCONFIRMED |NEW Last reconfirmed| |2021-05-20 Target| |x86_64-*-* Keywords| |missed-optimization --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- We only have a widening multiplication pattern, not a separate high-part one. When you increase N to 8 you'll see we need a VF of 8 to do <bb 2> [local count: 119292720]: vect__1.24_26 = MEM <vector(8) short int> [(short int *)&a]; vect__3.27_23 = MEM <vector(8) short int> [(short int *)&b]; vect_patt_29.28_22 = WIDEN_MULT_LO_EXPR <vect__3.27_23, vect__1.24_26>; vect_patt_29.28_21 = WIDEN_MULT_HI_EXPR <vect__3.27_23, vect__1.24_26>; vect__6.29_20 = vect_patt_29.28_22 >> 16; vect__6.29_19 = vect_patt_29.28_21 >> 16; vect__7.30_18 = VEC_PACK_TRUNC_EXPR <vect__6.29_20, vect__6.29_19>; MEM <vector(8) short int> [(short int *)&r] = vect__7.30_18; resulting in mulhi: .LFB1: .cfi_startproc movdqa b(%rip), %xmm0 pmullw a(%rip), %xmm0 movdqa %xmm0, %xmm1 movdqa b(%rip), %xmm2 pmulhw a(%rip), %xmm2 punpcklwd %xmm2, %xmm1 punpckhwd %xmm2, %xmm0 psrad $16, %xmm1 psrad $16, %xmm0 pshufb .LC0(%rip), %xmm1 pshufb .LC1(%rip), %xmm0 por %xmm1, %xmm0 movaps %xmm0, r(%rip) ret for smulhrs there's a special pattern: t.c:40:17: note: widen_mult pattern recognized: patt_37 = _1 w* _3; t.c:40:17: note: vect_recog_mulhs_pattern: detected: _8 = _7 >> 1; t.c:40:17: note: created pattern stmt: patt_36 = .MULHRS (_1, _3); t.c:40:17: note: mult_high pattern recognized: patt_35 = (int) patt_36; t.c:40:17: note: extra pattern stmt: patt_36 = .MULHRS (_1, _3); t.c:40:17: note: vect_is_simple_use: operand _7 >> 1, type of def: internal t.c:40:17: note: vect_is_simple_use: operand .MULHRS (_1, _3), type of def: internal t.c:40:17: note: vect_recog_cast_forwprop_pattern: detected: _9 = (short int) _8; t.c:40:17: note: cast_forwprop pattern recognized: patt_34 = (short int) patt_36; so we miss sth of that for the [u]mulhi cases. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations