https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Blocks| |53947
Status|UNCONFIRMED |NEW
Last reconfirmed| |2021-05-20
Target| |x86_64-*-*
Keywords| |missed-optimization
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
We only have a widening multiplication pattern, not a separate high-part one.
When you increase N to 8 you'll see we need a VF of 8 to do
<bb 2> [local count: 119292720]:
vect__1.24_26 = MEM <vector(8) short int> [(short int *)&a];
vect__3.27_23 = MEM <vector(8) short int> [(short int *)&b];
vect_patt_29.28_22 = WIDEN_MULT_LO_EXPR <vect__3.27_23, vect__1.24_26>;
vect_patt_29.28_21 = WIDEN_MULT_HI_EXPR <vect__3.27_23, vect__1.24_26>;
vect__6.29_20 = vect_patt_29.28_22 >> 16;
vect__6.29_19 = vect_patt_29.28_21 >> 16;
vect__7.30_18 = VEC_PACK_TRUNC_EXPR <vect__6.29_20, vect__6.29_19>;
MEM <vector(8) short int> [(short int *)&r] = vect__7.30_18;
resulting in
mulhi:
.LFB1:
.cfi_startproc
movdqa b(%rip), %xmm0
pmullw a(%rip), %xmm0
movdqa %xmm0, %xmm1
movdqa b(%rip), %xmm2
pmulhw a(%rip), %xmm2
punpcklwd %xmm2, %xmm1
punpckhwd %xmm2, %xmm0
psrad $16, %xmm1
psrad $16, %xmm0
pshufb .LC0(%rip), %xmm1
pshufb .LC1(%rip), %xmm0
por %xmm1, %xmm0
movaps %xmm0, r(%rip)
ret
for smulhrs there's a special pattern:
t.c:40:17: note: widen_mult pattern recognized: patt_37 = _1 w* _3;
t.c:40:17: note: vect_recog_mulhs_pattern: detected: _8 = _7 >> 1;
t.c:40:17: note: created pattern stmt: patt_36 = .MULHRS (_1, _3);
t.c:40:17: note: mult_high pattern recognized: patt_35 = (int) patt_36;
t.c:40:17: note: extra pattern stmt: patt_36 = .MULHRS (_1, _3);
t.c:40:17: note: vect_is_simple_use: operand _7 >> 1, type of def: internal
t.c:40:17: note: vect_is_simple_use: operand .MULHRS (_1, _3), type of def:
internal
t.c:40:17: note: vect_recog_cast_forwprop_pattern: detected: _9 = (short int)
_8;
t.c:40:17: note: cast_forwprop pattern recognized: patt_34 = (short int)
patt_36;
so we miss sth of that for the [u]mulhi cases.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations