https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100696

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Blocks|                            |53947
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-05-20
             Target|                            |x86_64-*-*
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
We only have a widening multiplication pattern, not a separate high-part one.
When you increase N to 8 you'll see we need a VF of 8 to do

  <bb 2> [local count: 119292720]:
  vect__1.24_26 = MEM <vector(8) short int> [(short int *)&a];
  vect__3.27_23 = MEM <vector(8) short int> [(short int *)&b];
  vect_patt_29.28_22 = WIDEN_MULT_LO_EXPR <vect__3.27_23, vect__1.24_26>;
  vect_patt_29.28_21 = WIDEN_MULT_HI_EXPR <vect__3.27_23, vect__1.24_26>;
  vect__6.29_20 = vect_patt_29.28_22 >> 16;
  vect__6.29_19 = vect_patt_29.28_21 >> 16;
  vect__7.30_18 = VEC_PACK_TRUNC_EXPR <vect__6.29_20, vect__6.29_19>;
  MEM <vector(8) short int> [(short int *)&r] = vect__7.30_18;

resulting in

mulhi:
.LFB1:
        .cfi_startproc
        movdqa  b(%rip), %xmm0
        pmullw  a(%rip), %xmm0
        movdqa  %xmm0, %xmm1
        movdqa  b(%rip), %xmm2
        pmulhw  a(%rip), %xmm2
        punpcklwd       %xmm2, %xmm1
        punpckhwd       %xmm2, %xmm0
        psrad   $16, %xmm1
        psrad   $16, %xmm0
        pshufb  .LC0(%rip), %xmm1
        pshufb  .LC1(%rip), %xmm0
        por     %xmm1, %xmm0
        movaps  %xmm0, r(%rip)
        ret

for smulhrs there's a special pattern:

t.c:40:17: note:   widen_mult pattern recognized: patt_37 = _1 w* _3;
t.c:40:17: note:   vect_recog_mulhs_pattern: detected: _8 = _7 >> 1;
t.c:40:17: note:   created pattern stmt: patt_36 = .MULHRS (_1, _3);
t.c:40:17: note:   mult_high pattern recognized: patt_35 = (int) patt_36;
t.c:40:17: note:   extra pattern stmt: patt_36 = .MULHRS (_1, _3);
t.c:40:17: note:   vect_is_simple_use: operand _7 >> 1, type of def: internal
t.c:40:17: note:   vect_is_simple_use: operand .MULHRS (_1, _3), type of def:
internal
t.c:40:17: note:   vect_recog_cast_forwprop_pattern: detected: _9 = (short int)
_8;
t.c:40:17: note:   cast_forwprop pattern recognized: patt_34 = (short int)
patt_36;

so we miss sth of that for the [u]mulhi cases.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Reply via email to