https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117734
Bug ID: 117734
Summary: Misses VNNI pmaddubsw qi->hi dot_prod
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
Looking at x264_r mc_chroma which does
dst[x] = ( cA*src[x] + cB*src[x+1] + cC*srcp[x] + cD*srcp[x+1] + 32 ) >> 6;
with uchar src[]/dst[] and integer multiplies we manage to reduce the
multiplication precision to HImode but then do not see the opportunity
to use dot_prod for the QI->HI multiply and add.
One reason is x86 doesn't seem to expose [us]dot_prodvNhiv2Nqi which I
think VNNI provides.
The vectorizer also does not consider demoting c[ABCD] to [us]char,
but maybe it would (range info is there). The vectorizer also has
the issue for this SLP opportunity (aka not reduction) that dot_prod
doesn't specify which lanes are summed, we'd have to fix this.
This PR is about the missing patterns.