On Thu, Dec 30, 2021 at 3:45 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > This patch adds basic V2QImode infrastructure and V2QImode arithmetic > operations (plus, minus and neg). The patched compiler can emit SSE > vectorized QImode operations (e.g. PADDB) with partial QImode vector, > and also synthesized double HI/LO QImode operations with integer registers. > > The testcase: > > typedef char __v2qi __attribute__ ((__vector_size__ (2))); > __v2qi plus (__v2qi a, __v2qi b) { return a + b; }; > > compiles with -O2 to: > > movl %edi, %edx > movl %esi, %eax > addb %sil, %dl > addb %ah, %dh > movl %edx, %eax > ret > > which is much better than what the unpatched compiler produces: > > movl %edi, %eax > movl %esi, %edx > xorl %ecx, %ecx > movb %dil, %cl > movsbl %dh, %edx > movsbl %ah, %eax > addl %edx, %eax > addb %sil, %cl > movb %al, %ch > movl %ecx, %eax > ret > > The V2QImode vectorization does not require vector registers, so it can > be enabled by default also for 32-bit targets without SSE. > > The patch also enables vectorized V2QImode sign/zero extends. > > The reason for RFC are several warning failures in > Wstringop-overflow-*.[Cc] as a result of an unwanted vectorization. I > tried to sprinkle vect_slp_v2qi_store_align xfails around, but > unfortunately without success, since I have no idea about the details > of these tests. > > I didn't want to introduce testsuite FAILs, so help with these failing > tests is greatly appreciated.
This is now fixed in a separate patch. > Anyway, the above example shows the potential of V2QImode > vectorization. There are additional similar optimizations possible > (e.g. shifts with GPRs) in addition to SSE instructions on partial > V2QI vectors. > > Patch is bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > 2021-12-30 Uroš Bizjak <ubiz...@gmail.com> > > gcc/ChangeLog: > > PR target/103861 > * config/i386/i386.h (VALID_SSE2_REG_MOODE): Add V2QImode. > (VALID_INT_MODE_P): Ditto. > * config/i386/i386.c (ix86_secondary_reload): Handle > V2QImode reloads from SSE register to memory. > (vector_mode_supported_p): Always return true for V2QImode. > * config/i386/i386.md (*subqi_ext<mode>_2): New insn pattern. > (*negqi_ext<mode>_2): Ditto. > * config/i386/mmx.md (movv2qi): New expander. > (movmisalignv2qi): Ditto. > (*movv2qi_internal): New insn pattern. > (*pushv2qi2): Ditto. > (negv2qi2 and splitters): Ditto. > (<plusminus:insn>v2qi3 and splitters): Ditto. > > gcc/testsuite/ChangeLog: > > PR target/103861 > * gcc.dg/store_merging_18.c (dg-options): Add -fno-tree-vectorize. > * gcc.dg/store_merging_29.c (dg-options): Ditto. > * gcc.target/i386/pr103861.c: New test. > * gcc.target/i386/pr92658-avx512vl.c (dg-final): > Remove vpmovqb scan-assembler xfail. > * gcc.target/i386/pr92658-sse4.c (dg-final): > Remove pmovzxbq scan-assembler xfail. > * gcc.target/i386/pr92658-sse4-2.c (dg-final): > Remove pmovsxbq scan-assembler xfail. > * gcc.target/i386/warn-vect-op-2.c (dg-warning): Adjust warnings. Now pushed to master. Uros.