On Thu, Aug 15, 2024 at 11:14 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch follows up on the previous patch to fix PR target/116275 by > improving the code STV (ultimately) generates for highpart sign extensions > like (x<<8)>>8. The arithmetic right shift is able to take advantage of > the available common subexpressions from the preceding left shift. > > Hence previously with -O2 -m32 -mavx -mno-avx512vl we'd generate: > > vpsllq $8, %xmm0, %xmm0 > vpsrad $8, %xmm0, %xmm1 > vpsrlq $8, %xmm0, %xmm0 > vpblendw $51, %xmm0, %xmm1, %xmm0 > > But with improved splitting, we now generate three instructions: > > vpslld $8, %xmm1, %xmm0 > vpsrad $8, %xmm0, %xmm0 > vpblendw $51, %xmm1, %xmm0, %xmm0 > > This patch also implements Uros' suggestion that the pre-reload > splitter could introduced a new pseudo to hold the intermediate > to potentially help reload with register allocation, which applies > when not performing the above optimization, i.e. on TARGET_XOP. > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2024-08-15 Roger Sayle <ro...@nextmovesoftware.com> > Uros Bizjak <ubiz...@gmail.com> > > gcc/ChangeLog > * config/i386/i386.md (*extendv2di2_highpart_stv_noavx512vl): Split > to an improved implementation on !TARGET_XOP. On TARGET_XOP, use > a new pseudo for the intermediate to simplify register allocation. > > gcc/testsuite/ChangeLog > * g++.target/i386/pr116275-2.C: New test case.
LGTM. Thanks, Uros.