On Fri, Apr 26, 2013 at 5:50 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> This patch fixes two wrong-code bugs with -mxop. > One is that vpmacsdqh instruction can be only used for > vec_widen_smult_odd_v4si > but not vec_widen_umult_odd_v4si. Consider we have > unsigned V4SImode h* with arguments > { 3, 3, 3, 3 } h* { 0xaaaaaaab, 0xaaaaaaab, 0xaaaaaaab, 0xaaaaaaab } > (but not known at compile time). If we use vpmacsdqh, it sign-extends > the numbers and thus computes (3 * 0xffffffffaaaaaaabULL) >> 32, > i.e. 0xffffffff, while we want (3 * 0xaaaaaaabULL) >> 32, i.e. 2. > > The second bug is in wrong shift count for immediate xop_rotr. > We want element bitsize - immediate to transform the r>> immediate > into r<< immediate, but (<ssescalarnum> * 8) is correct for that only > for V4SImode - 32. For V2DImode it is 16 instead of the desired > 64, for V8HImode it is 64 instead of the desired 16 and for V16QImode > it is 128 instead of the desired 8. > > Bootstrapped/regtested on x86_64-linux, configured --with-arch=bdver2, > fixes: > > -FAIL: gcc.c-torture/execute/pr51581-1.c execution, -O3 -fomit-frame-pointer > -FAIL: gcc.c-torture/execute/pr51581-1.c execution, -O3 -fomit-frame-pointer > -funroll-loops > -FAIL: gcc.c-torture/execute/pr51581-1.c execution, -O3 -fomit-frame-pointer > -funroll-all-loops -finline-functions > -FAIL: gcc.c-torture/execute/pr51581-1.c execution, -O3 -g > -FAIL: gcc.c-torture/execute/pr51581-2.c execution, -O3 -fomit-frame-pointer > -FAIL: gcc.c-torture/execute/pr51581-2.c execution, -O3 -fomit-frame-pointer > -funroll-loops > -FAIL: gcc.c-torture/execute/pr51581-2.c execution, -O3 -fomit-frame-pointer > -funroll-all-loops -finline-functions > -FAIL: gcc.c-torture/execute/pr51581-2.c execution, -O3 -g > -FAIL: gcc.c-torture/execute/pr53645.c execution, -O1 > -FAIL: gcc.c-torture/execute/pr53645.c execution, -O2 > -FAIL: gcc.c-torture/execute/pr53645.c execution, -O3 -fomit-frame-pointer > -FAIL: gcc.c-torture/execute/pr53645.c execution, -O3 -fomit-frame-pointer > -funroll-loops > -FAIL: gcc.c-torture/execute/pr53645.c execution, -O3 -fomit-frame-pointer > -funroll-all-loops -finline-functions > -FAIL: gcc.c-torture/execute/pr53645.c execution, -O3 -g > -FAIL: gcc.c-torture/execute/pr53645.c execution, -Os > -FAIL: gcc.c-torture/execute/pr53645.c execution, -Og -g > -FAIL: gcc.c-torture/execute/pr53645.c execution, -O2 -flto > -fno-use-linker-plugin -flto-partition=none > -FAIL: gcc.c-torture/execute/pr53645.c execution, -O2 -flto > -fuse-linker-plugin -fno-fat-lto-objects > -FAIL: gcc.c-torture/execute/pr56866.c execution, -O3 -fomit-frame-pointer > -FAIL: gcc.c-torture/execute/pr56866.c execution, -O3 -fomit-frame-pointer > -funroll-loops > -FAIL: gcc.c-torture/execute/pr56866.c execution, -O3 -fomit-frame-pointer > -funroll-all-loops -finline-functions > -FAIL: gcc.c-torture/execute/pr56866.c execution, -O3 -g > -FAIL: gcc.dg/vect/pr51581-1.c execution test > -FAIL: gcc.dg/vect/pr51581-2.c execution test > -FAIL: gcc.dg/vect/pr51581-3.c execution test > -FAIL: gcc.dg/vect/pr51581-1.c -flto execution test > -FAIL: gcc.dg/vect/pr51581-2.c -flto execution test > -FAIL: gcc.dg/vect/pr51581-3.c -flto execution test > -FAIL: gcc.target/i386/avx-mul-1.c execution test > -FAIL: gcc.target/i386/avx-pr51581-1.c execution test > -FAIL: gcc.target/i386/avx-pr51581-2.c execution test > -FAIL: gcc.target/i386/pr56866.c execution test > -FAIL: gcc.target/i386/sse2-mul-1.c execution test > -FAIL: gcc.target/i386/sse4_1-mul-1.c execution test > -FAIL: gcc.target/i386/xop-mul-1.c execution test > > failures that appear with stock gcc just with the testsuite/ > part of the patch applied. Ok for trunk/4.8 and partly for 4.7 > (the i386.c bug has been introduced in 2012-06-25 but the sse.md > bug existed in 4.7 already)? > > 2013-04-26 Jakub Jelinek <ja...@redhat.com> > > PR target/56866 > * config/i386/i386.c (ix86_expand_mul_widen_evenodd): Don't > use xop_pmacsdqh if uns_p. > * config/i386/sse.md (xop_rotr<mode>3): Fix up computation of > the immediate rotate count. > > * gcc.c-torture/execute/pr56866.c: New test. > * gcc.target/i386/pr56866.c: New test. > > --- gcc/config/i386/i386.c.jj 2013-04-22 10:26:22.000000000 +0200 > +++ gcc/config/i386/i386.c 2013-04-26 10:28:51.793534370 +0200 > @@ -40841,7 +40841,7 @@ ix86_expand_mul_widen_evenodd (rtx dest, > the even slots. For some cpus this is faster than a PSHUFD. */ > if (odd_p) > { > - if (TARGET_XOP && mode == V4SImode) > + if (TARGET_XOP && mode == V4SImode && !uns_p) Please add a small comment on why !uns_p is needed here. OK everywhere with the above addition. Thanks, Uros.