On Fri, Apr 26, 2013 at 5:50 PM, Jakub Jelinek <ja...@redhat.com> wrote:

> This patch fixes two wrong-code bugs with -mxop.
> One is that vpmacsdqh instruction can be only used for 
> vec_widen_smult_odd_v4si
> but not vec_widen_umult_odd_v4si.  Consider we have
> unsigned V4SImode h* with arguments
> { 3, 3, 3, 3 } h* { 0xaaaaaaab, 0xaaaaaaab, 0xaaaaaaab, 0xaaaaaaab }
> (but not known at compile time).  If we use vpmacsdqh, it sign-extends
> the numbers and thus computes (3 * 0xffffffffaaaaaaabULL) >> 32,
> i.e. 0xffffffff, while we want (3 * 0xaaaaaaabULL) >> 32, i.e. 2.
>
> The second bug is in wrong shift count for immediate xop_rotr.
> We want element bitsize - immediate to transform the r>> immediate
> into r<< immediate, but (<ssescalarnum> * 8) is correct for that only
> for V4SImode - 32.  For V2DImode it is 16 instead of the desired
> 64, for V8HImode it is 64 instead of the desired 16 and for V16QImode
> it is 128 instead of the desired 8.
>
> Bootstrapped/regtested on x86_64-linux, configured --with-arch=bdver2,
> fixes:
>
> -FAIL: gcc.c-torture/execute/pr51581-1.c execution,  -O3 -fomit-frame-pointer
> -FAIL: gcc.c-torture/execute/pr51581-1.c execution,  -O3 -fomit-frame-pointer 
> -funroll-loops
> -FAIL: gcc.c-torture/execute/pr51581-1.c execution,  -O3 -fomit-frame-pointer 
> -funroll-all-loops -finline-functions
> -FAIL: gcc.c-torture/execute/pr51581-1.c execution,  -O3 -g
> -FAIL: gcc.c-torture/execute/pr51581-2.c execution,  -O3 -fomit-frame-pointer
> -FAIL: gcc.c-torture/execute/pr51581-2.c execution,  -O3 -fomit-frame-pointer 
> -funroll-loops
> -FAIL: gcc.c-torture/execute/pr51581-2.c execution,  -O3 -fomit-frame-pointer 
> -funroll-all-loops -finline-functions
> -FAIL: gcc.c-torture/execute/pr51581-2.c execution,  -O3 -g
> -FAIL: gcc.c-torture/execute/pr53645.c execution,  -O1
> -FAIL: gcc.c-torture/execute/pr53645.c execution,  -O2
> -FAIL: gcc.c-torture/execute/pr53645.c execution,  -O3 -fomit-frame-pointer
> -FAIL: gcc.c-torture/execute/pr53645.c execution,  -O3 -fomit-frame-pointer 
> -funroll-loops
> -FAIL: gcc.c-torture/execute/pr53645.c execution,  -O3 -fomit-frame-pointer 
> -funroll-all-loops -finline-functions
> -FAIL: gcc.c-torture/execute/pr53645.c execution,  -O3 -g
> -FAIL: gcc.c-torture/execute/pr53645.c execution,  -Os
> -FAIL: gcc.c-torture/execute/pr53645.c execution,  -Og -g
> -FAIL: gcc.c-torture/execute/pr53645.c execution,  -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none
> -FAIL: gcc.c-torture/execute/pr53645.c execution,  -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects
> -FAIL: gcc.c-torture/execute/pr56866.c execution,  -O3 -fomit-frame-pointer
> -FAIL: gcc.c-torture/execute/pr56866.c execution,  -O3 -fomit-frame-pointer 
> -funroll-loops
> -FAIL: gcc.c-torture/execute/pr56866.c execution,  -O3 -fomit-frame-pointer 
> -funroll-all-loops -finline-functions
> -FAIL: gcc.c-torture/execute/pr56866.c execution,  -O3 -g
> -FAIL: gcc.dg/vect/pr51581-1.c execution test
> -FAIL: gcc.dg/vect/pr51581-2.c execution test
> -FAIL: gcc.dg/vect/pr51581-3.c execution test
> -FAIL: gcc.dg/vect/pr51581-1.c -flto execution test
> -FAIL: gcc.dg/vect/pr51581-2.c -flto execution test
> -FAIL: gcc.dg/vect/pr51581-3.c -flto execution test
> -FAIL: gcc.target/i386/avx-mul-1.c execution test
> -FAIL: gcc.target/i386/avx-pr51581-1.c execution test
> -FAIL: gcc.target/i386/avx-pr51581-2.c execution test
> -FAIL: gcc.target/i386/pr56866.c execution test
> -FAIL: gcc.target/i386/sse2-mul-1.c execution test
> -FAIL: gcc.target/i386/sse4_1-mul-1.c execution test
> -FAIL: gcc.target/i386/xop-mul-1.c execution test
>
> failures that appear with stock gcc just with the testsuite/
> part of the patch applied.  Ok for trunk/4.8 and partly for 4.7
> (the i386.c bug has been introduced in 2012-06-25 but the sse.md
> bug existed in 4.7 already)?
>
> 2013-04-26  Jakub Jelinek  <ja...@redhat.com>
>
>         PR target/56866
>         * config/i386/i386.c (ix86_expand_mul_widen_evenodd): Don't
>         use xop_pmacsdqh if uns_p.
>         * config/i386/sse.md (xop_rotr<mode>3): Fix up computation of
>         the immediate rotate count.
>
>         * gcc.c-torture/execute/pr56866.c: New test.
>         * gcc.target/i386/pr56866.c: New test.
>
> --- gcc/config/i386/i386.c.jj   2013-04-22 10:26:22.000000000 +0200
> +++ gcc/config/i386/i386.c      2013-04-26 10:28:51.793534370 +0200
> @@ -40841,7 +40841,7 @@ ix86_expand_mul_widen_evenodd (rtx dest,
>       the even slots.  For some cpus this is faster than a PSHUFD.  */
>    if (odd_p)
>      {
> -      if (TARGET_XOP && mode == V4SImode)
> +      if (TARGET_XOP && mode == V4SImode && !uns_p)

Please add a small comment on why !uns_p is needed here.

OK everywhere with the above addition.

Thanks,
Uros.

Reply via email to