[Bug tree-optimization/56624] New: Vectorizer gives up on a group-access if it contains stores to the same location

2013-03-15 Thread michael.v.zolotukhin at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56624



 Bug #: 56624

   Summary: Vectorizer gives up on a group-access if it contains

stores to the same location

Classification: Unclassified

   Product: gcc

   Version: 4.8.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: tree-optimization

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: michael.v.zolotuk...@gmail.com

 Build: 4.8.0 20130313





Created attachment 29672

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29672

Reproducer



GCC can't vectorize such loop:

void foo (double *a)

{

  int i;

  for (i = 0; i < 100; i+=2)

{

  a[i+1] = 2;

  a[i] = 3;

  a[i+1] = 2;

  a[i] = 3;

}

}

Vectorizer reports following:



note: === vect_analyze_data_ref_accesses === 

note: Detected interleaving of size 2 

note: Two store stmts share the same dr. 

note: not vectorized: complicated access pattern.



Obviously, in this given case vectorization is possible because the first

stores have no effect.



This test is a reproducer of similar problem encountered on Spec2006/470.lbm -

there if-conversion could produce stores to the same location which will stop

vectorizer.



The test is attached, command line to reproduce:

gcc group_access.c -O3 -c -ftree-vectorizer-verbose=15


[Bug tree-optimization/56625] New: After if-conversion vectorizer doesn't recognize similar stores

2013-03-15 Thread michael.v.zolotukhin at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56625



 Bug #: 56625

   Summary: After if-conversion vectorizer doesn't recognize

similar stores

Classification: Unclassified

   Product: gcc

   Version: 4.8.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: tree-optimization

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: michael.v.zolotuk...@gmail.com

CC: kirill.yuk...@intel.com,

michael.v.zolotuk...@gmail.com

 Build: 4.8.0 20130313





Created attachment 29673

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29673

Reproducer



In the following example there are two stores a[i]<-b[i] after if-conversion.

void foo (double a[], double b[])

{

  int i;

  for (i = 0; i < 100; i++)

{

  if (a[i] == 0)

a[i] = b[i]*4;

  else

a[i] = b[i]*3;

}

}

As vectorizer knows nothing about dependencies between a and b, it needs a

runtime test for it. But in the given example, vectorizer generates two

runtime-tests instead of one:

note: mark for run-time aliasing test between *_11 and *_8

note: mark for run-time aliasing test between *_15 and *_8



The test is attached, command line to reproduce:

gcc if-conv-runtime-tests.c -O3 -c -ftree-vectorizer-verbose=15

-ftree-loop-if-convert-stores  -fdump-tree-vect


[Bug tree-optimization/56624] Vectorizer gives up on a group-access if it contains stores to the same location

2013-03-15 Thread michael.v.zolotukhin at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56624



--- Comment #2 from Michael Zolotukhin  
2013-03-15 12:19:50 UTC ---

> Can you reproduce a testcase for that instead?  It doesn't make sense

> to handle code that should be optimized earlier (by DSE).  Is it from

> code like

> 

>  if (cond)

>a[i] = 3;

>  else

>a[i] = 3;

> 

> ?



Yes, originally it is from the code similar to your example, but this example

has one more problem which hides the one described in this tracker. I've

submitted one more bug with the test almost like yours (56625).


[Bug tree-optimization/56624] Vectorizer gives up on a group-access if it contains stores to the same location

2013-03-15 Thread michael.v.zolotukhin at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56624



--- Comment #3 from Michael Zolotukhin  
2013-03-15 12:26:46 UTC ---

Created attachment 29674

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29674

Reproducer 2


[Bug tree-optimization/56624] Vectorizer gives up on a group-access if it contains stores to the same location

2013-03-15 Thread michael.v.zolotukhin at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56624



--- Comment #4 from Michael Zolotukhin  
2013-03-15 12:27:51 UTC ---

Sorry, it looks like the reproducer with if could be made, and here it is:

void foo (long *a)

{

  int i;

  for (i = 0; i < 100; i+=2)

{

  if (a[i] == 0)

{

  a[i+1] = 2;

  a[i] = 3;

}

  else

{

  a[i+1] = 3;

  a[i] = 4;

}

}

}

In this example we have:

group_access2.c:4: note: === vect_analyze_data_ref_accesses ===

group_access2.c:4: note: READ_WRITE dependence in interleaving.

group_access2.c:4: note: not vectorized: complicated access pattern.

group_access2.c:4: note: bad data access.

group_access2.c:1: note: vectorized 0 loops in function.



The diagnostic is a bit different, but rootcause is the same I guess.



The test is attached (reproducer 2).


[Bug tree-optimization/54241] Routine hoist_adjacent_loads does not work properly after r189366

2012-08-13 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54241

Richard Guenther  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE

--- Comment #1 from Richard Guenther  2012-08-13 
12:39:10 UTC ---
.

*** This bug has been marked as a duplicate of bug 54240 ***

--- Comment #2 from Michael Zolotukhin  
2012-08-13 12:39:23 UTC ---
Created attachment 28004
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28004
test-case confirming the issue


[Bug tree-optimization/54240] Routine hoist_adjacent_loads does not work properly after r189366

2012-08-13 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54240

--- Comment #3 from William J. Schmidt  2012-08-13 
14:14:59 UTC ---
Odd, I don't know.  I'll have to go back and look at the tests when I get a
moment and investigate that.  Peculiar.

--- Comment #4 from Michael Zolotukhin  
2012-08-13 14:15:08 UTC ---
Created attachment 28006
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28006
test-case confirming the issue


[Bug tree-optimization/50188] New: Optimizer doesn't take into account, that longjmp could lead to loops, which causes illegal code transformations.

2011-08-25 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50188

 Bug #: 50188
   Summary: Optimizer doesn't take into account, that longjmp
could lead to loops, which causes illegal code
transformations.
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: michael.v.zolotuk...@gmail.com


Created attachment 25104
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25104
Bug reproducer.

Setjmp/longjmp could form loops, like following:
  int i = 0;
  (void) setjmp (env);
  i++;
  if (i < 10)
longjmp (env, 0);

Optimizer removes check 'if(i<10)' and 'i++' (seemingly, as a dead code). In
this example after such transformations the loop becomes infinite.


[Bug tree-optimization/50188] Optimizer doesn't take into account, that longjmp could lead to loops, which causes illegal code transformations.

2011-08-25 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50188

--- Comment #4 from Michael Zolotukhin  
2011-08-25 20:21:10 UTC ---
If I understand standard correctly, in this case behavior isn't undefined. Am I
right?
If so, then if behavior of optimized code (loop is infinite) is correct,
behavior of not optimized code isn't.


[Bug target/59363] [4.9 Regression] r203886 miscompiles git

2013-12-02 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59363

--- Comment #6 from Michael Zolotukhin  
---
I wasn't able to reproduce the problem, though I got the same asm-files as you
showed.

However, the both asms look correct to me, and equivalent to each other.

Could the problem be in function xdi_diff?
Maybe it's compiled with some different flags?

Though, I might miss something - I'll continue digging into the problem.


[Bug target/59363] [4.9 Regression] r203886 miscompiles git

2013-12-02 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59363

--- Comment #21 from Michael Zolotukhin  
---
Thanks, HJ! That seems to be the root cause of the fail.
Did I get it right, that you are testing a patch fixing the issue?

Similar issue could be in expand_movmem_epilogue, I'll take a look.


[Bug target/51134] [4.7 Regression] x86 memset/memcpy expansion is broken

2011-11-22 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51134

--- Comment #10 from Michael Zolotukhin  
2011-11-22 13:17:38 UTC ---
Created attachment 25882
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25882
Patch for new memset/memcpy implementation

(In reply to comment #9)
> Regressions caused by the new memset/memcpy expansion are
> 
> FAIL: 27_io/basic_filebuf/seekoff/wchar_t/1.cc execution test
> FAIL: 27_io/basic_filebuf/seekpos/wchar_t/9874.cc execution test
> FAIL: 27_io/manipulators/basefield/char/1.cc execution test
> FAIL: 27_io/manipulators/basefield/wchar_t/1.cc execution test
> FAIL: 27_io/objects/wchar_t/12.cc execution test
> FAIL: 27_io/objects/wchar_t/13.cc execution test
> FAIL: events run
> FAIL: gcc.target/i386/cleanup-1.c execution test
> FAIL: gcc.target/i386/cleanup-2.c execution test
> FAIL: gcc.target/i386/pr34077.c (internal compiler error)
> FAIL: gcc.target/i386/pr34077.c (internal compiler error)
> FAIL: gcc.target/i386/pr34077.c (test for excess errors)
> FAIL: gcc.target/i386/pr34077.c (test for excess errors)
> FAIL: gfortran.fortran-torture/execute/arrayarg.f90 execution,  -O2
> -fomit-frame-pointer -finline-functions -funroll-loops

With the attached patch fails on PR34077 and arrayarg.f90 are fixed, fails on
cleanup-tests are probably caused by incorrect testcase (see
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503).


[Bug target/51134] [4.7 Regression] x86 memset/memcpy expansion is broken

2011-11-22 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51134

--- Comment #12 from Michael Zolotukhin  
2011-11-22 16:54:35 UTC ---
> Do you have a patch for those C++ and Java regressions?

What regressions do you mean exactly? I managed to fix the bootstraps
(with sse_loop enabled again), but there are still some fails, so I
don't send the patch.

Currently I don't have a fix that solves all the problems - the
attached to previous letter patch fixes some of them, but there are
other fails (in 27_io and in some specs2k). I'm continuing debugging
and hope to finish fixes soon.

On 22 November 2011 20:34, hjl.tools at gmail dot com
 wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51134
>
> --- Comment #11 from H.J. Lu  2011-11-22 16:34:54 
> UTC ---
> (In reply to comment #10)
>> Created attachment 25882 [details]
>> Patch for new memset/memcpy implementation
>>
>> (In reply to comment #9)
>> > Regressions caused by the new memset/memcpy expansion are
>> >
>> > FAIL: 27_io/basic_filebuf/seekoff/wchar_t/1.cc execution test
>> > FAIL: 27_io/basic_filebuf/seekpos/wchar_t/9874.cc execution test
>> > FAIL: 27_io/manipulators/basefield/char/1.cc execution test
>> > FAIL: 27_io/manipulators/basefield/wchar_t/1.cc execution test
>> > FAIL: 27_io/objects/wchar_t/12.cc execution test
>> > FAIL: 27_io/objects/wchar_t/13.cc execution test
>> > FAIL: events run
>> > FAIL: gcc.target/i386/cleanup-1.c execution test
>> > FAIL: gcc.target/i386/cleanup-2.c execution test
>> > FAIL: gcc.target/i386/pr34077.c (internal compiler error)
>> > FAIL: gcc.target/i386/pr34077.c (internal compiler error)
>> > FAIL: gcc.target/i386/pr34077.c (test for excess errors)
>> > FAIL: gcc.target/i386/pr34077.c (test for excess errors)
>> > FAIL: gfortran.fortran-torture/execute/arrayarg.f90 execution,  -O2
>> > -fomit-frame-pointer -finline-functions -funroll-loops
>>
>> With the attached patch fails on PR34077 and arrayarg.f90 are fixed, fails on
>> cleanup-tests are probably caused by incorrect testcase (see
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503).
>
> Do you have a patch for those C++ and Java regressions?


[Bug target/51387] New: Test vect.exp/vect-116.c fails on execution when compiled with -mavx2 on sde.

2011-12-01 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51387

 Bug #: 51387
   Summary: Test vect.exp/vect-116.c fails on execution when
compiled with -mavx2 on sde.
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: michael.v.zolotuk...@gmail.com


Reproducer:
make check-gcc RUNTESTFLAGS="vect.exp=vect-116.c --target_board='sde-sim{-m32\
-mavx2,-mavx2}'"


[Bug target/51387] Test vect.exp/vect-116.c fails on execution when compiled with -mavx2 on sde.

2011-12-01 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51387

Michael Zolotukhin  changed:

   What|Removed |Added

 CC||michael.v.zolotukhin at
   ||gmail dot com

--- Comment #1 from Michael Zolotukhin  
2011-12-02 05:53:25 UTC ---
I added prints to the test and saw that the result was incorrectly permuted
(bytes 8-15 and 16-23 should be swapped):

  i  C[i] correct C[i]
  000
  111
  244
  399
  4   16   16
  5   25   25
  6   36   36
  7   49   49
  80   64
  9   33   81
 10   68  100
 11  105  121
 12  144  144
 13  185  169
 14  228  196
 15   17  225
 16   640
 17   81   33
 18  100   68
 19  121  105
 20  144  144
 21  169  185
 22  196  228
 23  225   17
 24   64   64
 25  113  113
 26  164  164
 27  217  217
 28   16   16
 29   73   73
 30  132  132
 31  193  193


Looks like there is a problem in function
expand_vec_perm_vpshufb2_vpermq_even_odd (config/i386/i386.c) in this code:

  /* Permute the V4DImode quarters using { 0, 2, 1, 3 } permutation.  */
  op = gen_lowpart (V4DImode, d->target);
  ior = gen_lowpart (V4DImode, ior);
  emit_insn (gen_avx2_permv4di_1 (op, ior, const0_rtx, const2_rtx,
  const1_rtx, GEN_INT (3)));

What do we need this permute for?


[Bug middle-end/51508] New: Test vect.exp/fast-math-mgrid-resid.f fails when compiled with -mavx2.

2011-12-11 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51508

 Bug #: 51508
   Summary: Test vect.exp/fast-math-mgrid-resid.f fails when
compiled with -mavx2.
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: michael.v.zolotuk...@gmail.com


This test checks work of predictive commoning phase, which fails when AVX2 is
available.


[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux

2011-12-28 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693

--- Comment #1 from Michael Zolotukhin  
2011-12-28 11:08:36 UTC ---
I though that if {vect_aligned_arrays} isn't true, than arrays could
be aligned even after peeling - that's why I added such check.
Unfortunately, I can't reproduce these fails, as I have no PowerPC. By
the way, if arrays aren't aligned on Power, why does GCC produce such
messages - does it really try to peel something? Maybe we should just
refine the check?
Anyway, if everything is ok with the tests (in original version) and
with gcc itself - we could check not for vect_aligned_arrays, but for
AVX. Please check
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01600.html and the
attached to that letter patch.

Thanks, Michael


On 28 December 2011 14:51, irar at il dot ibm.com
 wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693
>
>             Bug #: 51693
>           Summary: New XPASSes in vectorizer testsuite on
>                    powerpc64-suse-linux
>    Classification: Unclassified
>           Product: gcc
>           Version: 4.7.0
>            Status: UNCONFIRMED
>          Severity: normal
>          Priority: P3
>         Component: testsuite
>        AssignedTo: unassig...@gcc.gnu.org
>        ReportedBy: i...@il.ibm.com
>                CC: michael.v.zolotuk...@gmail.com
>              Host: powerpc64-suse-linux
>            Target: powerpc64-suse-linux
>             Build: powerpc64-suse-linux
>
>
> Revision 182583 http://gcc.gnu.org/viewcvs?view=revision&revision=182583 
> caused
> several XPASSes on powerpc64-suse-linux:
>
> XPASS: gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect "Alignment of
> access forced using peeling" 2
> XPASS: gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect "Vectorizing
> an unaligned access" 4
> XPASS: gcc.dg/vect/vect-peel-3.c scan-tree-dump-times vect "Vectorizing an
> unaligned access" 1
> XPASS: gcc.dg/vect/vect-peel-3.c scan-tree-dump-times vect "Alignment of 
> access
> forced using peeling" 1
> XPASS: gcc.dg/vect/vect-multitypes-1.c -flto scan-tree-dump-times vect
> "Alignment of access forced using peeling" 2
> XPASS: gcc.dg/vect/vect-multitypes-1.c -flto scan-tree-dump-times vect
> "Vectorizing an unaligned access" 4
> XPASS: gcc.dg/vect/vect-peel-3.c -flto scan-tree-dump-times vect "Vectorizing
> an unaligned access" 1
> XPASS: gcc.dg/vect/vect-peel-3.c -flto scan-tree-dump-times vect "Alignment of
> access forced using peeling" 1
> XPASS: gcc.dg/vect/no-section-anchors-vect-69.c scan-tree-dump-times vect
> "Alignment of access forced using peeling" 2
>
> The reason is that {!vect_aligned_arrays} was added to xfail of the above
> checks, while vect_aligned_arrays is false for power.
>
> Changing that, i.e.:
> Index: ../../lib/target-supports.exp
> ===
> --- ../../lib/target-supports.exp       (revision 182703)
> +++ ../../lib/target-supports.exp       (working copy)
> @@ -3222,7 +3222,8 @@ proc check_effective_target_vect_aligned_arrays {
>                 set et_vect_aligned_arrays_saved 1
>            }
>        }
> -        if [istarget spu-*-*] {
> +        if {[istarget spu-*-*]
> +           || [istarget powerpc*-*-*] } {
>            set et_vect_aligned_arrays_saved 1
>        }
>     }
>
> fixes the XPASSes and doesn't cause any problems (on powerpc64-suse-linux), 
> but
> AFAIU arrays are not always vector aligned on power, so this is not a good
> idea, unless we change the definition of
> check_effective_target_vect_aligned_arrays.
>
> What was the purpose of adding {!vect_aligned_arrays} to these tests? If
> peeling is impossible on AVX because arrays are never vector aligned, maybe we
> need a new target check instead of vect_aligned_arrays?
>
> --
> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
> --- You are receiving this mail because: ---
> You are on the CC list for the bug.


[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux

2011-12-28 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693

--- Comment #3 from Michael Zolotukhin  
2011-12-28 12:59:24 UTC ---
Created attachment 26195
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26195
AVX2 vect dump


[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux

2011-12-28 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693

--- Comment #4 from Michael Zolotukhin  
2011-12-28 13:01:51 UTC ---
(In reply to comment #2)
> > I though that if {vect_aligned_arrays} isn't true, than arrays could
> > be aligned even after peeling - that's why I added such check.
> 
> Sorry, I don't understand this sentence. What do you mean by aligned after
> peeling? Could you please explain what exactly happens on AVX (a dump file 
> with
> -fdump-tree-vect-details would be the best thing).
Sorry, I misspelled. I meant "than arrays couldn't be aligned" - at least
without some runtime checks. I.e. we can't peel some compile-time-known number
of iterations and be sure that array become aligned.

E.g., if we have array IA of ints aligned to 16-bytes, and we have access
IA[i+3], then peeling of one iteration will guarantee alignment to 16-byte. But
we don't know, how much iterations needs to be peeled to reach alignment to
32-bytes (as needed for AVX operations).

> > Unfortunately, I can't reproduce these fails, as I have no PowerPC. By
> > the way, if arrays aren't aligned on Power, why does GCC produce such
> > messages - does it really try to peel something? 
> 
> The arrays in the tests are aligned. I said that I think that we can't promise
> that all the arrays are vector aligned on power. BTW, we can peel for unknown
> misalignment as well.

In this case we shouldn't add Power to vector_aligned_arrays, I guess.

> > Maybe we should just
> > refine the check?
> > Anyway, if everything is ok with the tests (in original version) and
> > with gcc itself - we could check not for vect_aligned_arrays, but for
> > AVX. Please check
> > http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01600.html and the
> > attached to that letter patch.
> 
> I think that everything was ok, but I don't think that using 
> vect_sizes_32B_16B
> is a good idea. I would really like to see an AVX vect dump for eg.
> vect-peel-3.c.

In vect-peel-3.c we actually assume that vector length is 16 byte. Here is the
loop body:
  suma += ia[i];
  sumb += ib[i+5];
  sumc += ic[i+1];
When vector-size is 16, then peeling can make two of three accesses aligned,
but when vector size is 32 that's impossible. That's why using
vector_sizes_32B_16B might be correct here.

Also, I uploaded the dump you asked.

Michael

> Thanks,
> Ira
>


[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux

2011-12-28 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693

--- Comment #6 from Michael Zolotukhin  
2011-12-28 16:19:54 UTC ---
(In reply to comment #5)
> > In vect-peel-3.c we actually assume that vector length is 16 byte. Here is 
> > the
> > loop body:
> >   suma += ia[i];
> >   sumb += ib[i+5];
> >   sumc += ic[i+1];
> > When vector-size is 16, then peeling can make two of three accesses aligned,
> > but when vector size is 32 that's impossible. That's why using
> > vector_sizes_32B_16B might be correct here.
> 
> Ah, now I understand. I was confused by vect_aligned_arrays, and it's
> irrelevant here, right?
Actually yes, you're right. I think, ideally, vect_aligned_arrays should be
somehow checked in such tests, as in them we assume that array's beginning is
aligned - but that's not the rootcause of the xpasses.

> Yes, vector_sizes_32B_16B seems to be ok in that case.
Other two tests (vect-multitypes-1.c and no-section-anchors-vect-69.c) look
like having the same problem - are you ok for similar fix for them too, i.e. is
patch
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01600/vec-tests-avx2_fixes-7.patch
ok for trunk?

Thanks, Michael


[Bug testsuite/49503] New: Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c

2011-06-22 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503

   Summary: Incorrect stack alignment, produced by inline
assembler in tests gcc.target/i386/cleanup-1.c and
gcc.target/i386/cleanup-2.c
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: michael.v.zolotuk...@gmail.com
CC: michael.v.zolotuk...@gmail.com
  Host: x64
Target: x64
 Build: 4.7.0 20110619


The tests contain asm-listing like this:
__asm (
  testl  %0, %0
  jnz1f
  .subsection 1
  .type  _L_mutex_lock_%=, @function
_L_mutex_lock_%=:
1:leaq   %1, %%rdi
2:subq   $128, %%rsp
3:call   bar
4:addq   $128, %%rsp
5:jmp21f
...

As _L_mutex_lock is a function, GCC generates a prologue and epilogue for it -
in prologue stack alignment is performed (according to ABI64, stack should be
aligned to 128-bit).
Before a call, SP is assumed to be a multiple of 16, at function entry, when
return address is pushed to stack, (SP+8) becomes multiple of 16 - GCC uses
these assumptions when generating prologue for stack alignment.
But if JUMP is used instead of CALL, 8-byte displacement doesn't take place,
and stack becomes unaligned - that's violation of ABI.

To fix this error, JUMP should be replaced with CALL, or L_mutex_lock shouldn't
be declared as a function.


[Bug testsuite/49503] Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c

2011-06-22 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503

--- Comment #2 from Michael Zolotukhin  
2011-06-22 17:50:34 UTC ---
(In reply to comment #1)
> (In reply to comment #0)
> > 
> > As _L_mutex_lock is a function, GCC generates a prologue and epilogue for 
> > it -
> > in prologue stack alignment is performed (according to ABI64, stack should 
> > be
> > aligned to 128-bit).
> 
> I didn't see any prologue and epilogue for _L_mutex_lock. Do you have
> a run-time testcase to show the problem?

I don't have run-time test for this fail, but here is a way to see the problem:

Firstly, you need to compile the test (it is in gcc/testsuite/gcc.target/i386):

gcc cleanup-1.c   -fexceptions -fnon-call-exceptions
-fasynchronous-unwind-tables -O0  -lm -g -fno-inline

Then, in debugger one can see that before each callq before foo (here 'before'
means 'upper in call stack') RSP contains aligned value, after foo (deeper in
call stack) - unaligned.


Here is corresponding a assembler dump:

-Dump of assembler code for function foo:
   0x00400886 <+0>: push   %rbp
   0x00400887 <+1>: mov%rsp,%rbp
   0x0040088a <+4>: push   %r12
   0x0040088c <+6>: push   %rbx
   0x0040088d <+7>: sub$0x98,%rsp # It is the prologue,
   0x00400894 <+14>:mov%edi,-0x114(%rbp)  # that I've spoken
   0x0040089a <+20>:mov-0x114(%rbp),%ebx  # about.
   0x004008a0 <+26>:lea-0x110(%rbp),%r12  #
   0x004008a7 <+33>:test   %ebx,%ebx
   0x004008a9 <+35>:jne0x400941 <_L_mutex_lock_216>
   0x004008af <+41>:add$0x98,%rsp
   0x004008b6 <+48>:pop%rbx
   0x004008b7 <+49>:pop%r12
   0x004008b9 <+51>:leaveq
   0x004008ba <+52>:retq
-End of assembler dump.

RTL dumps showed that these instructions appear phase "197r.pro_and_epilogue".

I'm not sure, if prologue generation is incorrect - in my opinion the problem
is with such untrivial call of _L_mutex_lock.


[Bug testsuite/49503] Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c

2011-06-23 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503

--- Comment #4 from Michael Zolotukhin  
2011-06-23 11:42:03 UTC ---
Created attachment 24584
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24584
Test showing that cleanup-* tests are not quite correct.


[Bug testsuite/49503] Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c

2011-06-23 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503

--- Comment #5 from Michael Zolotukhin  
2011-06-23 11:44:17 UTC ---
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > (In reply to comment #0)
> > > > 
> > > > As _L_mutex_lock is a function, GCC generates a prologue and epilogue 
> > > > for it -
> > > > in prologue stack alignment is performed (according to ABI64, stack 
> > > > should be
> > > > aligned to 128-bit).
> > > 
> > > I didn't see any prologue and epilogue for _L_mutex_lock. Do you have
> > > a run-time testcase to show the problem?
> > 
> > I don't have run-time test for this fail, but here is a way to see the 
> > problem:
> > 
> 
> It is very easy to check if stack alignment is correct at run-time.
> Please see how it is done in testcases under gcc.dg/torture/stackalign.

Please find the testcase attached.
It passes on 32 bits and fails on 64 bits (4.7.0 and 4.5.1(RedHat) GCC was
checked).


[Bug testsuite/49503] Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c

2011-06-23 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503

--- Comment #7 from Michael Zolotukhin  
2011-06-23 16:00:36 UTC ---
Created attachment 24587
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24587
Fixed cleanup-regression.c


[Bug testsuite/49503] Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c

2011-06-23 Thread michael.v.zolotukhin at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503

--- Comment #8 from Michael Zolotukhin  
2011-06-23 16:03:32 UTC ---
(In reply to comment #6)
> (In reply to comment #4)
> > Created attachment 24584 [details]
> > Test showing that cleanup-* tests are not quite correct.
> 
> Wrong test.  You don't know the alignment of buf.
> You should put an alignment attribute on buf.

I added new test, with align attribute.
Compiler sometimes could make stack aligned (accidentally), and the bug
wouldn't show up.
To watch the fail, try use following command line:
gcc cleanup-regression-2.c -O0 -lm -fno-inline -fexceptions
-fnon-call-exceptions -fasynchronous-unwind-tables && ./a.out