[Bug tree-optimization/56624] New: Vectorizer gives up on a group-access if it contains stores to the same location
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56624 Bug #: 56624 Summary: Vectorizer gives up on a group-access if it contains stores to the same location Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: michael.v.zolotuk...@gmail.com Build: 4.8.0 20130313 Created attachment 29672 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29672 Reproducer GCC can't vectorize such loop: void foo (double *a) { int i; for (i = 0; i < 100; i+=2) { a[i+1] = 2; a[i] = 3; a[i+1] = 2; a[i] = 3; } } Vectorizer reports following: note: === vect_analyze_data_ref_accesses === note: Detected interleaving of size 2 note: Two store stmts share the same dr. note: not vectorized: complicated access pattern. Obviously, in this given case vectorization is possible because the first stores have no effect. This test is a reproducer of similar problem encountered on Spec2006/470.lbm - there if-conversion could produce stores to the same location which will stop vectorizer. The test is attached, command line to reproduce: gcc group_access.c -O3 -c -ftree-vectorizer-verbose=15
[Bug tree-optimization/56625] New: After if-conversion vectorizer doesn't recognize similar stores
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56625 Bug #: 56625 Summary: After if-conversion vectorizer doesn't recognize similar stores Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: michael.v.zolotuk...@gmail.com CC: kirill.yuk...@intel.com, michael.v.zolotuk...@gmail.com Build: 4.8.0 20130313 Created attachment 29673 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29673 Reproducer In the following example there are two stores a[i]<-b[i] after if-conversion. void foo (double a[], double b[]) { int i; for (i = 0; i < 100; i++) { if (a[i] == 0) a[i] = b[i]*4; else a[i] = b[i]*3; } } As vectorizer knows nothing about dependencies between a and b, it needs a runtime test for it. But in the given example, vectorizer generates two runtime-tests instead of one: note: mark for run-time aliasing test between *_11 and *_8 note: mark for run-time aliasing test between *_15 and *_8 The test is attached, command line to reproduce: gcc if-conv-runtime-tests.c -O3 -c -ftree-vectorizer-verbose=15 -ftree-loop-if-convert-stores -fdump-tree-vect
[Bug tree-optimization/56624] Vectorizer gives up on a group-access if it contains stores to the same location
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56624 --- Comment #2 from Michael Zolotukhin 2013-03-15 12:19:50 UTC --- > Can you reproduce a testcase for that instead? It doesn't make sense > to handle code that should be optimized earlier (by DSE). Is it from > code like > > if (cond) >a[i] = 3; > else >a[i] = 3; > > ? Yes, originally it is from the code similar to your example, but this example has one more problem which hides the one described in this tracker. I've submitted one more bug with the test almost like yours (56625).
[Bug tree-optimization/56624] Vectorizer gives up on a group-access if it contains stores to the same location
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56624 --- Comment #3 from Michael Zolotukhin 2013-03-15 12:26:46 UTC --- Created attachment 29674 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29674 Reproducer 2
[Bug tree-optimization/56624] Vectorizer gives up on a group-access if it contains stores to the same location
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56624 --- Comment #4 from Michael Zolotukhin 2013-03-15 12:27:51 UTC --- Sorry, it looks like the reproducer with if could be made, and here it is: void foo (long *a) { int i; for (i = 0; i < 100; i+=2) { if (a[i] == 0) { a[i+1] = 2; a[i] = 3; } else { a[i+1] = 3; a[i] = 4; } } } In this example we have: group_access2.c:4: note: === vect_analyze_data_ref_accesses === group_access2.c:4: note: READ_WRITE dependence in interleaving. group_access2.c:4: note: not vectorized: complicated access pattern. group_access2.c:4: note: bad data access. group_access2.c:1: note: vectorized 0 loops in function. The diagnostic is a bit different, but rootcause is the same I guess. The test is attached (reproducer 2).
[Bug tree-optimization/54241] Routine hoist_adjacent_loads does not work properly after r189366
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54241 Richard Guenther changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE --- Comment #1 from Richard Guenther 2012-08-13 12:39:10 UTC --- . *** This bug has been marked as a duplicate of bug 54240 *** --- Comment #2 from Michael Zolotukhin 2012-08-13 12:39:23 UTC --- Created attachment 28004 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28004 test-case confirming the issue
[Bug tree-optimization/54240] Routine hoist_adjacent_loads does not work properly after r189366
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54240 --- Comment #3 from William J. Schmidt 2012-08-13 14:14:59 UTC --- Odd, I don't know. I'll have to go back and look at the tests when I get a moment and investigate that. Peculiar. --- Comment #4 from Michael Zolotukhin 2012-08-13 14:15:08 UTC --- Created attachment 28006 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28006 test-case confirming the issue
[Bug tree-optimization/50188] New: Optimizer doesn't take into account, that longjmp could lead to loops, which causes illegal code transformations.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50188 Bug #: 50188 Summary: Optimizer doesn't take into account, that longjmp could lead to loops, which causes illegal code transformations. Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: michael.v.zolotuk...@gmail.com Created attachment 25104 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25104 Bug reproducer. Setjmp/longjmp could form loops, like following: int i = 0; (void) setjmp (env); i++; if (i < 10) longjmp (env, 0); Optimizer removes check 'if(i<10)' and 'i++' (seemingly, as a dead code). In this example after such transformations the loop becomes infinite.
[Bug tree-optimization/50188] Optimizer doesn't take into account, that longjmp could lead to loops, which causes illegal code transformations.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50188 --- Comment #4 from Michael Zolotukhin 2011-08-25 20:21:10 UTC --- If I understand standard correctly, in this case behavior isn't undefined. Am I right? If so, then if behavior of optimized code (loop is infinite) is correct, behavior of not optimized code isn't.
[Bug target/59363] [4.9 Regression] r203886 miscompiles git
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59363 --- Comment #6 from Michael Zolotukhin --- I wasn't able to reproduce the problem, though I got the same asm-files as you showed. However, the both asms look correct to me, and equivalent to each other. Could the problem be in function xdi_diff? Maybe it's compiled with some different flags? Though, I might miss something - I'll continue digging into the problem.
[Bug target/59363] [4.9 Regression] r203886 miscompiles git
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59363 --- Comment #21 from Michael Zolotukhin --- Thanks, HJ! That seems to be the root cause of the fail. Did I get it right, that you are testing a patch fixing the issue? Similar issue could be in expand_movmem_epilogue, I'll take a look.
[Bug target/51134] [4.7 Regression] x86 memset/memcpy expansion is broken
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51134 --- Comment #10 from Michael Zolotukhin 2011-11-22 13:17:38 UTC --- Created attachment 25882 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25882 Patch for new memset/memcpy implementation (In reply to comment #9) > Regressions caused by the new memset/memcpy expansion are > > FAIL: 27_io/basic_filebuf/seekoff/wchar_t/1.cc execution test > FAIL: 27_io/basic_filebuf/seekpos/wchar_t/9874.cc execution test > FAIL: 27_io/manipulators/basefield/char/1.cc execution test > FAIL: 27_io/manipulators/basefield/wchar_t/1.cc execution test > FAIL: 27_io/objects/wchar_t/12.cc execution test > FAIL: 27_io/objects/wchar_t/13.cc execution test > FAIL: events run > FAIL: gcc.target/i386/cleanup-1.c execution test > FAIL: gcc.target/i386/cleanup-2.c execution test > FAIL: gcc.target/i386/pr34077.c (internal compiler error) > FAIL: gcc.target/i386/pr34077.c (internal compiler error) > FAIL: gcc.target/i386/pr34077.c (test for excess errors) > FAIL: gcc.target/i386/pr34077.c (test for excess errors) > FAIL: gfortran.fortran-torture/execute/arrayarg.f90 execution, -O2 > -fomit-frame-pointer -finline-functions -funroll-loops With the attached patch fails on PR34077 and arrayarg.f90 are fixed, fails on cleanup-tests are probably caused by incorrect testcase (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503).
[Bug target/51134] [4.7 Regression] x86 memset/memcpy expansion is broken
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51134 --- Comment #12 from Michael Zolotukhin 2011-11-22 16:54:35 UTC --- > Do you have a patch for those C++ and Java regressions? What regressions do you mean exactly? I managed to fix the bootstraps (with sse_loop enabled again), but there are still some fails, so I don't send the patch. Currently I don't have a fix that solves all the problems - the attached to previous letter patch fixes some of them, but there are other fails (in 27_io and in some specs2k). I'm continuing debugging and hope to finish fixes soon. On 22 November 2011 20:34, hjl.tools at gmail dot com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51134 > > --- Comment #11 from H.J. Lu 2011-11-22 16:34:54 > UTC --- > (In reply to comment #10) >> Created attachment 25882 [details] >> Patch for new memset/memcpy implementation >> >> (In reply to comment #9) >> > Regressions caused by the new memset/memcpy expansion are >> > >> > FAIL: 27_io/basic_filebuf/seekoff/wchar_t/1.cc execution test >> > FAIL: 27_io/basic_filebuf/seekpos/wchar_t/9874.cc execution test >> > FAIL: 27_io/manipulators/basefield/char/1.cc execution test >> > FAIL: 27_io/manipulators/basefield/wchar_t/1.cc execution test >> > FAIL: 27_io/objects/wchar_t/12.cc execution test >> > FAIL: 27_io/objects/wchar_t/13.cc execution test >> > FAIL: events run >> > FAIL: gcc.target/i386/cleanup-1.c execution test >> > FAIL: gcc.target/i386/cleanup-2.c execution test >> > FAIL: gcc.target/i386/pr34077.c (internal compiler error) >> > FAIL: gcc.target/i386/pr34077.c (internal compiler error) >> > FAIL: gcc.target/i386/pr34077.c (test for excess errors) >> > FAIL: gcc.target/i386/pr34077.c (test for excess errors) >> > FAIL: gfortran.fortran-torture/execute/arrayarg.f90 execution, -O2 >> > -fomit-frame-pointer -finline-functions -funroll-loops >> >> With the attached patch fails on PR34077 and arrayarg.f90 are fixed, fails on >> cleanup-tests are probably caused by incorrect testcase (see >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503). > > Do you have a patch for those C++ and Java regressions?
[Bug target/51387] New: Test vect.exp/vect-116.c fails on execution when compiled with -mavx2 on sde.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51387 Bug #: 51387 Summary: Test vect.exp/vect-116.c fails on execution when compiled with -mavx2 on sde. Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: michael.v.zolotuk...@gmail.com Reproducer: make check-gcc RUNTESTFLAGS="vect.exp=vect-116.c --target_board='sde-sim{-m32\ -mavx2,-mavx2}'"
[Bug target/51387] Test vect.exp/vect-116.c fails on execution when compiled with -mavx2 on sde.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51387 Michael Zolotukhin changed: What|Removed |Added CC||michael.v.zolotukhin at ||gmail dot com --- Comment #1 from Michael Zolotukhin 2011-12-02 05:53:25 UTC --- I added prints to the test and saw that the result was incorrectly permuted (bytes 8-15 and 16-23 should be swapped): i C[i] correct C[i] 000 111 244 399 4 16 16 5 25 25 6 36 36 7 49 49 80 64 9 33 81 10 68 100 11 105 121 12 144 144 13 185 169 14 228 196 15 17 225 16 640 17 81 33 18 100 68 19 121 105 20 144 144 21 169 185 22 196 228 23 225 17 24 64 64 25 113 113 26 164 164 27 217 217 28 16 16 29 73 73 30 132 132 31 193 193 Looks like there is a problem in function expand_vec_perm_vpshufb2_vpermq_even_odd (config/i386/i386.c) in this code: /* Permute the V4DImode quarters using { 0, 2, 1, 3 } permutation. */ op = gen_lowpart (V4DImode, d->target); ior = gen_lowpart (V4DImode, ior); emit_insn (gen_avx2_permv4di_1 (op, ior, const0_rtx, const2_rtx, const1_rtx, GEN_INT (3))); What do we need this permute for?
[Bug middle-end/51508] New: Test vect.exp/fast-math-mgrid-resid.f fails when compiled with -mavx2.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51508 Bug #: 51508 Summary: Test vect.exp/fast-math-mgrid-resid.f fails when compiled with -mavx2. Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: michael.v.zolotuk...@gmail.com This test checks work of predictive commoning phase, which fails when AVX2 is available.
[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 --- Comment #1 from Michael Zolotukhin 2011-12-28 11:08:36 UTC --- I though that if {vect_aligned_arrays} isn't true, than arrays could be aligned even after peeling - that's why I added such check. Unfortunately, I can't reproduce these fails, as I have no PowerPC. By the way, if arrays aren't aligned on Power, why does GCC produce such messages - does it really try to peel something? Maybe we should just refine the check? Anyway, if everything is ok with the tests (in original version) and with gcc itself - we could check not for vect_aligned_arrays, but for AVX. Please check http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01600.html and the attached to that letter patch. Thanks, Michael On 28 December 2011 14:51, irar at il dot ibm.com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 > > Bug #: 51693 > Summary: New XPASSes in vectorizer testsuite on > powerpc64-suse-linux > Classification: Unclassified > Product: gcc > Version: 4.7.0 > Status: UNCONFIRMED > Severity: normal > Priority: P3 > Component: testsuite > AssignedTo: unassig...@gcc.gnu.org > ReportedBy: i...@il.ibm.com > CC: michael.v.zolotuk...@gmail.com > Host: powerpc64-suse-linux > Target: powerpc64-suse-linux > Build: powerpc64-suse-linux > > > Revision 182583 http://gcc.gnu.org/viewcvs?view=revision&revision=182583 > caused > several XPASSes on powerpc64-suse-linux: > > XPASS: gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect "Alignment of > access forced using peeling" 2 > XPASS: gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect "Vectorizing > an unaligned access" 4 > XPASS: gcc.dg/vect/vect-peel-3.c scan-tree-dump-times vect "Vectorizing an > unaligned access" 1 > XPASS: gcc.dg/vect/vect-peel-3.c scan-tree-dump-times vect "Alignment of > access > forced using peeling" 1 > XPASS: gcc.dg/vect/vect-multitypes-1.c -flto scan-tree-dump-times vect > "Alignment of access forced using peeling" 2 > XPASS: gcc.dg/vect/vect-multitypes-1.c -flto scan-tree-dump-times vect > "Vectorizing an unaligned access" 4 > XPASS: gcc.dg/vect/vect-peel-3.c -flto scan-tree-dump-times vect "Vectorizing > an unaligned access" 1 > XPASS: gcc.dg/vect/vect-peel-3.c -flto scan-tree-dump-times vect "Alignment of > access forced using peeling" 1 > XPASS: gcc.dg/vect/no-section-anchors-vect-69.c scan-tree-dump-times vect > "Alignment of access forced using peeling" 2 > > The reason is that {!vect_aligned_arrays} was added to xfail of the above > checks, while vect_aligned_arrays is false for power. > > Changing that, i.e.: > Index: ../../lib/target-supports.exp > === > --- ../../lib/target-supports.exp (revision 182703) > +++ ../../lib/target-supports.exp (working copy) > @@ -3222,7 +3222,8 @@ proc check_effective_target_vect_aligned_arrays { > set et_vect_aligned_arrays_saved 1 > } > } > - if [istarget spu-*-*] { > + if {[istarget spu-*-*] > + || [istarget powerpc*-*-*] } { > set et_vect_aligned_arrays_saved 1 > } > } > > fixes the XPASSes and doesn't cause any problems (on powerpc64-suse-linux), > but > AFAIU arrays are not always vector aligned on power, so this is not a good > idea, unless we change the definition of > check_effective_target_vect_aligned_arrays. > > What was the purpose of adding {!vect_aligned_arrays} to these tests? If > peeling is impossible on AVX because arrays are never vector aligned, maybe we > need a new target check instead of vect_aligned_arrays? > > -- > Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email > --- You are receiving this mail because: --- > You are on the CC list for the bug.
[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 --- Comment #3 from Michael Zolotukhin 2011-12-28 12:59:24 UTC --- Created attachment 26195 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26195 AVX2 vect dump
[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 --- Comment #4 from Michael Zolotukhin 2011-12-28 13:01:51 UTC --- (In reply to comment #2) > > I though that if {vect_aligned_arrays} isn't true, than arrays could > > be aligned even after peeling - that's why I added such check. > > Sorry, I don't understand this sentence. What do you mean by aligned after > peeling? Could you please explain what exactly happens on AVX (a dump file > with > -fdump-tree-vect-details would be the best thing). Sorry, I misspelled. I meant "than arrays couldn't be aligned" - at least without some runtime checks. I.e. we can't peel some compile-time-known number of iterations and be sure that array become aligned. E.g., if we have array IA of ints aligned to 16-bytes, and we have access IA[i+3], then peeling of one iteration will guarantee alignment to 16-byte. But we don't know, how much iterations needs to be peeled to reach alignment to 32-bytes (as needed for AVX operations). > > Unfortunately, I can't reproduce these fails, as I have no PowerPC. By > > the way, if arrays aren't aligned on Power, why does GCC produce such > > messages - does it really try to peel something? > > The arrays in the tests are aligned. I said that I think that we can't promise > that all the arrays are vector aligned on power. BTW, we can peel for unknown > misalignment as well. In this case we shouldn't add Power to vector_aligned_arrays, I guess. > > Maybe we should just > > refine the check? > > Anyway, if everything is ok with the tests (in original version) and > > with gcc itself - we could check not for vect_aligned_arrays, but for > > AVX. Please check > > http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01600.html and the > > attached to that letter patch. > > I think that everything was ok, but I don't think that using > vect_sizes_32B_16B > is a good idea. I would really like to see an AVX vect dump for eg. > vect-peel-3.c. In vect-peel-3.c we actually assume that vector length is 16 byte. Here is the loop body: suma += ia[i]; sumb += ib[i+5]; sumc += ic[i+1]; When vector-size is 16, then peeling can make two of three accesses aligned, but when vector size is 32 that's impossible. That's why using vector_sizes_32B_16B might be correct here. Also, I uploaded the dump you asked. Michael > Thanks, > Ira >
[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 --- Comment #6 from Michael Zolotukhin 2011-12-28 16:19:54 UTC --- (In reply to comment #5) > > In vect-peel-3.c we actually assume that vector length is 16 byte. Here is > > the > > loop body: > > suma += ia[i]; > > sumb += ib[i+5]; > > sumc += ic[i+1]; > > When vector-size is 16, then peeling can make two of three accesses aligned, > > but when vector size is 32 that's impossible. That's why using > > vector_sizes_32B_16B might be correct here. > > Ah, now I understand. I was confused by vect_aligned_arrays, and it's > irrelevant here, right? Actually yes, you're right. I think, ideally, vect_aligned_arrays should be somehow checked in such tests, as in them we assume that array's beginning is aligned - but that's not the rootcause of the xpasses. > Yes, vector_sizes_32B_16B seems to be ok in that case. Other two tests (vect-multitypes-1.c and no-section-anchors-vect-69.c) look like having the same problem - are you ok for similar fix for them too, i.e. is patch http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01600/vec-tests-avx2_fixes-7.patch ok for trunk? Thanks, Michael
[Bug testsuite/49503] New: Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503 Summary: Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite AssignedTo: unassig...@gcc.gnu.org ReportedBy: michael.v.zolotuk...@gmail.com CC: michael.v.zolotuk...@gmail.com Host: x64 Target: x64 Build: 4.7.0 20110619 The tests contain asm-listing like this: __asm ( testl %0, %0 jnz1f .subsection 1 .type _L_mutex_lock_%=, @function _L_mutex_lock_%=: 1:leaq %1, %%rdi 2:subq $128, %%rsp 3:call bar 4:addq $128, %%rsp 5:jmp21f ... As _L_mutex_lock is a function, GCC generates a prologue and epilogue for it - in prologue stack alignment is performed (according to ABI64, stack should be aligned to 128-bit). Before a call, SP is assumed to be a multiple of 16, at function entry, when return address is pushed to stack, (SP+8) becomes multiple of 16 - GCC uses these assumptions when generating prologue for stack alignment. But if JUMP is used instead of CALL, 8-byte displacement doesn't take place, and stack becomes unaligned - that's violation of ABI. To fix this error, JUMP should be replaced with CALL, or L_mutex_lock shouldn't be declared as a function.
[Bug testsuite/49503] Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503 --- Comment #2 from Michael Zolotukhin 2011-06-22 17:50:34 UTC --- (In reply to comment #1) > (In reply to comment #0) > > > > As _L_mutex_lock is a function, GCC generates a prologue and epilogue for > > it - > > in prologue stack alignment is performed (according to ABI64, stack should > > be > > aligned to 128-bit). > > I didn't see any prologue and epilogue for _L_mutex_lock. Do you have > a run-time testcase to show the problem? I don't have run-time test for this fail, but here is a way to see the problem: Firstly, you need to compile the test (it is in gcc/testsuite/gcc.target/i386): gcc cleanup-1.c -fexceptions -fnon-call-exceptions -fasynchronous-unwind-tables -O0 -lm -g -fno-inline Then, in debugger one can see that before each callq before foo (here 'before' means 'upper in call stack') RSP contains aligned value, after foo (deeper in call stack) - unaligned. Here is corresponding a assembler dump: -Dump of assembler code for function foo: 0x00400886 <+0>: push %rbp 0x00400887 <+1>: mov%rsp,%rbp 0x0040088a <+4>: push %r12 0x0040088c <+6>: push %rbx 0x0040088d <+7>: sub$0x98,%rsp # It is the prologue, 0x00400894 <+14>:mov%edi,-0x114(%rbp) # that I've spoken 0x0040089a <+20>:mov-0x114(%rbp),%ebx # about. 0x004008a0 <+26>:lea-0x110(%rbp),%r12 # 0x004008a7 <+33>:test %ebx,%ebx 0x004008a9 <+35>:jne0x400941 <_L_mutex_lock_216> 0x004008af <+41>:add$0x98,%rsp 0x004008b6 <+48>:pop%rbx 0x004008b7 <+49>:pop%r12 0x004008b9 <+51>:leaveq 0x004008ba <+52>:retq -End of assembler dump. RTL dumps showed that these instructions appear phase "197r.pro_and_epilogue". I'm not sure, if prologue generation is incorrect - in my opinion the problem is with such untrivial call of _L_mutex_lock.
[Bug testsuite/49503] Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503 --- Comment #4 from Michael Zolotukhin 2011-06-23 11:42:03 UTC --- Created attachment 24584 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24584 Test showing that cleanup-* tests are not quite correct.
[Bug testsuite/49503] Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503 --- Comment #5 from Michael Zolotukhin 2011-06-23 11:44:17 UTC --- (In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > (In reply to comment #0) > > > > > > > > As _L_mutex_lock is a function, GCC generates a prologue and epilogue > > > > for it - > > > > in prologue stack alignment is performed (according to ABI64, stack > > > > should be > > > > aligned to 128-bit). > > > > > > I didn't see any prologue and epilogue for _L_mutex_lock. Do you have > > > a run-time testcase to show the problem? > > > > I don't have run-time test for this fail, but here is a way to see the > > problem: > > > > It is very easy to check if stack alignment is correct at run-time. > Please see how it is done in testcases under gcc.dg/torture/stackalign. Please find the testcase attached. It passes on 32 bits and fails on 64 bits (4.7.0 and 4.5.1(RedHat) GCC was checked).
[Bug testsuite/49503] Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503 --- Comment #7 from Michael Zolotukhin 2011-06-23 16:00:36 UTC --- Created attachment 24587 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24587 Fixed cleanup-regression.c
[Bug testsuite/49503] Incorrect stack alignment, produced by inline assembler in tests gcc.target/i386/cleanup-1.c and gcc.target/i386/cleanup-2.c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49503 --- Comment #8 from Michael Zolotukhin 2011-06-23 16:03:32 UTC --- (In reply to comment #6) > (In reply to comment #4) > > Created attachment 24584 [details] > > Test showing that cleanup-* tests are not quite correct. > > Wrong test. You don't know the alignment of buf. > You should put an alignment attribute on buf. I added new test, with align attribute. Compiler sometimes could make stack aligned (accidentally), and the bug wouldn't show up. To watch the fail, try use following command line: gcc cleanup-regression-2.c -O0 -lm -fno-inline -fexceptions -fnon-call-exceptions -fasynchronous-unwind-tables && ./a.out