[Bug middle-end/57055] New: Incorrect CFG after transactional memory passes
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57055 Bug #: 57055 Summary: Incorrect CFG after transactional memory passes Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: enkovich@gmail.com Transactional passes do not set cfun->calls_setjmp to true and do not fix CFG accordingly after adding __builtin__ITM_beginTransaction call having ECF_RETURNS_TWICE flag set. It leads to inconsistency which may be revealed with special calls flags recomputation. If I add DCE pass after transactional memory then flags are recomputed and CFG check fails because of call statements in the middle of basic block. Thus DCE pass after transactional memory causes ~250 new fails in 'make check'. Tried on 'gcc version 4.9.0 20130422 (experimental) (GCC)'
[Bug rtl-optimization/50088] movzbl is generated instead of movl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088 --- Comment #8 from Ilya Enkovich 2011-08-16 06:55:34 UTC --- (In reply to comment #4) > > Well, yes, I think the proposal was to spill/load the full SImode instead > which would avoid both the partial dependency and the mismatched load/store > size. No? Yes, I think we should generate full SImode spill/load.
[Bug rtl-optimization/50088] movzbl is generated instead of movl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088 --- Comment #9 from Ilya Enkovich 2011-08-16 07:28:33 UTC --- (In reply to comment #5) > > It is for movqi. We can only safely replace mozbl with movl if > the source is 4byte aligned. It should a new backend option. That should work. A better solution here would be to not generate movqi at all. But probably it was performed intentionally and is profitable for some platforms. In this case we should choose movl generation for movqi.
[Bug rtl-optimization/50088] movzbl is generated instead of movl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088 --- Comment #13 from Ilya Enkovich 2011-08-17 09:07:20 UTC --- (In reply to comment #12) > Created attachment 25025 [details] > A patch to use the same mode for shift count > > This is an untested patch to use the same mode for shift count. We should find solution for the general problem. Not for its specific appearance in reproducer. We may have the same issue for any other instructions consuming byte register and it is better to fix the source of the problem (which is I suppose in IRA) and do not introduce workaround for each such instruction. BTW I think you should not increase size of immediate operands in your patch.
[Bug rtl-optimization/50088] movzbl is generated instead of movl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088 --- Comment #15 from Ilya Enkovich 2011-08-17 14:16:27 UTC --- (In reply to comment #14) > > I think this problem is unique to x86 since some instructions have > different sizes in register operands. In this example, shift count > is CL regardless the source operand size. I am not sure how much RA > can help here. By making register operands in shift instructions to > have the same size (32bit or less), it may work for most cases. > We have a problem due to different sizes of spill and load generated by IRA for the same var. I'm not sure that by patching shift instructions we cover all cases when IRA may do that.
[Bug target/50164] New: [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164 Bug #: 50164 Summary: [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: enkovich@gmail.com Created attachment 25083 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25083 Reproducer Problem occurs with -march=atom option used on the following part of test case: xc = (e_u8) (xc - xk); xm = (e_u8) (xm - xk); xy = (e_u8) (xy - xk); *EritePtr++ = xc; *EritePtr++ = xm; *EritePtr++ = xy; *EritePtr++ = xk; xk has the most usages here and GCC 4.6 keeps it on register but GCC 4.7 keeps it on stack which leads to increased number of memory instructions for that code. On Core i7 GCC 4.7 generates code x1.5 slower than GCC 4.6. On Atom it is ~10% slower. GCC 4.6 info: Configured with: /export/users/mstester/stability/svn/gcc-4_6-branch/configure --with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-cloog-backend=isl --with-fpmath=sse --prefix=/export/users/mstester/stability/work/gcc-4_6-branch/64/install --enable-languages=c,c++,fortran Thread model: posix gcc version 4.6.2 20110822 (prerelease) (GCC) COLLECT_GCC_OPTIONS='-O2' '-march=atom' '-m32' '-o' 'test.4.6' '-v' /nfs/ims/proj/icl/gcc/gnu/compilers/gcc/gcc-4_6-branch/64/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.2/cc1 -quiet -v -imultilib 32 -iprefix /nfs/ims/proj/icl/gcc/gnu/compilers/gcc/gcc-4_6-branch/64/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.2/ test.c -quiet -dumpbase test.c -march=atom -m32 -auxbase test -O2 -version -o /tmp/ccM2NIHU.s GNU C (GCC) version 4.6.2 20110822 (prerelease) (x86_64-unknown-linux-gnu) compiled by GNU C version 4.6.2 20110822 (prerelease), GMP version 4.3.1, MPFR version 2.4.2, MPC version 0.8.1 GCC 4.7 info: Target: x86_64-unknown-linux-gnu Configured with: ../gcc-master/configure --prefix=/export/gcc-master-build Thread model: posix gcc version 4.7.0 20110822 (experimental) (GCC) COLLECT_GCC_OPTIONS='-O2' '-march=atom' '-m32' '-o' 'test.4.7' '-v' /export/gcc-master-build/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/cc1 -quiet -v -imultilib 32 test.c -quiet -dumpbase test.c -march=atom -m32 -auxbase test -O2 -version -o /tmp/cc5DRHOU.s GNU C (GCC) version 4.7.0 20110822 (experimental) (x86_64-unknown-linux-gnu) compiled by GNU C version 4.7.0 20110822 (experimental), GMP version 4.3.2, MPFR version 3.0.0, MPC version 0.8.3-dev
[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164 --- Comment #2 from Ilya Enkovich 2011-08-25 09:31:29 UTC --- (In reply to comment #1) > Yesterday I sent a patch > http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01954.html which most probably > solved the problem. > > Now I have code size 419 (gcc 4.6) vs 411 (gcc as of Aug 24) bytes for the > test. I tried it but unfortunately it did not solve the regression. We still have xk on the stack and x1.5 more memory accesses in GCC 4.7 assembly for mentioned code part. GCC 4.6 produces bigger but faster code. Problem somehow appears only when -march=atom is used. There is no degradation if generic arch is used. I compared GCC 4.7 dumps for "-O2 -m32" and "-O2 -m32 -march=atom" and found that RTLs are same before IRA and differ after IRA. How does -march=atom affects register allocation?
[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164 Ilya Enkovich changed: What|Removed |Added Status|WAITING |RESOLVED Resolution||FIXED --- Comment #5 from Ilya Enkovich 2011-08-29 07:01:38 UTC --- (In reply to comment #4) > On Atom with -m32 -O2 -march=atom, > > 1. GCC 4.6.1: > > ./4.6 64.16s user 0.01s system 99% cpu 1:04.18 total > > 2. GCC 4.7.0 20110819: > > ./0819 69.73s user 0.01s system 99% cpu 1:09.76 total > > 3. GCC 4.7.0 20110826: > > ./0826 64.30s user 0.02s system 99% cpu 1:04.33 total > > Has this problem been fixed? Confirm. Problem has gone.
[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164 Ilya Enkovich changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED | --- Comment #6 from Ilya Enkovich 2011-08-29 07:39:09 UTC --- It appeared problem was fixed in reproducer but was not fixed in original test case. I'll prepare fixed reproducer.
[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164 Ilya Enkovich changed: What|Removed |Added Attachment #25083|0 |1 is obsolete|| --- Comment #7 from Ilya Enkovich 2011-08-30 10:40:50 UTC --- Created attachment 25138 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25138 Fixed reproducer
[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164 --- Comment #8 from Ilya Enkovich 2011-08-30 10:50:44 UTC --- I attached a fixed reproducer. It is closer to the original test and has higher registers pressure then the previous version. It has the same problem as the first reproducer. Reproduced with GCC 4.7.0 20110828 and options "-O2 -m32 -march=atom". Code becomes faster on both Atom (~10%) and Core (~35%) if I use just "-O2 -m32".
[Bug target/50164] [IRA, 4.7 Regression] Performance degradation due to increased memory instructions count
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50164 Ilya Enkovich changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution||FIXED --- Comment #11 from Ilya Enkovich 2011-10-28 10:10:42 UTC --- Initially problem was caused by movzbl cost value for Atom. Low cost of movzbl made IRA keep frequently used byte value on the stack and assign register for int value. Change cost model resolves the problem and it has been fixed in revision 17.
[Bug target/50962] Additional opportunity for AGU stall avoidance optimization for Atom processor
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50962 --- Comment #2 from Ilya Enkovich 2011-11-02 13:05:46 UTC --- Created attachment 25689 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25689 Proposed patch
[Bug target/50962] Additional opportunity for AGU stall avoidance optimization for Atom processor
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50962 Ilya Enkovich changed: What|Removed |Added CC||enkovich.gnu at gmail dot ||com --- Comment #3 from Ilya Enkovich 2011-11-02 13:06:07 UTC --- Current optimization use only splits to transform arithmetic into lea and vice versa. It does not work for move because corresponding lea template will be equal. We can check if lea is required during instruction emit. I have a patch to fix it. Bootstrap and make check passed. I'm currently checking performance changes.
[Bug tree-optimization/60559] New: g++.dg/vect/pr60023.cc fails with -fno-tree-dce (ICE)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60559 Bug ID: 60559 Summary: g++.dg/vect/pr60023.cc fails with -fno-tree-dce (ICE) Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: enkovich.gnu at gmail dot com Test gcc/testsuite/g++.dg/vect/pr60023.cc -fno-tree-dce fails with ICE if executed with additional -fno-tree-dce flag. As I can see the problem is in generated mask load which operated with integer types: int * _13; int _14; ... _14 = MASK_LOAD (_13, 0B, _ifc__37); With DCE we have LHS of this call removed and then statement ignored in expand_MASK_LOAD. But with no DCE we get ICE because there is no proper code in optab. I use gcc (GCC) 4.9.0 20140317 (experimental). gcc -O2 -ftree-vectorize -fno-vect-cost-model -msse2 -fdump-tree-vect-details -O3 -std=c++11 -fnon-call-exceptions -mavx2 -S -o pr60023.s pr60023.cc -fno-tree-dce /export/users/ienkovic/gcc/gcc/testsuite/g++.dg/vect/pr60023.cc: In function 'void f1(int*, int*, int*)': /export/users/ienkovic/gcc/gcc/testsuite/g++.dg/vect/pr60023.cc:14:17: internal compiler error: in maybe_gen_insn, at optabs.c:8250 p[i] = q[i] + 1; ^ 0xc421d0 maybe_gen_insn(insn_code, unsigned int, expand_operand*) /gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/optabs.c:8250 0xc42629 maybe_expand_insn(insn_code, unsigned int, expand_operand*) /gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/optabs.c:8294 0xc426bd expand_insn(insn_code, unsigned int, expand_operand*) /gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/optabs.c:8325 0xb27d95 expand_MASK_LOAD /gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/internal-fn.c:837 0xb2807f expand_internal_call(gimple_statement_base*) /gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/internal-fn.c:886 0x8f483f expand_call_stmt /gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/cfgexpand.c:2190 0x8f815a expand_gimple_stmt_1 /gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/cfgexpand.c:3160 0x8f87a4 expand_gimple_stmt /gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/cfgexpand.c:3312 0x8febbd expand_gimple_basic_block /gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/cfgexpand.c:5152 0x9006a5 gimple_expand_cfg /gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/cfgexpand.c:5731 0x900d20 execute /gnumnt/msticlxl7_users/ienkovic/point-lookout/gcc-pl/gcc/cfgexpand.c:5951
[Bug middle-end/64353] [5 Regression] ICE: in execute_todo, at passes.c:1986
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64353 Ilya Enkovich changed: What|Removed |Added CC||enkovich.gnu at gmail dot com --- Comment #5 from Ilya Enkovich --- When we process function C::xx early_ipa_sra pass performs a modification of C::i in ipa_modify_call_arguments adding a load statements. It marks function C::i as requiring ssa renaming for vops. Later we start processing of C::i and get ICE at execute_todo (pass->todo_flags_start) because it expects update ssa flags for functions requiring such update. Before r217125 it worked because C::i was not in SSA form at the moment of load insertion. To fix it we may either call update_ssa from ipa_modify_call_arguments or add update into todo_flags_start of fixup_cfg (we run it at the beginning of all gimple passes lists anyway). Possible fix (helps for the test, not fully tested): diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index 01f4111..533dcfe 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -4054,6 +4054,8 @@ ipa_modify_call_arguments (struct cgraph_edge *cs, gcall *stmt, expr = create_tmp_reg (TREE_TYPE (expr)); gimple_assign_set_lhs (tem, expr); gsi_insert_before (&gsi, tem, GSI_SAME_STMT); + if (gimple_in_ssa_p (cfun)) + update_ssa (TODO_update_ssa_only_virtuals); } } else
[Bug middle-end/64353] [5 Regression] ICE: in execute_todo, at passes.c:1986
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64353 --- Comment #7 from Ilya Enkovich --- Right, wrong const attribute causes no VUSE for calls to the function which leads to # VUSE <.MEM> generated for added loads and requires SSA update. We may actually call update_ssa only in case of missing VUSE still allowing optimization for functions wrongly marked as const.
[Bug target/64363] Unresolved labels with -fcheck-pointer-bounds and -mmpx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64363 Ilya Enkovich changed: What|Removed |Added CC||enkovich.gnu at gmail dot com --- Comment #1 from Ilya Enkovich --- We copy function to instrument it but static var is initialized using labels from the original function and thus we get unresolved links. Suppose we would get the same problem with non local gotos. Suppose it would be more safe to just don't instrument such functions for now and get back to it at the next stage 1.
[Bug target/64691] New: Suboptimal register allocation for bytes comparison on i386
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64691 Bug ID: 64691 Summary: Suboptimal register allocation for bytes comparison on i386 Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: enkovich.gnu at gmail dot com This problem was actually found in 256.bzip2 benchmark codes compiled by GCC 5.0 on -O2. There is a small loop with bytes comparison which appeared to be ineffective because compared values were not allocated on registers allowing byte access. That caused additional copies and as a result significant loop slow down. Situation may be simulated on a small test if we restrict registers usage. >cat test.c void test (unsigned char *p, unsigned char val) { unsigned char tmp1, tmp2; int i; i = 0; tmp1 = p[0]; while (val != tmp1) { i++; tmp2 = tmp1; tmp1 = p[i]; p[i] = tmp2; } p[0]= tmp1; } >gcc -O2 -m32 -ffixed-ebx test.c -S Here is a loop: .L3: movzbl (%eax), %ebp movl%esi, %ecx movb%dl, (%eax) addl$1, %eax movl%ebp, %edx cmpb%dl, %cl jne .L3 We have an extra register copy esi->ecx to perform comparison. Suppose the easiest way to get better register allocation here would be to transform QI comparison into SI one to relax register constraints.
[Bug jit/64722] On 2nd time libgccjit is run in-process on i686, generated code clobbers %ebx register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64722 Ilya Enkovich changed: What|Removed |Added CC||enkovich.gnu at gmail dot com --- Comment #4 from Ilya Enkovich --- (In reply to Jakub Jelinek from comment #3) > But then wonder if/how target_reinit works for i?86 32-bit. > Perhaps pic_offset_table_rtx should be cleared in init_emit_regs before > computing it? > pic_offset_table_rtx = NULL_RTX; > if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM) > pic_offset_table_rtx = gen_raw_REG (Pmode, PIC_OFFSET_TABLE_REGNUM); Clearing pic_offset_table_rtx here would mean PIC_OFFSET_TABLE_REGNUM tranfroms into EBX and pic_offset_table_rtx is initialized with EBX which is not what we want. Probably we just shouldn't try to initialize pic_offset_table_rtx with a hard reg in case target assumes pseudo pic reg? diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index df85366..51ef3a5 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -5872,7 +5872,8 @@ init_emit_regs (void) = gen_raw_REG (Pmode, RETURN_ADDRESS_POINTER_REGNUM); #endif - if ((unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM) + if (!targetm.use_pseudo_pic_reg () + && (unsigned) PIC_OFFSET_TABLE_REGNUM != INVALID_REGNUM) pic_offset_table_rtx = gen_raw_REG (Pmode, PIC_OFFSET_TABLE_REGNUM); else pic_offset_table_rtx = NULL_RTX;
[Bug tree-optimization/64277] [4.9/5 Regression] Incorrect warning "array subscript is above array bounds"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64277 Ilya Enkovich changed: What|Removed |Added CC||enkovich.gnu at gmail dot com --- Comment #7 from Ilya Enkovich --- Here is a reduced test case: >cat test.c int f1[10]; void foo(short a[], short m, unsigned short l) { int i = l; for (i = i + 5; i < m; i++) f1[i] = a[i]++; } >gcc test.c -O3 -c -Wall test.c: In function 'foo': test.c:6:7: warning: array subscript is above array bounds [-Warray-bounds] f1[i] = a[i]++; ^ test.c:6:7: warning: array subscript is above array bounds [-Warray-bounds] test.c:6:7: warning: array subscript is above array bounds [-Warray-bounds] test.c:6:7: warning: array subscript is above array bounds [-Warray-bounds] test.c:6:7: warning: array subscript is above array bounds [-Warray-bounds] Here we have complete unroll of the loop by 10 due to f1 size. Later vrp complains of last five produced iterations accessing above array bounds. Used GCC 5.0.
[Bug jit/64722] On 2nd time libgccjit is run in-process on i686, generated code clobbers %ebx register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64722 --- Comment #8 from Ilya Enkovich --- different hooks(In reply to Jakub Jelinek from comment #5) > Can you explain it? Usually when this function is called, > pic_offset_table_rtx is NULL and your i386.h macro relies on that. > When initializing default target during initialization it is NULL of course, > and apparently even in target_reinit, when it is called with freshly cleared > heap object for the non-default target. > It is just when jit calls the initialization again without clearing all the > variables... > So I believe my proposed change is correct. > > In any case, perhaps jit shouldn't reinitialize everything all the time, at > least if the compilation options don't change. I misunderstood places when init_emit_regs is called and my fix is incorrect. It is still unclear to me how this initialization affects generated code. IIRC we let pic_offset_table_rtx be EBX only because of middle-end which calls target hooks for code cost estimations. In this case we needed some valid pic reg to generate RTL for its estimation and EBX was used. But in target code pic_offset_table_rtx is initialized with pseudo register and value set in init_emit_regs shouldn't matter.
[Bug jit/64722] On 2nd time libgccjit is run in-process on i686, generated code clobbers %ebx register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64722 --- Comment #11 from Ilya Enkovich --- (In reply to David Malcolm from comment #10) > which led to investigating this code in ix86_conditional_register_usage: > 4394 j = PIC_OFFSET_TABLE_REGNUM; > 4395 if (j != INVALID_REGNUM) > 4396fixed_regs[j] = call_used_regs[j] = 1; > and line 4396 is bizarrely only called on the 2nd iteration, not the 1st, > which led me to investigate "PIC_OFFSET_TABLE_REGNUM", and discover what > appears to be the root cause, as described in comment #1. Now I see. The problem also is in ix86_conditional_register_usage that relies on pic_offset_table_rtx value. As I said EBX value is used only to estimate costs for middle-end. Thus we shouldn't fix reg here if pseudo pic register is used and correct code would be: @@ -4388,7 +4388,7 @@ ix86_conditional_register_usage (void) /* The PIC register, if it exists, is fixed. */ j = PIC_OFFSET_TABLE_REGNUM; - if (j != INVALID_REGNUM) + if (j != INVALID_REGNUM && !ix86_use_pseudo_pic_reg ()) fixed_regs[j] = call_used_regs[j] = 1; /* For 32-bit targets, squash the REX registers. */
[Bug jit/64722] On 2nd time libgccjit is run in-process on i686, generated code clobbers %ebx register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64722 --- Comment #14 from Ilya Enkovich --- (In reply to David Malcolm from comment #13) > > Ilya: I can't speak to the correctness of the above code or patch, but > r220044 fixes the original issue I ran into. Do you want me to keep this > bug open, or should we track the above in a separate PR? I think you may close this tracker if your issue is resolved. Change I mentioned is a minor fix I would like to have installed but I'll handle it separately.
[Bug tree-optimization/64277] [4.9/5 Regression] Incorrect warning "array subscript is above array bounds"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64277 --- Comment #8 from Ilya Enkovich --- Created attachment 34569 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34569&action=edit patch to disable warnings for array references generated by cunroll
[Bug tree-optimization/64277] [4.9/5 Regression] Incorrect warning "array subscript is above array bounds"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64277 --- Comment #9 from Ilya Enkovich --- Nice solution for this problem would be to have a better estimation of maximum loop iterations number. Currently array size and index step are used to get the maximum ignoring starting index value range. Another way to solve the problem is to disable warnings for code generated by cunroll in case it cannot compute exact number of iterations. I attach a patch which does it. This bug is hit multiple times on Android build with GCC 4.9. With this fix we have a clean Android build with GCC 4.9.
[Bug tree-optimization/64277] [4.9/5 Regression] Incorrect warning "array subscript is above array bounds"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64277 --- Comment #12 from Ilya Enkovich --- (In reply to Richard Biener from comment #10) > Ick - that will also paper over good warnings so I'd rather not do that. I'm also worried about possible good warnings removal. Thus I disable them only in case cunroll speculates about iterations number and never disable them for the first loop iteration. I agree warnings disabling looks like a workaround. But it doesn't seem correct to complain on code generated by compiler and probably never executed. Each time maxiter is used for complete unroll following optimizations may improve maxiter estimation and thus we get a compiler generated dead code which still may produce warnings.
[Bug tree-optimization/64277] [4.9/5 Regression] Incorrect warning "array subscript is above array bounds"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64277 --- Comment #13 from Ilya Enkovich --- Ranges have to be used for maxiter computations to have consistent analysis in complete unroll and vrp. Following patch allows to refine maxiter estimation using ranges and avoid warnings. diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 919f5c0..14cce2a 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -2754,6 +2754,7 @@ record_nonwrapping_iv (struct loop *loop, tree base, tree step, gimple stmt, { tree niter_bound, extreme, delta; tree type = TREE_TYPE (base), unsigned_type; + tree orig_base = base; if (TREE_CODE (step) != INTEGER_CST || integer_zerop (step)) return; @@ -2777,7 +2778,14 @@ record_nonwrapping_iv (struct loop *loop, tree base, tree step, gimple stmt, if (tree_int_cst_sign_bit (step)) { + wide_int min, max, highwi = high; extreme = fold_convert (unsigned_type, low); + if (TREE_CODE (orig_base) == SSA_NAME + && !POINTER_TYPE_P (TREE_TYPE (orig_base)) + && SSA_NAME_RANGE_INFO (orig_base) + && get_range_info (orig_base, &min, &max) == VR_RANGE + && wi::gts_p (highwi, max)) + base = wide_int_to_tree (unsigned_type, max); if (TREE_CODE (base) != INTEGER_CST) base = fold_convert (unsigned_type, high); delta = fold_build2 (MINUS_EXPR, unsigned_type, base, extreme); @@ -2785,8 +2793,15 @@ record_nonwrapping_iv (struct loop *loop, tree base, tree step, gimple stmt, } else { + wide_int min, max, lowwi = low; extreme = fold_convert (unsigned_type, high); - if (TREE_CODE (base) != INTEGER_CST) + if (TREE_CODE (orig_base) == SSA_NAME + && !POINTER_TYPE_P (TREE_TYPE (orig_base)) + && SSA_NAME_RANGE_INFO (orig_base) + && get_range_info (orig_base, &min, &max) == VR_RANGE + && wi::gts_p (min, lowwi)) + base = wide_int_to_tree (unsigned_type, min); + else if (TREE_CODE (base) != INTEGER_CST) base = fold_convert (unsigned_type, low); delta = fold_build2 (MINUS_EXPR, unsigned_type, extreme, base); } diff --git a/gcc/testsuite/gcc.dg/pr64277.c b/gcc/testsuite/gcc.dg/pr64277.c new file mode 100644 index 000..0d5ef11 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr64277.c @@ -0,0 +1,21 @@ +/* PR tree-optimization/64277 */ +/* { dg-do compile } */ +/* { dg-options "-O3 -Wall -Werror" } */ + + +int f1[10]; +void test1 (short a[], short m, unsigned short l) +{ + int i = l; + for (i = i + 5; i < m; i++) +f1[i] = a[i]++; +} + +void test2 (short a[], short m, short l) +{ + int i; + if (m > 5) +m = 5; + for (i = m; i > l; i--) +f1[i] = a[i]++; +}
[Bug middle-end/64805] Specific use of __attribute ((always_inline)) breaks MPX functionality with -fcheck-pointer-bounds -mmpx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64805 Ilya Enkovich changed: What|Removed |Added CC||enkovich.gnu at gmail dot com --- Comment #2 from Ilya Enkovich --- This might be introduced by the recent changes in always_inline functions instrumentation. Now we keep them alive longer and therefore have inline of the original functionA into the original functionB. It causes error in a verifier because inliner clears all references and then calls cgraph node verification which expects IPA_REF_CHKP reference to instrumented functionB. I would like to keep IPA_REF_CHKP check in the verifier because this ref is important for reachability analysis. Thus we probably should rebuild IPA_REF_CHKP reference in inliner. Will tests this patch: diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c index c0ff329..d341619 100644 --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -2464,6 +2464,13 @@ early_inliner (function *fun) #endif node->remove_all_references (); + /* Rebuild this reference because it dosn't depend on + function's body and it's required to pass cgraph_node + verification. */ + if (node->instrumented_version + && !node->instrumentation_clone) +node->create_reference (node->instrumented_version, IPA_REF_CHKP, NULL); + /* Even when not optimizing or not inlining inline always-inline functions. */ inlined = inline_always_inline_functions (node);
[Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317 Ilya Enkovich changed: What|Removed |Added CC||enkovich.gnu at gmail dot com --- Comment #3 from Ilya Enkovich --- Created attachment 34675 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34675&action=edit Another reproducer I found one more reproducer for the problem. Generated code for multiple 'output' calls inlined into test': .. .L5: movl28(%esp), %edx <-- load PIC reg movl%edi, (%ecx) leal(%ebx,%eax), %ecx cmpl%ecx, %ebp movl%ebx, out_pos@GOTOFF(%edx) movlval3@GOTOFF(%edx), %edi jnb .L6 movl28(%esp), %ebx <-- have value in EDX movlout@GOTOFF(%ebx), %ebx leal(%ebx,%eax), %ecx .L6: movl28(%esp), %edx <-- NOP movl%edi, (%ebx) leal(%ecx,%eax), %ebx cmpl%ebx, %ebp movl%ecx, out_pos@GOTOFF(%edx) movlval4@GOTOFF(%edx), %edi jnb .L7 movl28(%esp), %ecx <-- have value in EDX movlout@GOTOFF(%ecx), %ecx leal(%ecx,%eax), %ebx .L7: movl28(%esp), %edx <-- NOP movl%edi, (%ecx) ... BTW if I put __attribute__((noinline)) for crc32 function then mentioned code becomes better and we don't have these two useless instructions in each function instance. Compilation string: gcc -Ofast -funroll-loops -m32 -march=slm -fPIE test.i -S Used compiler: gcc version 5.0.0 20150203 (experimental) (GCC)
[Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317 --- Comment #5 from Ilya Enkovich --- (In reply to Jakub Jelinek from comment #4) > Does #c2 fix this, or is #c3 an unrelated bugreport that still needs fixing? Problem is still seen after the fix. I put test here because of the same symptom. Should I open a new one?
[Bug rtl-optimization/64960] New: Inefficient address pre-computation in PIC mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64960 Bug ID: 64960 Summary: Inefficient address pre-computation in PIC mode Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: enkovich.gnu at gmail dot com After EBX was unfixed in i386 PIC target, we may see addresses of static objects are loaded from GOT and placed to the stack for later usage. It allows to reuse PIC register for other purposes. But in cases when PIC register is still used (e.g. for calls) it may cause inefficiency in produced code. Here is an example: >cat test.c void f (int); int val1, *val2, val3; int test (int max) { int i; for (i = 0; i < max; i++) { val1 += val2[i]; f (val3); } } >gcc test.c -O2 -fPIE -S -m32 -ffixed-esi -ffixed-edi -ffixed-edx >cat test.s ... movlval1@GOT(%ebx), %eax <-- may be removed xorl%ebp, %ebp movl%eax, 4(%esp) <-- may be removed movlval2@GOT(%ebx), %eax <-- may be removed movl%eax, 8(%esp) <-- may be removed movlval3@GOT(%ebx), %eax <-- may be removed movl%eax, 12(%esp)<-- may be removed .L3: movl8(%esp), %eax <-- equal tomovl val2@GOT(%ebx), %eax subl$12, %esp movl(%eax), %ecx movl16(%esp), %eax<-- equal tomovl val1@GOT(%ebx), %eax movl(%ecx,%ebp,4), %ecx addl%ecx, (%eax) addl$1, %ebp movl24(%esp), %eax<-- equal tomovl val3@GOT(%ebx), %eax pushl (%eax) callf@PLT addl$16, %esp cmpl%ebp, 32(%esp) jne .L3 ... Also storing value on the stack doesn't benefit on static objects optimization performed by linker which transforms movl @GOT into lea instruction. It would be useful to avoid early address computation in case PIC register is available at address usage. Here is a code generated by GCC 4.9: xorl%ebp, %ebp .L2: movlval2@GOT(%ebx), %eax subl$12, %esp movl(%eax), %ecx movlval1@GOT(%ebx), %eax movl(%ecx,%ebp,4), %ecx addl%ecx, (%eax) addl$1, %ebp movlval3@GOT(%ebx), %eax pushl (%eax) callf@PLT addl$16, %esp cmpl16(%esp), %ebp jne .L2 Used gcc (GCC) 5.0.0 20150205 (experimental).
[Bug tree-optimization/65002] [5 Regression] ICE: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65002 --- Comment #6 from Ilya Enkovich --- Is this actually an ICE on valid code? 'const' attribute seems incorrect here similar to what we had in PR64353. The problem comes from SSA inconsistency caused by the wrong attribute. Probably just ignore such cases in SRA as was previously proposed for PR64353? Here is a possible patch (SSA update at fixup_cfg start may be removed then): diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index ad9584e..7f78e68 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -4890,6 +4890,20 @@ some_callers_have_mismatched_arguments_p (struct cgraph_node *node, return false; } +/* Return false if all callers have vuse attached to a call statement. */ + +static bool +some_callers_have_no_vuse_p (struct cgraph_node *node, +void *data ATTRIBUTE_UNUSED) +{ + struct cgraph_edge *cs; + for (cs = node->callers; cs; cs = cs->next_caller) +if (!cs->call_stmt || !gimple_vuse (cs->call_stmt)) + return true; + + return false; +} + /* Convert all callers of NODE. */ static bool @@ -5116,6 +5130,15 @@ ipa_early_sra (void) goto simple_out; } + if (node->call_for_symbol_thunks_and_aliases + (some_callers_have_no_vuse_p, NULL, true)) +{ + if (dump_file) + fprintf (dump_file, "There are callers with no VUSE attached " +"to a call stmt.\n"); + goto simple_out; +} + bb_dereferences = XCNEWVEC (HOST_WIDE_INT, func_param_count * last_basic_block_for_fn (cfun));
[Bug rtl-optimization/64317] [5 Regression] Ineffective allocation of PIC base register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64317 --- Comment #11 from Ilya Enkovich --- (In reply to Vladimir Makarov from comment #10) > I guess it is easy to check by preventing pic pseudo generation. i386 back-end doesn't support fixed PIC register any more. This test case demonstrates performance regression in some EEMBC 1.1 tests caused by pseudo PIC register introduction. It is unclear why RA decided to spill PIC register. If we look at loop's code then we see PIC register is used in each line of code and seems to be the most used register. It also seems weird to me that code for the first loop becomes much better (with no PIC reg fills) if we restrict inlining for the other one. How does the second loop affect allocation in the first one?
[Bug target/65044] ICE: SIGSEGV in contains_struct_check with -fsanitize=address -fcheck-pointer-bounds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65044 --- Comment #1 from Ilya Enkovich --- ICE occurs due to NULL field attached to a constructor element used for initialization of internal asan structure. Overall I don't think we should allow simultaneous usage of Pointer Bounds Checker and Address Sanitizer. It was never investigated how they may conflict. There should be at least a problem with static objects where each instrumentation creates static objects to describe existing ones, newly created objects are then also described by each other etc. I will prepare a patch to prevent checker usage with sanitizers.
[Bug target/65103] New: [i386] GOTOFF relocation is not propagated into address expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65103 Bug ID: 65103 Summary: [i386] GOTOFF relocation is not propagated into address expression Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: enkovich.gnu at gmail dot com In PIC code there are multiple cases when GOTOFF relocation is put into register and then used in address expression instead of using relocation directly in address expression. Here is an example: >cat test.c typedef struct S { int a; int sum; int delta; } S; S gs; int global_opt (int max) { while (gs.sum < max) gs.sum += gs.delta; return gs.a; } >gcc test.c -m32 -O2 -fPIE -S >cat test.s ... pushl %esi lealgs@GOTOFF, %esi pushl %ebx call__x86.get_pc_thunk.bx addl$_GLOBAL_OFFSET_TABLE_, %ebx movl12(%esp), %edx movl4(%esi,%ebx), %eax cmpl%eax, %edx jle .L4 movl8(%esi,%ebx), %ecx .L3: addl%ecx, %eax cmpl%eax, %edx jg .L3 movl%eax, 4(%esi,%ebx) .L4: movlgs@GOTOFF(%ebx), %eax popl%ebx popl%esi ret A separate instruction to get gs@GOTOFF is generated in expand. Later fwprop propagates this constant only into memory references with zero offset and leave register usage in all others. Used compiler: Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure --enable-languages=c,c++,fortran --disable-bootstrap --prefix=/export/users/ienkovic/ --disable-libsanitizer Thread model: posix gcc version 5.0.0 20150217 (experimental) (GCC)
[Bug target/65105] New: [i386] XMM registers are not used for 64bit computations on 32bit target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65105 Bug ID: 65105 Summary: [i386] XMM registers are not used for 64bit computations on 32bit target Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: enkovich.gnu at gmail dot com XMM registers may be used for 64bit operations on 32bit target. It should make code faster and free some GPRs. Here is an example test where GCC doesn't use XMM registers and possible code with XMM usage: >cat test.c long long test1 (long long x, long long y, long long z) { return ((x | z ) + (y & z) - z); } >cat test_xmm.s .file "test.c" .text .globl test1 test1: movq 4(%esp), %xmm2 movq 20(%esp), %xmm1 movq 12(%esp), %xmm0 por %xmm1, %xmm2 pand %xmm1, %xmm0 paddq %xmm0, %xmm2 psubq %xmm1, %xmm2 movd %xmm2, %eax psrlq $32, %xmm2 movd %xmm2, %edx ret
[Bug target/65105] [i386] XMM registers are not used for 64bit computations on 32bit target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65105 --- Comment #2 from Ilya Enkovich --- For this test I see 'plus' and 'minus' ops have DI mode until RA and get GPR pairs: (insn 12 35 13 2 (parallel [ (set (reg:DI 0 ax [orig:98 D.1945 ] [98]) (plus:DI (reg:DI 0 ax [orig:97 D.1945 ] [97]) (reg:DI 2 cx [orig:96 D.1945 ] [96]))) (clobber (reg:CC 17 flags)) ]) test.c:4 215 {*adddi3_doubleword} (nil)) (insn 13 12 18 2 (parallel [ (set (reg:DI 0 ax [orig:95 D.1945 ] [95]) (minus:DI (reg:DI 0 ax [orig:98 D.1945 ] [98]) (reg/v:DI 4 si [orig:94 z ] [94]))) (clobber (reg:CC 17 flags)) ]) test.c:4 259 {*subdi3_doubleword} (nil)) 'ior' and 'and' use SI mode and subregs starting from expand.
[Bug target/65167] ICE: in assign_by_spills, at lra-assigns.c:1383 (unable to find a register to spill) with -O -fschedule-insns -fcheck-pointer-bounds -mmpx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65167 --- Comment #1 from Ilya Enkovich --- For call arguments we usually store bounds passed in bounds tables and then fill bounds passed in registers. But with -fschedule-insns we have order changed and all hard registers are filled with values before BNDSTX. This is not a nice schedule because it requires additional spills. Seems LRA fails to spill a register when all of them are used to pass args. This situation didn't happen before because bounds registers is the first case when we use all registers to pass args. Should LRA be able to spill/fill initialized hard reg? Can it be fixed or we better avoid such scheduling?
[Bug target/65167] ICE: in assign_by_spills, at lra-assigns.c:1383 (unable to find a register to spill) with -O -fschedule-insns -fcheck-pointer-bounds -mmpx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65167 --- Comment #4 from Ilya Enkovich --- ix86_function_arg_regno_p doesn't recognize bnd registers as args. Also avoid_func_arg_motion doesn't work for BNDSTX because it is not a single set. This patch works for reproducer: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 71a5b22..acbe25f 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -6068,6 +6068,9 @@ ix86_function_arg_regno_p (int regno) int i; const int *parm_regs; + if (TARGET_MPX && BND_REGNO_P (regno)) +return true; + if (!TARGET_64BIT) { if (TARGET_MACHO) @@ -26846,6 +26849,16 @@ avoid_func_arg_motion (rtx_insn *first_arg, rtx_insn *insn) rtx set; rtx tmp; + /* Add anti dependencies for bounds stores. */ + if (INSN_P (insn) + && GET_CODE (PATTERN (insn)) == PARALLEL + && GET_CODE (XVECEXP (PATTERN (insn), 0, 0)) == UNSPEC + && XINT (XVECEXP (PATTERN (insn), 0, 0), 1) == UNSPEC_BNDSTX) +{ + add_dependence (first_arg, insn, REG_DEP_ANTI); + return; +} + set = single_set (insn); if (!set) return; Will run a testing for it.
[Bug target/65167] ICE: in assign_by_spills, at lra-assigns.c:1383 (unable to find a register to spill) with -O -fschedule-insns -fcheck-pointer-bounds -mmpx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65167 --- Comment #6 from Ilya Enkovich --- (In reply to Uroš Bizjak from comment #5) > (In reply to Ilya Enkovich from comment #4) > > > + if (TARGET_MPX && BND_REGNO_P (regno)) > > No need for TARGET_MPX check, there will be no bnd regs when this flag is > cleared. __builtin_apply_args stores all registers that might be used to pass arguments to a function. With no target check it will always try to store bounds with no instructions to do that.
[Bug lto/63555] New: ICE compiling simple test with SDB debugging info
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63555 Bug ID: 63555 Summary: ICE compiling simple test with SDB debugging info Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: enkovich.gnu at gmail dot com I see ICE when try to compile a small test with -gcoff. Problem appears when we have structure field and static variable with the same name. Here is a reproducer: typedef struct { int *next; } list; int *next; int main(int argc, char **argv) { return 0; } >gcc -m64 -c -gcoff short.c cc1: internal compiler error: in needed_p, at cgraphunit.c:237 0x7c1a8c symtab_node::needed_p() ../../gcc-pl/gcc/cgraphunit.c:236 0x7c3933 analyze_functions ../../gcc-pl/gcc/cgraphunit.c:936 0x7c7579 symbol_table::finalize_compilation_unit() ../../gcc-pl/gcc/cgraphunit.c:2288 0x627b77 c_write_global_declarations() ../../gcc-pl/gcc/c/c-decl.c:10431 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. Here is failing assert: /* Double check that no one output the function into assembly file early. */ gcc_checking_assert (!DECL_ASSEMBLER_NAME_SET_P (decl) || !TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl))); During file parsing we have a call to sdbout_symbol for structure type. It causes output of its field and field's name is marked as referenced. Later variable analysis hits assert because variable's assembler name is shared with the structure field.
[Bug ipa/63664] New: ipa-icf pass fails with segfault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63664 Bug ID: 63664 Summary: ipa-icf pass fails with segfault Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: enkovich.gnu at gmail dot com Created attachment 33825 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33825&action=edit Reproducer There is a segfault in ipa-icf pass. >g++ test.cpp -O2 -c test.cpp:40:1: internal compiler error: Segmentation fault } ^ 0xe87262 crash_signal ../../gcc-pl-ref/gcc/toplev.c:349 0x17000ad ipa_icf_gimple::func_checker::compatible_types_p(tree_node*, tree_node*, bool, bool) ../../gcc-pl-ref/gcc/ipa-icf-gimple.c:172 0x170035c ipa_icf_gimple::func_checker::compare_operand(tree_node*, tree_node*) ../../gcc-pl-ref/gcc/ipa-icf-gimple.c:220 0x17020a2 ipa_icf_gimple::func_checker::compare_tree_ssa_label(tree_node*, tree_node*) ../../gcc-pl-ref/gcc/ipa-icf-gimple.c:737 0x1702187 ipa_icf_gimple::func_checker::compare_gimple_label(gimple_statement_base*, gimple_statement_base*) ../../gcc-pl-ref/gcc/ipa-icf-gimple.c:755 0x1701c29 ipa_icf_gimple::func_checker::compare_bb(ipa_icf_gimple::sem_bb*, ipa_icf_gimple::sem_bb*) ../../gcc-pl-ref/gcc/ipa-icf-gimple.c:604 0x16f463b ipa_icf::sem_function::equals_private(ipa_icf::sem_item*, hash_map&) ../../gcc-pl-ref/gcc/ipa-icf.c:455 0x16f3e74 ipa_icf::sem_function::equals(ipa_icf::sem_item*, hash_map&) ../../gcc-pl-ref/gcc/ipa-icf.c:355 0x16f8687 ipa_icf::sem_item_optimizer::subdivide_classes_by_equality(bool) ../../gcc-pl-ref/gcc/ipa-icf.c:1771 0x16f7d93 ipa_icf::sem_item_optimizer::execute() ../../gcc-pl-ref/gcc/ipa-icf.c:1590 0x16fa221 ipa_icf_driver ../../gcc-pl-ref/gcc/ipa-icf.c:2320 0x16fa736 ipa_icf::pass_ipa_icf::execute(function*) ../../gcc-pl-ref/gcc/ipa-icf.c:2368 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. >g++ -v Using built-in specs. COLLECT_GCC=../../../gcc-ref-build/bin/g++ COLLECT_LTO_WRAPPER=/export/users/ienkovic/gcc-ref-build/libexec/gcc/x86_64-unknown-linux-gnu/5.0.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-ref/configure --prefix=/export/users/ienkovic/gcc-ref-build --enable-languages=c,c++,fortran --disable-bootstrap Thread model: posix gcc version 5.0.0 20141024 (experimental) (GCC) The problem is that compared labels have null types and ipa_icf_gimple::func_checker::compatible_types_p doesn't check for null types. Possible patch: diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c index 1369b74..afc0eeb 100644 --- a/gcc/ipa-icf-gimple.c +++ b/gcc/ipa-icf-gimple.c @@ -169,6 +169,11 @@ bool func_checker::compatible_types_p (tree t1, tree t2, bool compare_polymorphic, bool first_argument) { + if (!t1 && !t2) +return true; + else if (!t1 || !t2) +return false; + if (TREE_CODE (t1) != TREE_CODE (t2)) return return_false_with_msg ("different tree types"); If we don't want labels to have null type then start_preparsed_function (decl.c:13607) has to be fixed.
[Bug rtl-optimization/63620] RELOAD lost SET_GOT dependency on Darwin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63620 Ilya Enkovich changed: What|Removed |Added CC||enkovich.gnu at gmail dot com --- Comment #13 from Ilya Enkovich --- Created attachment 33841 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33841&action=edit Reproducer for Linux
[Bug middle-end/63766] [5 Regression] ICE: in gimple_predict_edge, at predict.c:578
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63766 --- Comment #2 from Ilya Enkovich --- Problem caused by the fact that now all function come to local optimizations in SSA form. It affects inline parameters computation and therefore inlining order. During early SRA we call convert_callers_for_node which recomputes inline parameters for functions in SSA form. Previously it was computed only for function already processed by all early local passes. Now all functions are in SSA form and it means we may recompute inline parameters for function not yet processed by local optimizations. In this test we have function marked as inlinable which is not yet processed in do_per_function_toporder called for local_optimization_passes. It allows this function to be inlined and removed before it is actually processed (and still sit in order vector). Another cgraph_node created by SRA is allocated at the same slot as removed one and thus the same function is processed twice, which causes ICE in profiling pass. Solution here would be to either use another condition for inline_parameters recomputation or to handle nodes removal in do_per_function_toporder by registering proper node removal hook. Suppose the latter is better because allows more early inlining. Here is a possible fix (works for reproducer, not fully tested): diff --git a/gcc/passes.c b/gcc/passes.c index 5e91a79..4799efa 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -1609,6 +1609,19 @@ do_per_function (void (*callback) (function *, void *data), void *data) static int nnodes; static GTY ((length ("nnodes"))) cgraph_node **order; +static void +remove_cgraph_node_from_order (cgraph_node *node, void *) +{ + int i; + + for (i = 0; i < nnodes; i++) +if (order[i] == node) + { + order[i] = NULL; + return; + } +} + /* If we are in IPA mode (i.e., current_function_decl is NULL), call function CALLBACK for every function in the call graph. Otherwise, call CALLBACK on the current function. @@ -1622,13 +1635,20 @@ do_per_function_toporder (void (*callback) (function *, void *data), void *data) callback (cfun, data); else { + cgraph_node_hook_list *hook; gcc_assert (!order); order = ggc_vec_alloc (symtab->cgraph_count); nnodes = ipa_reverse_postorder (order); for (i = nnodes - 1; i >= 0; i--) order[i]->process = 1; + hook = symtab->add_cgraph_removal_hook (remove_cgraph_node_from_order, + NULL); for (i = nnodes - 1; i >= 0; i--) { + /* Function could be inlined and removed as unreachable. */ + if (!order[i]) + continue; + struct cgraph_node *node = order[i]; /* Allow possibly removed nodes to be garbage collected. */ @@ -1637,6 +1657,7 @@ do_per_function_toporder (void (*callback) (function *, void *data), void *data) if (node->has_gimple_body_p ()) callback (DECL_STRUCT_FUNCTION (node->decl), data); } + symtab->remove_cgraph_removal_hook (hook); } ggc_free (order); order = NULL;
[Bug middle-end/63766] [5 Regression] ICE: in gimple_predict_edge, at predict.c:578
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63766 Ilya Enkovich changed: What|Removed |Added CC||enkovich.gnu at gmail dot com --- Comment #4 from Ilya Enkovich --- (In reply to Richard Biener from comment #3) > > That's quadratic in the number of nodes and thus a no-go. Why not delay > removing of unreachable nodes instead? If you go with the above then > you need to change that data-structure used. > Delaying removal of unreachable nodes would mean we perform all early optimization passes for node we will later remove. It should be much more expensive then having a hook iterating over nodes vector. I also may add order_idx field into cgraph_node structure or create a local hash to map nodes to indexes.
[Bug middle-end/63766] [5 Regression] ICE: in gimple_predict_edge, at predict.c:578
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63766 --- Comment #5 from Ilya Enkovich --- I forgot to mention PR in a ChangeLog. Patch is in trunk: https://gcc.gnu.org/ml/gcc-cvs/2014-11/msg00707.html
[Bug other/63992] fcheck-pointer-bounds and friends are undocumented
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63992 --- Comment #1 from Ilya Enkovich --- It is a part of already approved patch (https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02317.html) which waits for MPX runtime to be approved.
[Bug middle-end/63994] Ada bootstrap fails with -fcheck-pointer-bounds -mmpx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63994 --- Comment #1 from Ilya Enkovich --- What does "bootstrap with -fcheck-pointer-bounds -mmpx" mean? Any instruction on how to reproduce?
[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995 --- Comment #1 from Ilya Enkovich --- Created attachment 34052 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34052&action=edit reproducer
[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995 --- Comment #2 from Ilya Enkovich --- I had a successful bootstrap with instrumentation some time ago but it's not performed regularly. We are extending regression testing for instrumentation now and coverage should become better. This particular problem may be caused by multiple varpool_node for the same var. Will check it.
[Bug target/64056] [5 Regression] gcc.target/i386/chkp-strlen-4.c etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64056 Ilya Enkovich changed: What|Removed |Added CC||enkovich.gnu at gmail dot com --- Comment #1 from Ilya Enkovich --- I sent a patch (https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03097.html) to add checks for mempcpy availability.
[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995 --- Comment #3 from Ilya Enkovich --- Patch removing duplicating bounds symbols is in review. With this patch applied bootstrap goes till the end but there are lots of stage2 and stage3 comparison error. I looked into one of them and the difference is caused by '-gtoggle' option used for stage2 build and not used for stage3 build.
[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995 --- Comment #5 from Ilya Enkovich --- Created attachment 34112 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34112&action=edit -g0 problem reproducer
[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995 --- Comment #6 from Ilya Enkovich --- For attached -g0 problem reproducer: >gcc pr63995-2.c -c -O2 -mmpx -fcheck-pointer-bounds -g -o 1.o >gcc pr63995-2.c -c -O2 -mmpx -fcheck-pointer-bounds -g0 -o 2.o >objdump_pl -d 1.o >1.dump >objdump_pl -d 2.o >2.dump >diff 1.dump 2.dump 2c2 < 1.o: file format elf64-x86-64 --- > 2.o: file format elf64-x86-64 19,22c19,22 < 2b: b8 03 00 00 00 mov$0x3,%eax < 30: f3 0f 1b 1c 07 bndmk (%rdi,%rax,1),%bnd3 < 35: c7 44 24 10 ff ff ffmovl $0x,0x10(%rsp) < 3c: ff --- > 2b: c7 44 24 10 ff ff ffmovl $0x,0x10(%rsp) > 32: ff > 33: b8 03 00 00 00 mov$0x3,%eax > 38: f3 0f 1b 1c 07 bndmk (%rdi,%rax,1),%bnd3 Different instructions order is caused by different GIMPLE statements order after chkpopt pass. Will prepare a fix for that.
[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995 --- Comment #7 from Ilya Enkovich --- In chkpopt pass calls to bndmk are moved down to uses to decrease register pressure. Debug info introduces new uses and therefore it affects a position where bndmk calls appear. -g0 case: : r.field = -1; __bound_tmp.1_13 = __builtin_ia32_bndmk (&r, 4); test2.chkp (&r, __bound_tmp.1_13); -g case: : # DEBUG c => &r __bound_tmp.1_13 = __builtin_ia32_bndmk (&r, 4); # DEBUG __chkp_bounds_of_c => NULL r.field = -1; test2.chkp (&r, __bound_tmp.1_13); Will ignore debug statements when computing a new position for bounds load/creation (BTW debug statement seems to be damaged by gsi_move_before called for bndmk). Testing following fix: diff --git a/gcc/tree-chkp-opt.c b/gcc/tree-chkp-opt.c index ff390d7..b8d5d0b 100644 --- a/gcc/tree-chkp-opt.c +++ b/gcc/tree-chkp-opt.c @@ -1175,7 +1175,9 @@ chkp_reduce_bounds_lifetime (void) FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op) { - if (dom_bb && + if (is_gimple_debug (use_stmt)) + continue; + else if (dom_bb && dominated_by_p (CDI_DOMINATORS, dom_bb, gimple_bb (use_stmt))) {
[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995 --- Comment #8 from Ilya Enkovich --- With both patches applied bootstrap is OK
[Bug lto/64075] [5 Regression] ICE: in bp_pack_value, at data-streamer.h:106
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64075 --- Comment #4 from Ilya Enkovich --- (In reply to H.J. Lu from comment #3) > It was caused by r217655. The problem was introduced earlier when function_code field in tree_function_decl was extended to 12 bits. LTO streamers were not fixed appropriately. r217655 increased BUILT_IN_COMPLEX_MUL_MIN value and put it out of 11 bits which revealed the problem. With this patch compiles OK: diff --git a/gcc/tree-streamer-in.c b/gcc/tree-streamer-in.c index 99448dd..eb205ed 100644 --- a/gcc/tree-streamer-in.c +++ b/gcc/tree-streamer-in.c @@ -333,7 +333,7 @@ unpack_ts_function_decl_value_fields (struct bitpack_d *bp, tree expr) if (DECL_BUILT_IN_CLASS (expr) != NOT_BUILT_IN) { DECL_FUNCTION_CODE (expr) = (enum built_in_function) bp_unpack_value (bp, - 11); + 12); if (DECL_BUILT_IN_CLASS (expr) == BUILT_IN_NORMAL && DECL_FUNCTION_CODE (expr) >= END_BUILTINS) fatal_error ("machine independent builtin code out of range"); diff --git a/gcc/tree-streamer-out.c b/gcc/tree-streamer-out.c index ad58b84..0d87cff 100644 --- a/gcc/tree-streamer-out.c +++ b/gcc/tree-streamer-out.c @@ -300,7 +300,7 @@ pack_ts_function_decl_value_fields (struct bitpack_d *bp, tree expr) bp_pack_value (bp, DECL_PURE_P (expr), 1); bp_pack_value (bp, DECL_LOOPING_CONST_OR_PURE_P (expr), 1); if (DECL_BUILT_IN_CLASS (expr) != NOT_BUILT_IN) -bp_pack_value (bp, DECL_FUNCTION_CODE (expr), 11); +bp_pack_value (bp, DECL_FUNCTION_CODE (expr), 12); }
[Bug middle-end/63994] Ada bootstrap fails with -fcheck-pointer-bounds -mmpx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63994 --- Comment #3 from Ilya Enkovich --- (In reply to rguent...@suse.de from comment #2) > > > TARGET_CFLAGS="-O2 -g -mmpx -fcheck-pointer-bounds" TARGET_CXXFLAGS="-O2 > -g -mmpx -fcheck-pointer-bounds" BOOT_CFLAGS="-O2 -g -mmpx > -fcheck-pointer-bounds" /space/rguenther/src/svn/trunk/configure > --enable-languages=all,obj-c++,ada,go > > make -j12 TARGET_CFLAGS="-O2 -g -mmpx -fcheck-pointer-bounds" > TARGET_CXXFLAGS="-O2 -g -mmpx -fcheck-pointer-bounds" BOOT_CFLAGS="-O2 -g > -mmpx -fcheck-pointer-bounds" Building with these options I see Ada compiler is called with -fcheck-pointer-bounds. Option is in c-family/c.opt and shouldn't be passed for Ada compiler. We should either not pass CFLAGS for Ada during build or filter language in the compiler.
[Bug middle-end/63994] Ada bootstrap fails with -fcheck-pointer-bounds -mmpx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63994 --- Comment #5 from Ilya Enkovich --- (In reply to rguent...@suse.de from comment #4) > Any reason why non-C-family languages cannot use MPX? > > Richard. There is no fundamental restriction. If someone wants to implement Pointer Bounds Checker for some language, then he needs to define how it instruments the program on that language and implement it in the compiler. Currently it is defined and implemented for C-languages only.
[Bug lto/64075] [5 Regression] ICE: in bp_pack_value, at data-streamer.h:106
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64075 --- Comment #7 from Ilya Enkovich --- (In reply to Dmitry Gorbachev from comment #6) > The patch works, thanks! But the committed test is incorrect, because the > original, unpatched compiler, does not fail on it. It failed on functions > __mulsc3, __muldc3, __mulxc3, __multc3, __divsc3, __divdc3, __divxc3, and > __divtc3. Committed test is what you attached as a reproducer with function renamed to 'test'. Why shouldn't it work? I used it to reproduce and debug the issue on today's trunk compiler.
[Bug middle-end/63994] Ada bootstrap fails with -fcheck-pointer-bounds -mmpx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63994 --- Comment #7 from Ilya Enkovich --- (In reply to rguent...@suse.de from comment #6) > I see. I mainly wonder because of LTO which can combine TUs from C > and Ada and because for example both Fortran and Ada define > interoperability with C. All languages also share the common > C runtime builtins. > > Richard. It should be OK to mix instrumented and not instrumented codes. Instrumentation happens in early passes before LTO streams out. Therefore we can compile C file with '-fcheck-pointer-bounds -mmpx -flto -c', then compile fortran (or any other) file with '-c -flto' and finally pass generated objects to LTO. It may be inconvenient to avoid '-fcheck-pointer-bounds' for nonc-C files when you work with mixed codes. To handle it I may use langhooks and ignore '-fcheck-pointer-bounds' when it's not supported for used language.
[Bug target/64055] [5 regression] gnat.dg/derived_aggregate.adb FAILs on 32-bit i386
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64055 --- Comment #6 from Ilya Enkovich --- TREE_INT_CST_LOW (maxval) assumes integer constant anyway. Therefore we may use simpler check. It fixes gnat.dg/derived_aggregate.adb. diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 0fb78cc..84886da 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -1568,7 +1568,9 @@ chkp_find_bound_slots_1 (const_tree type, bitmap have_bound, HOST_WIDE_INT esize = TREE_INT_CST_LOW (TYPE_SIZE (etype)); unsigned HOST_WIDE_INT cur; - if (!maxval || integer_minus_onep (maxval)) + if (!maxval + || TREE_CODE (maxval) != INTEGER_CST + || integer_minus_onep (maxval)) return; for (cur = 0; cur <= TREE_INT_CST_LOW (maxval); cur++)
[Bug tree-optimization/64183] New: [5.0 Regression] Complete unroll doesn't happen for a while-loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64183 Bug ID: 64183 Summary: [5.0 Regression] Complete unroll doesn't happen for a while-loop Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: enkovich.gnu at gmail dot com Created attachment 34189 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34189&action=edit Reproducer There is a performance regression in DENMark after r218142. Regression happens because complete unroll computes max number of iterations for a while-loop in a different way. Reduced reproducer: int bits; unsigned int size; int max_code; void test () { int code = 0; while (code < max_code) code |= ((unsigned int) (size >> (--bits))); while (bits < (unsigned int)25) bits += 8; } Compilation string: gcc -std=c90 -m32 -O3 test.c -c -fdump-tree-cunroll-details Dump before r218142: Analyzing # of iterations of loop 2 exit condition [(unsigned int) (prephitmp_33 + 8), + , 8] <= 24 bounds on difference of bases: -4294967271 ... 24 ... Loop 2 iterates at most 4 times. Dump after r218142: Analyzing # of iterations of loop 2 exit condition [(unsigned int) (prephitmp_36 + 8), + , 8] <= 24 bounds on difference of bases: -4294967271 ... 24 ... Loop 2 iterates at most 536870911 times. While-loop condition has signed/unsigned comparison. But I believe the original estimation of 4 iterations is correct.
[Bug tree-optimization/64183] [5.0 Regression] Complete unroll doesn't happen for a while-loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64183 --- Comment #2 from Ilya Enkovich --- (In reply to Richard Biener from comment #1) > It works correctly for > > int bits; > > void > test () > { > while (bits < (unsigned int)25) > bits += 8; > } Right. But shift operator in the attached testcase somehow breaks it after r218142 adds a conversion to unsigned type for a second shift operand.
[Bug target/64003] valgrind complains about get_attr_length_nobnd in insn-attrtab.c from i386.md
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64003 --- Comment #17 from Ilya Enkovich --- (In reply to Jorn Wolfgang Rennecke from comment #13) > > AFAICS, the length attribute was broken in r217125 > https://gcc.gnu.org/ml/gcc-cvs/2014-11/msg00133.html If I understand the problem correctly the root is in attempt to get length of following instructions computing length for forwrad jump instruction. How comes r217125 is guilty for that? It doesn't introduce such computations, it just renames "length" attribute into "length_nobnd" for mentioned jump patterns. Do I miss something here?
[Bug target/64003] valgrind complains about get_attr_length_nobnd in insn-attrtab.c from i386.md
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64003 --- Comment #21 from Ilya Enkovich --- (In reply to Jeffrey A. Law from comment #20) > Ilya, it's the function call in this code I think: > > (cond [(eq_attr "length_nobnd" "!0") >(plus (symbol_ref ("ix86_bnd_prefixed_insn_p (insn)")) > (attr "length_nobnd")) > > You're calling out to ix86_bnd_prefixed_insn_p, and that's problematical for > branch shortening if I'm understanding Joern's comments here and David's > comments in the PA port correctly. Then we have three problematic patterns and the easiest way to handle it is to get rid of ix86_bnd_prefixed_insn_p call in length computation for them. I think the easiest way to do it is to have separate bnd and nobnd patterns for these instructions. Attached patch helps me to resolve valgrind error. Is such approach fine?
[Bug target/64003] valgrind complains about get_attr_length_nobnd in insn-attrtab.c from i386.md
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64003 --- Comment #22 from Ilya Enkovich --- Created attachment 34195 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34195&action=edit Proposed patch
[Bug target/64003] valgrind complains about get_attr_length_nobnd in insn-attrtab.c from i386.md
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64003 --- Comment #26 from Ilya Enkovich --- (In reply to rsand...@gcc.gnu.org from comment #25) > > If all you want to do is add 1 byte to the length to account for a prefix > then it might be cleaner to use ADJUST_INSN_LENGTH. You could then keep > the single nobnd patterns. Currently i386 target doesn't have ADJUST_INSN_LENGTH defined. So I prefer to keep it so and have all length definitions explicit in md file.
[Bug bootstrap/63995] Bootstrap error with -mmpx -fcheck-pointer-bounds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63995 --- Comment #12 from Ilya Enkovich --- For r218506 bootstrap with BOOT_CFLAGS="-O2 -g -fcheck-pointer-bounds -mmpx" on x86_64-unknown-linux-gnu is OK.
[Bug tree-optimization/61734] New: Regression in ABS_EXPR recognition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61734 Bug ID: 61734 Summary: Regression in ABS_EXPR recognition Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: enkovich.gnu at gmail dot com Recently a performance regression occurred in tests heavily using ABS computation (observed on x86 and ARM targets). It is caused by missing ABS_EXPR recognition which results in sub-optimal code. Problem appeared after this commit: commit 32ce9a5c4208411361402f60e672c4830da0bc8f Author: ebotcazou Date: Tue May 27 19:54:46 2014 + * fold-const.c (fold_comparison): Clean up and extend X +- C1 CMP C2 to X CMP C2 -+ C1 transformation to EQ_EXPR/NE_EXPR. Add X - Y CMP 0 to X CMP Y transformation. (fold_binary_loc) : Remove same transformations. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@210979 138bc75d-0d04-0410-961f-82ee72b054a4 Here is a simple test (tested on linux-x86_64): >cat test.i unsigned long test (unsigned char a, unsigned char b, unsigned long sum) { sum += ((a - b) > 0 ? (a - b) : -(a - b)); return sum; } >gcc-exp-build/bin/gcc test.i -m32 -O2 -fdump-tree-gimple -c >cat test.i.004t.gimple test (unsigned char a, unsigned char b, long unsigned int sum) { long unsigned int iftmp.0; int D.1720; int D.1721; int D.1724; int D.1726; long unsigned int D.1727; D.1720 = (int) a; D.1721 = (int) b; if (D.1720 > D.1721) goto ; else goto ; : D.1720 = (int) a; D.1721 = (int) b; D.1724 = D.1720 - D.1721; iftmp.0 = (long unsigned int) D.1724; goto ; : D.1721 = (int) b; D.1720 = (int) a; D.1726 = D.1721 - D.1720; iftmp.0 = (long unsigned int) D.1726; : sum = iftmp.0 + sum; D.1727 = sum; return D.1727; } With older compiler I have: >gcc-ref-build/bin/gcc test.i -m32 -O2 -fdump-tree-gimple -c >cat test.i.004t.gimple test (unsigned char a, unsigned char b, long unsigned int sum) { int D.1719; int D.1720; int D.1721; int D.1722; long unsigned int D.1723; long unsigned int D.1724; D.1719 = (int) a; D.1720 = (int) b; D.1721 = D.1719 - D.1720; D.1722 = ABS_EXPR ; D.1723 = (long unsigned int) D.1722; sum = D.1723 + sum; D.1724 = sum; return D.1724; } BTW both compilers generate ABS_EXPR when -O0 is used instead of -O2. Both compilers fail to generate ABS_EXPR when -m64 is used instead of -m32.
[Bug middle-end/61734] [4.10 Regression] Regression in ABS_EXPR recognition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61734 --- Comment #10 from Ilya Enkovich --- Thanks for the fix! Is there any reason for ABS_EXPR detection for not working on 64bit target for the same test? The only difference should be the long long type size. How does it affect optimizations?
[Bug middle-end/61734] [4.10 Regression] Regression in ABS_EXPR recognition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61734 --- Comment #12 from Ilya Enkovich --- Before your last fix both 32bit and 64bit versions of .original look similar except a condition. We have (a - b > 0) for 64 bit and (a > b) for 32bit. 64bit version (before and after the patch) { sum = ((int) a - (int) b > 0 ? (long unsigned int) ((int) a - (int) b) : (long unsigned int) ((int) b - (int) a)) + sum; return sum; } 32bit version (before the patch): { sum = ((int) a > (int) b ? (long unsigned int) ((int) a - (int) b) : (long unsigned int) ((int) b - (int) a)) + sum; return sum; } It is not clear why such difference exists though.
[Bug lto/62034] New: ICE for big statically initialized arrays compiled with LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62034 Bug ID: 62034 Summary: ICE for big statically initialized arrays compiled with LTO Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: enkovich.gnu at gmail dot com Created attachment 33259 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33259&action=edit Reproducer I get ICE when try to compile tests with big amount of statically initialized data. gcc --version gcc (GCC) 4.10.0 20140806 (experimental) Copyright (C) 2014 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. gcc -flto test.c gcc: internal compiler error: Segmentation fault (program lto1) 0x405c80 execute ../../gcc-ref/gcc/gcc.c:2900 0x409fe9 do_spec_1 ../../gcc-ref/gcc/gcc.c:4704 0x40d475 process_brace_body ../../gcc-ref/gcc/gcc.c:5987 0x40d2b1 handle_braces ../../gcc-ref/gcc/gcc.c:5901 0x40bf9d do_spec_1 ../../gcc-ref/gcc/gcc.c:5358 0x40d475 process_brace_body ../../gcc-ref/gcc/gcc.c:5987 0x40d2b1 handle_braces ../../gcc-ref/gcc/gcc.c:5901 0x40bf9d do_spec_1 ../../gcc-ref/gcc/gcc.c:5358 0x40c38c do_spec_1 ../../gcc-ref/gcc/gcc.c:5473 0x40d475 process_brace_body ../../gcc-ref/gcc/gcc.c:5987 0x40d2b1 handle_braces ../../gcc-ref/gcc/gcc.c:5901 0x40bf9d do_spec_1 ../../gcc-ref/gcc/gcc.c:5358 0x409664 do_spec_2 ../../gcc-ref/gcc/gcc.c:4405 0x409582 do_spec(char const*) ../../gcc-ref/gcc/gcc.c:4372 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. lto-wrapper: fatal error: gcc-ref-build/bin/gcc returned 4 exit status compilation terminated. /usr/bin/ld: lto-wrapper failed collect2: error: ld returned 1 exit status Debugger shows that problem appears when lto_input_tree tries to dig through a bunch of SCC entries in input stream. Each SCC entry cause two new functions (lto_input_tree and lto_input_tree_1) in the call stack. With many consequent SCC entries stack may grow too much (in my case compiler segfaulted with ~600 000 entries in the call stack). Attached test has a statically initialized array with a million elements. Bigger data set may be required to break the compiler if you use increased stack size. Problem appeared after this commit: https://gcc.gnu.org/ml/gcc-cvs/2014-07/msg00291.html Following patch removing recursion helps me to compile my tests: diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c index 698f926..25657da 100644 --- a/gcc/lto-streamer-in.c +++ b/gcc/lto-streamer-in.c @@ -1345,7 +1345,16 @@ lto_input_tree_1 (struct lto_input_block *ib, struct data_in *data_in, tree lto_input_tree (struct lto_input_block *ib, struct data_in *data_in) { - return lto_input_tree_1 (ib, data_in, streamer_read_record_start (ib), 0); + enum LTO_tags tag; + + /* Skip SCC entries. */ + while ((tag = streamer_read_record_start (ib)) == LTO_tree_scc) +{ + unsigned len, entry_len; + lto_input_scc (ib, data_in, &len, &entry_len); +} + + return lto_input_tree_1 (ib, data_in, tag, 0); } Did not fully test this patch yet.
[Bug middle-end/49959] New: ABS pattern is not recognized
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49959 Summary: ABS pattern is not recognized Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: enkovich@gmail.com Created attachment 24900 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24900 Simple test where ABS pattern is not recognized Here is optimization opportunity for ABS pattern recognizer which does not catch all cases. Here is a simple test for ABS computation: #define ABS(X)(((X)>0)?(X):-(X)) int test_abs(int *cur) { unsigned long sad = 0; sad = ABS(cur[0]); return sad; } GIMPLE for the test is good (phase optimized): test_abs (int * cur) { int D.2783; int D.2782; : D.2782_3 = *cur_2(D); D.2783_4 = ABS_EXPR ; return D.2783_4; } Now try to make a minor change in test: #define ABS(X)(((X)>0)?(X):-(X)) int test_abs(int *cur) { unsigned long sad = 0; sad += ABS(cur[0]); return sad; } GIMPLE becomes worse: test_abs (int * cur) { int D.2788; int D.2787; int D.2783; long unsigned int iftmp.0; : D.2783_4 = *cur_3(D); if (D.2783_4 > 0) goto ; else goto ; : iftmp.0_6 = (long unsigned int) D.2783_4; goto ; : D.2787_8 = -D.2783_4; iftmp.0_9 = (long unsigned int) D.2787_8; : # iftmp.0_1 = PHI D.2788_11 = (int) iftmp.0_1; return D.2788_11; } Compiler used for tests: Target: x86_64-unknown-linux-gnu Configured with: ../gcc/configure --prefix=/export/gcc-build --enable-languages=c,c++,fortran Thread model: posix gcc version 4.7.0 20110707 (experimental) (GCC) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O2' '-S' '-mtune=generic' '-march=x86-64'
[Bug rtl-optimization/50037] New: Unroll factor exceeds max trip count
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037 Bug #: 50037 Summary: Unroll factor exceeds max trip count Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: enkovich@gmail.com Created attachment 24971 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24971 Reproducer Here is a small loop on which GCC performs inefficient unroll: for ( count = ((*(hdrptr)) & 0xf) * 2; count > 0; count--, addr++ ) sum += *addr; This loop has maximum 30 iterations. If we use -O3 then this loop is vectorized. Resulting loop has maximum 30 / 8 = 3 iteration. Also vectorizer generates prologue and epilogue loops. Each of them has maximum 7 iterations. If we add -funroll-loops option then each of 3 generated by vectorizer loops is unrolled with unroll factor 8. It creates a lot of code which is never executed and also decreases performance due to additional checks and branches. Target: x86_64-unknown-linux-gnu Configured with: ../gcc1/configure --prefix=/export/gcc-perf/install --enable-languages=c,c++,fortran Thread model: posix gcc version 4.7.0 20110615 (experimental) (GCC) COLLECT_GCC_OPTIONS='-O3' '-funroll-loops' '-S' '-v' '-mtune=generic' '-march=x86-64' /export/gcc-perf/install/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/cc1 -quiet -v unroll_test.c -quiet -dumpbase unroll_test.c -mtune=generic -march=x86-64 -auxbase unroll_test -O3 -version -funroll-loops -o unroll_test.s GNU C (GCC) version 4.7.0 20110615 (experimental) (x86_64-unknown-linux-gnu) compiled by GNU C version 4.4.3, GMP version 4.3.1, MPFR version 2.4.2, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
[Bug rtl-optimization/50037] Unroll factor exceeds max trip count
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037 --- Comment #2 from Ilya Enkovich 2011-08-10 15:33:22 UTC --- I wouldn't blame vectorizer here. Following loop is unrolled with unroll factor 8 even if vectorizer is disabled: for ( count = ((*(hdrptr)) & 0x3) * 2; count > 0; count--, addr++ ) sum += *addr; BTW prologue loops generated by vectorizer also compute iterations count using 'AND' expression. Therefore we may frequently get prologue loops unrolled which is never profitable if we use such huge unroll factor.
[Bug rtl-optimization/50037] Unroll factor exceeds max trip count
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50037 --- Comment #8 from Ilya Enkovich 2011-08-15 09:06:18 UTC --- This patch did not work for me. Tried on following loop (-O2 -funroll-loops): for ( count = ((*(hdrptr)) & 0x7); count > 0; count--, addr++ ) sum += *addr; No multiplication by 2 but still have the same unroll. I also was hoping this patch would prevent unroll of prologue loop generated by vectorizer.It uses '& 7' expression for iterations computation but this loop also uses MIN expression to limit number of iteration and is still unrolled.
[Bug rtl-optimization/50088] New: movzbl is generated instead of movl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088 Bug #: 50088 Summary: movzbl is generated instead of movl Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: enkovich@gmail.com Created attachment 25016 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25016 Reproducer When spilled register is going to be used in subreg expression then short load is generated to fill register. Example: movl %edx, 0x34(%esp) jz 0x1498 Block 34: movzxb 0x34(%esp), %ecx shl %cl, %eax It is correct but may cause performance problems. I doubt there are situations when zero extended load is better than natural one. On Atom processors (and probably some others) such situations cause stalls because store forwarding does not work for store/load pair using different access sizes. For example EEMBC 2.0/huffde has ~6% performance improvement on Atom if we replace such movzbl with movl. Attached reproducer demonstrates fills performed via movzbl. Used compiler and options: Target: x86_64-unknown-linux-gnu Configured with: ../gcc1/configure --prefix=/export/users/gcc-perf/install --enable-languages=c,c++,fortran Thread model: posix gcc version 4.7.0 20110615 (experimental) (GCC) COLLECT_GCC_OPTIONS='-O2' '-m32' '-S' '-v' '-mtune=generic' '-march=x86-64' /export/users/gcc-perf/install/libexec/gcc/x86_64-unknown-linux-gnu/4.7.0/cc1 -quiet -v -imultilib 32 test_movzbl.c -quiet -dumpbase test_movzbl.c -m32 -mtune=generic -march=x86-64 -auxbase test_movzbl -O2 -version -o test_movzbl.s GNU C (GCC) version 4.7.0 20110615 (experimental) (x86_64-unknown-linux-gnu) compiled by GNU C version 4.4.3, GMP version 4.3.1, MPFR version 2.4.2, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
[Bug rtl-optimization/50088] movzbl is generated instead of movl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50088 --- Comment #2 from Ilya Enkovich 2011-08-15 13:24:05 UTC --- Actually we do not need any zero extensions here. Zero extended load appears only after IRA if we have to spill/fill register. Here is c code from reproducer: n1 = (n1 + 1) & 15; s += arr[i] << n1; RTL before IRA: (insn 67 66 68 4 (parallel [ (set (reg/v:SI 97 [ n1 ]) (plus:SI (reg/v:SI 97 [ n1 ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) test_movzbl.c:18 249 {*addsi_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 68 67 70 4 (parallel [ (set (reg/v:SI 97 [ n1 ]) (and:SI (reg/v:SI 97 [ n1 ]) (const_int 15 [0xf]))) (clobber (reg:CC 17 flags)) ]) test_movzbl.c:18 385 {*andsi_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 70 68 71 4 (set (reg:SI 262) (mem:SI (reg:SI 224 [ ivtmp.52 ]) [2 MEM[base: D.2889_232, offset: 0B]+0 S4 A32])) test_movzbl.c:20 64 {*movsi_internal} (nil)) (insn 71 70 72 4 (parallel [ (set (reg:SI 262) (ashift:SI (reg:SI 262) (subreg:QI (reg/v:SI 97 [ n1 ]) 0))) (clobber (reg:CC 17 flags)) ]) test_movzbl.c:20 502 {*ashlsi3_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (ashift:SI (mem:SI (reg:SI 224 [ ivtmp.52 ]) [2 MEM[base: D.2889_232, offset: 0B]+0 S4 A32]) (subreg:QI (reg/v:SI 97 [ n1 ]) 0)) (nil IRA then introduces fill for shift instruction and use byte load for it: (insn 155 70 71 4 (set (reg:QI 2 cx) (mem/c:QI (reg/f:SI 7 sp) [4 %sfp+-28 S1 A32])) test_movzbl.c:20 66 {*movqi_internal} (nil)) (insn 71 155 72 4 (parallel [ (set (reg:SI 5 di [262]) (ashift:SI (reg:SI 5 di [262]) (reg:QI 2 cx))) (clobber (reg:CC 17 flags)) ]) test_movzbl.c:20 502 {*ashlsi3_1} (expr_list:REG_EQUAL (ashift:SI (mem:SI (reg:SI 0 ax [orig:224 ivtmp.52 ] [224]) [2 MEM[base: D.2889_232, offset: 0B]+0 S4 A32]) (subreg:QI (mem/c:SI (reg/f:SI 7 sp) [4 %sfp+-28 S4 A32]) 0)) (nil))) Load for shift then is emitted as movzbl.