[PATCH] doc: discourage use of __attribute__((optimize())) in production code
Many developers are still using __attribute__((optimize())) in production code, although it quite broken. * doc/extend.texi (Common Function Attributes) [optimize]: Discourage use of the optimize attribute. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 883d9b334ab5..a1359f5579ce 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -3021,10 +3021,8 @@ that affect more than one function. @xref{Function Specific Option Pragmas}, for details about the @samp{#pragma GCC optimize} pragma. -This can be used for instance to have frequently-executed functions -compiled with more aggressive optimization options that produce faster -and larger code, while other functions can be compiled with less -aggressive options. +This attribute should only be used for debugging purposes. It is not +suitable for production code. @item pure @cindex @code{pure} function attribute -- Markus
[PATCH] m68k testsuite TLC
'twas about to post a fix for an old m68k bug when I noticed the m68k target tests weren't clean. Two minor issues. First, 20090709-1.c. It's expecting a definition and two references to a string in the constant pool. However, one reference to that string was eliminated in favor of using an integer constant. This patch adjusts the expected output. Second, pr63347 includes , but doesn't really need it. Removed the #include and added "-w" to the options to keep it quiet about missing prototypes and such. Those two changes make m68k.exp run cleanly. Committing to the trunk. Jeff diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 9c37176..38d1dbe 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2015-12-13 Jeff Law + + * gcc.target/m68k/pr63347.c: Remove #include add -w to + command line options. + * gcc.target/m68k/20090709-1.c: Adjust expected output. + 2015-12-12 David Edelsohn * g++.dg/opt/pr48549.C: XFAIL AIX. diff --git a/gcc/testsuite/gcc.target/m68k/20090709-1.c b/gcc/testsuite/gcc.target/m68k/20090709-1.c index fda05b7..ce835a1 100644 --- a/gcc/testsuite/gcc.target/m68k/20090709-1.c +++ b/gcc/testsuite/gcc.target/m68k/20090709-1.c @@ -1,11 +1,11 @@ /* { dg-do compile } */ /* There should be 3 occurrences of .LC0 in the code: one for the definition of "0", - one for use in test1() and one for use in test2(). + The occurrence in test1 is collapsed to an integer constant FIXME: At the moment m68k GCC does not optimize test1() to nop for some reason. */ -/* { dg-final { scan-assembler-times ".LC0" 3 } } */ +/* { dg-final { scan-assembler-times ".LC0" 2 } } */ void dummy(char *arg); diff --git a/gcc/testsuite/gcc.target/m68k/pr63347.c b/gcc/testsuite/gcc.target/m68k/pr63347.c index 1d23e9a..6396476 100644 --- a/gcc/testsuite/gcc.target/m68k/pr63347.c +++ b/gcc/testsuite/gcc.target/m68k/pr63347.c @@ -1,7 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mcpu=5208" } */ - -#include +/* { dg-options "-O2 -mcpu=5208 -w" } */ void __attribute__ ((noinline)) oof()
[PATCH][PR target/19201] Peephole to improve clearing items in structure for m68k
Every release I try to dig through some old bug reports and clear them out of the system. This release is no different. I'm a bit surprised I didn't tackle bz19201 in prior releases given that Kazu had an almost correct patch attached to the BZ. I suspect this could be generalized to more than just clearing bytes. But I doubt it's worth any real effort as this is ultimately m68k specific. Essentially Kazu's patch rewrites a load+store sequence where the load feeds the store's address computation. He was missing a check to verify the that the target of the load was an address register. That in turn causes failures such as reported by Mikael Pettersson in c#15 when he tried including the patch in his builds. I was able to trigger the same kind of failure as Mikael using Kazu's patch and was able to confirm that adding the ADDRESS_REG_P check avoids those problems. I've got a bootstrap running, but it'll be, umm, a long time before it finishes (both for the control build and the build with the updated patch). The control build is into stage2 while the build with the patch is still building libraries for stage1. But it's well past the point where it failed with the incorrect initial patch. Installed on the trunk. jeff commit 42582e91448c6afcc162d95229e14aac23a950ed Author: Jeff Law Date: Sun Dec 13 06:05:03 2015 -0700 [PATCH][PR target/19201] Peephole to improve clearing items in structure for m68k * config/m68k/m68k.md (load feeding clear byte): New peephole2. * gcc.target/m68k/pr19201.c: New test. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index eee44a7..730c79f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,7 @@ +2015-12-13 Kazu Kirata + + * config/m68k/m68k.md (load feeding clear byte): New peephole2. + 2015-12-13 Tom de Vries * tree-ssa-structalias.c (find_func_clobbers): Handle sizes and kinds diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md index 1eaf58f..444515a 100644 --- a/gcc/config/m68k/m68k.md +++ b/gcc/config/m68k/m68k.md @@ -7601,3 +7601,36 @@ (include "cf.md") (include "sync.md") + +;; Convert +;; +;; move.l 4(%a0),%a0 +;; clr.b (%a0,%a1.l) +;; +;; into +;; +;; add.l 4(%a0),%a1 +;; clr.b (%a1) +;; +;; The latter is smaller. It is faster on all models except m68060. + +(define_peephole2 + [(set (match_operand:SI 0 "register_operand" "") + (mem:SI (plus:SI (match_operand:SI 1 "register_operand" "") +(match_operand:SI 2 "const_int_operand" "" + (set (mem:QI (plus:SI (match_operand:SI 3 "register_operand" "") +(match_operand:SI 4 "register_operand" ""))) + (const_int 0))] + "(optimize_size || !TUNE_68060) + && (operands[0] == operands[3] || operands[0] == operands[4]) + && ADDRESS_REG_P (operands[1]) + && ADDRESS_REG_P ((operands[0] == operands[3]) ? operands[4] : operands[3]) + && peep2_reg_dead_p (2, operands[3]) + && peep2_reg_dead_p (2, operands[4])" + [(set (match_dup 5) + (plus:SI (match_dup 5) +(mem:SI (plus:SI (match_dup 1) + (match_dup 2) + (set (mem:QI (match_dup 5)) + (const_int 0))] + "operands[5] = (operands[0] == operands[3]) ? operands[4] : operands[3];") diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 38d1dbe..3f0862e 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,7 @@ 2015-12-13 Jeff Law + * gcc.target/m68k/pr19201.c: New test. + * gcc.target/m68k/pr63347.c: Remove #include add -w to command line options. * gcc.target/m68k/20090709-1.c: Adjust expected output. diff --git a/gcc/testsuite/gcc.target/m68k/pr19201.c b/gcc/testsuite/gcc.target/m68k/pr19201.c new file mode 100644 index 000..dd81857 --- /dev/null +++ b/gcc/testsuite/gcc.target/m68k/pr19201.c @@ -0,0 +1,13 @@ +/* { dg-options "-w -O2 -fomit-frame-pointer" } */ +/* { dg-final { scan-assembler-not "%a.,%\[ad\]..l" } } */ + +struct X { + char *a; + /* other members */ + int b; +}; + +void f (struct X *x) +{ + x->a[x->b] = 0; +}
Re: [PATCH, 4/16] Implement -foffload-alias
On 11/12/15 14:00, Richard Biener wrote: On Fri, 11 Dec 2015, Tom de Vries wrote: On 13/11/15 12:39, Jakub Jelinek wrote: We simply have some compiler internal interface between the caller and callee of the outlined regions, each interface in between those has its own structure type used to communicate the info; we can attach attributes on the fields, or some flags to indicate some properties interesting from aliasing POV. We don't really need to perform full IPA-PTA, perhaps it would be enough to a) record somewhere in cgraph the relationship in between such callers and callees (for offloading regions we already have "omp target entrypoint" attribute on the callee and a singler caller), tell LTO if possible not to split those into different partitions if easily possible, and then just for these pairs perform aliasing/points-to analysis in the caller and the result record using cliques/special attributes/whatever to the callee side, so that the callee (outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias analysis. Hi, This work-in-progress patch allows me to use IPA PTA information in the kernels pass group. Since: - I'm running IPA PTA before ealias, and IPA PTA does not interpret restrict, and - compute_may_alias doesn't run if IPA PTA information is present I needed to convince ealias to do the restrict clique/base annotation. It would be more logical to fit IPA PTA after ealias, but one is an IPA pass, the other a regular one-function pass, so I would have to split the containing pass groups pass_all_early_optimizations and pass_local_optimization_passes. I'll give that a try now. I've tried this approach, but realized that this changes the order in which non-openacc functions are processed in the compiler, so I've abandoned this idea. Any comments? I don't think you want to run IPA PTA before early optimizations, it (and ealias) rely on some initial cleanup to do anything meaningful with well-spent ressources. The local PTA "hack" also looks more like a waste of resources, but well ... teaching IPA PTA to honor restrict might be an impossible task though I didn't think much about it other than handling it only for nonlocal_p functions (for others we should see all incoming args if IPA PTA works optimally). The restrict tags will leak all over the place of course and in the end no meaningful cliques may remain. This patch: - moves the kernels pass group to the first position in the pass list after ealias where we're back in ipa mode - inserts an new ipa pass to contain the gimple pass group called pass_oacc_ipa - inserts a version of ipa-pta before the pass group. Bootstrapped and reg-tested on x86_64. OK for stage3 trunk? Thanks, - Tom Add pass_oacc_ipa --- gcc/passes.def | 37 ++- gcc/testsuite/g++.dg/ipa/devirt-37.C| 10 gcc/testsuite/g++.dg/ipa/devirt-40.C| 4 +-- gcc/testsuite/g++.dg/tree-ssa/pr61034.C | 10 gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c | 4 +-- gcc/testsuite/gcc.dg/ipa/ipa-pta-3.c| 4 +-- gcc/testsuite/gcc.dg/ipa/ipa-pta-4.c| 4 +-- gcc/tree-pass.h | 3 ++- gcc/tree-ssa-loop.c | 40 ++ gcc/tree-ssa-structalias.c | 44 + 10 files changed, 102 insertions(+), 58 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index 43ce3d5..579dd63 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -88,24 +88,7 @@ along with GCC; see the file COPYING3. If not see /* pass_build_ealias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); - /* Pass group that runs when the function is an offloaded function - containing oacc kernels loops. Part 1. */ - NEXT_PASS (pass_oacc_kernels); - PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) - NEXT_PASS (pass_ch); - POP_INSERT_PASSES () NEXT_PASS (pass_fre); - /* Pass group that runs when the function is an offloaded function - containing oacc kernels loops. Part 2. */ - NEXT_PASS (pass_oacc_kernels2); - PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels2) - /* We use pass_lim to rewrite in-memory iteration and reduction - variable accesses in loops into local variables accesses. */ - NEXT_PASS (pass_lim); - NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); - NEXT_PASS (pass_dce); - NEXT_PASS (pass_expand_omp_ssa); - POP_INSERT_PASSES () NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_dse); NEXT_PASS (pass_cd_dce); @@ -124,6 +107,26 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_rebuild_cgraph_edges); NEXT_PASS (pass_inline_parameters); POP_INSERT_PASSES () + + NEXT_PASS (pass_ipa_pta_oacc_kernels); + NEXT_PASS (pass_oacc_ipa); + PUSH_INSERT_PASSES_WITHIN (pass_oacc_ipa) + /* Pass group t
[PIING][PATCH, 9/16] Add pass_parallelize_loops_oacc_kernels
On 24/11/15 13:24, Tom de Vries wrote: On 16/11/15 12:59, Tom de Vries wrote: On 09/11/15 20:52, Tom de Vries wrote: On 09/11/15 16:35, Tom de Vries wrote: Hi, this patch series for stage1 trunk adds support to: - parallelize oacc kernels regions using parloops, and - map the loops onto the oacc gang dimension. The patch series contains these patches: 1Insert new exit block only when needed in transform_to_exit_first_loop_alt 2Make create_parallel_loop return void 3Ignore reduction clause on kernels directive 4Implement -foffload-alias 5Add in_oacc_kernels_region in struct loop 6Add pass_oacc_kernels 7Add pass_dominator_oacc_kernels 8Add pass_ch_oacc_kernels 9Add pass_parallelize_loops_oacc_kernels 10Add pass_oacc_kernels pass group in passes.def 11Update testcases after adding kernels pass group 12Handle acc loop directive 13Add c-c++-common/goacc/kernels-*.c 14Add gfortran.dg/goacc/kernels-*.f95 15Add libgomp.oacc-c-c++-common/kernels-*.c 16Add libgomp.oacc-fortran/kernels-*.f95 The first 9 patches are more or less independent, but patches 10-16 are intended to be committed at the same time. Bootstrapped and reg-tested on x86_64. Build and reg-tested with nvidia accelerator, in combination with a patch that enables accelerator testing (which is submitted at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). I'll post the individual patches in reply to this message. This patch adds pass_parallelize_loops_oacc_kernels. There's a number of things we do differently in parloops for oacc kernels: - in normal parloops, we generate code to choose between a parallel version of the loop, and a sequential (low iteration count) version. Since the code in oacc kernels region is supposed to run on the accelerator anyway, we skip this check, and don't add a low iteration count loop. - in normal parloops, we generate an #pragma omp parallel / GIMPLE_OMP_RETURN pair to delimit the region which will we split off into a thread function. Since the oacc kernels region is already split off, we don't add this pair. - we indicate the parallelization factor by setting the oacc function attributes - we generate an #pragma oacc loop instead of an #pragma omp for, and we add the gang clause - in normal parloops, we rewrite the variable accesses in the loop in terms into accesses relative to a thread function parameter. For the oacc kernels region, that rewrite has already been done at omp-lower, so we skip this. - we need to ensure that the entire kernels region can be run in parallel. The loop independence check is already present, so for oacc kernels we add a check between blocks outside the loop and the entire region. - we guard stores in the blocks outside the loop with gang_pos == 0. There's no need for each gang to write to a single location, we can do this in just one gang. (Typically this is the write of the final value of the iteration variable if that one is copied back to the host). Reposting with loop optimizer init added in pass_parallelize_loops_oacc_kernels::execute. Reposting with loop_optimizer_finalize,scev_initialize and scev_finalize added in pass_parallelize_loops_oacc_kernels::execute. Ping. Anything I can do to facilitate the review? Thanks, Tom
Re: [PATCH 6/6] [DJGPP] configure.ac: enable LTO
On 12/12/2015 09:40 AM, Andris Pavenis wrote: On 12/11/2015 12:32 AM, Jeff Law wrote: On 12/05/2015 10:25 AM, Andris Pavenis wrote: Patch enables LTO support for DJGPP in top level configure.ac Andris 2015-12-05 Andris Pavenis * configure.ac: enable LTO for *-*-msdosdjgpp OK once prereqs have gone in. Note you should to the autoconf dance to update the generated files. Mention them in your ChangeLog as * configure: Regenerated jeff Updated patch is in attachment. Andris PS. Somebody other should apply this and my other DJGPP related patches as I do not have SVN write access. I think the better solution is to get you write access :-) https://gcc.gnu.org/svnwrite.html You can list me as your sponsor. Cheers, Jeff
Re: [PATCH 6/6] [DJGPP] configure.ac: enable LTO
> You can list me as your sponsor. I've been wanting him to be a djgpp target/host maintainer for years anyway, so +1 from me :-)
Re: [PATCH] doc: discourage use of __attribute__((optimize())) in production code
Markus Trippelsdorf writes: > Many developers are still using __attribute__((optimize())) in > production code, although it quite broken. Wo reads documentation? @) If you want to discourage it better warn once at runtime. -Andi
[PATCH] rs6000: Fix a mistake in cstore_si_as_di (PR68865, PR68879)
convert_move does not know how to zero-extend a constant integer to the target mode -- simply because it does not know the target mode. As a result, 32-bit SImode with the high bit set would be effectively sign- extended instead of zero-extended. This patch fixes it. Is this okay for trunk? (bootstrap+regtest in progress, on powerpc64-linux). Segher 2015-12-14 Segher Boessenkool PR target/68865 PR target/68879 * rs6000.md (cstore_si_as_di): Force all operands into registers. --- gcc/config/rs6000/rs6000.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index e2dca1b..f962f04 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -10618,6 +10618,8 @@ (define_expand "cstore_si_as_di" int uns_flag = unsigned_comparison_operator (operands[1], VOIDmode) ? 1 : 0; enum rtx_code cond_code = signed_condition (GET_CODE (operands[1])); + operands[2] = force_reg (SImode, operands[2]); + operands[3] = force_reg (SImode, operands[3]); rtx op1 = gen_reg_rtx (DImode); rtx op2 = gen_reg_rtx (DImode); convert_move (op1, operands[2], uns_flag); -- 1.9.3
Re: [PATCH] rs6000: Fix a mistake in cstore_si_as_di (PR68865, PR68879)
On Mon, Dec 14, 2015 at 07:04:06AM +, Segher Boessenkool wrote: > convert_move does not know how to zero-extend a constant integer to the > target mode -- simply because it does not know the target mode. That last "target" should be "source", of course. Segher