[PATCH] doc: discourage use of __attribute__((optimize())) in production code

2015-12-13 Thread Markus Trippelsdorf
Many developers are still using __attribute__((optimize())) in
production code, although it quite broken. 

* doc/extend.texi (Common Function Attributes) [optimize]:
Discourage use of the optimize attribute.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 883d9b334ab5..a1359f5579ce 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3021,10 +3021,8 @@ that affect more than one function.
 @xref{Function Specific Option Pragmas}, for details about the
 @samp{#pragma GCC optimize} pragma.
 
-This can be used for instance to have frequently-executed functions
-compiled with more aggressive optimization options that produce faster
-and larger code, while other functions can be compiled with less
-aggressive options.
+This attribute should only be used for debugging purposes.  It is not
+suitable for production code.
 
 @item pure
 @cindex @code{pure} function attribute
-- 
Markus


[PATCH] m68k testsuite TLC

2015-12-13 Thread Jeff Law


'twas about to post a fix for an old m68k bug when I noticed the m68k 
target tests weren't clean.  Two minor issues.


First, 20090709-1.c.  It's expecting a definition and two references to 
a string in the constant pool.  However, one reference to that string 
was eliminated in favor of using an integer constant.  This patch 
adjusts the expected output.


Second, pr63347 includes , but doesn't really need it.  Removed 
the #include and added "-w" to the options to keep it quiet about 
missing prototypes and such.


Those two changes make m68k.exp run cleanly.  Committing to the trunk.

Jeff
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 9c37176..38d1dbe 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2015-12-13  Jeff Law  
+
+   * gcc.target/m68k/pr63347.c: Remove #include  add -w to
+   command line options.
+   * gcc.target/m68k/20090709-1.c: Adjust expected output.
+
 2015-12-12  David Edelsohn  
 
* g++.dg/opt/pr48549.C: XFAIL AIX.
diff --git a/gcc/testsuite/gcc.target/m68k/20090709-1.c 
b/gcc/testsuite/gcc.target/m68k/20090709-1.c
index fda05b7..ce835a1 100644
--- a/gcc/testsuite/gcc.target/m68k/20090709-1.c
+++ b/gcc/testsuite/gcc.target/m68k/20090709-1.c
@@ -1,11 +1,11 @@
 /* { dg-do compile } */
 /* There should be 3 occurrences of .LC0 in the code:
one for the definition of "0",
-   one for use in test1() and
one for use in test2().
+   The occurrence in test1 is collapsed to an integer constant
FIXME: At the moment m68k GCC does not optimize test1() to nop
for some reason.  */
-/* { dg-final { scan-assembler-times ".LC0" 3 } } */
+/* { dg-final { scan-assembler-times ".LC0" 2 } } */
 
 void dummy(char *arg);
 
diff --git a/gcc/testsuite/gcc.target/m68k/pr63347.c 
b/gcc/testsuite/gcc.target/m68k/pr63347.c
index 1d23e9a..6396476 100644
--- a/gcc/testsuite/gcc.target/m68k/pr63347.c
+++ b/gcc/testsuite/gcc.target/m68k/pr63347.c
@@ -1,7 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mcpu=5208" } */
-
-#include 
+/* { dg-options "-O2 -mcpu=5208 -w" } */
 
 void __attribute__ ((noinline))
 oof()


[PATCH][PR target/19201] Peephole to improve clearing items in structure for m68k

2015-12-13 Thread Jeff Law


Every release I try to dig through some old bug reports and clear them 
out of the system.  This release is no different.


I'm a bit surprised I didn't tackle bz19201 in prior releases given that 
Kazu had an almost correct patch attached to the BZ.


I suspect this could be generalized to more than just clearing bytes. 
But I doubt it's worth any real effort as this is ultimately m68k specific.


Essentially Kazu's patch rewrites a load+store sequence where the load 
feeds the store's address computation.  He was missing a check to verify 
the that the target of the load was an address register.  That in turn 
causes failures such as reported by Mikael Pettersson in c#15 when he 
tried including the patch in his builds.


I was able to trigger the same kind of failure as Mikael using Kazu's 
patch and was able to confirm that adding the ADDRESS_REG_P check avoids 
those problems.


I've got a bootstrap running, but it'll be, umm, a long time before it 
finishes (both for the control build and the build with the updated 
patch).  The control build is into stage2 while the build with the patch 
is still building libraries for stage1.  But it's well past the point 
where it failed with the incorrect initial patch.


Installed on the trunk.

jeff



commit 42582e91448c6afcc162d95229e14aac23a950ed
Author: Jeff Law 
Date:   Sun Dec 13 06:05:03 2015 -0700

[PATCH][PR target/19201] Peephole to improve clearing items in structure 
for m68k
* config/m68k/m68k.md (load feeding clear byte): New peephole2.

* gcc.target/m68k/pr19201.c: New test.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index eee44a7..730c79f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2015-12-13  Kazu Kirata  
+
+   * config/m68k/m68k.md (load feeding clear byte): New peephole2.
+
 2015-12-13  Tom de Vries  
 
* tree-ssa-structalias.c (find_func_clobbers): Handle sizes and kinds
diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
index 1eaf58f..444515a 100644
--- a/gcc/config/m68k/m68k.md
+++ b/gcc/config/m68k/m68k.md
@@ -7601,3 +7601,36 @@
 
 (include "cf.md")
 (include "sync.md")
+
+;; Convert
+;;
+;; move.l 4(%a0),%a0
+;; clr.b (%a0,%a1.l)
+;;
+;; into
+;;
+;; add.l 4(%a0),%a1
+;; clr.b (%a1)
+;;
+;; The latter is smaller.  It is faster on all models except m68060.
+
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand" "")
+   (mem:SI (plus:SI (match_operand:SI 1 "register_operand" "")
+(match_operand:SI 2 "const_int_operand" ""
+   (set (mem:QI (plus:SI (match_operand:SI 3 "register_operand" "")
+(match_operand:SI 4 "register_operand" "")))
+   (const_int 0))]
+  "(optimize_size || !TUNE_68060)
+   && (operands[0] == operands[3] || operands[0] == operands[4])
+   && ADDRESS_REG_P (operands[1])
+   && ADDRESS_REG_P ((operands[0] == operands[3]) ? operands[4] : operands[3])
+   && peep2_reg_dead_p (2, operands[3])
+   && peep2_reg_dead_p (2, operands[4])"
+  [(set (match_dup 5)
+   (plus:SI (match_dup 5)
+(mem:SI (plus:SI (match_dup 1)
+ (match_dup 2)
+   (set (mem:QI (match_dup 5))
+   (const_int 0))]
+  "operands[5] = (operands[0] == operands[3]) ? operands[4] : operands[3];")
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 38d1dbe..3f0862e 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,7 @@
 2015-12-13  Jeff Law  
 
+   * gcc.target/m68k/pr19201.c: New test.
+
* gcc.target/m68k/pr63347.c: Remove #include  add -w to
command line options.
* gcc.target/m68k/20090709-1.c: Adjust expected output.
diff --git a/gcc/testsuite/gcc.target/m68k/pr19201.c 
b/gcc/testsuite/gcc.target/m68k/pr19201.c
new file mode 100644
index 000..dd81857
--- /dev/null
+++ b/gcc/testsuite/gcc.target/m68k/pr19201.c
@@ -0,0 +1,13 @@
+/* { dg-options "-w -O2 -fomit-frame-pointer" } */
+/* { dg-final { scan-assembler-not "%a.,%\[ad\]..l" } } */
+
+struct X { 
+  char *a; 
+  /* other members */ 
+  int b; 
+}; 
+ 
+void f (struct X *x) 
+{ 
+  x->a[x->b] = 0; 
+}


Re: [PATCH, 4/16] Implement -foffload-alias

2015-12-13 Thread Tom de Vries

On 11/12/15 14:00, Richard Biener wrote:

On Fri, 11 Dec 2015, Tom de Vries wrote:


On 13/11/15 12:39, Jakub Jelinek wrote:

We simply have some compiler internal interface between the caller and
callee of the outlined regions, each interface in between those has
its own structure type used to communicate the info;
we can attach attributes on the fields, or some flags to indicate some
properties interesting from aliasing POV.  We don't really need to perform
full IPA-PTA, perhaps it would be enough to a) record somewhere in cgraph
the relationship in between such callers and callees (for offloading regions
we already have "omp target entrypoint" attribute on the callee and a
singler caller), tell LTO if possible not to split those into different
partitions if easily possible, and then just for these pairs perform
aliasing/points-to analysis in the caller and the result record using
cliques/special attributes/whatever to the callee side, so that the callee
(outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias analysis.


Hi,

This work-in-progress patch allows me to use IPA PTA information in the
kernels pass group.

Since:
-  I'm running IPA PTA before ealias, and IPA PTA does not interpret
restrict, and
- compute_may_alias doesn't run if IPA PTA information is present
I needed to convince ealias to do the restrict clique/base annotation.

It would be more logical to fit IPA PTA after ealias, but one is an IPA pass,
the other a regular one-function pass, so I would have to split the containing
pass groups pass_all_early_optimizations and pass_local_optimization_passes.
I'll give that a try now.



I've tried this approach, but realized that this changes the order in 
which non-openacc functions are processed in the compiler, so I've 
abandoned this idea.



Any comments?


I don't think you want to run IPA PTA before early
optimizations, it (and ealias) rely on some initial cleanup to
do anything meaningful with well-spent ressources.

The local PTA "hack" also looks more like a waste of resources, but well
... teaching IPA PTA to honor restrict might be an impossible task
though I didn't think much about it other than handling it only for
nonlocal_p functions (for others we should see all incoming args
if IPA PTA works optimally).  The restrict tags will leak all over
the place of course and in the end no meaningful cliques may remain.



This patch:
- moves the kernels pass group to the first position in the pass list
  after ealias where we're back in ipa mode
- inserts an new ipa pass to contain the gimple pass group called
  pass_oacc_ipa
- inserts a version of ipa-pta before the pass group.

Bootstrapped and reg-tested on x86_64.

OK for stage3 trunk?

Thanks,
- Tom

Add pass_oacc_ipa

---
 gcc/passes.def  | 37 ++-
 gcc/testsuite/g++.dg/ipa/devirt-37.C| 10 
 gcc/testsuite/g++.dg/ipa/devirt-40.C|  4 +--
 gcc/testsuite/g++.dg/tree-ssa/pr61034.C | 10 
 gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c   |  4 +--
 gcc/testsuite/gcc.dg/ipa/ipa-pta-3.c|  4 +--
 gcc/testsuite/gcc.dg/ipa/ipa-pta-4.c|  4 +--
 gcc/tree-pass.h |  3 ++-
 gcc/tree-ssa-loop.c | 40 ++
 gcc/tree-ssa-structalias.c  | 44 +
 10 files changed, 102 insertions(+), 58 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 43ce3d5..579dd63 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -88,24 +88,7 @@ along with GCC; see the file COPYING3.  If not see
 	  /* pass_build_ealias is a dummy pass that ensures that we
 	 execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
-	  /* Pass group that runs when the function is an offloaded function
-	 containing oacc kernels loops.  Part 1.  */
-	  NEXT_PASS (pass_oacc_kernels);
-	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
-	  NEXT_PASS (pass_ch);
-	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_fre);
-	  /* Pass group that runs when the function is an offloaded function
-	 containing oacc kernels loops.  Part 2.  */
-	  NEXT_PASS (pass_oacc_kernels2);
-	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels2)
-	  /* We use pass_lim to rewrite in-memory iteration and reduction
-		 variable accesses in loops into local variables accesses.  */
-	  NEXT_PASS (pass_lim);
-	  NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
-	  NEXT_PASS (pass_dce);
-	  NEXT_PASS (pass_expand_omp_ssa);
-	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_merge_phi);
   NEXT_PASS (pass_dse);
 	  NEXT_PASS (pass_cd_dce);
@@ -124,6 +107,26 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_rebuild_cgraph_edges);
   NEXT_PASS (pass_inline_parameters);
   POP_INSERT_PASSES ()
+
+  NEXT_PASS (pass_ipa_pta_oacc_kernels);
+  NEXT_PASS (pass_oacc_ipa);
+  PUSH_INSERT_PASSES_WITHIN (pass_oacc_ipa)
+  /* Pass group t

[PIING][PATCH, 9/16] Add pass_parallelize_loops_oacc_kernels

2015-12-13 Thread Tom de Vries

On 24/11/15 13:24, Tom de Vries wrote:

On 16/11/15 12:59, Tom de Vries wrote:

On 09/11/15 20:52, Tom de Vries wrote:

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


This patch adds pass_parallelize_loops_oacc_kernels.

There's a number of things we do differently in parloops for oacc
kernels:
- in normal parloops, we generate code to choose between a parallel
   version of the loop, and a sequential (low iteration count) version.
   Since the code in oacc kernels region is supposed to run on the
   accelerator anyway, we skip this check, and don't add a low iteration
   count loop.
- in normal parloops, we generate an #pragma omp parallel /
   GIMPLE_OMP_RETURN pair to delimit the region which will we split off
   into a thread function. Since the oacc kernels region is already
   split off, we don't add this pair.
- we indicate the parallelization factor by setting the oacc function
   attributes
- we generate an #pragma oacc loop instead of an #pragma omp for, and
   we add the gang clause
- in normal parloops, we rewrite the variable accesses in the loop in
   terms into accesses relative to a thread function parameter. For the
   oacc kernels region, that rewrite has already been done at omp-lower,
   so we skip this.
- we need to ensure that the entire kernels region can be run in
   parallel. The loop independence check is already present, so for oacc
   kernels we add a check between blocks outside the loop and the entire
   region.
- we guard stores in the blocks outside the loop with gang_pos == 0.
   There's no need for each gang to write to a single location, we can
   do this in just one gang. (Typically this is the write of the final
   value of the iteration variable if that one is copied back to the
   host).



Reposting with loop optimizer init added in
pass_parallelize_loops_oacc_kernels::execute.



Reposting with loop_optimizer_finalize,scev_initialize and scev_finalize
  added in pass_parallelize_loops_oacc_kernels::execute.



Ping.

Anything I can do to facilitate the review?

Thanks,
 Tom






Re: [PATCH 6/6] [DJGPP] configure.ac: enable LTO

2015-12-13 Thread Jeff Law

On 12/12/2015 09:40 AM, Andris Pavenis wrote:

On 12/11/2015 12:32 AM, Jeff Law wrote:

On 12/05/2015 10:25 AM, Andris Pavenis wrote:

Patch enables LTO support for DJGPP in top level configure.ac

Andris

2015-12-05 Andris Pavenis 

* configure.ac: enable LTO for *-*-msdosdjgpp

OK once prereqs have gone in.  Note you should to the autoconf dance
to update the generated files.  Mention them in your ChangeLog as
* configure: Regenerated

jeff



Updated patch is in attachment.

Andris

PS. Somebody other should apply this and my other DJGPP related patches
as I do not have SVN write access.

I think the better solution is to get you write access :-)

https://gcc.gnu.org/svnwrite.html

You can list me as your sponsor.

Cheers,
Jeff


Re: [PATCH 6/6] [DJGPP] configure.ac: enable LTO

2015-12-13 Thread DJ Delorie

> You can list me as your sponsor.

I've been wanting him to be a djgpp target/host maintainer for years
anyway, so +1 from me :-)


Re: [PATCH] doc: discourage use of __attribute__((optimize())) in production code

2015-12-13 Thread Andi Kleen
Markus Trippelsdorf  writes:

> Many developers are still using __attribute__((optimize())) in
> production code, although it quite broken. 

Wo reads documentation? @) If you want to discourage it better warn once
at runtime.

-Andi


[PATCH] rs6000: Fix a mistake in cstore_si_as_di (PR68865, PR68879)

2015-12-13 Thread Segher Boessenkool
convert_move does not know how to zero-extend a constant integer to the
target mode -- simply because it does not know the target mode.  As a
result, 32-bit SImode with the high bit set would be effectively sign-
extended instead of zero-extended.

This patch fixes it.  Is this okay for trunk?  (bootstrap+regtest in
progress, on powerpc64-linux).


Segher


2015-12-14  Segher Boessenkool  

PR target/68865
PR target/68879
* rs6000.md (cstore_si_as_di): Force all operands into registers.

---
 gcc/config/rs6000/rs6000.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index e2dca1b..f962f04 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10618,6 +10618,8 @@ (define_expand "cstore_si_as_di"
   int uns_flag = unsigned_comparison_operator (operands[1], VOIDmode) ? 1 : 0;
   enum rtx_code cond_code = signed_condition (GET_CODE (operands[1]));
 
+  operands[2] = force_reg (SImode, operands[2]);
+  operands[3] = force_reg (SImode, operands[3]);
   rtx op1 = gen_reg_rtx (DImode);
   rtx op2 = gen_reg_rtx (DImode);
   convert_move (op1, operands[2], uns_flag);
-- 
1.9.3



Re: [PATCH] rs6000: Fix a mistake in cstore_si_as_di (PR68865, PR68879)

2015-12-13 Thread Segher Boessenkool
On Mon, Dec 14, 2015 at 07:04:06AM +, Segher Boessenkool wrote:
> convert_move does not know how to zero-extend a constant integer to the
> target mode -- simply because it does not know the target mode.

That last "target" should be "source", of course.


Segher