Re: [PR103302] skip multi-word pre-move clobber during lra

2021-12-15 Thread Alexandre Oliva via Gcc-patches
On Dec  9, 2021, Jeff Law  wrote:

>> I found a similar pattern of issuing clobbers for multi-word moves, but
>> not when reload_in_progress, in expr.c:emit_move_complex_parts.  I don't
>> have a testcase, but I'm tempted to propose '!lra_in_progress &&' for it
>> as well.  Can you think of any reason not to?

> The only reason I can think of is we're in stage3 :-)  It'd be a lot
> easier to green light that if we could trigger an issue.

I have not found the cycles to try to construct a testcase to trigger
the issue, but before moving on, I have regstrapped this on
x86_64-linux-gnu, so, at least for now, I propose it for the next
release cycle.  Ok to install then?


[PR103302] skip multi-part clobber during lra for complex parts too

From: Alexandre Oliva 

As with the earlier patch, avoid emitting clobbers that we used to
avoid during reload also during LRA, now when moving complex
multi-part values.  We don't have a testcase for this one.


for  gcc/ChangeLog

PR target/103302
* expr.c (emit_move_complex_parts): Skip clobbers during lra.
---
 gcc/expr.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index 0365625e7b835..30d1735ec29ce 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -3736,7 +3736,7 @@ emit_move_complex_parts (rtx x, rtx y)
   /* Show the output dies here.  This is necessary for SUBREGs
  of pseudos since we cannot track their lifetimes correctly;
  hard regs shouldn't appear here except as return values.  */
-  if (!reload_completed && !reload_in_progress
+  if (!reload_completed && !reload_in_progress && !lra_in_progress
   && REG_P (x) && !reg_overlap_mentioned_p (x, y))
 emit_clobber (x);
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] testsuite: Robustify aarch64/simd tests against more aggressive DCE

2021-12-15 Thread Richard Sandiford via Gcc-patches
Marc Poulhiès via Gcc-patches  writes:
> Hello,
>
> We've observed that some aarch64 tests can fail if DCE is made more
> aggressive as it removes the builtin calls being tested for errors.
>
> This patch simply adds a LHS to these builtin calls to make sure DCE does
> not remove them at -O0.
>
> This patch has been tested on aarch64-elf. Ok to commit ?

The calls should still be diagnosed as incorrect, even if we don't
code-generate them.  The fact that we don't do that is a known bug
(in aarch64 code).

The new variables seem to be unused, so I think slightly stronger
DCE could remove the calls even after the patch.  Perhaps the containing
functions should take an int32x4_t *ptr or something, with the calls
assigning to different ptr[] indices.

I think it would be better to do that using new calls though,
and xfail the existing ones when they no longer work.  For example:

  /* { dg-error "lane -1 out of range 0 - 7" "" {target *-*-*} 0 } */
  vqdmlal_high_laneq_s16 (int32x4_a, int16x8_b, int16x8_c, -1);
  /* { dg-error "lane -1 out of range 0 - 7" "" {target *-*-*} 0 } */
  ptr[0] = vqdmlal_high_laneq_s16 (int32x4_a, int16x8_b, int16x8_c, -1);

That way we don't lose the existing tests.

Thanks,
Richard

>
> Thanks,
> Marc
>
> 2021-12-06  Marc Poulhiès  
>
> gcc/testsuite/
>
>   * gcc.target/aarch64/simd/vqdmlal_high_lane_s32_indices_1.c: Add
>   LHS to builtin calls.
>   * gcc.target/aarch64/simd/vqdmlal_high_lane_s16_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmlal_high_lane_s32_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmlal_high_laneq_s16_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmlal_high_laneq_s32_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmlal_lane_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmlal_lane_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmlal_laneq_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmlal_laneq_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmlalh_lane_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmlals_lane_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmlsl_high_lane_s16_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmlsl_high_lane_s32_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmlsl_high_laneq_s16_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmlsl_high_laneq_s32_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmlsl_lane_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmlsl_lane_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmlsl_laneq_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmlsl_laneq_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmlslh_lane_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmlsls_lane_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmulh_lane_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmulh_lane_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmulh_laneq_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmulh_laneq_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmulhh_lane_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmulhq_lane_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmulhq_lane_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmulhq_laneq_s16_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmulhq_laneq_s32_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmulhs_lane_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmull_high_lane_s16_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmull_high_lane_s32_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmull_high_laneq_s16_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmull_high_laneq_s32_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqdmull_lane_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmull_lane_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmull_laneq_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmull_laneq_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmullh_lane_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqdmulls_lane_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqrdmulh_lane_s16_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqrdmulh_lane_s32_indices_1.c: Likewise.
>   * gcc.target/aarch64/simd/vqrdmulh_laneq_s16_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqrdmulh_laneq_s32_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/vqrdmulhh_lane_s16_indices_1.c:
>   Likewise.
>   * gcc.target/aarch64/simd/

[PATCH] c++: don't ICE on NAMESPACE_DECL inside FUNCTION_DECL

2021-12-15 Thread Matthias Kretz
OK for trunk? This fixes several modules.exp failures for me.

── ✂ ──

Code like
  void swap() {
namespace __variant = __detail::__variant;
...
  }
create a NAMESPACE_DECL where the CP_DECL_CONTEXT is a FUNCTION_DECL.
DECL_TEMPLATE_INFO fails on NAMESPACE_DECL and therefore must be handled
first in the assertion.

Signed-off-by: Matthias Kretz 

gcc/cp/ChangeLog:

* module.cc (trees_out::get_merge_kind): NAMESPACE_DECLs also
cannot have a DECL_TEMPLATE_INFO.
---
 gcc/cp/module.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


--
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 stdₓ::simd
──diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 3b1b5ca0ac0..2b5a32695d2 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -10067,9 +10067,10 @@ trees_out::get_merge_kind (tree decl, depset *dep)
   tree ctx = CP_DECL_CONTEXT (decl);
   if (TREE_CODE (ctx) == FUNCTION_DECL)
 	{
-	  /* USING_DECLs cannot have DECL_TEMPLATE_INFO -- this isn't
-	 permitting them to have one.   */
+	  /* USING_DECLs and NAMESPACE_DECLs cannot have DECL_TEMPLATE_INFO --
+	 this isn't permitting them to have one.   */
 	  gcc_checking_assert (TREE_CODE (decl) == USING_DECL
+			   || TREE_CODE (decl) == NAMESPACE_DECL
 			   || !DECL_LANG_SPECIFIC (decl)
 			   || !DECL_TEMPLATE_INFO (decl));
 


Re: [patch, Fortran] Make REAL(KIND=16) detection more robust

2021-12-15 Thread FX via Gcc-patches
A gentle ping…


> Le 7 déc. 2021 à 15:11, FX  a écrit :
> 
> Hi,
> 
> Right now, the logic in libgfortran for the detection of REAL(KIND=16) is in 
> kinds-override.h:
> 
> /* What are the C types corresponding to the real(kind=10) and
>   real(kind=16) types? We currently rely on the following assumptions:
> -- if real(kind=10) exists, i.e. if HAVE_GFC_REAL_10 is defined,
>then it is necessarily the "long double" type
> -- if real(kind=16) exists, then:
> * if HAVE_GFC_REAL_10, real(kind=16) is "__float128"
>* otherwise, real(kind=16) is "long double"
>   To allow to change this in the future, we create the
>   GFC_REAL_16_IS_FLOAT128 macro that is used throughout libgfortran.  */
> 
> 
> Well, this may not be true of all platforms, and it’s possible to have other 
> combinations. On the aarch64-apple-darwin port, I’m currently playing with 
> enabling a binary128 floating-point mode, and that target has double == long 
> double… so the assumptions above are not true.
> 
> Funnily, we already have more fine-grained logic in the mk-kinds-h.sh script, 
> where we actually check the Fortran kind corresponding to C’s long double. We 
> just have to use it, and emit the GFC_REAL_16_IS_FLOAT128 / 
> GFC_REAL_16_IS_LONG_DOUBLE macros there.
> 
> 
> Bootstrapped and regtested on x86_64-linux, checked that no symbols were 
> introduced or removed.
> (and tested on a port to aarch64-apple-darwin).
> 
> OK to commit?
> 
> FX


libgfortran.patch
Description: Binary data


Re: [patch, Fortran] IEEE support for aarch64-apple-darwin

2021-12-15 Thread FX via Gcc-patches
ping for that patch

(don’t mind the ChangeLog question, I’ve figured it out, will include proper 
ChangeLog in the commit)


> Le 6 déc. 2021 à 17:32, FX  a écrit :
> 
> Hi everyone,
> 
> Since support for target aarch64-apple-darwin has been submitted for review, 
> it’s time to submit the Fortran part, i.e. enabling IEEE support on that 
> target.
> 
> The patch has been in use now for several months, in a developer branch 
> shipped by some distros on macOS (including Homebrew). It was authored more 
> than a year ago, but I figured it wasn’t relevant to submit until the target 
> was actually close to be in trunk: 
> https://github.com/iains/gcc-darwin-arm64/commit/b107973550d3d9a9ce9acc751adbbe2171d13736
> 
> Bootstrapped and tested on aarch64-apple-darwin20 (macOS Big Sur) and 
> aarch64-apple-darwin21 (macOS Monterey).
> 
> OK to merge?
> Can someone point me to the right way of formatting ChangeLogs and commit 
> entries, nowadays?
> 
> 
> Thanks,
> FX
> 
> 



Re: [PATCH] x86: PR target/103611: Splitter for DST:DI = (HI:SI<<32)|LO:SI.

2021-12-15 Thread Uros Bizjak via Gcc-patches
On Mon, Dec 13, 2021 at 3:10 PM Roger Sayle  wrote:
>
>
> A common idiom is to create a DImode value from the "concat" of two SImode
> values, using "(long long)hi << 32 | (long long)lo", where the operation
> may be ior, xor or plus.  On x86, with -m32, the high and low parts of
> a DImode register are actually different SImode registers (typically %edx
> and %eax) so ideally this idiom should reduce to two move instructions
> (or optimally, just clever register allocation).
>
> Unfortunately, GCC currently performs the IOR operation above on -m32,
> and worse allocates DImode registers (split to SImode register pairs)
> for both the zero extended HI and LO values.
>
> Hence, for test1 from the new test case below:
>
> typedef int __v4si __attribute__ ((__vector_size__ (16)));
> long long test1(__v4si v) {
>   unsigned int loVal = (unsigned int)v[0];
>   unsigned int hiVal = (unsigned int)v[1];
>   return (long long)(loVal) | ((long long)(hiVal) << 32);
> }
>
> we currently generate (with -m32 -O2 -msse4.1):
>
> test1:  subl$28, %esp
> pextrd  $1, %xmm0, %eax
> pmovzxdq%xmm0, %xmm1
> movq%xmm1, 8(%esp)
> movl%eax, %edx
> movl8(%esp), %eax
> orl 12(%esp), %edx
> addl$28, %esp
> orb $0, %ah
> ret
>
> with this patch we now generate:
>
> test1:  pextrd  $1, %xmm0, %edx
> movd%xmm0, %eax
> ret
>
> The fix is to recognize and split the idiom (hi<<32)|zext(lo) prior
> to register allocation on !TARGET_64BIT, simplifying this sequence to
> "highpart(dst) = hi; lowpart(dst) = lo".
>
> The one minor complication is that sse.md's define_insn for
> *vec_extractv4si_0_zext_sse4 can sometimes interfere with this
> optimization.  It turns out that on !TARGET_64BIT, the zero_extend:DI
> following vec_select:SI isn't free, and this insn gets split back
> into multiple instructions during later passes, but too late to
> be optimized away by this patch/reload.  Hence the last hunk of
> this patch is to restrict *vec_extractv4si_0_zext_sse4 to TARGET_64BIT.
> Checking PR target/80286, where *vec_extractv4si_0_zext_sse4 was
> first added, this seems reasonable (but this patch has been tested
> both with and without this last change, if it's consider controversial).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without "--target_board='unix{-m32}'"
> with no new failures.  OK for mainline?
>
>
> 2021-12-13  Roger Sayle  
>
> gcc/ChangeLog
> PR target/103611
> * config/i386/i386.md (any_or_plus): New code iterator.
> (define_split): Split (HI<<32)|zext(LO) into piece-wise
> move instructions on !TARGET_64BIT.
> * config/i386/sse.md (*vec_extractv4si_0_zext_sse4):
> Restrict to TARGET_64BIT.
>
> gcc/testsuite/ChangeLog
> PR target/103611
> * gcc.target/i386/pr103611-2.c: New test case.

OK with *vec_extractv4si_0_zext_sse4 change but please also change isa
attribute to:

  [(set_attr "isa" "*,*,avx512f")

Thanks,
Uros.

>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH][2/4][committed] aarch64: Add memmove expansion for +mops

2021-12-15 Thread Christophe Lyon via Gcc-patches
On Mon, Dec 13, 2021 at 3:31 PM Kyrylo Tkachov via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi all,
>
> This second patch in the series adds an inline movmem expansion for
> TARGET_MOPS
> that emits the recommended sequence.
>
> A new param aarch64-mops-memmove-size-threshold is added to control the
> memmove size threshold
> for this expansion. Its default value is zero to be consistent with the
> current behaviour where
> we always emit a libcall, as we don't currently have a movmem inline
> expansion
> (we should add a compatible-everywhere inline expansion, but that's for
> the future), so we should
> always prefer to emit the MOPS sequence when available in lieu of a
> libcall.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Pushing to trunk.
> Thanks,
> Kyrill
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.md (aarch64_movmemdi): Define.
> (movmemdi): Define.
> (unspec): Add UNSPEC_MOVMEM.
> * config/aarch64/aarch64.opt (aarch64-mops-memmove-size-threshold):
> New param.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/mops_2.c: New test.
>


Hi Kyrill,

The new test fails with -mabi=ilp32 too :-)

Thanks,

Christophe


Re: [PATCH][1/4][committed] aarch64: Add support for Armv8.8-a memory operations and memcpy expansion

2021-12-15 Thread Christophe Lyon via Gcc-patches
On Mon, Dec 13, 2021 at 3:29 PM Kyrylo Tkachov via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> Hi all,
>
> This patch adds the +mops architecture extension flag from the 2021 Arm
> Architecture extensions, Armv8.8-a.
> The +mops extensions introduce instructions to accelerate the memcpy,
> memset, memmove standard functions.
> The first patch here uses the instructions in the inline memcpy expansion.
> Further patches in the series will use similar instructions to inline
> memmove and memset.
>
> A new param, aarch64-mops-memcpy-size-threshold, is introduced to control
> the size threshold above which to
> emit the new sequence. Its default setting is 256 bytes, which is the same
> as the current threshold above
> which we'd emit a libcall.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Pushing to trunk.
> Thanks,
> Kyrill
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-option-extensions.def (mops): Define.
> * config/aarch64/aarch64.c (aarch64_expand_cpymem_mops): Define.
> (aarch64_expand_cpymem): Define.
> * config/aarch64/aarch64.h (AARCH64_FL_MOPS): Define.
> (AARCH64_ISA_MOPS): Define.
> (TARGET_MOPS): Define.
> (MOVE_RATIO): Adjust for TARGET_MOPS.
> * config/aarch64/aarch64.md ("unspec"): Add UNSPEC_CPYMEM.
> (aarch64_cpymemdi): New pattern.
> (cpymemdi): Adjust for TARGET_MOPS.
> * config/aarch64/aarch64.opt (aarch64-mops-memcpy-size-threshol):
> New param.
> * doc/invoke.texi (AArch64 Options): Document +mops.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/mops_1.c: New test.
>

Hi Kyrill,

And this test fails with -mabi=ilp32 too, sorry for the delay.

Thanks

Christophe


[PATCH] i386, fab: Optimize __atomic_{add,sub,and,or,xor}_fetch (x, y, z) {==,!=,<,<=,>,>=} 0 [PR98737]

2021-12-15 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 27, 2021 at 12:27:13PM +0100, Ulrich Drepper via Gcc-patches wrote:
> On 1/27/21 11:37 AM, Jakub Jelinek wrote:
> > Would equality comparison against 0 handle the most common cases.
> > 
> > The user can write it as
> > __atomic_sub_fetch (x, y, z) == 0
> > or
> > __atomic_fetch_sub (x, y, z) - y == 0
> > thouch, so the expansion code would need to be able to cope with both.
> 
> Please also keep !=0, <0, <=0, >0, and >=0 in mind.  They all can be
> useful and can be handled with the flags.

<= 0 and > 0 don't really work well with lock {add,sub,inc,dec}, x86 doesn't
have comparisons that would look solely at both SF and ZF and not at other
flags (and emitting two separate conditional jumps or two setcc insns and
oring them together looks awful).

But the rest can work.

Here is a patch that adds internal functions and optabs for these,
recognizes them at the same spot as e.g. .ATOMIC_BIT_TEST_AND* internal
functions (fold all builtins pass) and expands them appropriately (or for
the <= 0 and > 0 cases of +/- FAILs and let's middle-end fall back).

So far I have handled just the op_fetch builtins, IMHO instead of handling
also __atomic_fetch_sub (x, y, z) - y == 0 etc. we should canonicalize
__atomic_fetch_sub (x, y, z) - y to __atomic_sub_fetch (x, y, z) (and vice
versa).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-12-15  Jakub Jelinek  

PR target/98737
* internal-fn.def (ATOMIC_ADD_FETCH_CMP_0, ATOMIC_SUB_FETCH_CMP_0,
ATOMIC_AND_FETCH_CMP_0, ATOMIC_OR_FETCH_CMP_0, ATOMIC_XOR_FETCH_CMP_0):
New internal fns.
* internal-fn.h (ATOMIC_OP_FETCH_CMP_0_EQ, ATOMIC_OP_FETCH_CMP_0_NE,
ATOMIC_OP_FETCH_CMP_0_LT, ATOMIC_OP_FETCH_CMP_0_LE,
ATOMIC_OP_FETCH_CMP_0_GT, ATOMIC_OP_FETCH_CMP_0_GE): New enumerators.
* internal-fn.c (expand_ATOMIC_ADD_FETCH_CMP_0,
expand_ATOMIC_SUB_FETCH_CMP_0, expand_ATOMIC_AND_FETCH_CMP_0,
expand_ATOMIC_OR_FETCH_CMP_0, expand_ATOMIC_XOR_FETCH_CMP_0): New
functions.
* optabs.def (atomic_add_fetch_cmp_0_optab,
atomic_sub_fetch_cmp_0_optab, atomic_and_fetch_cmp_0_optab,
atomic_or_fetch_cmp_0_optab, atomic_xor_fetch_cmp_0_optab): New
direct optabs.
* builtins.h (expand_ifn_atomic_op_fetch_cmp_0): Declare.
* builtins.c (expand_ifn_atomic_op_fetch_cmp_0): New function.
* tree-ssa-ccp.c: Include internal-fn.h.
(optimize_atomic_bit_test_and): Add . before internal fn call
in function comment.  Change return type from void to bool and
return true only if successfully replaced.
(optimize_atomic_op_fetch_cmp_0): New function.
(pass_fold_builtins::execute): Use optimize_atomic_op_fetch_cmp_0
for BUILT_IN_ATOMIC_{ADD,SUB,AND,OR,XOR}_FETCH_{1,2,4,8,16} and
BUILT_IN_SYNC_{ADD,SUB,AND,OR,XOR}_AND_FETCH_{1,2,4,8,16},
for *XOR* ones only if optimize_atomic_bit_test_and failed.
* config/i386/sync.md (atomic__fetch_cmp_0,
atomic__fetch_cmp_0): New define_expand patterns.
(atomic__fetch_cmp_0_1,
atomic__fetch_cmp_0_1): New define_insn patterns.

* gcc.target/i386/pr98737-1.c: New test.
* gcc.target/i386/pr98737-2.c: New test.
* gcc.target/i386/pr98737-3.c: New test.
* gcc.target/i386/pr98737-4.c: New test.
* gcc.target/i386/pr98737-5.c: New test.
* gcc.target/i386/pr98737-6.c: New test.
* gcc.target/i386/pr98737-7.c: New test.

--- gcc/internal-fn.def.jj  2021-11-30 13:26:09.323329485 +0100
+++ gcc/internal-fn.def 2021-12-13 12:12:10.947053554 +0100
@@ -403,6 +403,11 @@ DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF, NULL)
 DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF, NULL)
 DEF_INTERNAL_FN (ATOMIC_COMPARE_EXCHANGE, ECF_LEAF, NULL)
+DEF_INTERNAL_FN (ATOMIC_ADD_FETCH_CMP_0, ECF_LEAF, NULL)
+DEF_INTERNAL_FN (ATOMIC_SUB_FETCH_CMP_0, ECF_LEAF, NULL)
+DEF_INTERNAL_FN (ATOMIC_AND_FETCH_CMP_0, ECF_LEAF, NULL)
+DEF_INTERNAL_FN (ATOMIC_OR_FETCH_CMP_0, ECF_LEAF, NULL)
+DEF_INTERNAL_FN (ATOMIC_XOR_FETCH_CMP_0, ECF_LEAF, NULL)
 
 /* To implement [[fallthrough]].  */
 DEF_INTERNAL_FN (FALLTHROUGH, ECF_LEAF | ECF_NOTHROW, NULL)
--- gcc/internal-fn.h.jj2021-11-30 13:26:09.324329471 +0100
+++ gcc/internal-fn.h   2021-12-13 19:17:03.491728748 +0100
@@ -240,4 +240,13 @@ extern void expand_SHUFFLEVECTOR (intern
 
 extern bool vectorized_internal_fn_supported_p (internal_fn, tree);
 
+enum {
+  ATOMIC_OP_FETCH_CMP_0_EQ = 0,
+  ATOMIC_OP_FETCH_CMP_0_NE = 1,
+  ATOMIC_OP_FETCH_CMP_0_LT = 2,
+  ATOMIC_OP_FETCH_CMP_0_LE = 3,
+  ATOMIC_OP_FETCH_CMP_0_GT = 4,
+  ATOMIC_OP_FETCH_CMP_0_GE = 5
+};
+
 #endif
--- gcc/internal-fn.c.jj2021-12-02 19:41:52.635552695 +0100
+++ gcc/internal-fn.c   2021-12-13 12:19:51.504465053 +0100
@@ -3238,6 +3238,46 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal
   expand_ifn_atomic_compare_exchange (

Re: [PATCH] gcc/diagnostic.c: make -Werror message more helpful

2021-12-15 Thread Richard Sandiford via Gcc-patches
Martin Sebor via Gcc-patches  writes:
> On 12/12/21 3:13 AM, Andrea Monaco via Gcc-patches wrote:
>> 
>> Hello.
>> 
>> 
>> I propose to make that message more verbose.  It sure would have helped
>> me once.  You don't always have a Web search available :)
>
> Warnings turned into errors have the [-Werror=...] tag at the end
> so I'm not sure I see when reiterating -Werror at the end of output
> would be helpful.  Can you explain the circumstances when it would
> have helped you?

Printing -Werror=foo might give the impression that that was the option
that was actually passed.  The message is the same if -Werror=foo was
passed and if -Werror was passed (or something in between, for groups
of options).

The final “all warnings being treated as errors” line is only printed
for -Werror, not -Werror=foo, so I think including -Werror there makes
sense, and is more consistent with the individual error messages.

So personally I think we should take the patch.

Thanks,
Richard

> For what it's worth, a change here that I think might be more useful
> is printing the number of diagnostics of each kind (e.g., 2 warnings
> and 5 errors found).
>
>> Andrea Monaco
>> 
>> 
>> 
>> diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
>> index 4ded1760705..8b67662390e 100644
>> --- a/gcc/diagnostic.c
>> +++ b/gcc/diagnostic.c
>> @@ -156,7 +156,7 @@ default_diagnostic_final_cb (diagnostic_context *context)
>> /* -Werror was given.  */
>> if (context->warning_as_error_requested)
>>  pp_verbatim (context->printer,
>> -_("%s: all warnings being treated as errors"),
>> +_("%s: all warnings being treated as errors (-Werror; 
>> disable with -Wno-error)"),
>
> If this change should move forward, -Werror needs to be quoted
> (e.g., passed as an argument to %qs or surrounded in a pair of
> %< and %> directives).  The "disable with -Wno-error" part
> is superfluous and would not be entirely accurate for warnings
> promoted to errors by #pragma GCC diagnostic (those cannot be
> demoted back to warnings by -Wno-error).
>
> Martin
>
>>   progname);
>> /* At least one -Werror= was given.  */
>> else
>> 


[committed] openmp: Avoid calling operand_equal_p on OMP_CLAUSEs [PR103704]

2021-12-15 Thread Jakub Jelinek via Gcc-patches
Hi!

On OMP_CLAUSEs we reuse TREE_TYPE as CP_OMP_CLAUSE_INFO in the C++ FE.
This confuses the hashing code that operand_equal_p does when checking.
There is really no reason to compare OMP_CLAUSEs against expressions
like captured this, they will never compare equal.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-12-15  Jakub Jelinek  

PR c++/103704
* semantics.c (finish_omp_target_clauses_r): For OMP_CLAUSEs
just walk subtrees.

* g++.dg/gomp/pr103704.C: New test.

--- gcc/cp/semantics.c.jj   2021-12-09 15:37:27.0 +0100
+++ gcc/cp/semantics.c  2021-12-14 15:00:04.163009167 +0100
@@ -9325,6 +9325,9 @@ finish_omp_target_clauses_r (tree *tp, i
   return NULL_TREE;
 }
 
+  if (TREE_CODE (t) == OMP_CLAUSE)
+return NULL_TREE;
+
   if (current_object)
 {
   tree this_expr = TREE_OPERAND (current_object, 0);
--- gcc/testsuite/g++.dg/gomp/pr103704.C.jj 2021-12-14 15:19:12.101684930 
+0100
+++ gcc/testsuite/g++.dg/gomp/pr103704.C2021-12-14 15:18:57.895887034 
+0100
@@ -0,0 +1,19 @@
+// PR c++/103704
+// { dg-do compile }
+
+struct S { int a; };
+
+template 
+struct U : public T {
+  T a;
+  U ()
+  {
+#pragma omp target
+#pragma omp teams
+#pragma omp distribute private(a)
+for (int k = 0; k < 1; ++k)
+  ;
+  }
+};
+
+struct V : public U { V () : U () {} };

Jakub



[PATCH] i386: simplify cpu_feature handling

2021-12-15 Thread Martin Liška

On 12/14/21 17:12, Jakub Jelinek wrote:

I'd use INT_TYPE_SIZE - 1 instead of 31.  Otherwise LGTM.


Installed with that change, thanks.

Moreover, I'm suggesting a simplification:

The patch removes unneeded loops for cpu_features2 and CONVERT_EXPR
that can be simplified with NOP_EXPR.

Survives i386.exp tests, may I install the patch after testing or
is it a stage1 material?

Thanks,
MartinFrom 9fa89df81b3e6cb56f6ab59b0993168e7a048489 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 15 Dec 2021 10:54:23 +0100
Subject: [PATCH] i386: simplify cpu_feature handling

The patch removes unneeded loops for cpu_features2 and CONVERT_EXPR
that can be simplified with NOP_EXPR.

gcc/ChangeLog:

	* common/config/i386/cpuinfo.h (has_cpu_feature): Directly
	compute index in cpu_features2.
	(set_cpu_feature): Likewise.
	* config/i386/i386-builtins.c (fold_builtin_cpu): Also remove
	loop for cpu_features2 and use NOP_EXPRs.
---
 gcc/common/config/i386/cpuinfo.h | 50 +++-
 gcc/config/i386/i386-builtins.c  | 79 
 2 files changed, 67 insertions(+), 62 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index bbf29bdb116..dd321920108 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -55,43 +55,49 @@ struct __processor_model2
 static inline int
 has_cpu_feature (struct __processor_model *cpu_model,
 		 unsigned int *cpu_features2,
-		 enum processor_features f)
+		 enum processor_features feature)
 {
-  unsigned int i;
+  unsigned index, offset;
+  unsigned f = feature;
+
   if (f < 32)
 {
   /* The first 32 features.  */
-  return cpu_model->__cpu_features[0] & (1U << (f & 31));
+  return cpu_model->__cpu_features[0] & (1U << f);
+}
+  else
+{
+  /* The rest of features.  cpu_features2[i] contains features from
+	 (32 + i * 32) to (31 + 32 + i * 32), inclusively.  */
+  f -= 32;
+  index = f / 32;
+  offset = f % 32;
+  return cpu_features2[index] & (1U << offset);
 }
-  /* The rest of features.  cpu_features2[i] contains features from
- (32 + i * 32) to (31 + 32 + i * 32), inclusively.  */
-  for (i = 0; i < SIZE_OF_CPU_FEATURES; i++)
-if (f < (32 + 32 + i * 32))
-return cpu_features2[i] & (1U << ((f - (32 + i * 32)) & 31));
-  gcc_unreachable ();
 }
 
 static inline void
 set_cpu_feature (struct __processor_model *cpu_model,
 		 unsigned int *cpu_features2,
-		 enum processor_features f)
+		 enum processor_features feature)
 {
-  unsigned int i;
+  unsigned index, offset;
+  unsigned f = feature;
+
   if (f < 32)
 {
   /* The first 32 features.  */
-  cpu_model->__cpu_features[0] |= (1U << (f & 31));
-  return;
+  cpu_model->__cpu_features[0] |= (1U << f);
+}
+  else
+{
+  /* The rest of features.  cpu_features2[i] contains features from
+	 (32 + i * 32) to (31 + 32 + i * 32), inclusively.  */
+  f -= 32;
+  index = f / 32;
+  offset = f % 32;
+  cpu_features2[index] |= (1U << offset);
 }
-  /* The rest of features.  cpu_features2[i] contains features from
- (32 + i * 32) to (31 + 32 + i * 32), inclusively.  */
-  for (i = 0; i < SIZE_OF_CPU_FEATURES; i++)
-if (f < (32 + 32 + i * 32))
-  {
-	cpu_features2[i] |= (1U << ((f - (32 + i * 32)) & 31));
-	return;
-  }
-  gcc_unreachable ();
 }
 
 /* Get the specific type of AMD CPU and return AMD CPU name.  Return
diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
index 4b30cf75c26..31e034e1bc9 100644
--- a/gcc/config/i386/i386-builtins.c
+++ b/gcc/config/i386/i386-builtins.c
@@ -2275,7 +2275,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
   /* Check the value.  */
   final = build2 (EQ_EXPR, unsigned_type_node, ref,
 		  build_int_cstu (unsigned_type_node, field_val));
-  return build1 (CONVERT_EXPR, integer_type_node, final);
+  return build1 (NOP_EXPR, integer_type_node, final);
 }
   else if (fn_code == IX86_BUILTIN_CPU_SUPPORTS)
 {
@@ -2300,7 +2300,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
 	  return integer_zero_node;
 	}
 
-  if (isa_names_table[i].feature >= 32)
+  unsigned feature = isa_names_table[i].feature;
+  if (feature >= INT_TYPE_SIZE)
 	{
 	  if (ix86_cpu_features2_var == nullptr)
 	{
@@ -2318,46 +2319,44 @@ fold_builtin_cpu (tree fndecl, tree *args)
 	  varpool_node::add (ix86_cpu_features2_var);
 	}
 
-	  for (unsigned int j = 0; j < SIZE_OF_CPU_FEATURES; j++)
-	if (isa_names_table[i].feature < (32 + 32 + j * 32))
-	  {
-		field_val = (1U << (isa_names_table[i].feature
-- (32 + j * 32)));
-		tree index = size_int (j);
-		array_elt = build4 (ARRAY_REF, unsigned_type_node,
-ix86_cpu_features2_var,
-index, NULL_TREE, NULL_TREE);
-		/* Return __cpu_features2[index] & field_val  */
-		final = build2 (BIT_AND_EXPR, unsigned_type_node,
-array_elt,
-build_int_cstu (unsigned_type_node,

Re: [PATCH] c++: processing_template_decl vs template depth [PR103408]

2021-12-15 Thread Jakub Jelinek via Gcc-patches
On Mon, Dec 13, 2021 at 04:28:26PM -0500, Patrick Palka via Gcc-patches wrote:
>   * g++.dg/concepts/diagnostic18.C: Expect a "constraints on a
>   non-templated function" error.
>   * g++.dg/cpp23/auto-fncast10.C: New test.

This test fails:
+FAIL: g++.dg/cpp23/auto-fncast11.C  -std=c++2b  (test for errors, line 19)
+FAIL: g++.dg/cpp23/auto-fncast11.C  -std=c++2b (test for excess errors)
because the regex in dg-error was missing an indefinite article.

Tested with
GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ 
RUNTESTFLAGS="dg.exp=auto-fncast11.C"
and committed to trunk as obvious:

2021-12-15  Jakub Jelinek  

PR c++/103408
* g++.dg/cpp23/auto-fncast11.C: Fix expected diagnostic wording.

--- gcc/testsuite/g++.dg/cpp23/auto-fncast11.C.jj   2021-12-14 
18:40:21.399133909 +0100
+++ gcc/testsuite/g++.dg/cpp23/auto-fncast11.C  2021-12-15 11:16:36.401592355 
+0100
@@ -16,4 +16,4 @@ static_assert(requires { requires auto(a
 static_assert(!requires { requires auto(false); });
 static_assert(!requires { requires auto(auto(false)); });
 
-auto f() requires (auto(false)); // { dg-error "constraints on non-templated" }
+auto f() requires (auto(false)); // { dg-error "constraints on a 
non-templated" }


Jakub



Re: [PATCH] Verbose support in analyze_brprob_spec

2021-12-15 Thread Martin Liška

On 12/15/21 02:58, Xionghu Luo wrote:

Also add verbose argument support like analyze_brprob.py

contrib/ChangeLog:

* analyze_brprob_spec.py: Add verbose argument.
---
  contrib/analyze_brprob_spec.py | 1 +
  1 file changed, 1 insertion(+)

diff --git a/contrib/analyze_brprob_spec.py b/contrib/analyze_brprob_spec.py
index e621853ba4e..063bd11d99c 100755
--- a/contrib/analyze_brprob_spec.py
+++ b/contrib/analyze_brprob_spec.py
@@ -31,6 +31,7 @@ parser.add_argument('-s', '--sorting', dest = 'sorting',
  choices = ['branches', 'branch-hitrate', 'hitrate', 'coverage', 'name'],
  default = 'branches')
  parser.add_argument('-d', '--def-file', help = 'path to predict.def')
+parser.add_argument('-v', '--verbose', action = 'store_true', help = 'Print 
verbose informations')


Hello.

Is the argument properly passes to invocation of analyze_brprob.py script?
If so, then please install the patch.

Cheers,
Martin

  
  args = parser.parse_args()
  





Re: [PATCH] i386, fab: Optimize __atomic_{add,sub,and,or,xor}_fetch (x, y, z) {==,!=,<,<=,>,>=} 0 [PR98737]

2021-12-15 Thread Uros Bizjak via Gcc-patches
On Wed, Dec 15, 2021 at 10:23 AM Jakub Jelinek  wrote:
>
> On Wed, Jan 27, 2021 at 12:27:13PM +0100, Ulrich Drepper via Gcc-patches 
> wrote:
> > On 1/27/21 11:37 AM, Jakub Jelinek wrote:
> > > Would equality comparison against 0 handle the most common cases.
> > >
> > > The user can write it as
> > > __atomic_sub_fetch (x, y, z) == 0
> > > or
> > > __atomic_fetch_sub (x, y, z) - y == 0
> > > thouch, so the expansion code would need to be able to cope with both.
> >
> > Please also keep !=0, <0, <=0, >0, and >=0 in mind.  They all can be
> > useful and can be handled with the flags.
>
> <= 0 and > 0 don't really work well with lock {add,sub,inc,dec}, x86 doesn't
> have comparisons that would look solely at both SF and ZF and not at other
> flags (and emitting two separate conditional jumps or two setcc insns and
> oring them together looks awful).
>
> But the rest can work.
>
> Here is a patch that adds internal functions and optabs for these,
> recognizes them at the same spot as e.g. .ATOMIC_BIT_TEST_AND* internal
> functions (fold all builtins pass) and expands them appropriately (or for
> the <= 0 and > 0 cases of +/- FAILs and let's middle-end fall back).
>
> So far I have handled just the op_fetch builtins, IMHO instead of handling
> also __atomic_fetch_sub (x, y, z) - y == 0 etc. we should canonicalize
> __atomic_fetch_sub (x, y, z) - y to __atomic_sub_fetch (x, y, z) (and vice
> versa).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-12-15  Jakub Jelinek  
>
> PR target/98737
> * internal-fn.def (ATOMIC_ADD_FETCH_CMP_0, ATOMIC_SUB_FETCH_CMP_0,
> ATOMIC_AND_FETCH_CMP_0, ATOMIC_OR_FETCH_CMP_0, 
> ATOMIC_XOR_FETCH_CMP_0):
> New internal fns.
> * internal-fn.h (ATOMIC_OP_FETCH_CMP_0_EQ, ATOMIC_OP_FETCH_CMP_0_NE,
> ATOMIC_OP_FETCH_CMP_0_LT, ATOMIC_OP_FETCH_CMP_0_LE,
> ATOMIC_OP_FETCH_CMP_0_GT, ATOMIC_OP_FETCH_CMP_0_GE): New enumerators.
> * internal-fn.c (expand_ATOMIC_ADD_FETCH_CMP_0,
> expand_ATOMIC_SUB_FETCH_CMP_0, expand_ATOMIC_AND_FETCH_CMP_0,
> expand_ATOMIC_OR_FETCH_CMP_0, expand_ATOMIC_XOR_FETCH_CMP_0): New
> functions.
> * optabs.def (atomic_add_fetch_cmp_0_optab,
> atomic_sub_fetch_cmp_0_optab, atomic_and_fetch_cmp_0_optab,
> atomic_or_fetch_cmp_0_optab, atomic_xor_fetch_cmp_0_optab): New
> direct optabs.
> * builtins.h (expand_ifn_atomic_op_fetch_cmp_0): Declare.
> * builtins.c (expand_ifn_atomic_op_fetch_cmp_0): New function.
> * tree-ssa-ccp.c: Include internal-fn.h.
> (optimize_atomic_bit_test_and): Add . before internal fn call
> in function comment.  Change return type from void to bool and
> return true only if successfully replaced.
> (optimize_atomic_op_fetch_cmp_0): New function.
> (pass_fold_builtins::execute): Use optimize_atomic_op_fetch_cmp_0
> for BUILT_IN_ATOMIC_{ADD,SUB,AND,OR,XOR}_FETCH_{1,2,4,8,16} and
> BUILT_IN_SYNC_{ADD,SUB,AND,OR,XOR}_AND_FETCH_{1,2,4,8,16},
> for *XOR* ones only if optimize_atomic_bit_test_and failed.
> * config/i386/sync.md (atomic__fetch_cmp_0,
> atomic__fetch_cmp_0): New define_expand patterns.
> (atomic__fetch_cmp_0_1,
> atomic__fetch_cmp_0_1): New define_insn patterns.
>
> * gcc.target/i386/pr98737-1.c: New test.
> * gcc.target/i386/pr98737-2.c: New test.
> * gcc.target/i386/pr98737-3.c: New test.
> * gcc.target/i386/pr98737-4.c: New test.
> * gcc.target/i386/pr98737-5.c: New test.
> * gcc.target/i386/pr98737-6.c: New test.
> * gcc.target/i386/pr98737-7.c: New test.

OK (with a small adjustment) for the x86 part.

Thanks,
Uros.

>
> --- gcc/internal-fn.def.jj  2021-11-30 13:26:09.323329485 +0100
> +++ gcc/internal-fn.def 2021-12-13 12:12:10.947053554 +0100
> @@ -403,6 +403,11 @@ DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_SET
>  DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_COMPLEMENT, ECF_LEAF, NULL)
>  DEF_INTERNAL_FN (ATOMIC_BIT_TEST_AND_RESET, ECF_LEAF, NULL)
>  DEF_INTERNAL_FN (ATOMIC_COMPARE_EXCHANGE, ECF_LEAF, NULL)
> +DEF_INTERNAL_FN (ATOMIC_ADD_FETCH_CMP_0, ECF_LEAF, NULL)
> +DEF_INTERNAL_FN (ATOMIC_SUB_FETCH_CMP_0, ECF_LEAF, NULL)
> +DEF_INTERNAL_FN (ATOMIC_AND_FETCH_CMP_0, ECF_LEAF, NULL)
> +DEF_INTERNAL_FN (ATOMIC_OR_FETCH_CMP_0, ECF_LEAF, NULL)
> +DEF_INTERNAL_FN (ATOMIC_XOR_FETCH_CMP_0, ECF_LEAF, NULL)
>
>  /* To implement [[fallthrough]].  */
>  DEF_INTERNAL_FN (FALLTHROUGH, ECF_LEAF | ECF_NOTHROW, NULL)
> --- gcc/internal-fn.h.jj2021-11-30 13:26:09.324329471 +0100
> +++ gcc/internal-fn.h   2021-12-13 19:17:03.491728748 +0100
> @@ -240,4 +240,13 @@ extern void expand_SHUFFLEVECTOR (intern
>
>  extern bool vectorized_internal_fn_supported_p (internal_fn, tree);
>
> +enum {
> +  ATOMIC_OP_FETCH_CMP_0_EQ = 0,
> +  ATOMIC_OP_FETCH_CMP_0_NE = 1,
> +  ATOMIC_OP_FETCH_CMP_0_LT = 2,
> +  ATOMIC_OP_FETCH_CMP_0_LE = 3,
> + 

[PATCH][pushed] c++: Fix warning word splitting [PR103713]

2021-12-15 Thread Martin Liška

Fix warning word splitting.

Pushed as obvious.
Martin

PR c++/103713

gcc/cp/ChangeLog:

* tree.c (maybe_warn_parm_abi): Fix warning word splitting.
---
 gcc/cp/tree.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index f6f7927f293..284fb5f4b2a 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -4371,8 +4371,9 @@ maybe_warn_parm_abi (tree t, location_t loc)
"the calling convention for %qT, which was "
"accidentally changed in 8.1", t);
   else
-   w = warning_at (loc, OPT_Wabi, "%<-fabi-version=12%> (GCC 8.1) accident"
-   "ally changes the calling convention for %qT", t);
+   w = warning_at (loc, OPT_Wabi, "%<-fabi-version=12%> (GCC 8.1) "
+   "accidentally changes the calling convention for %qT",
+   t);
   if (w)
inform (location_of (t), " declared here");
   return;
--
2.34.1



Re: [PATCH]AArch64 Fix the AAPCs for new partial and full SIMD structure types [PR103094]

2021-12-15 Thread Richard Sandiford via Gcc-patches
Richard Sandiford via Gcc-patches  writes:
> Tamar Christina  writes:
>> Hi All,
>>
>> The new partial and full vector types added to AArch64, e.g.
>>
>> int8x8x2_t with mode V2x8QI are incorrectly being defined as being short
>> vectors and not being composite types.
>>
>> This causes the layout code to incorrectly conclude that the registers are
>> packed. i.e. for V2x8QI it thinks those 16-bytes are in the same registers.
>>
>> Because of this the code under !aarch64_composite_type_p is unreachable but 
>> also
>> lacked any extra checks to see that nregs is what we expected it to be.
>>
>> I have also updated aarch64_advsimd_full_struct_mode_p and 
>> aarch64_advsimd_partial_struct_mode_p to only consider vector types as struct
>> modes.  Otherwise types such as OImode and friends would qualify leading to
>> incorrect results.
>
> How easy would it be to fix the bug without doing this last bit?
> The idea was that OI, CI and XI should continue to be structure
> modes until we remove them.  aarch64_advsimd_partial_struct_mode_p
> and aarch64_advsimd_full_struct_mode_p are meant to be convenience
> wrappers and so they shouldn't make different decisions from the
> underlying aarch64_classify_vector_mode.
>
>>
>> This patch fixes up the issues and we now generate correct code.
>>
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>
>> Ok for master?
>>
>> Thanks,
>> Tamar
>>
>>
>>
>> gcc/ChangeLog:
>>
>>  PR target/103094
>>  * config/aarch64/aarch64.c (aarch64_function_value, aarch64_layout_arg):
>>  Fix unreachable code for partial vectors and re-order switch to perform
>>  the simplest test first.
>>  (aarch64_short_vector_p): Mark as not short vectors.
>>  (aarch64_composite_type_p): Mark as composite types.
>>  (aarch64_advsimd_partial_struct_mode_p,
>>  aarch64_advsimd_full_struct_mode_p): Restrict to actual SIMD types.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  PR target/103094
>>  * gcc.target/aarch64/pr103094.c: New test.
>>
>> --- inline copy of patch -- 
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 
>> fdf05505846721b02059df494d6395ae9423a8ef..d9104ddac3cdd44f7c2290b8725d05be4fd6468f
>>  100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -3055,15 +3055,17 @@ aarch64_advsimd_struct_mode_p (machine_mode mode)
>>  static bool
>>  aarch64_advsimd_partial_struct_mode_p (machine_mode mode)
>>  {
>> -  return (aarch64_classify_vector_mode (mode)
>> -  == (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
>> +  return VECTOR_MODE_P (mode)
>> + && (aarch64_classify_vector_mode (mode)
>> +== (VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL));
>>  }
>>  
>>  /* Return true if MODE is an Advanced SIMD Q-register structure mode.  */
>>  static bool
>>  aarch64_advsimd_full_struct_mode_p (machine_mode mode)
>>  {
>> -  return (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD | 
>> VEC_STRUCT));
>> +  return VECTOR_MODE_P (mode)
>> + && (aarch64_classify_vector_mode (mode) == (VEC_ADVSIMD | VEC_STRUCT));
>>  }
>>  
>>  /* Return true if MODE is any of the data vector modes, including
>> @@ -6468,17 +6470,21 @@ aarch64_function_value (const_tree type, const_tree 
>> func,
>> NULL, false))
>>  {
>>gcc_assert (!sve_p);
>> -  if (!aarch64_composite_type_p (type, mode))
>> +  if (aarch64_advsimd_full_struct_mode_p (mode))
>> +{
>> +  gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 16), count));
>> +  return gen_rtx_REG (mode, V0_REGNUM);
>> +}
>> +  else if (aarch64_advsimd_partial_struct_mode_p (mode))
>> +{
>> +  gcc_assert (known_eq (exact_div (GET_MODE_SIZE (mode), 8), count));
>> +  return gen_rtx_REG (mode, V0_REGNUM);
>> +}
>> +  else if (!aarch64_composite_type_p (type, mode))
>>  {
>>gcc_assert (count == 1 && mode == ag_mode);
>>return gen_rtx_REG (mode, V0_REGNUM);
>>  }
>> -  else if (aarch64_advsimd_full_struct_mode_p (mode)
>> -   && known_eq (GET_MODE_SIZE (ag_mode), 16))
>> -return gen_rtx_REG (mode, V0_REGNUM);
>> -  else if (aarch64_advsimd_partial_struct_mode_p (mode)
>> -   && known_eq (GET_MODE_SIZE (ag_mode), 8))
>> -return gen_rtx_REG (mode, V0_REGNUM);
>>else
>>  {
>>int i;
>> @@ -6745,6 +6751,7 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const 
>> function_arg_info &arg)
>>  /* No frontends can create types with variable-sized modes, so we
>> shouldn't be asked to pass or return them.  */
>>  size = GET_MODE_SIZE (mode).to_constant ();
>> +
>>size = ROUND_UP (size, UNITS_PER_WORD);
>>  
>>allocate_ncrn = (type) ? !(FLOAT_TYPE_P (type)) : !FLOAT_MODE_P (mode);
>> @@ -6769,17 +6776,21 @@ aarch64_layout_arg (cumulative_args_t pcum_v, const 
>> function_arg_info &arg)
>>if (nvrn + nregs <= NUM_FP_ARG_REGS)
>>  {
>>pcum->aapcs

[PATCH] testsuite: fix vect.exp ASAN errors

2021-12-15 Thread Martin Liška

The patch fixes a few ASAN errors (global variable access out of bounds).

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-simd-18.c: Fix ASAN error.
* gcc.dg/vect/vect-simd-19.c: Likewise.
* gcc.dg/vect/vect-simd-20.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/vect-simd-18.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-simd-19.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-simd-20.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-18.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-18.c
index b25f5a5cd31..cca350f5c21 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-18.c
@@ -17,7 +17,7 @@ foo (int s, int *p)
   return r;
 }
 
-int p[1 / 78];

+int p[1 / 78 + 1];
 
 int

 main ()
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-19.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-19.c
index a71dfa676d8..67e25c0e07e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-19.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-19.c
@@ -17,7 +17,7 @@ foo (int s, int m, int n, int *p)
   return r;
 }
 
-int p[1 / 78];

+int p[1 / 78 + 1];
 
 int

 main ()
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-20.c 
b/gcc/testsuite/gcc.dg/vect/vect-simd-20.c
index c85f05f61c6..35546ba5a23 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-20.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-20.c
@@ -18,7 +18,7 @@ foo (int s, int m, int n, int *p)
   return r;
 }
 
-int p[1 / 78 * 7];

+int p[(1 / 78 + 1) * 7];
 
 int

 main ()
--
2.34.1



Re: [PATCH] c++: processing_template_decl vs template depth [PR103408]

2021-12-15 Thread Patrick Palka via Gcc-patches
On Wed, 15 Dec 2021, Jakub Jelinek wrote:

> On Mon, Dec 13, 2021 at 04:28:26PM -0500, Patrick Palka via Gcc-patches wrote:
> > * g++.dg/concepts/diagnostic18.C: Expect a "constraints on a
> > non-templated function" error.
> > * g++.dg/cpp23/auto-fncast10.C: New test.
> 
> This test fails:
> +FAIL: g++.dg/cpp23/auto-fncast11.C  -std=c++2b  (test for errors, line 19)
> +FAIL: g++.dg/cpp23/auto-fncast11.C  -std=c++2b (test for excess errors)
> because the regex in dg-error was missing an indefinite article.
> 
> Tested with
> GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ 
> RUNTESTFLAGS="dg.exp=auto-fncast11.C"
> and committed to trunk as obvious:

Oops, thanks Jakub, I didn't realize we don't run the testsuite with
-std=c++23 yet.

I guess it'd be too expensive to add another std to the testing matrix
at this point, but I wonder if the test harness should at least run the
testcases inside cpp23/ with -std=c++23?  Something like the following
seems to work.

(And since -std=c++11 also isn't part of the default testing matrix
anymore, perhaps we could give the testscases inside cpp0x/ a similar
treatment too?)

-- >8 --

Subject: [PATCH] testsuite: run testcases in g++.dg/cpp23/ with -std=c++23

gcc/testsuite/ChangeLog:

* lib/g++-dg.exp (g++-dg-runtest): Add -std=c++23 to option_list
for testcases in cpp23/.
---
 gcc/testsuite/lib/g++-dg.exp | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/g++-dg.exp b/gcc/testsuite/lib/g++-dg.exp
index fd06d278faa..79fe3db014e 100644
--- a/gcc/testsuite/lib/g++-dg.exp
+++ b/gcc/testsuite/lib/g++-dg.exp
@@ -38,6 +38,8 @@ proc g++-dg-runtest { testcases flags default-extra-flags } {
continue
}
 
+   set nshort [file tail [file dirname $test]]/[file tail $test]
+
# If the testcase specifies a standard, use that one.
# If not, run it under both standards, allowing GNU extensions
# if there's a dg-options line.
@@ -61,12 +63,13 @@ proc g++-dg-runtest { testcases flags default-extra-flags } 
{
} elseif { $x eq "impcx" } then { set x "23 
-fimplicit-constexpr" }
lappend option_list "${std_prefix}$x"
}
+   if [string match "cpp23/*" $nshort] {
+   lappend option_list "${std_prefix}23"
+   }
} else {
set option_list { "" }
}
 
-   set nshort [file tail [file dirname $test]]/[file tail $test]
-
foreach flags_t $option_list {
verbose "Testing $nshort, $flags $flags_t" 1
dg-test $test "$flags $flags_t" ${default-extra-flags}
-- 
2.34.1.182.ge773545c7f



Re: [PATCH] rs6000: __builtin_darn[_raw] should be in [power9-64] (PR103624)

2021-12-15 Thread Bill Schmidt via Gcc-patches


On 12/14/21 8:23 PM, Segher Boessenkool wrote:
> On Tue, Dec 14, 2021 at 07:32:30AM -0600, Bill Schmidt wrote:
>> On 12/13/21 6:22 PM, Segher Boessenkool wrote:
>>> On Mon, Dec 13, 2021 at 02:37:43PM -0600, Bill Schmidt wrote:
 On 12/13/21 10:54 AM, Segher Boessenkool wrote:
> On Mon, Dec 13, 2021 at 11:30:28AM -0500, David Edelsohn wrote:
>> On Mon, Dec 13, 2021 at 10:48 AM Bill Schmidt  
>> wrote:
>>> PR103624 observes that we get segfaults for the 64-bit darn builtins 
>>> when compiled
>>> on a 32-bit architecture.  The old built-in infrastructure requires 
>>> TARGET_64BIT, and
>>> this was missed in the new support.  Moving these two builtins from the 
>>> [power9]
>>> stanza to the [power9-64] stanza solves the problem.
>>>
>>> Tested the fix on a powerpc-e300c3-linux-gnu cross.  Bootstrapped and 
>>> tested on
>>> powerpc64le-linux-gnu with no regressions.  Is this okay for trunk?
>> Okay.
> No, as I said before this is not correct, not without a lot more
> explanation at least.  We should not copy errors in the old code into
> the new code.  That is negating one of the main advantages of
> reimplementing this in the first place!
 Can you please be more specific?

 All I have from you before is "It should work for 32-bit though?"  I 
 responded in the
 bug report that __builtin_darn_32 was used for this purpose.  I haven't 
 seen a
 response to that.  What do you want to see happen?
>>> That of course does not work for _raw.
>>>
>>> These builtins should just return a "long", just like __builtin_ppc_mftb
>>> does.  All three of them.
>> Well, that seems wrong for __builtin_darn_32, which maps to an SImode 
>> pattern.
> That is Yet Another Bug, then.
>
> The insn returns a full register.  The patterns should use either :P or
> :GPR (the latter if SImode makes sense for it, so we could have that for
> all darn variants).  :DI and :SI never make sense for this.
>
>> So, I assume what you'd like to see is for the other two built-ins to return
>> long, and for the "&& TARGET_64BIT" to be removed from the darn_raw and darn
>> patterns?
> No, all builtins should work in either mode, and always return long.
> If the patterns are broken, the *patterns* should be fixed :-)


OK, thanks!  This is much clearer now.

I've opened an internal issue about the deficiencies of the darn patterns and
their associated built-ins.  In response to PR103624, I would like to start
with the existing patch to ensure the new support mirrors what we had before,
so we have that as a baseline.  We can then move on to fixing the larger
set of problems.  Is that a reasonable plan?

Thanks!
Bill

>
>>> Avoiding ICEs should not be a goal.  It should be a side effect of doing
>>> the right thing in the first place!
>> There's no reason to get snippy.  Given that you approved Kelvin's original
>> implementation of the darn patterns and built-in functions, I think I can be
>> forgiven for thinking that those were the desired semantics. :-)
> Sorry if I sound annoyed.  I am annoyed, but not with you.  Just with
> the world in general I suppose.
>
> With the new builtins representation it is much easier to spot problems,
> it is a great success already!
>
>
> Segher


[committed][nvptx] Add -mptx=7.0

2021-12-15 Thread Tom de Vries via Gcc-patches
Hi,

Add support for ptx isa version 7.0, required for the addition of -misa=sm_75
and -misa=sm_80.

Tested by setting the default ptx isa version to 7.0, and doing a build and
libgomp test run.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add -mptx=7.0

gcc/ChangeLog:

* config/nvptx/nvptx-opts.h (enum ptx_version): Add PTX_VERSION_7_0.
* config/nvptx/nvptx.c (nvptx_file_start): Handle TARGET_PTX_7_0.
* config/nvptx/nvptx.h (TARGET_PTX_7_0): New macro.
* config/nvptx/nvptx.opt (ptx_version): Add 7.0.

---
 gcc/config/nvptx/nvptx-opts.h | 3 ++-
 gcc/config/nvptx/nvptx.c  | 4 +++-
 gcc/config/nvptx/nvptx.h  | 1 +
 gcc/config/nvptx/nvptx.opt| 3 +++
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index f7371dc274c..396fe871163 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -30,7 +30,8 @@ enum ptx_isa
 enum ptx_version
 {
   PTX_VERSION_3_1,
-  PTX_VERSION_6_3
+  PTX_VERSION_6_3,
+  PTX_VERSION_7_0
 };
 
 #endif
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 445d7ce8cc9..51eef2b45b2 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5404,7 +5404,9 @@ static void
 nvptx_file_start (void)
 {
   fputs ("// BEGIN PREAMBLE\n", asm_out_file);
-  if (TARGET_PTX_6_3)
+  if (TARGET_PTX_7_0)
+fputs ("\t.version\t7.0\n", asm_out_file);
+  else if (TARGET_PTX_6_3)
 fputs ("\t.version\t6.3\n", asm_out_file);
   else
 fputs ("\t.version\t3.1\n", asm_out_file);
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index c3480cc1c26..92fd9d3b6d1 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -90,6 +90,7 @@
 #define TARGET_SM53 (ptx_isa_option >= PTX_ISA_SM53)
 
 #define TARGET_PTX_6_3 (ptx_version_option >= PTX_VERSION_6_3)
+#define TARGET_PTX_7_0 (ptx_version_option >= PTX_VERSION_7_0)
 
 /* Registers.  Since ptx is a virtual target, we just define a few
hard registers for special purposes and leave pseudos unallocated.
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 514f19d171e..04b45da9249 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -79,6 +79,9 @@ Enum(ptx_version) String(3.1) Value(PTX_VERSION_3_1)
 EnumValue
 Enum(ptx_version) String(6.3) Value(PTX_VERSION_6_3)
 
+EnumValue
+Enum(ptx_version) String(7.0) Value(PTX_VERSION_7_0)
+
 mptx=
 Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option) 
Init(PTX_VERSION_3_1)
 Specify the version of the ptx version to use.


Re: [PATCH] nvptx: Adds uses of -misa=sm_75 and -misa=sm_80

2021-12-15 Thread Tom de Vries via Gcc-patches
On 9/17/21 5:41 PM, Roger Sayle wrote:
> 
> This patch adds upon my previous patch to prototype HFmode support on
> nvptx, which includes adding new target macros TARGET_SM75 and TARGET_SM80.

I've mode those parts into this patch.

> Tobias Burnus has questioned "whether it makes sense to add those
> flags if no use is made of those flags".  I had hoped that it might
> be possible to split these patch submissions into smaller parts to
> assist the review process, but failing that, here's part 2, that
> adds support for __builtin_tanhf, HFmode exp2/tanh and also
> for HFmode min/max, controlled by TARGET_SM75 and TARGET_SM80 respectively.
> 
> The following has been tested on nvptx-none, hosted on x86_64-pc-linux-gnu
> (on top of my previous patch) with a "make" and "make -k check" with no
> new failures.  Please ignore the hunks in the git diff that were described
> in the previous patch (hopefully I'll be able to resume submitting
> patches sequentially in future).  Are both parts Ok for mainline?
> 
> 

Committed.

I've used mptx=7.0 in the test-cases, since that's required.

That doesn't become apparent though unless dg-do assemble is used
instead of dg-do compile, but I've left that as is for now.  To deal
with this properly will require adding some required target testing of
what is supported by ptxas, and then choosing between dg-do assemble and
compile based on what is supported, and that all looks involved enough
to treat as a separate issue.

Thanks,
- Tom

> 2020-09-17  Roger Sayle  
> 
> gcc/ChangeLog
>   * config/nvptx/nvptx.md (define_c_enum "unspec"): New UNSPEC_TANH.
>   (define_mode_iterator HSFM): New iterator for HFmode and SFmode.
>   (exp2hf2): New define_insn controlled by TARGET_SM75.
>   (tanh2): New define_insn controlled by TARGET_SM75.
>   (sminhf3, smaxhf3): New define_isnns controlled by TARGET_SM80.
> 
> gcc/testsuite/ChangeLog
>   * gcc.target/nvptx/float16-2.c: New test case.
>   * gcc.target/nvptx/tanh-1.c: New test case.
> 
> Roger
> --
> 
> 
> -Original Message-
> From: Tobias Burnus  
> Sent: 17 September 2021 09:25
> To: Roger Sayle ; 'GCC Patches'
> ; Tom de Vries 
> Subject: Re: [PATCH] nvptx: Add (experimental) support for HFmode with
> -misa=sm_53
> 
> Hi Roger,
> 
> some more generic remarks not specific to using new ISA features.
> 
> On 17.09.21 00:53, Roger Sayle wrote:
> 
>> Whilst there I also added -misa=sm_75 and -misa=sm_80 which are points 
>> where other useful instructions were added to the ISA.
> 
> First, my impression was that already sm_70 added lots of useful stuff, but
> granted sm_75 adds some more. In any case, the question is whether it makes
> sense to add those flags if no use is made of those flags.
> 
> In particular, sm_80 is according to the following webpage only supported
> with PTX ISA 7.0 of CUDA 11.0. But GCC currently only supports
> -mptx=3.6 (default) and -mptx=6.3 (= CUDA 10).
> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-no
> tes
> 
> Note that you missed to update gcc/config/nvptx/t-omp-device for the new
> sm_*  and likewise the "-misa=@var{ISA-string}" section in
> gcc/gcc/doc/invoke.texi.
> 
> Additionally, I wonder whether the preprocessor macros __nvptx__,
> __nvptx_softstack__, __nvptx_unisimt__ and __PTX_SM__  should be documented
> somewhere as well. As all but one are related to command-line options, I
> wonder whether the respective section in invoke.texi would be a good place
> for them.
> 
> Tobias
> 
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
> 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
> Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
> Registergericht München, HRB 106955
> 


Re: [PATCH] c++: processing_template_decl vs template depth [PR103408]

2021-12-15 Thread Jakub Jelinek via Gcc-patches
On Wed, Dec 15, 2021 at 08:58:45AM -0500, Patrick Palka wrote:
> Oops, thanks Jakub, I didn't realize we don't run the testsuite with
> -std=c++23 yet.
> 
> I guess it'd be too expensive to add another std to the testing matrix
> at this point, but I wonder if the test harness should at least run the
> testcases inside cpp23/ with -std=c++23?  Something like the following
> seems to work.
> 
> (And since -std=c++11 also isn't part of the default testing matrix
> anymore, perhaps we could give the testscases inside cpp0x/ a similar
> treatment too?)

I think up to Jason, but I'd say if we do it, we should do it for all those
language version subdirectories and make sure we only add those extra modes
temporarily (for that subdir files only) and only if they aren't already
present in the list we cycle through (to avoid running it e.g. with
-std=c++23 twice).

> Subject: [PATCH] testsuite: run testcases in g++.dg/cpp23/ with -std=c++23
> 
> gcc/testsuite/ChangeLog:
> 
>   * lib/g++-dg.exp (g++-dg-runtest): Add -std=c++23 to option_list
>   for testcases in cpp23/.
> ---
>  gcc/testsuite/lib/g++-dg.exp | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/lib/g++-dg.exp b/gcc/testsuite/lib/g++-dg.exp
> index fd06d278faa..79fe3db014e 100644
> --- a/gcc/testsuite/lib/g++-dg.exp
> +++ b/gcc/testsuite/lib/g++-dg.exp
> @@ -38,6 +38,8 @@ proc g++-dg-runtest { testcases flags default-extra-flags } 
> {
>   continue
>   }
>  
> + set nshort [file tail [file dirname $test]]/[file tail $test]
> +
>   # If the testcase specifies a standard, use that one.
>   # If not, run it under both standards, allowing GNU extensions
>   # if there's a dg-options line.
> @@ -61,12 +63,13 @@ proc g++-dg-runtest { testcases flags default-extra-flags 
> } {
>   } elseif { $x eq "impcx" } then { set x "23 
> -fimplicit-constexpr" }
>   lappend option_list "${std_prefix}$x"
>   }
> + if [string match "cpp23/*" $nshort] {
> + lappend option_list "${std_prefix}23"
> + }
>   } else {
>   set option_list { "" }
>   }
>  
> - set nshort [file tail [file dirname $test]]/[file tail $test]
> -
>   foreach flags_t $option_list {
>   verbose "Testing $nshort, $flags $flags_t" 1
>   dg-test $test "$flags $flags_t" ${default-extra-flags}

Jakub



Re: [commited] jit: Support for global rvalue initialization and constructors

2021-12-15 Thread Antoni Boucher via Gcc-patches
Hi Petter.
I believe you have forgotten the line `global:` in the file
`gcc/jit/libgccjit.map`.
I'm not sure what this line does, but it is there for all other ABI.
David: What do you think?
Regards.

Le mardi 14 décembre 2021 à 17:22 +, Petter Tomner via Jit a
écrit :
> Hi!
> 
> I have pushed the patch for rvalue initialization and ctors for
> libgccjit, for ABI 19.
> 
> Please see attached patch.
> 
> Regards,
> Petter
>   



[PATCH] rs6000: Refactor altivec_build_resolved_builtin

2021-12-15 Thread Bill Schmidt via Gcc-patches
Hi!

While replacing the built-in machinery, we agreed to defer some necessary
refactoring of the overload processing.  This patch cleans it up considerably.

I've put in one FIXME for an additional level of cleanup that should be done
independently.  The various helper functions (resolve_VEC_*) can be simplified
if we move the argument processing in altivec_resolve_overloaded_builtin
earlier.  But this requires making nontrivial changes to those functions that
will need careful review.  Let's do that in a later patch.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk?

Thanks!
Bill


2021-12-09  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.c (resolution): New enum.
(resolve_VEC_MUL): New function.
(resolve_VEC_CMPNE): Likewise.
(resolve_VEC_ADDE_SUBE): Likewise.
(resolve_VEC_ADDEC_SUBEC): Likewise.
(resolve_VEC_SPLATS): Likewise.
(resolve_VEC_EXTRACT): Likewise.
(resolve_VEC_INSERT): Likewise.
(resolve_VEC_STEP): Likewise.
(find_instance): Likewise.
(altivec_resolve_overloaded_builtin): Many cleanups:  Call factored-out
functions.  Move variable declarations closer to uses.  Add commentary.
Remove unnecessary levels of braces.  Avoid use of gotos.  Change
misleading variable names.  Use switches over if-else-if chains.
---
 gcc/config/rs6000/rs6000-c.c | 1835 +++---
 1 file changed, 1004 insertions(+), 831 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index e0ebdeed548..45f485aab44 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -928,28 +928,847 @@ altivec_build_resolved_builtin (tree *args, int n, tree 
fntype, tree ret_type,
   return fold_convert (ret_type, call);
 }
 
+/* Enumeration of possible results from attempted overload resolution.
+   This is used by special-case helper functions to tell their caller
+   whether they succeeded and what still needs to be done.
+
+   unresolved = Still needs processing
+ resolved = Resolved (but may be an error_mark_node)
+  resolved_bad = An error that needs handling by the caller.  */
+
+enum resolution { unresolved, resolved, resolved_bad };
+
+/* Resolve an overloaded vec_mul call and return a tree expression for the
+   resolved call if successful.  NARGS is the number of arguments to the call.
+   ARGLIST contains the arguments.  RES must be set to indicate the status of
+   the resolution attempt.  LOC contains statement location information.  */
+
+static tree
+resolve_VEC_MUL (resolution *res, vec *arglist, unsigned nargs,
+location_t loc)
+{
+  /* vec_mul needs to be special cased because there are no instructions for it
+ for the {un}signed char, {un}signed short, and {un}signed int types.  */
+  if (nargs != 2)
+{
+  error ("builtin %qs only accepts 2 arguments", "vec_mul");
+  *res = resolved;
+  return error_mark_node;
+}
+
+  tree arg0 = (*arglist)[0];
+  tree arg0_type = TREE_TYPE (arg0);
+  tree arg1 = (*arglist)[1];
+  tree arg1_type = TREE_TYPE (arg1);
+
+  /* Both arguments must be vectors and the types must be compatible.  */
+  if (TREE_CODE (arg0_type) != VECTOR_TYPE
+  || !lang_hooks.types_compatible_p (arg0_type, arg1_type))
+{
+  *res = resolved_bad;
+  return error_mark_node;
+}
+
+  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+{
+case E_QImode:
+case E_HImode:
+case E_SImode:
+case E_DImode:
+case E_TImode:
+  /* For scalar types just use a multiply expression.  */
+  *res = resolved;
+  return fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (arg0), arg0,
+ fold_convert (TREE_TYPE (arg0), arg1));
+case E_SFmode:
+  {
+   /* For floats use the xvmulsp instruction directly.  */
+   *res = resolved;
+   tree call = rs6000_builtin_decls[RS6000_BIF_XVMULSP];
+   return build_call_expr (call, 2, arg0, arg1);
+  }
+case E_DFmode:
+  {
+   /* For doubles use the xvmuldp instruction directly.  */
+   *res = resolved;
+   tree call = rs6000_builtin_decls[RS6000_BIF_XVMULDP];
+   return build_call_expr (call, 2, arg0, arg1);
+  }
+/* Other types are errors.  */
+default:
+  *res = resolved_bad;
+  return error_mark_node;
+}
+}
+
+/* Resolve an overloaded vec_cmpne call and return a tree expression for the
+   resolved call if successful.  NARGS is the number of arguments to the call.
+   ARGLIST contains the arguments.  RES must be set to indicate the status of
+   the resolution attempt.  LOC contains statement location information.  */
+
+static tree
+resolve_VEC_CMPNE (resolution *res, vec *arglist, unsigned nargs,
+  location_t loc)
+{
+  /* vec_cmpne needs to be special cased because there are no instructions
+ for it (prior to power 9).  */
+  if (nargs != 2)

[PATCH] x86_64: Ignore zero width bitfields in ABI and issue -Wpsabi warning about C zero width bitfield ABI changes [PR102024]

2021-12-15 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 29, 2021 at 05:25:30AM -0700, H.J. Lu wrote:
> > I'd like to ping this patch, but perhaps first it would be nice to discuss
> > it in the x86-64 psABI group.
> > The current psABI doesn't seem to mention zero sized bitfields at all
> > explicitly, so perhaps theoretically they should be treated as INTEGER 
> > class,
> > but if they are at positions multiple of 64 bits, then it is unclear into
> > which eightbyte they should be considered, whether the previous one if any
> > or the next one if any.  I guess similar problem is for zero sized
> > structures, but those should according to algorithm have NO_CLASS and so it
> > doesn't really make a difference.  And, no compiler I'm aware of treats
> > the zero sized bitfields at 64 bit boundaries as INTEGER class, LLVM/ICC are
> > ignoring such bitfields everywhere, GCC ignores them at those boundaries
> > (and used to ignore them in C++ everywhere).  I guess my preferred solution
> > would be to say explicitly that zero sized bitfields are NO_CLASS.
> > I'm not a member of the google x86-64 psABI group, can somebody please raise
> > it there?
> 
> https://groups.google.com/g/x86-64-abi/c/OYeWs14WHQ4

Thanks.
I see nobody commented on Micha's post there.

Here is a patch that implements it in GCC, i.e. C++ doesn't change ABI (at least
not from the past few releases) and C does for GCC:

2021-12-15  Jakub Jelinek  

PR target/102024
* config/i386/i386.c (classify_argument): Add zero_width_bitfields
argument, when seeing DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD bitfields,
always ignore them, when seeing other zero sized bitfields, either
set zero_width_bitfields to 1 and ignore it or if equal to 2 process
it.  Pass it to recursive calls.  Add wrapper
with old arguments and diagnose ABI differences for C structures
with zero width bitfields.  Formatting fixes.

* gcc.target/i386/pr102024.c: New test.
* g++.target/i386/pr102024.C: New test.

--- gcc/config/i386/i386.c.jj   2021-12-10 17:00:06.024369219 +0100
+++ gcc/config/i386/i386.c  2021-12-15 15:04:49.245148023 +0100
@@ -2065,7 +2065,8 @@ merge_classes (enum x86_64_reg_class cla
 
 static int
 classify_argument (machine_mode mode, const_tree type,
-  enum x86_64_reg_class classes[MAX_CLASSES], int bit_offset)
+  enum x86_64_reg_class classes[MAX_CLASSES], int bit_offset,
+  int &zero_width_bitfields)
 {
   HOST_WIDE_INT bytes
 = mode == BLKmode ? int_size_in_bytes (type) : (int) GET_MODE_SIZE (mode);
@@ -2123,6 +2124,16 @@ classify_argument (machine_mode mode, co
 misaligned integers.  */
  if (DECL_BIT_FIELD (field))
{
+ if (integer_zerop (DECL_SIZE (field)))
+   {
+ if (DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD (field))
+   continue;
+ if (zero_width_bitfields != 2)
+   {
+ zero_width_bitfields = 1;
+ continue;
+   }
+   }
  for (i = (int_bit_position (field)
+ (bit_offset % 64)) / 8 / 8;
   i < ((int_bit_position (field) + (bit_offset % 64))
@@ -2160,7 +2171,8 @@ classify_argument (machine_mode mode, co
  num = classify_argument (TYPE_MODE (type), type,
   subclasses,
   (int_bit_position (field)
-   + bit_offset) % 512);
+   + bit_offset) % 512,
+  zero_width_bitfields);
  if (!num)
return 0;
  pos = (int_bit_position (field)
@@ -2178,7 +2190,8 @@ classify_argument (machine_mode mode, co
  {
int num;
num = classify_argument (TYPE_MODE (TREE_TYPE (type)),
-TREE_TYPE (type), subclasses, bit_offset);
+TREE_TYPE (type), subclasses, bit_offset,
+zero_width_bitfields);
if (!num)
  return 0;
 
@@ -2211,7 +2224,7 @@ classify_argument (machine_mode mode, co
 
  num = classify_argument (TYPE_MODE (TREE_TYPE (field)),
   TREE_TYPE (field), subclasses,
-  bit_offset);
+  bit_offset, zero_width_bitfields);
  if (!num)
return 0;
  for (i = 0; i < num && i < words; i++)
@@ -2231,7 +2244,7 @@ classify_argument (machine_mode mode, co
 X86_64_SSEUP_CLASS, everything should 

Re: [PATCH v4 1/6] tree-object-size: Use trees and support negative offsets

2021-12-15 Thread Jakub Jelinek via Gcc-patches
On Wed, Dec 01, 2021 at 07:57:52PM +0530, Siddhesh Poyarekar wrote:

>  static inline bool
> -object_sizes_set (struct object_size_info *osi, unsigned varno,
> -   unsigned HOST_WIDE_INT val)
> +object_sizes_set (struct object_size_info *osi, unsigned varno, tree val,
> +   tree wholeval)
>  {
>int object_size_type = osi->object_size_type;
> -  if ((object_size_type & OST_MINIMUM) == 0)
> -{
> -  if (object_sizes[object_size_type][varno] < val)
> - return object_sizes_set_force (osi, varno, val);
> -}
> -  else
> -{
> -  if (object_sizes[object_size_type][varno] > val)
> - return object_sizes_set_force (osi, varno, val);
> -}
> -  return false;
> +  object_size osize = object_sizes[object_size_type][varno];
> +
> +  tree oldval = osize.size;
> +  tree old_wholeval = osize.wholesize;
> +
> +  enum tree_code code = object_size_type & OST_MINIMUM ? MIN_EXPR : MAX_EXPR;
> +
> +  val = size_binop (code, val, oldval);
> +  wholeval = size_binop (code, wholeval, old_wholeval);
> +
> +  object_sizes[object_size_type][varno].size = val;
> +  object_sizes[object_size_type][varno].wholesize = wholeval;
> +  return tree_int_cst_compare (oldval, val) != 0;

Shouldn't this also tree_int_cst_compare (old_wholeval, wholeval) ?

Otherwise LGTM.

Jakub



Re: [PATCH v4 2/6] __builtin_dynamic_object_size: Recognize builtin

2021-12-15 Thread Jakub Jelinek via Gcc-patches
On Wed, Dec 01, 2021 at 07:57:53PM +0530, Siddhesh Poyarekar wrote:
> Recognize the __builtin_dynamic_object_size builtin and add paths in the
> object size path to deal with it, but treat it like
> __builtin_object_size for now.  Also add tests to provide the same
> testing coverage for the new builtin name.
> 
> gcc/ChangeLog:
> 
>   * builtins.def (BUILT_IN_DYNAMIC_OBJECT_SIZE): New builtin.
>   * tree-object-size.h: Move object size type bits enum from
>   tree-object-size.c and add new value OST_DYNAMIC.
>   * builtins.c (expand_builtin, fold_builtin_2): Handle it.
>   (fold_builtin_object_size): Handle new builtin and adjust for
>   change to compute_builtin_object_size.
>   * tree-object-size.c: Include builtins.h.
>   (compute_builtin_object_size): Adjust.
>   (early_object_sizes_execute_one,
>   dynamic_object_sizes_execute_one): New functions.
>   (object_sizes_execute): Rename insert_min_max_p argument to
>   early.  Handle BUILT_IN_DYNAMIC_OBJECT_SIZE and call the new
>   functions.
>   doc/extend.texi (__builtin_dynamic_object_size): Document new
>   builtin.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/ext/builtin-dynamic-object-size1.C: New test.
>   * g++.dg/ext/builtin-dynamic-object-size2.C: Likewise.
>   * gcc.dg/builtin-dynamic-alloc-size.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-1.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-10.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-11.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-12.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-13.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-14.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-15.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-16.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-17.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-18.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-19.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-5.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-6.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-7.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-8.c: Likewise.
>   * gcc.dg/builtin-dynamic-object-size-9.c: Likewise.
>   * gcc.dg/builtin-object-size-16.c: Adjust to allow inclusion
>   from builtin-dynamic-object-size-16.c.
>   * gcc.dg/builtin-object-size-17.c: Likewise.

Ok, thanks.

Jakub



PING [PATCH] enable -Winvalid-memory-order for C++ [PR99612]

2021-12-15 Thread Martin Sebor via Gcc-patches

Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586402.html

Besides PR 99612 this also fixes the false positive reported
recently in PR 103372.

On 12/8/21 9:49 AM, Martin Sebor wrote:

Even with -Wno-system-headers enabled, the -Winvalid-memory-order
code tries to make sure calls to atomic functions with invalid
memory orders are diagnosed even though the C atomic functions
are defined as macros in the  system header.
The warning triggers at all optimization levels, including -O0.

Independently, the core diagnostic enhancements implemented earlier
this year (the warning group control) enable warnings for functions
defined in system headers that are inlined into user code.  This
was done for similar reason as above: because it's desirable to
diagnose invalid calls made from user code to system functions
(e.g., buffer overflows, invalid or mismatched deallocations,
etc.)

However, the C macro solution interferes with the code diagnostic
changes and prevents the invalid memory model warnings from being
issued for the same problems in C++.  In addition, because C++
atomics are ordinary (inline) functions that call the underlying
__atomic_xxx built-ins, the invalid memory orders can only be
detected with both inlining and constant propagation enabled.

The attached patch removes these limitations and enables
-Winvalid-memory-order to trigger even for C++ std::atomic,
(almost) just like it does in C, at all optimization levels
including -O0.

To make that possible I had to move -Winvalid-memory-order from
builtins.c to a GIMPLE pass where it can use context-sensitive
range info at -O0, instead of relying on constant propagation
(only available at -O1 and above).  Although the same approach
could be used to emit better object code for C++ atomics at -O0
(i.e., use the right memory order instead of dropping to seq_cst),
this patch doesn't do that.)

In addition to enabling the warning I've also enhanced it to
include the memory models involved in the diagnosed call (both
the problem ones and the viable alternatives).

Tested on x86_64-linux.

Jonathan, I CC you for two reasons: a) because this solution
is based on your (as well as my own) preference for handling
C++ system headers, and because of our last week's discussion
of the false positives in std::string resulting from the same
choice there.

I don't anticipate this change to lead to the same fallout
because it's unlikely for GCC to synthesize invalid memory
orders out of thin air; and b) because the current solution
can only detect the problems in calls to atomic functions at
-O0 that are declared with attribute always_inline.  This
includes member functions defined in the enclosing atomic
class but not namespace-scope functions.  To make
the detection possible those would also have to be
always_inline.  If that's a change you'd like to see I can
look into making it happen.

Martin




Re: [PATCH] ipa: Careful processing ANCESTOR jump functions and NULL pointers (PR 103083)

2021-12-15 Thread Martin Jambor
On Mon, Dec 13 2021, Jan Hubicka wrote:
>> >>> +  || (only_for_nonzero && 
>> >>> !src_lats->bits_lattice.known_nonzero_p ()))
>> >>> +{
>> >>> +  if (jfunc->bits)
>> >>> +return dest_lattice->meet_with (jfunc->bits->value,
>> >>> +jfunc->bits->mask, 
>> >>> precision);
>> >>> +  else
>> >>> +return dest_lattice->set_to_bottom ();
>> >>> +}
>> >> I do not think you need to set to bottom here. For pointers, we
>> >> primarily track alignment and NULL is aligned - all you need to do is to
>> >> make sure that we do not believe that some bits are 1.
>> >
>> > I am not sure I understand, the testcase you wrote has all bits as zeros
>> > and still miscompiles?  It is primarily used for alignment but not only
>> > for that.
>
> Maybe I misunderstand the code.  But if you have only_for_nonzero and
> you do not know htat src lattice is non-zero you will get
>  - if src is 0, then dest is 0
>  - if src is non-zero then dest is src+offset
>

or all known bits of src are zero but then there are unknown masked-out
bits with unknown value and in that case we just don't know.  The patch
does src+offset when there is a known non-zero bit but we can only set
dest to zero if all bits are known.  But as you pointed out, the
constant part of IPA-CP should catch this case, and now does, so I
decided not to handle it here.

If you prefer, I can add a case if both m_value and m_mask or zero
(well, could the latter in theory only need "precision" zeros)? and if
so, simply meet the destination lattice with the source one.

Is that what you are asking for?

Thanks,

Martin


[PATCH 00/40] OpenACC "kernels" Improvements

2021-12-15 Thread Frederik Harwath
Hi,
this patch series implements the re-work of the OpenACC "kernels"
implementation that has been announced at the GNU Tools Track of this
year's Linux Plumbers Conference; see
https://linuxplumbersconf.org/event/11/contributions/998/.  Versions
of the patches have also been committed to the devel/omp/gcc-11 branch
recently.

The patch series contains middle-end changes that modify the "kernels"
loop handling to use Graphite for dependence analysis of loops in
"kernels" regions, as well as new optimizations and adjustments to
existing optimizations to support this analysis. A central step is
contained in the commit titled "openacc: Use Graphite for dependence
analysis in \"kernels\" regions" whose commit message also contains
further explanations. There are also front end changes (cf. the
patches by Sandra Loosemore) that prepare the loops in "kernels"
regions for the middle-end processing and which lift various
restrictions on "kernels" regions.  I have included some dependences
(the patches by Julian Brown) from the devel/omp/gcc-11 branch which
will be re-submitted independently for review.

I have bootstrapped the compiler on x86_64-linux-gnu and performed
comprehensive testing on a powerpc64le-linux-gnu target.  The patches
should apply cleanly on commit r12-4865 of the master branch.

I am aware that we cannot incorporate those patches into GCC at the
current development stage. I hope that we can discuss some of the
changes before they can be considered for inclusion in GCC during the
next stage 1.

Best regards,
Frederik


Andrew Stubbs (2):
  openacc: Add data optimization pass
  openacc: Add runtime a lias checking for OpenACC kernels

Frederik Harwath (20):
  Fortran: Delinearize array accesses
  openacc: Move pass_oacc_device_lower after pass_graphite
  graphite: Extend SCoP detection dump output
  graphite: Rename isl_id_for_ssa_name
  graphite: Fix minor mistakes in comments
  Move compute_alias_check_pairs to tree-data-ref.c
  graphite: Add runtime alias checking
  openacc: Use Graphite for dependence analysis in "kernels" regions
  openacc: Add "can_be_parallel" flag info to "graph" dumps
  openacc: Remove unused partitioning in "kernels" regions
  Add function for printing a single OMP_CLAUSE
  openacc: Warn about "independent" "kernels" loops with
data-dependences
  openacc: Handle internal function calls in pass_lim
  openacc: Disable pass_pre on outlined functions analyzed by Graphite
  graphite: Tune parameters for OpenACC use
  graphite: Adjust scop loop-nest choice
  graphite: Accept loops without data references
  openacc: Enable reduction variable localization for "kernels"
  openacc: Check type for references in reduction lowering
  openacc: Adjust testsuite to new "kernels" handling

Julian Brown (4):
  Reference reduction localization
  Fix tree check failure with reduction localization
  Use more appropriate var in localize_reductions call
  Handle references in OpenACC "private" clauses

Sandra Loosemore (12):
  Kernels loops annotation: C and C++.
  Add -fno-openacc-kernels-annotate-loops option to more testcases.
  Kernels loops annotation: Fortran.
  Additional Fortran testsuite fixes for kernels loops annotation pass.
  Fix bug in processing of array dimensions in data clauses.
  Add a "combined" flag for "acc kernels loop" etc directives.
  Annotate inner loops in "acc kernels loop" directives (C/C++).
  Annotate inner loops in "acc kernels loop" directives (Fortran).
  Permit calls to builtins and intrinsics in kernels loops.
  Fix patterns in Fortran tests for kernels loop annotation.
  Clean up loop variable extraction in OpenACC kernels loop annotation.
  Relax some restrictions on the loop bound in kernels loop annotation.

Tobias Burnus (2):
  Fix for is_gimple_reg vars to 'data kernels'
  openacc: fix privatization of by-reference arrays

 gcc/Makefile.in   |   2 +
 gcc/c-family/c-common.h   |   1 +
 gcc/c-family/c-omp.c  | 915 +++--
 gcc/c-family/c.opt|   8 +
 gcc/c/c-decl.c|  28 +
 gcc/c/c-parser.c  |   3 +
 gcc/cfgloop.c |   1 +
 gcc/cfgloop.h |   6 +
 gcc/cfgloopmanip.c|   1 +
 gcc/common.opt|   9 +
 gcc/config/nvptx/nvptx.c  |   7 +
 gcc/cp/decl.c |  44 +
 gcc/cp/parser.c   |   3 +
 gcc/cp/semantics.c|   9 +
 gcc/doc/gimple.texi   |   2 +
 gcc/doc/invoke.texi   |  52 +-
 gcc/doc/passes.texi   |   6 +-
 gcc/expr.c|   1 +
 gcc/flag-types.h  |   1 +
 gcc/fortran/gfortran.h|   1 +
 gcc/fortran/lang.opt 

[PATCH 02/40] Add -fno-openacc-kernels-annotate-loops option to more testcases.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

2020-03-27  Sandra Loosemore  

gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-2.c: Add
-fno-openacc-kernels-annotate-loops.
---
 gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
index cdf85d4bafae..0f2d2f0a757b 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
@@ -1,5 +1,6 @@
 /* Test OpenACC 'kernels' construct decomposition.  */

+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fopt-info-omp-all" } */
 /* { dg-additional-options "--param=openacc-kernels=decompose" }
 /* { dg-additional-options "-O2" } for 'parloops'.  */
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 01/40] Kernels loops annotation: C and C++.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

This patch detects loops in kernels regions that are candidates for
parallelization, and adds "#pragma acc loop auto" annotations to them.
This annotation is controlled by the -fopenacc-kernels-annotate-loops
option, which is enabled by default.  -Wopenacc-kernels-annotate-loops
can be used to produce diagnostics about loops that cannot be annotated.

gcc/c-family/
* c-common.h (c_oacc_annotate_loops_in_kernels_regions): Declare.
* c-omp.c: Include tree-iterator.h
(enum annotation_state): New.
(struct annotation_info): New.
(do_not_annotate_loop): New.
(do_not_annotate_loop_nest): New.
(annotation_error): New.
(c_finish_omp_for_internal): Split from c_finish_omp_for.  Use
annotation_error function.  Code refactoring to avoid destructive
changes that cannot be undone in case of error.
(is_local_var): New.
(lang_specific_unwrap_initializer): New.
(annotate_for_loop): New.
(check_and_annotate_for_loop): New.
(annotate_loops_in_kernels_regions): New.
(c_oacc_annotate_loops_in_kernels_regions): New.
* c.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.

gcc/c/
* c-decl.c (c_unwrap_for_init): New.
(finish_function): Call c_oacc_annotate_loops_in_kernels_regions.

gcc/cp/
* decl.c (cp_unwrap_for_init): New.
(finish_function): Call c_oacc_annotate_loops_in_kernels_regions.

gcc/
* doc/invoke.texi (Option Summary): Add entries for
-Wopenacc-kernels-annotate-loops and
-fno-openacc-kernels-annotate-loops.
(Warning Options): Document -Wopenacc-kernels-annotate-loops.
(Optimization Options): Document -fno-openacc-kernels-annotate-loops.

gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Add
-fno-openacc-kernels-annotate-loops option.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-counter-var-redundant-load.c: Likewise.
* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-3.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
* c-c++-common/goacc/kernels-loop-data.c: Likewise.
* c-c++-common/goacc/kernels-loop-g.c: Likewise.
* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
* c-c++-common/goacc/kernels-loop-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
* c-c++-common/goacc/kernels-loop.c: Likewise.
* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c:
Likewise.
* c-c++-common/goacc/kernels-reduction.c: Likewise.
* c-c++-common/goacc/kernels-loop-annotation-1.c: New.
* c-c++-common/goacc/kernels-loop-annotation-2.c: New.
* c-c++-common/goacc/kernels-loop-annotation-3.c: New.
* c-c++-common/goacc/kernels-loop-annotation-4.c: New.
* c-c++-common/goacc/kernels-loop-annotation-5.c: New.
* c-c++-common/goacc/kernels-loop-annotation-6.c: New.
* c-c++-common/goacc/kernels-loop-annotation-7.c: New.
* c-c++-common/goacc/kernels-loop-annotation-8.c: New.
* c-c++-common/goacc/kernels-loop-annotation-9.c: New.
* c-c++-common/goacc/kernels-loop-annotation-10.c: New.
* c-c++-common/goacc/kernels-loop-annotation-11.c: New.
* c-c++-common/goacc/kernels-loop-annotation-12.c: New.
* c-c++-common/goacc/kernels-loop-annotation-13.c: New.
* c-c++-common/goacc/kernels-loop-annotation-14.c: New.
* c-c++-common/goacc/kernels-loop-annotation-15.c: New.
* c-c++-common/goacc/kernels-loop-annotation-16.c: New.
* c-c++-common/goacc/kernels-loop-annotation-17.c: New.
---
 gcc/c-family/c-common.h   |   1 +
 gcc/c-family/c-omp.c  | 799 --
 gcc/c-family/c.opt|   8 +
 gcc/c/c-decl.c|  28 +
 gcc/cp/decl.c |  44 +
 gcc/doc/invoke.texi   |  32 +-
 .../goacc/classify-kernels-unparallelized.c   |   1 +
 .../c-c++-common/goacc/classify-kernels.c |   3 +-
 .../kernels-counter-var-redundant-load.c  |   1 +
 .../kernels-counter-vars-function-scope.c |   1 +
 .../goacc/kernels-double-reduction-n.c|   1 +
 .../goacc/kernels-doub

[PATCH 03/40] Kernels loops annotation: Fortran.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

This patch implements the Fortran support for adding "#pragma acc loop auto"
annotations to loops in OpenACC kernels regions.  It implements the same
-fopenacc-kernels-annotate-loops and -Wopenacc-kernels-annotate-loops options
that were previously added (and documented) for the C/C++ front ends.

Co-Authored-By: Gergö Barany 

gcc/fortran/
* gfortran.h (gfc_oacc_annotate_loops_in_kernels_regions): Declare.
* lang.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.
* openmp.c: Include options.h.
(enum annotation_state, enum annotation_result): New.
(check_code_for_invalid_calls): New.
(check_expr_for_invalid_calls): New.
(check_for_invalid_calls): New.
(annotate_do_loop): New.
(annotate_do_loops_in_kernels): New.
(compute_goto_targets): New.
(gfc_oacc_annotate_loops_in_kernels_regions): New.
* parse.c (gfc_parse_file): Handle -fopenacc-kernels-annotate-loops.

gcc/testsuite/
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add
-fno-openacc-kernels-annotate-loops option.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/common-block-3.f90: Likewise.
* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
* gfortran.dg/goacc/kernels-loop.f95: Likewise.
* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-9.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-10.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: New.
---
 gcc/fortran/gfortran.h|   1 +
 gcc/fortran/lang.opt  |   8 +
 gcc/fortran/openmp.c  | 364 ++
 gcc/fortran/parse.c   |   9 +
 .../goacc/classify-kernels-unparallelized.f95 |   1 +
 .../gfortran.dg/goacc/classify-kernels.f95|   1 +
 .../gfortran.dg/goacc/common-block-3.f90  |   1 +
 .../gfortran.dg/goacc/kernels-loop-2.f95  |   1 +
 .../goacc/kernels-loop-annotation-1.f95   |  33 ++
 .../goacc/kernels-loop-annotation-10.f95  |  32 ++
 .../goacc/kernels-loop-annotation-11.f95  |  34 ++
 .../goacc/kernels-loop-annotation-12.f95  |  39 ++
 .../goacc/kernels-loop-annotation-13.f95  |  38 ++
 .../goacc/kernels-loop-annotation-14.f95  |  35 ++
 .../goacc/kernels-loop-annotation-15.f95  |  35 ++
 .../goacc/kernels-loop-annotation-16.f95  |  34 ++
 .../goacc/kernels-loop-annotation-2.f95   |  32 ++
 .../goacc/kernels-loop-annotation-3.f95   |  33 ++
 .../goacc/kernels-loop-annotation-4.f95   |  34 ++
 .../goacc/kernels-loop-annotation-5.f95   |  35 ++
 .../goacc/kernels-loop-annotation-6.f95   |  34 ++
 .../goacc/kernels-loop-annotation-7.f95   |  48 +++
 .../goacc/kernels-loop-annotation-8.f95   |  50 +++
 .../goacc/kernels-loop-annotation-9.f95   |  34 ++
 .../gfortran.dg/goacc/kernels-loop-data-2.f95 |   1 +
 .../goacc/kernels-loop-data-enter-exit-2.f95  |   1 +
 .../goacc/kernels-loop-data-enter-exit.f95|   1 +
 .../goacc/kernels-loop-data-update.f95|   1 +
 .../gfortran.dg/goacc/kernels-loop-data.f95   |   1 +
 .../gfortran.dg/goacc/kernels-loop-n.f95  |   1 +
 .../gfortran.dg/goacc/kernels-loop.f95|   1 +
 .../kernels-parallel-loop-data-enter-exit.f95 |   1 +
 32 files changed, 974 insertions(+)
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1

[PATCH 04/40] Additional Fortran testsuite fixes for kernels loops annotation pass.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

2020-03-27  Sandra Loosemore  

gcc/testsuite/
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust
line numbering.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Add
-fno-openacc-kernels-annotate-loops.
---
 .../gfortran.dg/goacc/classify-kernels-unparallelized.f95| 5 +++--
 gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 | 5 +++--
 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95  | 1 +
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git 
a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 
b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
index 2ceae2088070..00aac9aa94ea 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
@@ -23,8 +23,9 @@ program main

   call setup(a, b)

-  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message 
"optimized: assigned OpenACC seq loop parallelism" }
-  do i = 0, n - 1
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC seq loop 
parallelism" }
+  ! { dg-message "note: beginning .parloops. part in OpenACC 
.kernels. region" "" { target *-*-* } 24 }
  c(i) = a(f (i)) + b(f (i))
   end do
   !$acc end kernels
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 
b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
index d061a241074b..ba815319abf2 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
@@ -19,8 +19,9 @@ program main

   call setup(a, b)

-  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message 
"optimized: assigned OpenACC gang loop parallelism" }
-  do i = 0, n - 1
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC gang loop 
parallelism" }
+  ! { dg-message "beginning .parloops. part in OpenACC 
.kernels. region" "" { target *-*-* } 20 }
  c(i) = a(i) + b(i)
   end do
   !$acc end kernels
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
index 238482b91a49..04c998d11dad 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
@@ -1,5 +1,6 @@
 ! Test OpenACC 'kernels' construct decomposition.

+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fopt-info-omp-all" }
 ! { dg-additional-options "--param=openacc-kernels=decompose" }
 ! { dg-additional-options "-O2" } for 'parloops'.
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 05/40] Fix bug in processing of array dimensions in data clauses.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

The g++ front end wraps the array length and low_bound values in
NON_LVALUE_EXPR, causing the subsequent tests for INTEGER_CST to fail.
The test case c-c++-common/goacc/kernels-loop-annotation-1.c was
tickling this bug and giving bogus errors in g++ because it was falling
through to dynamic array code instead of recognizing the constant bounds.

This patch was posted upstream here
https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542694.html
but not yet committed.  It may be that some other fix for this problem
is implemented on mainline instead; check before merging this patch.

2020-03-31  Sandra Loosemore  

gcc/cp/
* semantics.c (handle_omp_array_sections_1): Call STRIP_NOPS
on length and low_bound;
(handle_omp_array_sections): Likewise.
---
 gcc/cp/semantics.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 2443d0327498..c2643d0a7a24 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5145,6 +5145,10 @@ handle_omp_array_sections_1 (tree c, tree t, vec 
&types,
   if (length)
 length = mark_rvalue_use (length);
   /* We need to reduce to real constant-values for checks below.  */
+  if (length)
+STRIP_NOPS (length);
+  if (low_bound)
+STRIP_NOPS (low_bound);
   if (length)
 length = fold_simple (length);
   if (low_bound)
@@ -5457,6 +5461,11 @@ handle_omp_array_sections (tree c, enum 
c_omp_region_type ort)
  tree low_bound = TREE_PURPOSE (t);
  tree length = TREE_VALUE (t);

+ if (length)
+   STRIP_NOPS (length);
+ if (low_bound)
+   STRIP_NOPS (low_bound);
+
  i--;
  if (low_bound
  && TREE_CODE (low_bound) == INTEGER_CST
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 06/40] Add a "combined" flag for "acc kernels loop" etc directives.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

2020-08-19  Sandra Loosemore  

gcc/
* tree.h (OACC_LOOP_COMBINED): New.

gcc/c/
* c-parser.c (c_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/cp/
* parser.c (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/fortran/
* trans-openmp.c (gfc_trans_omp_do): Add combined parameter,
use it to set OACC_LOOP_COMBINED.  Update all call sites.
---
 gcc/c/c-parser.c   |  3 +++
 gcc/cp/parser.c|  3 +++
 gcc/fortran/trans-openmp.c | 34 +-
 gcc/tree.h |  5 +
 4 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 80dd61d599ef..1258b48693de 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -17371,6 +17371,7 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, 
char *p_name,
omp_clause_mask mask, tree *cclauses, bool *if_p)
 {
   bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1;
+  bool is_combined = (cclauses != NULL);

   strcat (p_name, " loop");
   mask |= OACC_LOOP_CLAUSE_MASK;
@@ -17389,6 +17390,8 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, 
char *p_name,
   tree block = c_begin_compound_stmt (true);
   tree stmt = c_parser_omp_for_loop (loc, parser, OACC_LOOP, clauses, NULL,
 if_p);
+  if (stmt && stmt != error_mark_node)
+OACC_LOOP_COMBINED (stmt) = is_combined;
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4c2075742d6a..c834d25b028f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -44580,6 +44580,7 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token 
*pragma_tok, char *p_name,
 omp_clause_mask mask, tree *cclauses, bool *if_p)
 {
   bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1;
+  bool is_combined = (cclauses != NULL);

   strcat (p_name, " loop");
   mask |= OACC_LOOP_CLAUSE_MASK;
@@ -44598,6 +44599,8 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token 
*pragma_tok, char *p_name,
   tree block = begin_omp_structured_block ();
   int save = cp_parser_begin_omp_structured_block (parser);
   tree stmt = cp_parser_omp_for_loop (parser, OACC_LOOP, clauses, NULL, if_p);
+  if (stmt && stmt != error_mark_node)
+OACC_LOOP_COMBINED (stmt) = is_combined;
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (block));

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index e81c5588c53c..618e106791e5 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -4855,7 +4855,8 @@ typedef struct dovar_init_d {

 static tree
 gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
- gfc_omp_clauses *do_clauses, tree par_clauses)
+ gfc_omp_clauses *do_clauses, tree par_clauses,
+ bool combined)
 {
   gfc_se se;
   tree dovar, stmt, from, to, step, type, init, cond, incr, orig_decls;
@@ -5219,7 +5220,10 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, 
stmtblock_t *pblock,
 case EXEC_OMP_DISTRIBUTE: stmt = make_node (OMP_DISTRIBUTE); break;
 case EXEC_OMP_LOOP: stmt = make_node (OMP_LOOP); break;
 case EXEC_OMP_TASKLOOP: stmt = make_node (OMP_TASKLOOP); break;
-case EXEC_OACC_LOOP: stmt = make_node (OACC_LOOP); break;
+case EXEC_OACC_LOOP:
+  stmt = make_node (OACC_LOOP);
+  OACC_LOOP_COMBINED (stmt) = combined;
+  break;
 default: gcc_unreachable ();
 }

@@ -5313,7 +5317,8 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
 pblock = █
   else
 pushlevel ();
-  stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, NULL);
+  stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, NULL,
+  true);
   protected_set_expr_location (stmt, loc);
   if (TREE_CODE (stmt) != BIND_EXPR)
 stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
@@ -6151,7 +6156,7 @@ gfc_trans_omp_do_simd (gfc_code *code, stmtblock_t 
*pblock,
 omp_do_clauses
   = gfc_trans_omp_clauses (&block, &clausesa[GFC_OMP_SPLIT_DO], code->loc);
   body = gfc_trans_omp_do (code, EXEC_OMP_SIMD, pblock ? pblock : &block,
-  &clausesa[GFC_OMP_SPLIT_SIMD], omp_clauses);
+  &clausesa[GFC_OMP_SPLIT_SIMD], omp_clauses, false);
   if (pblock == NULL)
 {
   if (TREE_CODE (body) != BIND_EXPR)
@@ -6209,7 +6214,7 @@ gfc_trans_omp_parallel_do (gfc_code *code, bool is_loop, 
stmtblock_t *pblock,
 }
   stmt = gfc_trans_omp_do (code, is_loop ? EXEC_OMP_LOOP : EXEC_OMP_DO,
   new_pblock, &clausesa[GFC_OMP_SPLIT_DO],
-  omp_clauses);
+  omp_clauses, false);
   if (pblock == NULL)
 {
   if (TREE_CODE (stmt) != BIND_EXPR)
@@ -6496,7 +6501,8 @@ 

[PATCH 07/40] Annotate inner loops in "acc kernels loop" directives (C/C++).

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior for C and C++.

2020-08-19  Sandra Loosemore  

gcc/c-family/
* c-omp.c (annotate_loops_in_kernels_regions): Process inner
loops in combined "acc kernels loop" directives.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-18.c: New.
* c-c++-common/goacc/kernels-loop-annotation-19.c: New.
* c-c++-common/goacc/combined-directives.c: Adjust expected
patterns.
---
 gcc/c-family/c-omp.c  | 36 ---
 .../c-c++-common/goacc/combined-directives.c  |  2 +-
 .../goacc/kernels-loop-annotation-18.c| 18 ++
 .../goacc/kernels-loop-annotation-19.c| 19 ++
 4 files changed, 62 insertions(+), 13 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-19.c

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index fad50da8fbc4..30757877eafe 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3477,18 +3477,30 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int 
*walk_subtrees,
   /* Do not try to add automatic OpenACC annotations inside manually
 annotated loops.  Presumably, the user avoided doing it on
 purpose; for example, all available levels of parallelism may
-have been used up.  */
-  {
-   struct annotation_info nested_info
- = { NULL_TREE, NULL_TREE, false, as_explicit_annotation,
- node, info };
-   if (info->state >= as_in_kernels_region)
- do_not_annotate_loop_nest (info, as_explicit_annotation,
-node);
-   walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions,
-  (void *) &nested_info, NULL);
-   *walk_subtrees = 0;
-  }
+have been used up.  However, assume that the combined construct
+"#pragma acc kernels loop" means to try to process the whole
+loop nest.
+Note that a single OACC_LOOP construct represents an entire set
+of collapsed loops so we do not have to deal explicitly with the
+collapse clause here, as the Fortran front end does.  */
+  if (info->state == as_in_kernels_region && OACC_LOOP_COMBINED (node))
+   {
+ walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions,
+(void *) info, NULL);
+ *walk_subtrees = 0;
+   }
+  else
+   {
+ struct annotation_info nested_info
+   = { NULL_TREE, NULL_TREE, false, as_explicit_annotation,
+   node, info };
+ if (info->state >= as_in_kernels_region)
+   do_not_annotate_loop_nest (info, as_explicit_annotation,
+  node);
+ walk_tree (&OMP_BODY (node), annotate_loops_in_kernels_regions,
+(void *) &nested_info, NULL);
+ *walk_subtrees = 0;
+   }
   break;

 case FOR_STMT:
diff --git a/gcc/testsuite/c-c++-common/goacc/combined-directives.c 
b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
index c2a3c57b48b8..2519f23d49f0 100644
--- a/gcc/testsuite/c-c++-common/goacc/combined-directives.c
+++ b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
@@ -110,7 +110,7 @@ test ()
 // { dg-final { scan-tree-dump-times "acc loop worker" 2 "gimple" } }
 // { dg-final { scan-tree-dump-times "acc loop vector" 2 "gimple" } }
 // { dg-final { scan-tree-dump-times "acc loop seq" 2 "gimple" } }
-// { dg-final { scan-tree-dump-times "acc loop auto" 2 "gimple" } }
+// { dg-final { scan-tree-dump-times "acc loop auto" 6 "gimple" } }
 // { dg-final { scan-tree-dump-times "acc loop tile.2, 3" 2 "gimple" } }
 // { dg-final { scan-tree-dump-times "acc loop independent private.i" 2 
"gimple" } }
 // { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c
new file mode 100644
index ..89ec6447625f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-18.c
@@ -0,0 +1,18 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that "acc kernels loop" directive causes annotation of the entire
+   loop nest.  */
+
+void f (float *a, float *b)
+{
+#pragma acc kernels loop
+

[PATCH 08/40] Annotate inner loops in "acc kernels loop" directives (Fortran).

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior in Fortran.

2020-08-19  Sandra Loosemore  

gcc/fortran/
* openmp.c (annotate_do_loops_in_kernels): Handle
EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner
loops in a combined "acc kernels loop" directive.

gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-18.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-19.f95: New.
* gfortran.dg/goacc/combined-directives.f90: Adjust expected
patterns.
* gfortran.dg/goacc/private-explicit-kernels-1.f95: Likewise.
* gfortran.dg/goacc/private-predetermined-kernels-1.f95:
Likewise.
---
 gcc/fortran/openmp.c  | 50 ++-
 .../gfortran.dg/goacc/combined-directives.f90 | 19 +--
 .../goacc/kernels-loop-annotation-18.f95  | 28 +++
 .../goacc/kernels-loop-annotation-19.f95  | 29 +++
 .../goacc/private-explicit-kernels-1.f95  |  7 ++-
 .../goacc/private-predetermined-kernels-1.f95 |  7 ++-
 6 files changed, 131 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 243b5e0a9ac6..b0b68b494778 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -9272,7 +9272,6 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code 
*parent,

case EXEC_OACC_PARALLEL_LOOP:
case EXEC_OACC_PARALLEL:
-   case EXEC_OACC_KERNELS_LOOP:
case EXEC_OACC_LOOP:
  /* Do not try to add automatic OpenACC annotations inside manually
 annotated loops.  Presumably, the user avoided doing it on
@@ -9317,6 +9316,55 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code 
*parent,
}
  break;

+   case EXEC_OACC_KERNELS_LOOP:
+ /* This is a combined "acc kernels loop" directive.  We want to
+leave the outer loop alone but try to annotate any nested
+loops in the body.  The expected structure nesting here is
+  EXEC_OACC_KERNELS_LOOP
+EXEC_OACC_KERNELS_LOOP
+  EXEC_DO
+EXEC_DO
+  ...body...  */
+ if (code->block)
+   /* Might be empty?  */
+   {
+ gcc_assert (code->block->op == EXEC_OACC_KERNELS_LOOP);
+ gfc_omp_clauses *clauses = code->ext.omp_clauses;
+ int collapse = clauses->collapse;
+ gfc_expr_list *tile = clauses->tile_list;
+ gfc_code *inner = code->block->next;
+
+ gcc_assert (inner->op == EXEC_DO);
+ gcc_assert (inner->block->op == EXEC_DO);
+
+ /* We need to skip over nested loops covered by "collapse" or
+"tile" clauses.  "Tile" takes precedence
+(see gfc_trans_omp_do).  */
+ if (tile)
+   {
+ collapse = 0;
+ for (gfc_expr_list *el = tile; el; el = el->next)
+   collapse++;
+   }
+ if (clauses->orderedc)
+   collapse = clauses->orderedc;
+ if (collapse <= 0)
+   collapse = 1;
+ for (int i = 1; i < collapse; i++)
+   {
+ gcc_assert (inner->op == EXEC_DO);
+ gcc_assert (inner->block->op == EXEC_DO);
+ inner = inner->block->next;
+   }
+ if (inner)
+   /* Loop might have empty body?  */
+   annotate_do_loops_in_kernels (inner->block->next,
+ inner, goto_targets,
+ as_in_kernels_region);
+   }
+ walk_block = false;
+ break;
+
case EXEC_DO_WHILE:
case EXEC_DO_CONCURRENT:
  /* Traverse the body in a special state to allow EXIT statements
diff --git a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 
b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
index 956349204f4d..562a4e40cd7d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
@@ -139,10 +139,21 @@ end subroutine test

 ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. 
collapse.2." 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loo

[PATCH 09/40] Permit calls to builtins and intrinsics in kernels loops.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

This tweak to the OpenACC kernels loop annotation relaxes the
restrictions on function calls in the loop body.  Normally calls to
functions not explicitly marked with a parallelism attribute are not
permitted, but C/C++ builtins and Fortran intrinsics have known
semantics so we can generally permit those without restriction.  If
any turn out to be problematical, we can add on here to recognize
them, or in the processing of the "auto" annotations.

2020-08-22  Sandra Loosemore  

gcc/c-family/
* c-omp.c (annotate_loops_in_kernels_regions): Test for
calls to builtins.

gcc/fortran/
* openmp.c (check_expr_for_invalid_calls): Check for intrinsic
functions.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-20.c: New.
* gfortran.dg/goacc/kernels-loop-annotation-20.f95: New.
---
 gcc/c-family/c-omp.c  | 10 ---
 gcc/fortran/openmp.c  |  9 ---
 .../goacc/kernels-loop-annotation-20.c| 23 
 .../goacc/kernels-loop-annotation-20.f95  | 26 +++
 4 files changed, 61 insertions(+), 7 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index 30757877eafe..e7c27f45e888 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3545,8 +3545,9 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int 
*walk_subtrees,
   break;

 case CALL_EXPR:
-  /* Direct function calls to functions marked as OpenACC routines are
-allowed.  Reject indirect calls or calls to non-routines.  */
+  /* Direct function calls to builtins and functions marked as
+OpenACC routines are allowed.  Reject indirect calls or calls
+to non-routines.  */
   if (info->state >= as_in_kernels_loop)
{
  tree fn = CALL_EXPR_FN (node), fn_decl = NULL_TREE;
@@ -3560,8 +3561,9 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int 
*walk_subtrees,
}
  if (fn_decl == NULL_TREE)
do_not_annotate_loop_nest (info, as_invalid_call, node);
- else if (!lookup_attribute ("oacc function",
- DECL_ATTRIBUTES (fn_decl)))
+ else if (!fndecl_built_in_p (fn_decl, BUILT_IN_NORMAL)
+  && !lookup_attribute ("oacc function",
+DECL_ATTRIBUTES (fn_decl)))
do_not_annotate_loop_nest (info, as_invalid_call, node);
}
   break;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index b0b68b494778..d5d996e378d7 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -9156,9 +9156,12 @@ check_expr_for_invalid_calls (gfc_expr **exprp, int 
*walk_subtrees,
   switch (expr->expr_type)
 {
 case EXPR_FUNCTION:
-  if (expr->value.function.esym
- && (expr->value.function.esym->attr.oacc_routine_lop
- != OACC_ROUTINE_LOP_NONE))
+  /* Permit calls to Fortran intrinsic functions and to routines
+with an explicitly declared parallelism level.  */
+  if (expr->value.function.isym
+ || (expr->value.function.esym
+ && (expr->value.function.esym->attr.oacc_routine_lop
+ != OACC_ROUTINE_LOP_NONE)))
return 0;
   /* Else fall through.  */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
new file mode 100644
index ..5e3f02845713
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
@@ -0,0 +1,23 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that calls to built-in functions don't inhibit kernels loop
+   annotation.  */
+
+void foo (int n, int *input, int *out1, int *out2)
+{
+#pragma acc kernels
+  {
+int i;
+
+for (i = 0; i < n; i++)
+  {
+   out1[i] = __builtin_clz (input[i]);
+   out2[i] = __builtin_popcount (input[i]);
+  }
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95
new file mode 100644
index ..5169a0a1676d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95
@@ -0,0 +1,26 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with calls to intri

[PATCH 10/40] Fix patterns in Fortran tests for kernels loop annotation.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

Several of the Fortran tests for kernels loop annotation were failing
due to changes in the formatting of "acc loop" constructs in the dump
file.  Now the "auto" clause appears first, instead of after "private".

2020-08-23   Sandra Loosemore  

gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: Update
expected output.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: Likewise.
---
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95  | 2 +-
 14 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
index 41f6307dbb17..42e751dbfb83 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
@@ -30,4 +30,4 @@ subroutine f (a, b, c)
 !$acc end kernels
 end subroutine f

-! { dg-final { scan-tree-dump-times "acc loop private\\(.\\) auto" 3 
"original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 3 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
index d51482e4685d..6e2e2c41172b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
@@ -31,4 +31,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
index 3c4956d70775..03c4234ce7cd 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
@@ -36,4 +36,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
index 3ec459f0a8df..6aeb3f2fe4d0 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
@@ -35,4 +35,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
index 91f431cca432..7d1cff64a3d9 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
@@ -32,4 +32,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } }
diff --git a

[PATCH 11/40] Clean up loop variable extraction in OpenACC kernels loop annotation.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

The code for identifying annotatable loops in OpenACC kernels regions
previously looked for the loop variable as the left-hand side of the
comparison in the loop end test.  However, front end optimizations
sometimes switch the sense of the comparison, making this method
unreliable.  In particular, it's ambiguous when both operands to the
end test comparison are local variables.

This patch reorders the loop processing to identify the loop variable
from the initializer, rather than the end test. The processing of the
end test then just checks that one of the operands to the comparison
matches the variable appearing in the initializer.  Much of the patch
is code refactoring, moving the initializer analysis out of
annotate_for_loop to check_and_annotate_for_loop so it can be
performed earlier.

2020-08-30  Sandra Loosemore  

gcc/c-family/
* c-omp.c (annotate_for_loop): Move initializer processing...
(check_and_annotate_for_loop): ... to here.  Allow the loop
variable as either operand to the condition.
---
 gcc/c-family/c-omp.c | 196 +--
 1 file changed, 98 insertions(+), 98 deletions(-)

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index e7c27f45e888..e73fb5d01f7e 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3174,86 +3174,26 @@ static tree (*lang_specific_unwrap_initializer) (tree);

 /* Try to annotate the given NODE, which must be a FOR_STMT, with a
"#pragma acc loop auto" annotation.  In practice, this means
-   building an OMP_FOR node for it.  PREV_STMT is the statement
-   immediately before the loop, which may be used as the loop's
-   initialization statement.  Annotating the loop may fail, in which
-   case INFO is used to record the cause of the failure and the
-   original loop remains unchanged.  This function returns the
-   transformed loop if the transformation succeeded, the original node
-   otherwise.  */
+   building an OMP_FOR node for it.  DECL and INIT are the
+   previously-verified iteration variable and initializer.  Annotating
+   the loop may fail, in which case INFO is used to record the cause
+   of the failure and the original loop remains unchanged.  This
+   function returns the transformed loop if the transformation
+   succeeded, the original node otherwise.  */

 static tree
-annotate_for_loop (tree node, tree_stmt_iterator *prev_tsi,
+annotate_for_loop (tree node, tree decl, tree init,
   struct annotation_info *info)
 {
   gcc_checking_assert (TREE_CODE (node) == FOR_STMT);

   location_t loc = EXPR_LOCATION (node);
   tree cond = FOR_COND (node);
+  tree incr = FOR_EXPR (node);
+
+  gcc_assert (decl);
   gcc_assert (cond);
-  tree decl = TREE_OPERAND (cond, 0);
   gcc_assert (decl && TREE_CODE (decl) == VAR_DECL);
-  tree init = FOR_INIT_STMT (node);
-  tree prev_stmt = NULL_TREE;
-  bool unlink_prev = false;
-  bool fix_decl = false;
-
-
-  /* Both the C and C++ front ends normally put the initializer in the
- statement list just before the FOR_STMT instead of in FOR_INIT_STMT.
- If FOR_INIT_STMT happens to exist but isn't a MODIFY_EXPR, bail out
- because the code below won't handle it.  */
-  if (init != NULL_TREE && TREE_CODE (init) != MODIFY_EXPR)
-{
-  do_not_annotate_loop (info, as_invalid_initializer, NULL_TREE);
-  return node;
-}
-
-  /* Examine the statement before the loop to see if it is a
- valid initializer.  It must be either a MODIFY_EXPR or VAR_DECL,
- possibly wrapped in language-specific structure.  */
-  if (init == NULL_TREE && prev_tsi != NULL)
-{
-  prev_stmt = tsi_stmt (*prev_tsi);
-
-  /* Call the language-specific hook to unwrap prev_stmt.  */
-  if (prev_stmt)
-   prev_stmt = (*lang_specific_unwrap_initializer) (prev_stmt);
-
-  /* See if we have a valid MODIFY_EXPR.  */
-  if (prev_stmt
- && TREE_CODE (prev_stmt) == MODIFY_EXPR
- && TREE_OPERAND (prev_stmt, 0) == decl
- && !TREE_SIDE_EFFECTS (TREE_OPERAND (prev_stmt, 1)))
-   {
- init = prev_stmt;
- unlink_prev = true;
-   }
-  else if (prev_stmt == decl
-  && !TREE_SIDE_EFFECTS (DECL_INITIAL (decl)))
-   {
- /* If the preceding statement is the declaration of the loop
-variable with its initialization, build an assignment
-expression for the loop's initializer.  */
- init = build2 (MODIFY_EXPR, TREE_TYPE (decl), decl,
-DECL_INITIAL (decl));
- /* We need to remove the initializer from the decl if we
-end up using the init we just built instead.  */
- fix_decl = true;
-   }
-}
-
-  if (init == NULL_TREE)
-/* There is nothing we can do to find the correct init statement for
-   this loop, but c_finish_omp_for insists on having one and would fail
-   otherwise.  In that case, we would just return node.  Do th

[PATCH 12/40] Relax some restrictions on the loop bound in kernels loop annotation.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

OpenACC loop semantics require that the loop bound be computable
before entering the loop, rather than the C/C++ semantics where the
end test is evaluated on every iteration.  Formerly the kernels loop
annotater permitted only constants and variables not modified in the
loop body in the loop bound expression.  This patch relaxes those
restrictions somewhat to allow many forms of expressions involving
such constants and variables, including calls to constant functions.

2020-08-30  Sandra Loosemore  

gcc/c-family/
* c-omp.c (end_test_ok_for_annotation_r): New.
(end_test_ok_for_annotation): New.
(check_and_annotate_for_loop): Use the new helper function.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-21.c: New.
* c-c++-common/goacc/kernels-loop-annotation-22.c: New.
---
 gcc/c-family/c-omp.c  | 120 --
 .../goacc/kernels-loop-annotation-21.c|  42 ++
 .../goacc/kernels-loop-annotation-22.c|  41 ++
 3 files changed, 194 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-21.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-22.c

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index e73fb5d01f7e..dc63d304ca67 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3165,6 +3165,116 @@ is_local_var (tree decl)
  && !TREE_ADDRESSABLE (decl));
 }

+/* EXP is a loop bound expression for a comparison against local
+   variable DECL.  Check whether this is potentially valid in an OpenACC loop
+   context, namely that it can be precomputed when entering the loop
+   construct per the OpenACC specification.  Local variables referenced
+   in both DECL and EXP that may not be modified in the body of the loop
+   are added to the list in INFO to be checked later.
+
+   FIXME: Ideally we would like to make this test permissive rather than
+   restrictive, and allow the later conversion of the "auto" attribute to
+   either "seq" or "independent" to make the determination using dataflow,
+   alias analysis, etc rather than a tree traversal.  But presently it does
+   not do that and always just hoists the loop bound expression.  So the
+   current implementation only considers expressions involving unmodified
+   local variables and constants, using a tree walk.  */
+
+static tree
+end_test_ok_for_annotation_r (tree *tp, int *walk_subtrees,
+ void *data)
+{
+  tree exp = *tp;
+  struct annotation_info *info = (struct annotation_info *) data;
+
+  switch (TREE_CODE_CLASS (TREE_CODE (exp)))
+{
+case tcc_constant:
+  /* Constants are trivially known to be invariant.  */
+  return NULL_TREE;
+
+case tcc_declaration:
+  if (is_local_var (exp))
+   {
+ tree t;
+ /* Add it to the list of variables that can't be modified in the
+loop, only if not already present.  */
+ for (t = info->vars; t && TREE_VALUE (t) != exp;
+  t = TREE_CHAIN (t))
+   ;
+ if (!t)
+   info->vars = tree_cons (NULL_TREE, exp, info->vars);
+ return NULL_TREE;
+   }
+  else if (TREE_CODE (exp) == VAR_DECL && TREE_READONLY (exp))
+   return NULL_TREE;
+  else if (TREE_CODE (exp) == FUNCTION_DECL)
+   return NULL_TREE;
+  break;
+
+case tcc_unary:
+case tcc_binary:
+case tcc_comparison:
+  /* Allow arithmetic expressions and comparisons provided
+that the operands are good.  */
+  return NULL_TREE;
+
+default:
+  /* Handle some special cases.  */
+  switch (TREE_CODE (exp))
+   {
+   case COND_EXPR:
+   case TRUTH_ANDIF_EXPR:
+   case TRUTH_ORIF_EXPR:
+   case TRUTH_AND_EXPR:
+   case TRUTH_OR_EXPR:
+   case TRUTH_XOR_EXPR:
+   case TRUTH_NOT_EXPR:
+ /* ?: and boolean operators are OK.  */
+ return NULL_TREE;
+
+   case CALL_EXPR:
+ /* Allow calls to constant functions with invariant operands.  */
+ {
+   tree fndecl = get_callee_fndecl (exp);
+   if (fndecl && TREE_READONLY (fndecl))
+ return NULL_TREE;
+ }
+ break;
+
+   case ADDR_EXPR:
+ /* We can expect addresses of things to be invariant.  */
+ return NULL_TREE;
+
+   default:
+ break;
+   }
+}
+
+  /* Reject anything else.  */
+  *walk_subtrees = 0;
+  return exp;
+}
+
+static bool
+end_test_ok_for_annotation (tree decl, tree exp,
+   struct annotation_info *info)
+{
+  /* Traversal returns NULL_TREE if all is well.  */
+  if (!walk_tree (&exp, end_test_ok_for_annotation_r, info, NULL))
+{
+  /* So far, so good.  Check the decl against any variables collected
+in the exp.  */
+  tree t;
+  for (t = info->vars; t; t = TREE_CHAIN (t))
+   if (TREE

[PATCH 15/40] graphite: Extend SCoP detection dump output

2021-12-15 Thread Frederik Harwath
Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

ChangeLog:

* graphite-scop-detection.c (scop_detection::can_represent_loop):
Output reason for failure to dump file.
(scop_detection::harmful_loop_in_region): Likewise.
(scop_detection::graphite_can_represent_expr): Likewise.
(scop_detection::stmt_has_simple_data_refs_p): Likewise.
(scop_detection::stmt_simple_for_scop_p): Likewise.
(print_sese_loop_numbers): New function.
(scop_detection::add_scop): Use from here to print loops in
rejected SCoP.
---
 gcc/graphite-scop-detection.c | 188 +-
 1 file changed, 165 insertions(+), 23 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 3e729b159b09..46c470210d05 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -69,12 +69,27 @@ public:
 fprintf (output.dump_file, "%d", i);
 return output;
   }
+
   friend debug_printer &
   operator<< (debug_printer &output, const char *s)
   {
 fprintf (output.dump_file, "%s", s);
 return output;
   }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, gimple* stmt)
+  {
+print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS);
+return output;
+  }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, tree t)
+  {
+print_generic_expr (output.dump_file, t, TDF_SLIM);
+return output;
+  }
 } dp;

 #define DEBUG_PRINT(args) do \
@@ -506,6 +521,24 @@ scop_detection::merge_sese (sese_l first, sese_l second) 
const
   return combined;
 }

+/* Print the loop numbers of the loops contained
+   in SESE to FILE. */
+
+static void
+print_sese_loop_numbers (FILE *file, sese_l sese)
+{
+  loop_p loop;
+  bool printed = false;
+  FOR_EACH_LOOP (loop, 0)
+  {
+if (loop_in_sese_p (loop, sese))
+  fprintf (file, "%d, ", loop->num);
+printed = true;
+  }
+  if (printed)
+fprintf (file, "\b\b");
+}
+
 /* Build scop outer->inner if possible.  */

 void
@@ -519,8 +552,13 @@ scop_detection::build_scop_depth (loop_p loop)
   if (! next
  || harmful_loop_in_region (next))
{
- if (s)
-   add_scop (s);
+  if (next)
+DEBUG_PRINT (
+dp << "[scop-detection] Discarding SCoP on loops ";
+print_sese_loop_numbers (dump_file, next);
+dp << " because of harmful loops\n";);
+  if (s)
+add_scop (s);
  build_scop_depth (loop);
  s = invalid_sese;
}
@@ -560,14 +598,62 @@ scop_detection::can_represent_loop (loop_p loop, sese_l 
scop)
   || !single_pred_p (loop->latch)
   || exit->src != single_pred (loop->latch)
   || !empty_block_p (loop->latch))
-return false;
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop shape 
unsupported.\n");
+  return false;
+}
+
+  bool edge_irreducible
+  = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP;
+  if (edge_irreducible)
+{
+  DEBUG_PRINT (
+  dp << "[can_represent_loop-fail] Loop is not a natural loop.\n");
+  return false;
+}
+
+  bool niter_is_unconditional = number_of_iterations_exit (loop,
+  single_exit (loop),
+  &niter_desc, false);

-  return !(loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP)
-&& number_of_iterations_exit (loop, single_exit (loop), &niter_desc, false)
-&& niter_desc.control.no_overflow
-&& (niter = number_of_latch_executions (loop))
-&& !chrec_contains_undetermined (niter)
-&& graphite_can_represent_expr (scop, loop, niter);
+  if (!niter_is_unconditional)
+{
+  DEBUG_PRINT (
+  dp << "[can_represent_loop-fail] Loop niter not unconditional.\n"
+ << "Condition: " << niter_desc.assumptions << "\n");
+  return false;
+}
+
+  niter = number_of_latch_executions (loop);
+  if (!niter)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n");
+  return false;
+}
+  if (!niter_desc.control.no_overflow)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can 
overflow.\n");
+  return false;
+}
+
+  bool undetermined_coefficients = chrec_contains_undetermined (niter);
+  if (undetermined_coefficients)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+  << "Loop niter chrec contains undetermined coefficients.\n");
+  return false;
+}
+
+  bool can_represent_expr = graphite_can_represent_expr (scop, loop, niter);
+  if (!can_represent_expr)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+  << "Loop niter expression cannot be represented: "
+  << niter << "\n");
+  return false;
+}
+
+  ret

[PATCH 13/40] Fortran: Delinearize array accesses

2021-12-15 Thread Frederik Harwath
The Fortran front end presently linearizes accesses to
multi-dimensional arrays by combining the indices for the various
dimensions into a series of explicit multiplies and adds with
refactoring to allow CSE of invariant parts of the computation.
Unfortunately this representation interferes with Graphite-based loop
optimizations.  It is difficult to recover the original
multi-dimensional form of the access by the time loop optimizations
run because parts of it have already been optimized away or into a
form that is not easily recognizable, so it seems better to have the
Fortran front end produce delinearized accesses to begin with, a set
of nested ARRAY_REFs similar to the existing behavior of the C and C++
front ends.  This is a long-standing problem that has previously been
discussed e.g. in PR 14741 and PR61000.

This patch is an initial implementation for explicit array accesses
only; it doesn't handle the accesses generated during scalarization of
whole-array or array-section operations, which follow a different code
path.

Co-Authored-By: Tobias Burnus 

gcc/ChangeLog:

* expr.c (get_inner_reference): Handle NOP_EXPR.

gcc/fortran/ChangeLog:

* lang.opt: Document -param=delinearize.
* trans-array.c: (get_class_array_vptr): New function.
(get_array_lbound): New function.
(get_array_ubound): New function.
(gfc_conv_array_ref): Implement main delinearization logic.
(build_array_ref): Adjust.

gcc/testsuite/ChangeLog:

* gfortran.dg/assumed_type_2.f90: Adjust test expectations.
* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
* gfortran.dg/gomp/affinity-clause-1.f90: Likewise.
* gfortran.dg/graphite/block-2.f: Likewise.
* gfortran.dg/graphite/block-3.f90: Likewise.
* gfortran.dg/graphite/block-4.f90: Likewise.
* gfortran.dg/graphite/id-9.f: Likewise.
* gfortran.dg/inline_matmul_16.f90: Likewise.
* gfortran.dg/inline_matmul_24.f90: Likewise.
* gfortran.dg/no_arg_check_2.f90: Likewise.
* gfortran.dg/pr32921.f: Likewise.
* gfortran.dg/reassoc_4.f: Likewise.
* gfortran.dg/vect/fast-math-mgrid-resid.f: Likewise.
---
 gcc/expr.c|   1 +
 gcc/fortran/lang.opt  |   4 +
 gcc/fortran/trans-array.c | 321 +-
 gcc/testsuite/gfortran.dg/assumed_type_2.f90  |   6 +-
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |   2 +-
 .../gfortran.dg/gomp/affinity-clause-1.f90|   2 +-
 gcc/testsuite/gfortran.dg/graphite/block-2.f  |   9 +-
 .../gfortran.dg/graphite/block-3.f90  |   2 +-
 .../gfortran.dg/graphite/block-4.f90  |   2 +-
 gcc/testsuite/gfortran.dg/graphite/id-9.f |   2 +-
 .../gfortran.dg/inline_matmul_16.f90  |   2 +
 .../gfortran.dg/inline_matmul_24.f90  |   2 +-
 gcc/testsuite/gfortran.dg/no_arg_check_2.f90  |   6 +-
 gcc/testsuite/gfortran.dg/pr32921.f   |   2 +-
 gcc/testsuite/gfortran.dg/reassoc_4.f |   2 +-
 .../gfortran.dg/vect/fast-math-mgrid-resid.f  |   1 +
 16 files changed, 270 insertions(+), 96 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index eb33643bd770..188905b4fe4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7759,6 +7759,7 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize,
  break;

case VIEW_CONVERT_EXPR:
+   case NOP_EXPR:
  break;

case MEM_REF:
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index a202c04c4a25..25c5a5a32c41 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -521,6 +521,10 @@ fdefault-real-16
 Fortran Var(flag_default_real_16)
 Set the default real kind to an 16 byte wide type.

+-param=delinearize=
+Common Joined UInteger Var(flag_delinearize_aref) Init(1) IntegerRange(0,1) 
Param Optimization
+Delinearize array references.
+
 fdollar-ok
 Fortran Var(flag_dollar_ok)
 Allow dollar signs in entity names.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 5ceb261b6989..e84b4cb55f05 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3747,11 +3747,9 @@ add_to_offset (tree *cst_offset, tree *offset, tree t)
 }
 }

-
 static tree
-build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+get_class_array_vptr (tree desc, tree vptr)
 {
-  tree tmp;
   tree type;
   tree cdesc;

@@ -3775,19 +3773,74 @@ build_array_ref (tree desc, tree offset, tree decl, 
tree vptr)
  && GFC_CLASS_TYPE_P (TYPE_CANONICAL (type)))
vptr = gfc_class_vptr_get (TREE_OPERAND (cdesc, 0));
 }
+  return vptr;
+}

+static tree
+build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+{
+  tree tmp;
+  vptr = get_class_array_vptr (desc, vptr);
   tmp = gfc_conv_array_data (desc);
   tmp = build_fold_indirect_ref_loc (input_location, tmp);
   tmp = gfc_build_array_ref (tmp, offset, decl, vptr);
   return tmp;
 }

+/* Get the declared lower bound for rank 

[PATCH 16/40] graphite: Rename isl_id_for_ssa_name

2021-12-15 Thread Frederik Harwath
The SSA names for which this function gets used are always SCoP
parameters and hence "isl_id_for_parameter" is a better name.  It also
explains the prefix "P_" for those names in the ISL representation.

gcc/ChangeLog:

* graphite-sese-to-poly.c (isl_id_for_ssa_name): Rename to ...
  (isl_id_for_parameter): ... this new function name.
  (build_scop_context): Adjust function use.
---
 gcc/graphite-sese-to-poly.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 99ea0327b1a7..204d382ed4cc 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -100,14 +100,15 @@ extract_affine_mul (scop_p s, tree e, __isl_take 
isl_space *space)
   return isl_pw_aff_mul (lhs, rhs);
 }

-/* Return an isl identifier from the name of the ssa_name E.  */
+/* Return an isl identifier for the parameter P.  */

 static isl_id *
-isl_id_for_ssa_name (scop_p s, tree e)
+isl_id_for_parameter (scop_p s, tree p)
 {
-  char name1[14];
-  snprintf (name1, sizeof (name1), "P_%d", SSA_NAME_VERSION (e));
-  return isl_id_alloc (s->isl_context, name1, e);
+  gcc_checking_assert (TREE_CODE (p) == SSA_NAME);
+  char name[14];
+  snprintf (name, sizeof (name), "P_%d", SSA_NAME_VERSION (p));
+  return isl_id_alloc (s->isl_context, name, p);
 }

 /* Return an isl identifier for the data reference DR.  Data references and
@@ -898,15 +899,15 @@ build_scop_context (scop_p scop)
   isl_space *space = isl_space_set_alloc (scop->isl_context, nbp, 0);

   unsigned i;
-  tree e;
-  FOR_EACH_VEC_ELT (region->params, i, e)
+  tree p;
+  FOR_EACH_VEC_ELT (region->params, i, p)
 space = isl_space_set_dim_id (space, isl_dim_param, i,
-  isl_id_for_ssa_name (scop, e));
+  isl_id_for_parameter (scop, p));

   scop->param_context = isl_set_universe (space);

-  FOR_EACH_VEC_ELT (region->params, i, e)
-add_param_constraints (scop, i, e);
+  FOR_EACH_VEC_ELT (region->params, i, p)
+add_param_constraints (scop, i, p);
 }

 /* Return true when loop A is nested in loop B.  */
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 17/40] graphite: Fix minor mistakes in comments

2021-12-15 Thread Frederik Harwath
gcc/ChangeLog:

* graphite-sese-to-poly.c (build_poly_sr_1): Fix a typo and
a reference to a variable which does not exist.
* graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Fix typo
in comment.
---
 gcc/graphite-isl-ast-to-gimple.c | 2 +-
 gcc/graphite-sese-to-poly.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 1ad68a1d4735..0712d85b67a6 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1018,7 +1018,7 @@ gsi_insert_earliest (gimple_seq seq)
   basic_block begin_bb = get_entry_bb (codegen_region);

   /* Inserting the gimple statements in a vector because gimple_seq behave
- in strage ways when inserting the stmts from it into different basic
+ in strange ways when inserting the stmts from it into different basic
  blocks one at a time.  */
   auto_vec stmts;
   for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi);
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 204d382ed4cc..33d6a98327b8 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -649,14 +649,14 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, 
enum poly_dr_type kind,
 isl_map *acc, isl_set *subscript_sizes)
 {
   scop_p scop = PBB_SCOP (pbb);
-  /* Each scalar variables has a unique alias set number starting from
+  /* Each scalar variable has a unique alias set number starting from
  the maximum alias set assigned to a dr.  */
   int alias_set = scop->max_alias_set + SSA_NAME_VERSION (var);
   subscript_sizes = isl_set_fix_si (subscript_sizes, isl_dim_set, 0,
alias_set);

   /* Add a constrain to the ACCESSES polyhedron for the alias set of
- data reference DR.  */
+ the reference */
   isl_constraint *c
 = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space 
(acc)));
   c = isl_constraint_set_constant_si (c, -alias_set);
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 14/40] openacc: Move pass_oacc_device_lower after pass_graphite

2021-12-15 Thread Frederik Harwath
The OpenACC device lowering pass must run after the Graphite pass to
allow for the use of Graphite for automatic parallelization of kernels
regions in the future. Experimentation has shown that it is best,
performancewise, to run pass_oacc_device_lower together with the
related passes pass_oacc_loop_designation and pass_oacc_gimple_workers
early after pass_graphite in pass_tree_loop, at least if the other
tree loop passes are not adjusted. In particular, to enable
vectorization which is crucial for GCN offloading, device lowering
should happen before pass_vectorize. To bring the loops contained in
the offloading functions into the shape expected by the loop
vectorizer, we have to make sure that some passes that previously were
executed only once before pass_tree_loop are also executed on the
offloading functions.  To ensure the execution of
pass_oacc_device_lower if pass_tree_loop does not execute (no loops,
no optimizations), we introduce two further copies of the pass to the
pipeline that run if there are no loops or if no optimization is
performed.

gcc/ChangeLog:

* omp-general.c (oacc_get_fn_dim_size): Return 0 on
missing "dims".
* omp-oacc-neuter-broadcast.cc:
Make pass_omp_oacc_neuter_broadcast clonable.
* omp-offload.c (pass_oacc_loop_designation::clone): New
member function.
(pass_oacc_gimple_workers::clone): Likewise.
(pass_oacc_gimple_device_lower::clone): Likewise.
* passes.c (pass_data_no_loop_optimizations): New pass_data.
(class pass_no_loop_optimizations): New pass.
(make_pass_no_loop_optimizations): New function.
* passes.def: Move pass_oacc_{loop_designation,
gimple_workers, device_lower} into tree_loop, and add
copies to pass_tree_no_loop and to new
pass_no_loop_optimizations.  Add copies of passes pass_ccp,
pass_ipa_warn, pass_complete_unrolli, pass_backprop,
pass_phiprop, pass_fix_loops after the OpenACC passes
in pass_tree_loop.
* tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone):
New member function.
(pass_complete_unrolli::clone): Likewise.
* tree-ssa-loop.c (pass_fix_loops::clone): Likewise.
(pass_tree_loop_init::clone): Likewise.
(pass_tree_loop_done::clone): Likewise.
* tree-ssa-phiprop.c (pass_phiprop::clone): Likewise.
* tree-pass.h (make_pass_oacc_only): New declaration.
(make_pass_oacc_functions_only): New declaration.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust
expected output to pass name changes due to the pass
reordering and cloning.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/loop-processing-1.c: Adjust expected output
to pass name changes due to the pass reordering and cloning.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/routine-nohost-1.c: Likewise.
* c-c++-common/unroll-1.c: Likewise.
* c-c++-common/unroll-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-2.c: Likewise.
* gcc.dg/tree-ssa/backprop-3.c: Likewise.
* gcc.dg/tree-ssa/backprop-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-5.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.
* gcc.dg/tree-ssa/cunroll-1.c: Likewise.
* gcc.dg/tree-ssa/cunroll-3.c: Likewise.
* gcc.dg/tree-ssa/cunroll-9.c: Likewise.
* gcc.dg/tree-ssa/ldist-17.c: Likewise.
* gcc.dg/tree-ssa/loop-38.c: Likewise.
* gcc.dg/tree-ssa/pr21463.c: Likewise.
* gcc.dg/tree-ssa/pr45427.c: Likewise.
* gcc.dg/tree-ssa/pr61743-1.c: Likewise.
* gcc.dg/unroll-2.c: Likewise.
* gcc.dg/unroll-3.c: Likewise.
* gcc.dg/unroll-4.c: Likewise.
* gcc.dg/unroll-5.c: Likewise.
* gcc.dg/vect/vect-profile-1.c: Likewise.
* gcc.dg/tree-ssa/loopclosedphi.c: Likewise.
* gcc.dg/tree-ssa/pr59597.c: Likewise.
* gcc.dg/vect/bb-slp-59.c: Likewise.
* c-c++-common/goacc/device-lowering-debug-optimization.c: New test.
* c-c++-common/goacc/device-lowering-no-loops.c: New test.
* c-c

[PATCH 18/40] Move compute_alias_check_pairs to tree-data-ref.c

2021-12-15 Thread Frederik Harwath
Move this function from tree-loop-distribution.c to tree-data-ref.c
and make it non-static to enable its use from other parts of GCC.

gcc/ChangeLog:
* tree-loop-distribution.c (data_ref_segment_size): Remove function.
(latch_dominated_by_data_ref): Likewise.
(compute_alias_check_pairs): Likewise.

* tree-data-ref.c (data_ref_segment_size): New function,
copied from tree-loop-distribution.c
(compute_alias_check_pairs): Likewise.
(latch_dominated_by_data_ref): Likewise.

* tree-data-ref.h (compute_alias_check_pairs): New declaration.
---
 gcc/tree-data-ref.c  | 87 
 gcc/tree-data-ref.h  |  3 ++
 gcc/tree-loop-distribution.c | 87 
 3 files changed, 90 insertions(+), 87 deletions(-)

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 46f4ffedb483..6a3659dc490c 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -2636,6 +2636,93 @@ create_intersect_range_checks (class loop *loop, tree 
*cond_expr,
 dump_printf (MSG_NOTE, "using an address-based overlap test\n");
 }

+/* Compute and return an expression whose value is the segment length which
+   will be accessed by DR in NITERS iterations.  */
+
+static tree
+data_ref_segment_size (struct data_reference *dr, tree niters)
+{
+  niters = size_binop (MINUS_EXPR,
+  fold_convert (sizetype, niters),
+  size_one_node);
+  return size_binop (MULT_EXPR,
+fold_convert (sizetype, DR_STEP (dr)),
+fold_convert (sizetype, niters));
+}
+
+/* Return true if LOOP's latch is dominated by statement for data reference
+   DR.  */
+
+static inline bool
+latch_dominated_by_data_ref (class loop *loop, data_reference *dr)
+{
+  return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src,
+gimple_bb (DR_STMT (dr)));
+}
+
+/* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's
+   data dependence relations ALIAS_DDRS.  */
+
+void
+compute_alias_check_pairs (class loop *loop, vec *alias_ddrs,
+  vec *comp_alias_pairs)
+{
+  unsigned int i;
+  unsigned HOST_WIDE_INT factor = 1;
+  tree niters_plus_one, niters = number_of_latch_executions (loop);
+
+  gcc_assert (niters != NULL_TREE && niters != chrec_dont_know);
+  niters = fold_convert (sizetype, niters);
+  niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "Creating alias check pairs:\n");
+
+  /* Iterate all data dependence relations and compute alias check pairs.  */
+  for (i = 0; i < alias_ddrs->length (); i++)
+{
+  ddr_p ddr = (*alias_ddrs)[i];
+  struct data_reference *dr_a = DDR_A (ddr);
+  struct data_reference *dr_b = DDR_B (ddr);
+  tree seg_length_a, seg_length_b;
+
+  if (latch_dominated_by_data_ref (loop, dr_a))
+   seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
+  else
+   seg_length_a = data_ref_segment_size (dr_a, niters);
+
+  if (latch_dominated_by_data_ref (loop, dr_b))
+   seg_length_b = data_ref_segment_size (dr_b, niters_plus_one);
+  else
+   seg_length_b = data_ref_segment_size (dr_b, niters);
+
+  unsigned HOST_WIDE_INT access_size_a
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a;
+  unsigned HOST_WIDE_INT access_size_b
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b;
+  unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
+  unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
+
+  dr_with_seg_len_pair_t dr_with_seg_len_pair
+   (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
+dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
+/* ??? Would WELL_ORDERED be safe?  */
+dr_with_seg_len_pair_t::REORDERED);
+
+  comp_alias_pairs->safe_push (dr_with_seg_len_pair);
+}
+
+  if (tree_fits_uhwi_p (niters))
+factor = tree_to_uhwi (niters);
+
+  /* Prune alias check pairs.  */
+  prune_runtime_alias_test_list (comp_alias_pairs, factor);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file,
+"Improved number of alias checks from %d to %d\n",
+alias_ddrs->length (), comp_alias_pairs->length ());
+}
+
 /* Create a conditional expression that represents the run-time checks for
overlapping of address ranges represented by a list of data references
pairs passed in ALIAS_PAIRS.  Data references are in LOOP.  The returned
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 74f579c9f3f2..4929b059ddea 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -582,6 +582,9 @@ extern opt_result runtime_alias_check_p (ddr_p, class loop 
*, bool);
 extern int data_ref_compare_tree (tree, tree);
 extern void prune_runtime_alias_test_list (vec 

[PATCH 19/40] graphite: Add runtime alias checking

2021-12-15 Thread Frederik Harwath
Graphite rejects a SCoP if it contains a pair of data references for
which it cannot determine statically if they may alias. This happens
very often, for instance in C code which does not use explicit
"restrict".  This commit adds the possibility to analyze a SCoP
nevertheless and perform an alias check at runtime.  Then, if aliasing
is detected, the execution will fall back to the unoptimized SCoP.

TODO This needs more testing on non-OpenACC code.

gcc/ChangeLog:

* common.opt: Add fgraphite-runtime-alias-checks.
* graphite-isl-ast-to-gimple.c
(generate_alias_cond): New function.
(graphite_regenerate_ast_isl): Use from here.
* graphite-poly.c (new_scop): Create unhandled_alias_ddrs vec ...
(free_scop): and release here.
* graphite-scop-detection.c (dr_defs_outside_region): New function.
(dr_well_analyzed_for_runtime_alias_check_p): New function.
(graphite_runtime_alias_check_p): New function.
(build_alias_set): Record unhandled alias ddrs for later alias check
creation if flag_graphite_runtime_alias_checks is true instead
of failing.
* graphite.h (struct scop): Add field unhandled_alias_ddrs.
* sese.h (has_operands_from_region_p): New function.

gcc/testsuite/ChangeLog:

* gcc.dg/graphite/alias-1.c: New test.
---
 gcc/common.opt  |   4 +
 gcc/graphite-isl-ast-to-gimple.c|  60 ++
 gcc/graphite-poly.c |   2 +
 gcc/graphite-scop-detection.c   | 241 +---
 gcc/graphite.h  |   4 +
 gcc/sese.h  |  18 ++
 gcc/testsuite/gcc.dg/graphite/alias-1.c |  22 +++
 7 files changed, 328 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 1a5b9bfcca91..b6c46ab63e34 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1673,6 +1673,10 @@ fgraphite-identity
 Common Var(flag_graphite_identity) Optimization
 Enable Graphite Identity transformation.

+fgraphite-runtime-alias-checks
+Common Var(flag_graphite_runtime_alias_checks) Optimization Init(1)
+Allow Graphite to add runtime alias checks to loop-nests if aliasing cannot be 
resolved statically.
+
 fhoist-adjacent-loads
 Common Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 0712d85b67a6..073b471775de 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1456,6 +1456,34 @@ generate_entry_out_of_ssa_copies (edge false_entry,
 }
 }

+/* Create a condition that evaluates to TRUE if all ALIAS_DDRS are free of
+   aliasing. */
+
+static tree
+generate_alias_cond (vec &alias_ddrs, loop_p context_loop)
+{
+  gcc_checking_assert (flag_graphite_runtime_alias_checks
+   && alias_ddrs.length () > 0);
+  gcc_checking_assert (context_loop);
+
+  auto_vec check_pairs;
+  compute_alias_check_pairs (context_loop, &alias_ddrs, &check_pairs);
+  gcc_checking_assert (check_pairs.length () > 0);
+
+  tree alias_cond = NULL_TREE;
+  create_runtime_alias_checks (context_loop, &check_pairs, &alias_cond);
+  gcc_checking_assert (alias_cond);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Generated runtime alias check: ");
+  print_generic_expr (dump_file, alias_cond, dump_flags);
+  fprintf (dump_file, "\n");
+}
+
+  return alias_cond;
+}
+
 /* GIMPLE Loop Generator: generates loops in GIMPLE form for the given SCOP.
Return true if code generation succeeded.  */

@@ -1496,12 +1524,44 @@ graphite_regenerate_ast_isl (scop_p scop)
   region->if_region = if_region;

   loop_p context_loop = region->region.entry->src->loop_father;
+  gcc_checking_assert (context_loop);
   edge e = single_succ_edge (if_region->true_region->region.entry->dest);
   basic_block bb = split_edge (e);

   /* Update the true_region exit edge.  */
   region->if_region->true_region->region.exit = single_succ_edge (bb);

+  if (flag_graphite_runtime_alias_checks
+  && scop->unhandled_alias_ddrs.length () > 0)
+{
+  /* SCoP detection has failed to handle the aliasing between some data
+references of the SCoP statically. Generate an alias check that selects
+the newly generated version of the SCoP in the true-branch of the
+conditional if aliasing can be ruled out at runtime and the original
+version of the SCoP, otherwise. */
+
+  loop_p loop
+  = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
+  scop->scop_info->region.exit->src->loop_father);
+  tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
+  tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+  set_ifsese_condition (r

[PATCH 21/40] openacc: Add "can_be_parallel" flag info to "graph" dumps

2021-12-15 Thread Frederik Harwath
gcc/ChangeLog:

* graph.c (oacc_get_fn_attrib): New declaration.
(find_loop_location): New declaration.
(draw_cfg_nodes_for_loop): Print value of the
can_be_parallel flag at the top of loops in OpenACC
functions.
---
 gcc/graph.c | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/gcc/graph.c b/gcc/graph.c
index 9acd1d5b95e4..a34356e8a7ec 100644
--- a/gcc/graph.c
+++ b/gcc/graph.c
@@ -192,6 +192,10 @@ draw_cfg_nodes_no_loops (pretty_printer *pp, struct 
function *fun)
 }
 }

+
+extern tree oacc_get_fn_attrib (tree);
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Draw all the basic blocks in LOOP.  Print the blocks in breath-first
order to get a good ranking of the nodes.  This function is recursive:
It first prints inner loops, then the body of LOOP itself.  */
@@ -206,17 +210,26 @@ draw_cfg_nodes_for_loop (pretty_printer *pp, int 
funcdef_no,

   if (loop->header != NULL
   && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun))
-pp_printf (pp,
-  "\tsubgraph cluster_%d_%d {\n"
-  "\tstyle=\"filled\";\n"
-  "\tcolor=\"darkgreen\";\n"
-  "\tfillcolor=\"%s\";\n"
-  "\tlabel=\"loop %d\";\n"
-  "\tlabeljust=l;\n"
-  "\tpenwidth=2;\n",
-  funcdef_no, loop->num,
-  fillcolors[(loop_depth (loop) - 1) % 3],
-  loop->num);
+{
+  pp_printf (pp,
+ "\tsubgraph cluster_%d_%d {\n"
+ "\tstyle=\"filled\";\n"
+ "\tcolor=\"darkgreen\";\n"
+ "\tfillcolor=\"%s\";\n"
+ "\tlabel=\"loop %d %s\";\n"
+ "\tlabeljust=l;\n"
+ "\tpenwidth=2;\n",
+ funcdef_no, loop->num,
+ fillcolors[(loop_depth (loop) - 1) % 3], loop->num,
+ /* This is only meaningful for loops that have been processed
+by Graphite.
+
+TODO Use can_be_parallel_valid_p? */
+ !oacc_get_fn_attrib (cfun->decl)
+ ? ""
+ : loop->can_be_parallel ? "(can_be_parallel = true)"
+ : "(can_be_parallel = false)");
+}

   for (class loop *inner = loop->inner; inner; inner = inner->next)
 draw_cfg_nodes_for_loop (pp, funcdef_no, inner);
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 22/40] openacc: Remove unused partitioning in "kernels" regions

2021-12-15 Thread Frederik Harwath
With the old "kernels" handling, unparallelized regions would
get executed with 1x1x1 partitioning even if the user provided
explicit num_gangs, num_workers clauses etc.

This commit restores this behavior by removing unused partitioning
after assigning the parallelism dimensions to loops.

gcc/ChangeLog:

* omp-offload.c (oacc_remove_unused_partitioning): New function
for removing partitioning that is not used by any loop.
(oacc_validate_dims): Call oacc_remove_unused_partitioning and
enable warnings about unused partitioning.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust
expectations.
---
 gcc/omp-offload.c | 51 +--
 .../acc_prof-kernels-1.c  | 18 ---
 2 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 2743e90f79a3..392ca56b1f4f 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1097,6 +1097,39 @@ oacc_parse_default_dims (const char *dims)
   targetm.goacc.validate_dims (NULL_TREE, oacc_min_dims, -2, 0);
 }

+/* Remove parallelism dimensions below LEVEL which are not set in USED
+   from DIMS and emit a warning pointing to the location of FN. */
+
+static void
+oacc_remove_unused_partitioning (tree fn, int *dims, int level, unsigned used)
+{
+
+  bool host_compiler = true;
+#ifdef ACCEL_COMPILER
+  host_compiler = false;
+#endif
+
+  static char const *const axes[] =
+  /* Must be kept in sync with GOMP_DIM enumeration.  */
+  { "gang", "worker", "vector" };
+
+  char removed_partitions[20] = "\0";
+  for (int ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
+if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] >= 0)
+  {
+if (host_compiler)
+  {
+strcat (removed_partitions, axes[ix]);
+strcat (removed_partitions, " ");
+  }
+dims[ix] = -1;
+  }
+  if (removed_partitions[0] != '\0')
+warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
+"removed %spartitioning from % region",
+removed_partitions);
+}
+
 /* Validate and update the dimensions for offloaded FN.  ATTRS is the
raw attribute.  DIMS is an array of dimensions, which is filled in.
LEVEL is the partitioning level of a routine, or -1 for an offload
@@ -1117,6 +1150,7 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
   for (ix = 0; ix != GOMP_DIM_MAX; ix++)
 {
   purpose[ix] = TREE_PURPOSE (pos);
+
   tree val = TREE_VALUE (pos);
   dims[ix] = val ? TREE_INT_CST_LOW (val) : -1;
   pos = TREE_CHAIN (pos);
@@ -1126,14 +1160,15 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
 #ifdef ACCEL_COMPILER
   check = false;
 #endif
+
+  static char const *const axes[] =
+  /* Must be kept in sync with GOMP_DIM enumeration.  */
+  { "gang", "worker", "vector" };
+
   if (check
   && warn_openacc_parallelism
-  && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn))
-  && !lookup_attribute ("oacc parallel_kernels_graphite", DECL_ATTRIBUTES 
(fn)))
+  && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)))
 {
-  static char const *const axes[] =
-  /* Must be kept in sync with GOMP_DIM enumeration.  */
-   { "gang", "worker", "vector" };
   for (ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
if (dims[ix] < 0)
  ; /* Defaulting axis.  */
@@ -1144,14 +1179,20 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
  "region contains %s partitioned code but"
  " is not %s partitioned", axes[ix], axes[ix]);
else if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] != 1)
+ {
  /* The dimension is explicitly partitioned to non-unity, but
 no use is made within the region.  */
  warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
  "region is %s partitioned but"
  " does not contain %s partitioned code",
  axes[ix], axes[ix]);
+  }
 }

+  if (lookup_attribute ("oacc parallel_kernels_graphite",
+ DECL_ATTRIBUTES (fn)))
+oacc_remove_unused_partitioning  (fn, dims, level, used);
+
   bool changed = targetm.goacc.validate_dims (fn, dims, level, used);

   /* Default anything left to 1 or a partitioned default.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index ad33f72e2fb6..65c83dce01c9 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -7,6 +7,8 @@

 #include 

+/* { dg-skip-if "'kernels' not analyzed by Graphite at -O0" { *-*-* } { "-O0" 
} { "" } } */

[PATCH 23/40] Add function for printing a single OMP_CLAUSE

2021-12-15 Thread Frederik Harwath
Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump
the whole OMP clause chain") changed the dumping behavior for
OMP_CLAUSEs.  The old behavior is required for a follow-up
commit ("openacc: Add data optimization pass") that optimizes single
OMP_CLAUSEs.

gcc/ChangeLog:

* tree-pretty-print.c (print_omp_clause_to_str): Add new function.
* tree-pretty-print.h (print_omp_clause_to_str): Add declaration.
---
 gcc/tree-pretty-print.c | 11 +++
 gcc/tree-pretty-print.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 275dc7d8af73..e85370cfe722 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1360,6 +1360,17 @@ dump_omp_clause (pretty_printer *pp, tree clause, int 
spc, dump_flags_t flags)
 }
 }

+/* Print the single clause at the top of the clause chain C to a string and
+   return it. Note that print_generic_expr_to_str prints the whole clause chain
+   instead. The caller must free the returned memory. */
+
+char *
+print_omp_clause_to_str (tree c)
+{
+  pretty_printer pp;
+  dump_omp_clause (&pp, c, 0, TDF_VOPS|TDF_MEMSYMS);
+  return xstrdup (pp_formatted_text (&pp));
+}

 /* Dump chain of OMP clauses.

diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h
index dacd256302b2..f9ff0ee1ce0b 100644
--- a/gcc/tree-pretty-print.h
+++ b/gcc/tree-pretty-print.h
@@ -41,6 +41,7 @@ extern void print_generic_expr (FILE *, tree, dump_flags_t = 
TDF_NONE);
 extern char *print_generic_expr_to_str (tree);
 extern void dump_omp_clauses (pretty_printer *, tree, int, dump_flags_t,
  bool = true);
+extern char *print_omp_clause_to_str (tree);
 extern void dump_omp_atomic_memory_order (pretty_printer *,
  enum omp_memory_order);
 extern void dump_omp_loop_non_rect_expr (pretty_printer *, tree, int,
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 24/40] openacc: Add data optimization pass

2021-12-15 Thread Frederik Harwath
From: Andrew Stubbs 

Address PR90591 "Avoid unnecessary data transfer out of OMP
construct", for simple (but common) cases.

This commit adds a pass that optimizes data mapping clauses.
Currently, it can optimize copy/map(tofrom) clauses involving scalars
to copyin/map(to) and further to "private".  The pass is restricted
"kernels" regions but could be extended to other types of regions.

gcc/ChangeLog:

* Makefile.in: Add pass.
* doc/gimple.texi: TODO.
* gimple-walk.c (walk_gimple_seq_mod): Adjust for backward walking.
* gimple-walk.h (struct walk_stmt_info): Add field.
* passes.def: Add new pass.
* tree-pass.h (make_pass_omp_data_optimize): New declaration.
* omp-data-optimize.cc: New file.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Expect optimization messages.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/uninit-copy-clause.c: Likewise.
* gfortran.dg/goacc/uninit-copy-clause.f95: Likewise.
* c-c++-common/goacc/omp_data_optimize-1.c: New test.
* g++.dg/goacc/omp_data_optimize-1.C: New test.
* gfortran.dg/goacc/omp_data_optimize-1.f90: New test.

Co-Authored-By: Thomas Schwinge 
---
 gcc/Makefile.in   |   1 +
 gcc/doc/gimple.texi   |   2 +
 gcc/gimple-walk.c |  15 +-
 gcc/gimple-walk.h |   6 +
 gcc/omp-data-optimize.cc  | 951 ++
 gcc/passes.def|   1 +
 .../c-c++-common/goacc/omp_data_optimize-1.c  | 677 +
 .../c-c++-common/goacc/uninit-copy-clause.c   |   6 +
 .../g++.dg/goacc/omp_data_optimize-1.C| 169 
 .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 588 +++
 .../gfortran.dg/goacc/uninit-copy-clause.f95  |   2 +
 gcc/tree-pass.h   |   1 +
 .../kernels-decompose-1.c |   2 +
 .../libgomp.oacc-fortran/pr94358-1.f90|   4 +
 14 files changed, 2422 insertions(+), 3 deletions(-)
 create mode 100644 gcc/omp-data-optimize.cc
 create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index debd8047cc85..e876e6ec993c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1515,6 +1515,7 @@ OBJS = \
omp-oacc-kernels-decompose.o \
omp-oacc-neuter-broadcast.o \
omp-simd-clone.o \
+   omp-data-optimize.o \
opt-problem.o \
optabs.o \
optabs-libfuncs.o \
diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 5d89dbcc68d5..c8f0b8b2a826 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -2770,4 +2770,6 @@ calling @code{walk_gimple_stmt} on each one.  @code{WI} 
is as in
 @code{walk_gimple_stmt}.  If @code{walk_gimple_stmt} returns non-@code{NULL}, 
the walk
 is stopped and the value returned.  Otherwise, all the statements
 are walked and @code{NULL_TREE} returned.
+
+TODO update for forward vs. backward.
 @end deftypefn
diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
index e15fd4697ba1..b6add4394ab2 100644
--- a/gcc/gimple-walk.c
+++ b/gcc/gimple-walk.c
@@ -32,6 +32,8 @@ along with GCC; see the file COPYING3.  If not see
 /* Walk all the statements in the sequence *PSEQ calling walk_gimple_stmt
on each one.  WI is as in walk_gimple_stmt.

+   TODO update for forward vs. backward.
+
If walk_gimple_stmt returns non-NULL, the walk is stopped, and the
value is stored in WI->CALLBACK_RESULT.  Also, the statement that
produced the value is returned if this statement has not been
@@ -44,9 +46,10 @@ gimple *
 walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt,
 walk_tree_fn callback_op, struct walk_stmt_info *wi)
 {
-  gimple_stmt_iterator gsi;
+  bool forward = !(wi && wi->backward);

-  for (gsi = gsi_start (*pseq); !gsi_end_p (gsi); )
+  gimple_stmt_iterator gsi = forward ? gsi_start (*pseq) : gsi_last (*pseq);
+  for (; !gsi_end_p (gsi); )
 {
   tree ret = walk_gimple_stmt (&gsi, callback_stmt, callback_op, wi);
   if (ret)
@@ -60,7 +63,13 @@ walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn 
callback_stmt,
}

   if (!wi->removed_stmt)
-   gsi_next (&gsi);
+   {
+ if (forward)
+   gsi_next (&gsi);
+ else //TODO Correct?  

+   gsi_prev (&gsi);
+ //TODO This could do with some unit testing (see other 
'gcc/*-tests.c' files for inspiration), to make sure all the corner cases 
(removing first/last, for example) work correctly.
+   }
 }

   if (wi)
diff 

[PATCH 26/40] openacc: Warn about "independent" "kernels" loops with data-dependences

2021-12-15 Thread Frederik Harwath
This commit concerns loops in OpenACC "kernels" region that have been marked
up with an explicit "independent" clause by the user, but for which Graphite
found data dependences.  A discussion on the private internal OpenACC mailing
list suggested that warning the user about the dependences woud be a more
acceptable solution than reverting the user's decision. This behavior is
implemented by the present commit.

gcc/ChangeLog:

* common.opt: Add flag Wopenacc-false-independent.
* omp-offload.c (oacc_loop_warn_if_false_independent): New function.
(oacc_loop_fixed_partitions): Call from here.
---
 gcc/common.opt|  5 +
 gcc/omp-offload.c | 49 +++
 2 files changed, 54 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index b6c46ab63e34..ec76a88f14e3 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -850,6 +850,11 @@ Wtsan
 Common Var(warn_tsan) Init(1) Warning
 Warn about unsupported features in ThreadSanitizer.

+Wopenacc-false-independent
+Common Var(warn_openacc_false_independent) Init(1) Warning
+Warn in case a loop in an OpenACC \"kernels\" region has an \"independent\"
+clause but analysis shows that it has loop-carried dependences.
+
 Xassembler
 Driver Separate

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 3458a1acbceb..36dde11f5955 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1900,6 +1900,51 @@ oacc_loop_transform_auto_into_independent (oacc_loop 
*loop)
   return true;
 }

+/* Emit a warning if LOOP has an "independent" clause but Graphite's
+   analysis shows that it has data dependences. Note that we respect
+   the user's explicit decision to parallelize the loop but we
+   nevertheless warn that this decision could be wrong. */
+
+static void
+oacc_loop_warn_if_false_independent (oacc_loop *loop)
+{
+  if (!optimize)
+return;
+
+  if (loop->routine)
+return;
+
+  /* TODO Warn about "auto" & "independent" in "parallel" regions? */
+  if (!oacc_parallel_kernels_graphite_fun_p ())
+return;
+
+  if (!(loop->flags & OLF_INDEPENDENT))
+return;
+
+  bool analyzed = false;
+  bool can_be_parallel = oacc_loop_can_be_parallel_p (loop, analyzed);
+  loop_p cfg_loop = oacc_loop_get_cfg_loop (loop);
+
+  if (cfg_loop && cfg_loop->inner && !analyzed)
+{
+  if (dump_enabled_p ())
+   {
+ const dump_user_location_t loc
+   = dump_user_location_t::from_location_t (loop->loc);
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+  "'independent' loop in 'kernels' region has not been 
"
+  "analyzed (cf. 'graphite' "
+  "dumps for more information).\n");
+   }
+  return;
+}
+
+  if (!can_be_parallel)
+warning_at (loop->loc, 0,
+"loop has \"independent\" clause but data dependences were "
+"found.");
+}
+
 /* Walk the OpenACC loop hierarchy checking and assigning the
programmer-specified partitionings.  OUTER_MASK is the partitioning
this loop is contained within.  Return mask of partitioning
@@ -1951,6 +1996,10 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned 
outer_mask)
}
}

+  /* TODO Is this flag needed? Perhaps use -Wopenacc-parallelism? */
+  if (warn_openacc_false_independent)
+oacc_loop_warn_if_false_independent (loop);
+
   if (maybe_auto && (loop->flags & OLF_INDEPENDENT))
{
  loop->flags |= OLF_AUTO;
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 25/40] openacc: Add runtime alias checking for OpenACC kernels

2021-12-15 Thread Frederik Harwath
From: Andrew Stubbs 

This commit adds the code generation for the runtime alias checks for
OpenACC loops that have been analyzed by Graphite.  The runtime alias
check condition gets generated in Graphite. It is evaluated by the
code generated for the IFN_GOACC_LOOP internal function calls.  If
aliasing is detected at runtime, the execution dimensions get adjusted
to execute the affected loops sequentially.

gcc/ChangeLog:

* graphite-isl-ast-to-gimple.c: Include internal-fn.h.
(graphite_oacc_analyze_scop): Implement runtime alias checks.
* omp-expand.c (expand_oacc_for): Add an additional "noalias" parameter
to GOACC_LOOP internal calls, and initialise it to integer_one_node.
* omp-offload.c (oacc_xform_loop): Integrate the runtime alias check
into the GOACC_LOOP expansion.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test.
---
 gcc/graphite-isl-ast-to-gimple.c  | 122 
 gcc/omp-expand.c  |  37 +--
 gcc/omp-offload.c | 271 ++
 .../runtime-alias-check-1.c   |  79 +
 .../runtime-alias-check-2.c   |  90 ++
 5 files changed, 457 insertions(+), 142 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index e820e2c32202..010adaabb000 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "graphite.h"
 #include "graphite-oacc.h"
 #include "stdlib.h"
+#include "internal-fn.h"

 struct ast_build_info
 {
@@ -1697,6 +1698,127 @@ graphite_oacc_analyze_scop (scop_p scop)
   print_isl_schedule (dump_file, scop->original_schedule);
 }

+  if (flag_graphite_runtime_alias_checks
+  && scop->unhandled_alias_ddrs.length () > 0)
+{
+  sese_info_p region = scop->scop_info;
+
+  /* Usually there will be a chunking loop with the actual work loop
+inside it.  In some corner cases there may only be one loop.  */
+  loop_p top_loop = region->region.entry->dest->loop_father;
+  loop_p active_loop = top_loop->inner ? top_loop->inner : top_loop;
+  tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, 
active_loop);
+
+  /* Walk back to GOACC_LOOP block.  */
+  basic_block goacc_loop_block = region->region.entry->src;
+
+  /* Find the GOACC_LOOP calls. If there aren't any then this is not an
+OpenACC kernels loop and will need different handling.  */
+  gimple_stmt_iterator gsitop = gsi_start_bb (goacc_loop_block);
+  while (!gsi_end_p (gsitop)
+&& (!is_gimple_call (gsi_stmt (gsitop))
+|| !gimple_call_internal_p (gsi_stmt (gsitop))
+|| (gimple_call_internal_fn (gsi_stmt (gsitop))
+!= IFN_GOACC_LOOP)))
+   gsi_next (&gsitop);
+
+  if (!gsi_end_p (gsitop))
+   {
+ /* Move the GOACC_LOOP CHUNK and STEP calls to after any hoisted
+statements.  There ought not be any problematic dependencies 
because
+the chunk size and step are only computed for very specific 
purposes.
+They may not be at the very top of the block, but they should be
+found together (the asserts test this assuption). */
+ gimple_stmt_iterator gsibottom = gsi_last_bb (goacc_loop_block);
+ gsi_move_after (&gsitop, &gsibottom);
+ gimple_stmt_iterator gsiinsert = gsibottom;
+ gcc_checking_assert (is_gimple_call (gsi_stmt (gsitop))
+  && gimple_call_internal_p (gsi_stmt (gsitop))
+  && (gimple_call_internal_fn (gsi_stmt (gsitop))
+  == IFN_GOACC_LOOP));
+ gsi_move_after (&gsitop, &gsibottom);
+
+ /* Insert "noalias_p = COND" before the GOACC_LOOP statements.
+Note that these likely depend on some of the hoisted statements.  
*/
+ tree cond_val = force_gimple_operand_gsi (&gsiinsert, cond, true, 
NULL,
+   true, GSI_NEW_STMT);
+
+ /* Insert the cond_val into each GOACC_LOOP call in the region.  */
+ for (int n = -1; n < (int)region->bbs.length (); n++)
+   {
+ /* Cover the region plus goacc_loop_block.  */
+ basic_block bb = n < 0 ? goacc_loop_block : region->bbs[n];
+
+ for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+  !gsi_end_p (gsi);
+  gsi_next (&gsi))
+   {
+ gimple *stmt = gsi_stmt (gsi);
+ if (!i

[PATCH 27/40] openacc: Handle internal function calls in pass_lim

2021-12-15 Thread Frederik Harwath
The loop invariant motion pass correctly refuses to move statements
out of a loop if any other statement in the loop is unanalyzable.  The
pass does not know how to handle the OpenACC internal function calls
which was not necessary until recently when the OpenACC device
lowering pass was moved to a later position in the pass pipeline.

This commit changes pass_lim to ignore the OpenACC internal function
calls which do not contain any memory references. The hoisting enabled
by this change can be useful for the data-dependence analysis in
Graphite; for instance, in the outlined functions for OpenACC regions,
all invariant accesses to the ".omp_data_i" struct should be hoisted
out of the OpenACC loop.  This is particularly important for variables
that were scalars in the original loop and which have been turned into
accesses to the struct by the outlining process.  Not hoisting those
can prevent scalar evolution analysis which is crucial for Graphite.
Since any hoisting that introduces intermediate names - and hence,
"fake" dependences - inside the analyzed nest can be harmful to
data-dependence analysis, a flag to restrict the hoisting in OpenACC
functions is added to the pass. The pass instance that executes before
Graphite now runs with this flag set to true and the pass instance
after Graphite runs unrestricted.

A more precise way of selecting the statements for which hoisting
should be enabled is left for a future improvement.

gcc/ChangeLog:
* passes.def: Set restrict_oacc_hoisting to true for the early
pass_lim instance.
* tree-ssa-loop-im.c (movement_possibility): Add
restrict_oacc_hoisting flag to function; restrict movement if set.
(compute_invariantness): Add restrict_oacc_hoisting flag and pass it on.
(gather_mem_refs_stmt): Skip IFN_GOACC_LOOP and IFN_UNIQUE
calls.
(loop_invariant_motion_in_fun): Add restrict_oacc_hoisting flag and
pass it on.
(pass_lim::execute): Pass on new flags.
* tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Adjust
declaration.
* gimple-loop-interchange.cc (pass_linterchange::execute): Adjust call 
to
loop_invariant_motion_in_fun.
---
 gcc/gimple-loop-interchange.cc |  2 +-
 gcc/passes.def |  2 +-
 gcc/tree-ssa-loop-im.c | 57 --
 gcc/tree-ssa-loop-manip.h  |  2 +-
 4 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index ccd5083145f8..7c9b7b2345fa 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -2107,7 +2107,7 @@ pass_linterchange::execute (function *fun)
   if (changed_p)
 {
   unsigned todo = TODO_update_ssa_only_virtuals;
-  todo |= loop_invariant_motion_in_fun (cfun, false);
+  todo |= loop_invariant_motion_in_fun (cfun, false, false);
   scev_reset ();
   return todo;
 }
diff --git a/gcc/passes.def b/gcc/passes.def
index 681392f8f79f..1da9382bac53 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -250,7 +250,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_cse_sincos);
   NEXT_PASS (pass_optimize_bswap);
   NEXT_PASS (pass_laddress);
-  NEXT_PASS (pass_lim);
+  NEXT_PASS (pass_lim, true /* restrict_oacc_hoisting */);
   NEXT_PASS (pass_walloca, false);
   NEXT_PASS (pass_pre);
   NEXT_PASS (pass_sink_code);
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 4b187c2cdafe..466dc494fb52 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -47,6 +47,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-dfa.h"
 #include "dbgcnt.h"
+#include "graphite-oacc.h"
+#include "internal-fn.h"

 /* TODO:  Support for predicated code motion.  I.e.

@@ -327,11 +329,23 @@ enum move_pos
Otherwise return MOVE_IMPOSSIBLE.  */

 enum move_pos
-movement_possibility (gimple *stmt)
+movement_possibility (gimple *stmt, bool restrict_oacc_hoisting)
 {
   tree lhs;
   enum move_pos ret = MOVE_POSSIBLE;

+  if (restrict_oacc_hoisting && oacc_get_fn_attrib (cfun->decl)
+  && gimple_code (stmt) == GIMPLE_ASSIGN)
+{
+  tree rhs = gimple_assign_rhs1 (stmt);
+
+  if (TREE_CODE (rhs) == VIEW_CONVERT_EXPR)
+   rhs = TREE_OPERAND (rhs, 0);
+
+  if (TREE_CODE (rhs) == ARRAY_REF)
+ return MOVE_IMPOSSIBLE;
+}
+
   if (flag_unswitch_loops
   && gimple_code (stmt) == GIMPLE_COND)
 {
@@ -981,7 +995,7 @@ rewrite_bittest (gimple_stmt_iterator *bsi)
statements.  */

 static void
-compute_invariantness (basic_block bb)
+compute_invariantness (basic_block bb, bool restrict_oacc_hoisting)
 {
   enum move_pos pos;
   gimple_stmt_iterator bsi;
@@ -1009,7 +1023,7 @@ compute_invariantness (basic_block bb)
   {
stmt = gsi_stmt (bsi);

-   pos = movement_possibility (stmt);
+   pos = movement_possibility (s

[PATCH 28/40] openacc: Disable pass_pre on outlined functions analyzed by Graphite

2021-12-15 Thread Frederik Harwath
The additional dependences introduced by partial redundancy
elimination proper and by the code hoisting step of the pass very
often cause Graphite to fail on OpenACC functions. On the other hand,
the pass can also enable the analysis of OpenACC loops (cf. e.g. the
loop-auto-transfer-4.f90 testcase), for instance, because full
redundancy elimination removes definitions that would otherwise
prevent the creation of runtime alias checks outside of the SCoP.

This commit disables the actual partial redundancy elimination step as
well as the code hoisting step of pass_pre on OpenACC functions that
might be handled by Graphite.

gcc/ChangeLog:

* tree-ssa-pre.c (insert): Skip any insertions in OpenACC
functions that might be processed by Graphite.
---
 gcc/tree-ssa-pre.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index dc55d868cc19..d61210fc2ee9 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "alias.h"
 #include "gimple-range.h"
+#include "graphite-oacc.h"

 /* Even though this file is called tree-ssa-pre.c, we actually
implement a bit more than just PRE here.  All of them piggy-back
@@ -3742,6 +3743,22 @@ do_hoist_insertion (basic_block block)
 static void
 insert (void)
 {
+
+/* The additional dependences introduced by the code insertions
+ can cause Graphite's dependence analysis to fail .  Without
+ special handling of those dependences in Graphite, it seems
+ better to skip this step if OpenACC loops that need to be handled
+ by Graphite are found.  Note that the full redundancy elimination
+ step of this pass is useful for the purpose of dependence
+ analysis, for instance, because it can remove definitions from
+ SCoPs that would otherwise prevent the creation of runtime alias
+ checks since those may only use definitions that are available
+ before the SCoP. */
+
+  if (oacc_function_p (cfun)
+  && ::graphite_analyze_oacc_function_p (cfun))
+return;
+
   basic_block bb;

   FOR_ALL_BB_FN (bb, cfun)
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 29/40] graphite: Tune parameters for OpenACC use

2021-12-15 Thread Frederik Harwath
The default values of some parameters that restrict Graphite's
resource usage are too low for many OpenACC codes.  Furthermore,
exceeding the limits does not alwas lead to user-visible diagnostic
messages.

This commit increases the parameter values on OpenACC functions.  The
values were chosen to allow for the analysis of all "kernels" regions
in the SPEC ACCEL v1.3 benchmark suite.  Warnings about exceeded
Graphite-related limits are added to the -fopt-info-missed
output. Those warnings are phrased in a uniform way that intentionally
refers to the "data-dependence analysis" of "OpenACC loops" instead of
"a failure in Graphite" to make them easier to understand for users.

gcc/ChangeLog:

* graphite-optimize-isl.c (optimize_isl): Adjust
param_max_isl_operations value for OpenACC functions and add
special warnings if value gets exceeded.

* graphite-scop-detection.c (build_scops): Likewise for
param_graphite_max_arrays_per_scop.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/graphite-parameter-1.c: New test.
* gcc.dg/goacc/graphite-parameter-2.c: New test.
---
 gcc/graphite-optimize-isl.c   | 35 ---
 gcc/graphite-scop-detection.c | 28 ++-
 .../gcc.dg/goacc/graphite-parameter-1.c   | 21 +++
 .../gcc.dg/goacc/graphite-parameter-2.c   | 23 
 4 files changed, 101 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 019452700a49..4eecbd20b740 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "tree-vectorizer.h"
 #include "graphite.h"
+#include "graphite-oacc.h"


 /* get_schedule_for_node_st - Improve schedule for the schedule node.
@@ -115,6 +116,14 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
   int old_err = isl_options_get_on_error (scop->isl_context);
   int old_max_operations = isl_ctx_get_max_operations (scop->isl_context);
   int max_operations = param_max_isl_operations;
+
+  /* The default value for param_max_isl_operations is easily exceeded
+ by "kernels" loops in existing OpenACC codes.  Raise the values
+ significantly since analyzing those loops is crucial. */
+  if (param_max_isl_operations == 35 /* default value */
+  && oacc_function_p (cfun))
+max_operations = 200;
+
   if (max_operations)
 isl_ctx_set_max_operations (scop->isl_context, max_operations);
   isl_options_set_on_error (scop->isl_context, ISL_ON_ERROR_CONTINUE);
@@ -164,11 +173,27 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
  dump_user_location_t loc = find_loop_location
(scop->scop_info->region.entry->dest->loop_father);
  if (isl_ctx_last_error (scop->isl_context) == isl_error_quota)
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
-"loop nest not optimized, optimization timed out "
-"after %d operations [--param 
max-isl-operations]\n",
-max_operations);
- else
+   {
+  if (oacc_function_p (cfun))
+   {
+ /* Special casing for OpenACC to unify diagnostic messages
+here and in graphite-scop-detection.c. */
+  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+   "data-dependence analysis of OpenACC loop "
+   "nest "
+   "failed; try increasing the value of "
+   "--param="
+   "max-isl-operations=%d.\n",
+   max_operations);
+}
+  else
+dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+ "loop nest not optimized, optimization timed "
+ "out after %d operations [--param "
+ "max-isl-operations]\n",
+ max_operations);
+}
+  else
dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
 "loop nest not optimized, ISL signalled an 
error\n");
}
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 234dbe0ec729..9a5e43a5bfc6 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -2053,6 +2053,9 @@ determine_openacc_reductions (scop_p scop)
 }
 }

+
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Find Static Control Parts (SCoP) in the current function and pushes
them to SCOPS.  */

@@ -2106,6 +2109,11 @@ build_scops (vec *scops

[PATCH 30/40] graphite: Adjust scop loop-nest choice

2021-12-15 Thread Frederik Harwath
The find_common_loop function is used in Graphite to obtain a common
super-loop of all loops inside a SCoP.  The function is applied to the
loop of the destination block of the edge that leads into the SESE
region and the loop of the source block of the edge that exits the
region.  The exit block is usually introduced by the canonicalization
of the loop structure that Graphite does to support its code
generation. If it is empty, it may happen that it belongs to the outer
fake loop.  This way, build_alias_set may end up analysing
data-references with respect to this loop although there may exist a
proper super-loop of the SCoP loops.  This does not seem to be correct
in general and it leads to problems with runtime alias check creation
which fails if executed on a loop without niter information.

gcc/ChangeLog:

* graphite-scop-detection.c (scop_context_loop): New function.
(build_alias_set): Use scop_context_loop instead of find_common_loop.
* graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
* graphite.h (scop_context_loop): New declaration.
---
 gcc/graphite-isl-ast-to-gimple.c |  4 +---
 gcc/graphite-scop-detection.c| 21 ++---
 gcc/graphite.h   |  1 +
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 010adaabb000..acadf544fadd 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1543,9 +1543,7 @@ graphite_regenerate_ast_isl (scop_p scop)
 conditional if aliasing can be ruled out at runtime and the original
 version of the SCoP, otherwise. */

-  loop_p loop
-  = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-  scop->scop_info->region.exit->src->loop_father);
+  loop_p loop = scop_context_loop (scop);
   tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
   tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
   set_ifsese_condition (region->if_region, non_alias_cond);
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 9a5e43a5bfc6..f173e6c4f890 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -297,6 +297,23 @@ single_pred_cond_non_loop_exit (basic_block bb)
   return NULL;
 }

+
+/* Return the innermost loop that encloses all loops in SCOP. */
+
+loop_p
+scop_context_loop (scop_p scop)
+{
+  edge scop_entry = scop->scop_info->region.entry;
+  edge scop_exit = scop->scop_info->region.exit;
+  basic_block exit_bb = scop_exit->src;
+
+  while (sese_trivially_empty_bb_p (exit_bb) && single_pred_p (exit_bb))
+exit_bb = single_pred (exit_bb);
+
+  loop_p entry_loop = scop_entry->dest->loop_father;
+  return find_common_loop (entry_loop, exit_bb->loop_father);
+}
+
 namespace
 {

@@ -1774,9 +1791,7 @@ build_alias_set (scop_p scop)
   int i, j;
   int *all_vertices;

-  struct loop *nest
-= find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-   scop->scop_info->region.exit->src->loop_father);
+  struct loop *nest = scop_context_loop (scop);

   gcc_checking_assert (nest);

diff --git a/gcc/graphite.h b/gcc/graphite.h
index 9c508f31109f..dacb27a9073c 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -480,4 +480,5 @@ extern tree cached_scalar_evolution_in_region (const sese_l 
&, loop_p, tree);
 extern void dot_all_sese (FILE *, vec &);
 extern void dot_sese (sese_l &);
 extern void dot_cfg ();
+extern loop_p scop_context_loop (scop_p);
 #endif
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 31/40] graphite: Accept loops without data references

2021-12-15 Thread Frederik Harwath
It seems that the check that rejects loops without data references is
only included to avoid handling non-profitable loops.  Including those
loops in Graphite's analysis enables more consistent diagnostic
messages in OpenACC "kernels" code and does not introduce any
testsuite regressions.  If executing Graphite on loops without
data references leads to noticeable compile time slow-downs for
non-OpenACC users of Graphite, the check can be re-introduced but
restricted to non-OpenACC functions.

gcc/ChangeLog:

* graphite-scop-detection.c (scop_detection::harmful_loop_in_region):
Remove check for loops without data references.
---
 gcc/graphite-scop-detection.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index f173e6c4f890..2dcb85508a3d 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -849,19 +849,6 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
  return true;
}

-  /* Check if all loop nests have at least one data reference.
-???  This check is expensive and loops premature at this point.
-If important to retain we can pre-compute this for all innermost
-loops and reject those when we build a SESE region for a loop
-during SESE discovery.  */
-  if (! loop->inner
- && ! loop_nest_has_data_refs (loop))
-   {
- DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-  << " does not have any data reference.\n");
- return true;
-   }
-
   DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is 
harmless.\n");
 }

--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 32/40] Reference reduction localization

2021-12-15 Thread Frederik Harwath
From: Julian Brown 

gcc/
* gimplify.c (privatize_reduction): New struct.
(localize_reductions_r, localize_reductions): New functions.
(gimplify_omp_for): Call localize_reductions.
(gimplify_omp_workshare): Likewise.
* omp-low.c (lower_oacc_reductions): Handle localized reductions.
Create fewer temp vars.
* tree-core.h (omp_clause_code): Add OMP_CLAUSE_REDUCTION_PRIVATE_DECL
documentation.
* tree.c (omp_clause_num_ops): Bump number of ops for
OMP_CLAUSE_REDUCTION to 6.
(walk_tree_1): Adjust accordingly.
* tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_DECL): Add macro.
---
 gcc/gimplify.c  | 102 +++
 gcc/omp-low.c   |  45 +---
 gcc/tree-core.h |   4 +-
 gcc/tree.c  | 137 +---
 gcc/tree.h  |   2 +
 5 files changed, 250 insertions(+), 40 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index c2ab96e7e182..9a4331c70d6e 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -240,6 +240,11 @@ struct gimplify_omp_ctx
   int defaultmap[5];
 };

+struct privatize_reduction
+{
+  tree ref_var, local_var;
+};
+
 static struct gimplify_ctx *gimplify_ctxp;
 static struct gimplify_omp_ctx *gimplify_omp_ctxp;
 static bool in_omp_construct;
@@ -11900,6 +11905,80 @@ gimplify_omp_taskloop_expr (tree type, tree *tp, 
gimple_seq *pre_p,
   OMP_FOR_CLAUSES (orig_for_stmt) = c;
 }

+/* Helper function for localize_reductions.  Replace all uses of REF_VAR with
+   LOCAL_VAR.  */
+
+static tree
+localize_reductions_r (tree *tp, int *walk_subtrees, void *data)
+{
+  enum tree_code tc = TREE_CODE (*tp);
+  struct privatize_reduction *pr = (struct privatize_reduction *) data;
+
+  if (TYPE_P (*tp))
+*walk_subtrees = 0;
+
+  switch (tc)
+{
+case INDIRECT_REF:
+case MEM_REF:
+  if (TREE_OPERAND (*tp, 0) == pr->ref_var)
+   *tp = pr->local_var;
+
+  *walk_subtrees = 0;
+  break;
+
+case VAR_DECL:
+case PARM_DECL:
+case RESULT_DECL:
+  if (*tp == pr->ref_var)
+   *tp = pr->local_var;
+
+  *walk_subtrees = 0;
+  break;
+
+default:
+  break;
+}
+
+  return NULL_TREE;
+}
+
+/* OpenACC worker and vector loop state propagation requires reductions
+   to be inside local variables.  This function replaces all reference-type
+   reductions variables associated with the loop with a local copy.  It is
+   also used to create private copies of reduction variables for those
+   which are not associated with acc loops.  */
+
+static void
+localize_reductions (tree clauses, tree body)
+{
+  tree c, var, type, new_var;
+  struct privatize_reduction pr;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
+  {
+   var = OMP_CLAUSE_DECL (c);
+
+   if (!lang_hooks.decls.omp_privatize_by_reference (var))
+ {
+   OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = NULL;
+   continue;
+ }
+
+   type = TREE_TYPE (TREE_TYPE (var));
+   new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var)));
+
+   pr.ref_var = var;
+   pr.local_var = new_var;
+
+   walk_tree (&body, localize_reductions_r, &pr, NULL);
+
+   OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = new_var;
+  }
+}
+
+
 /* Gimplify the gross structure of an OMP_FOR statement.  */

 static enum gimplify_status
@@ -12126,6 +12205,23 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   gcc_unreachable ();
 }

+  if (ort == ORT_ACC)
+{
+  gimplify_omp_ctx *outer = gimplify_omp_ctxp;
+
+  while (outer
+&& outer->region_type != ORT_ACC_PARALLEL
+&& outer->region_type != ORT_ACC_KERNELS)
+   outer = outer->outer_context;
+
+  /* FIXME: Reductions only work in parallel regions at present.  We avoid
+doing the reduction localization transformation in kernels regions
+here, because the code to remove reductions in kernels regions cannot
+handle that.  */
+  if (outer && outer->region_type == ORT_ACC_PARALLEL)
+   localize_reductions (OMP_FOR_CLAUSES (*expr_p), OMP_FOR_BODY (*expr_p));
+}
+
   /* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear
  clause for the IV.  */
   if (ort == ORT_SIMD && TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) == 1)
@@ -13654,6 +13750,12 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq 
*pre_p)
   || (ort & ORT_HOST_TEAMS) == ORT_HOST_TEAMS)
 {
   push_gimplify_context ();
+
+  /* FIXME: Reductions are not supported in kernels regions yet.  */
+  if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL)
+localize_reductions (OMP_TARGET_CLAUSES (*expr_p),
+OMP_TARGET_BODY (*expr_p));
+
   gimple *g = gimplify_and_return_first (OMP_BODY (expr), &body);
   if (gimple_code (g) == GIMPLE_BIND)
pop_gimplify_context (g);
diff --git a/gcc/o

[PATCH 33/40] Fix tree check failure with reduction localization

2021-12-15 Thread Frederik Harwath
From: Julian Brown 

gcc/
* gimplify.c (gimplify_omp_workshare): Use OMP_CLAUSES, OMP_BODY
instead of OMP_TARGET_CLAUSES, OMP_TARGET_BODY.
---
 gcc/gimplify.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 9a4331c70d6e..04ffbc256442 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -13753,8 +13753,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)

   /* FIXME: Reductions are not supported in kernels regions yet.  */
   if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL)
-localize_reductions (OMP_TARGET_CLAUSES (*expr_p),
-OMP_TARGET_BODY (*expr_p));
+localize_reductions (OMP_CLAUSES (expr), OMP_BODY (expr));

   gimple *g = gimplify_and_return_first (OMP_BODY (expr), &body);
   if (gimple_code (g) == GIMPLE_BIND)
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 34/40] Use more appropriate var in localize_reductions call

2021-12-15 Thread Frederik Harwath
From: Julian Brown 

gcc/
* gimplify.c (gimplify_omp_for): Use for_stmt in call to
localize_reductions.
---
 gcc/gimplify.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 04ffbc256442..daa69ccf6202 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12219,7 +12219,8 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 here, because the code to remove reductions in kernels regions cannot
 handle that.  */
   if (outer && outer->region_type == ORT_ACC_PARALLEL)
-   localize_reductions (OMP_FOR_CLAUSES (*expr_p), OMP_FOR_BODY (*expr_p));
+   localize_reductions (OMP_FOR_CLAUSES (for_stmt),
+OMP_FOR_BODY (for_stmt));
 }

   /* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 35/40] Handle references in OpenACC "private" clauses

2021-12-15 Thread Frederik Harwath
From: Julian Brown 

gcc/
* gimplify.c (localize_reductions): Rewrite references for
OMP_CLAUSE_PRIVATE also.

libgomp/
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: New test.
* testsuite/libgomp.oacc-c++/privatized-ref-2.C: New test.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: New test.
---
 gcc/gimplify.c| 15 
 .../libgomp.oacc-c++/privatized-ref-2.C   | 64 +
 .../libgomp.oacc-c++/privatized-ref-3.C   | 64 +
 .../libgomp.oacc-fortran/privatized-ref-1.f95 | 71 +++
 4 files changed, 214 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index daa69ccf6202..bf37388f947c 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -11976,6 +11976,21 @@ localize_reductions (tree clauses, tree body)

OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c) = new_var;
   }
+else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
+  {
+   var = OMP_CLAUSE_DECL (c);
+
+   if (!lang_hooks.decls.omp_privatize_by_reference (var))
+ continue;
+
+   type = TREE_TYPE (TREE_TYPE (var));
+   new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var)));
+
+   pr.ref_var = var;
+   pr.local_var = new_var;
+
+   walk_tree (&body, localize_reductions_r, &pr, NULL);
+  }
 }


diff --git a/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C 
b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
new file mode 100644
index ..3884f163132c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-2.C
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+
+#include 
+
+void workers (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+int i, j;
+#pragma acc loop gang
+for (i = 0; i < 256; i++)
+  {
+#pragma acc loop worker
+   for (j = 0; j < 256; j++)
+ {
+   int tmpvar;
+   int &tmpref = tmpvar;
+   tmpref = (i * 256 + j) * 99;
+   res[i * 256 + j] = tmpref;
+ }
+  }
+  }
+
+  for (i = 0; i < 65536; i++)
+if (res[i] != i * 99)
+  abort ();
+}
+
+void vectors (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+int i, j;
+#pragma acc loop gang worker
+for (i = 0; i < 256; i++)
+  {
+#pragma acc loop vector
+   for (j = 0; j < 256; j++)
+ {
+   int tmpvar;
+   int &tmpref = tmpvar;
+   tmpref = (i * 256 + j) * 101;
+   res[i * 256 + j] = tmpref;
+ }
+  }
+  }
+
+  for (i = 0; i < 65536; i++)
+if (res[i] != i * 101)
+  abort ();
+}
+
+int main (int argc, char *argv[])
+{
+  workers ();
+  vectors ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C 
b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C
new file mode 100644
index ..c1a10cba31b3
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/privatized-ref-3.C
@@ -0,0 +1,64 @@
+/* { dg-do run } */
+
+#include 
+
+void workers (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+int i, j;
+int tmpvar;
+int &tmpref = tmpvar;
+#pragma acc loop gang
+for (i = 0; i < 256; i++)
+  {
+#pragma acc loop worker private(tmpref)
+   for (j = 0; j < 256; j++)
+ {
+   tmpref = (i * 256 + j) * 99;
+   res[i * 256 + j] = tmpref;
+ }
+  }
+  }
+
+  for (i = 0; i < 65536; i++)
+if (res[i] != i * 99)
+  abort ();
+}
+
+void vectors (void)
+{
+  double res[65536];
+  int i;
+
+#pragma acc parallel copyout(res) num_gangs(64) num_workers(64)
+  {
+int i, j;
+int tmpvar;
+int &tmpref = tmpvar;
+#pragma acc loop gang worker
+for (i = 0; i < 256; i++)
+  {
+#pragma acc loop vector private(tmpref)
+   for (j = 0; j < 256; j++)
+ {
+   tmpref = (i * 256 + j) * 101;
+   res[i * 256 + j] = tmpref;
+ }
+  }
+  }
+
+  for (i = 0; i < 65536; i++)
+if (res[i] != i * 101)
+  abort ();
+}
+
+int main (int argc, char *argv[])
+{
+  workers ();
+  vectors ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95 
b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95
new file mode 100644
index ..fe1520a8078c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-1.f95
@@ -0,0 +1,71 @@
+! { dg-do run }
+
+program main
+  implicit none
+  integer :: myint
+  integer :: i
+  real :: res(65536), tmp
+
+  res(:) = 0.0
+
+  myint = 5
+  call workers(myint, res)
+
+  do i=1,65536
+tmp = i * 99

[PATCH 36/40] openacc: Enable reduction variable localization for "kernels"

2021-12-15 Thread Frederik Harwath
gcc/ChangeLog:

* gimplify.c (gimplify_omp_for): Enable localization on
"kernels" regions.
(gimplify_omp_workshare): Likewise.
---
 gcc/gimplify.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index bf37388f947c..a0137089496b 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12229,11 +12229,9 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 && outer->region_type != ORT_ACC_KERNELS)
outer = outer->outer_context;

-  /* FIXME: Reductions only work in parallel regions at present.  We avoid
-doing the reduction localization transformation in kernels regions
-here, because the code to remove reductions in kernels regions cannot
-handle that.  */
-  if (outer && outer->region_type == ORT_ACC_PARALLEL)
+  if (outer && (outer->region_type == ORT_ACC_PARALLEL
+   || (outer->region_type == ORT_ACC_KERNELS
+   && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE)))
localize_reductions (OMP_FOR_CLAUSES (for_stmt),
 OMP_FOR_BODY (for_stmt));
 }
@@ -13767,8 +13765,9 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
 {
   push_gimplify_context ();

-  /* FIXME: Reductions are not supported in kernels regions yet.  */
-  if (/*ort == ORT_ACC_KERNELS ||*/ ort == ORT_ACC_PARALLEL)
+  if (ort == ORT_ACC_PARALLEL
+  || (ort == ORT_ACC_KERNELS
+  && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE))
 localize_reductions (OMP_CLAUSES (expr), OMP_BODY (expr));

   gimple *g = gimplify_and_return_first (OMP_BODY (expr), &body);
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 37/40] Fix for is_gimple_reg vars to 'data kernels'

2021-12-15 Thread Frederik Harwath
From: Tobias Burnus 

Nearly all variable mapping is moved from 'kernels' to a surrounding
'data kernels' and then 'force_present' mapped for the 'kernels'. However, as
libgomp.oacc-c-c++-common/declare-vla.c shows, moving 'int i, N' will fail as
there is a special case for is_gimple_reg in mapping and that fails badly if
outside a target region (e.g. offloading = false). As those are transferred by
value and not as a pointer, it makes more sense to only map them at
'kernels' and ignore them for 'data kernels'.
Additionally, as e.g. libgomp.oacc-c-c++-common/kernels-decompose-1.c shows,
one still additionally to handle 'kernels'-declared variables which now are
declared in 'kernels data' and and can be handled as is_gimple_reg.

gcc/
* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
is_gimple_reg vars are not yet mapped, fall through to map is as
before the transformation.
(omp_oacc_kernels_decompose_1): Don't map is_gimple_reg vars.
(decompose_kernels_region_body): Use tofrom for is_gimple_reg vars.
(omp_oacc_kernels_decompose_1): Handle is_gimple_reg vars as without
data kernels.

gcc/testsuite/
* gfortran.dg/goacc/declare-3.f95: Update scan-tree-dump-times.
---
 gcc/omp-oacc-kernels-decompose.cc | 9 +++--
 gcc/testsuite/gfortran.dg/goacc/declare-3.f95 | 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/omp-oacc-kernels-decompose.cc 
b/gcc/omp-oacc-kernels-decompose.cc
index c96207d96250..a6be1f1ed238 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -873,7 +873,7 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
  else
inner_bind_vars = next;
}
-  else
+  else if (!is_gimple_reg (v))
{
  /* Otherwise, build the map clause.  */
  tree new_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
@@ -1222,7 +1222,9 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
   if (!DECL_ARTIFICIAL (var) && TREE_CODE (var) != CONST_DECL)
{
  tree present_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
- OMP_CLAUSE_SET_MAP_KIND (present_clause, GOMP_MAP_FORCE_PRESENT);
+ OMP_CLAUSE_SET_MAP_KIND (present_clause,
+  is_gimple_reg (var)
+  ? GOMP_MAP_TOFROM : GOMP_MAP_FORCE_PRESENT);
  OMP_CLAUSE_DECL (present_clause) = var;
  OMP_CLAUSE_SIZE (present_clause) = DECL_SIZE_UNIT (var);
  OMP_CLAUSE_CHAIN (present_clause) = present_clauses;
@@ -1437,6 +1439,9 @@ omp_oacc_kernels_decompose_1 (gimple *kernels_stmt)
   region causes runtime errors.  */
break;

+ if (is_gimple_reg (decl))
+   break;
+
  /* For non-artificial variables, and for non-declaration
 expressions like A[0:n], copy the clause to the data
 region.  */
diff --git a/gcc/testsuite/gfortran.dg/goacc/declare-3.f95 
b/gcc/testsuite/gfortran.dg/goacc/declare-3.f95
index 9127cba6600d..2a1fe0a68465 100644
--- a/gcc/testsuite/gfortran.dg/goacc/declare-3.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/declare-3.f95
@@ -39,7 +39,7 @@ program test
   use mod_d
   use mod_e

-  ! { dg-final { scan-tree-dump {(?n)#pragma acc data map\(force_alloc:d\) 
map\(force_to:b\) map\(force_alloc:a\)$} original } }
+  ! { dg-final { scan-tree-dump {(?n)#pragma acc data map\(force_alloc:d\) 
map\(to:b\) map\(alloc:a\)$} original } }
 end program test

 ! { dg-final { scan-tree-dump-times {#pragma acc data} 1 original } }
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 38/40] openacc: fix privatization of by-reference arrays

2021-12-15 Thread Frederik Harwath
From: Tobias Burnus 

Replacing of a by-reference variable in a private clause by a local variable
makes sense; however, for arrays, the size is not directly known by the type.
This causes an ICE via create_tmp_var which indirectly invokes
force_constant_size in this case - but the latter only handled Ada.

gcc/ChangeLog:

* gimplify.c (localize_reductions): Do not create local
variable for privatized arrays.
---
 gcc/gimplify.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index a0137089496b..952bc449a7db 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -11982,8 +11982,9 @@ localize_reductions (tree clauses, tree body)

if (!lang_hooks.decls.omp_privatize_by_reference (var))
  continue;
-
type = TREE_TYPE (TREE_TYPE (var));
+   if (TREE_CODE (type) == ARRAY_TYPE)
+ continue;
new_var = create_tmp_var (type, IDENTIFIER_POINTER (DECL_NAME (var)));

pr.ref_var = var;
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 39/40] openacc: Check type for references in reduction lowering

2021-12-15 Thread Frederik Harwath
gcc/ChangeLog:

* omp-low.c (lower_oacc_reductions): Only create a reference
if variable has pointer type.
---
 gcc/omp-low.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ae5cdfc5e260..2b8b848ec03a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -7639,9 +7639,10 @@ lower_oacc_reductions (location_t loc, tree clauses, 
tree level, bool inner,

if (omp_privatize_by_reference (orig))
  {
-   outgoing = build_simple_mem_ref (outgoing);
+if (POINTER_TYPE_P (TREE_TYPE (outgoing)))
+ outgoing = build_simple_mem_ref (outgoing);

-   if (!TREE_CONSTANT (incoming))
+if (POINTER_TYPE_P (TREE_TYPE (incoming)))
  incoming = build_simple_mem_ref (incoming);
  }

--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PR103302] skip multi-word pre-move clobber during lra

2021-12-15 Thread Jeff Law via Gcc-patches




On 12/15/2021 1:22 AM, Alexandre Oliva wrote:

On Dec  9, 2021, Jeff Law  wrote:


I found a similar pattern of issuing clobbers for multi-word moves, but
not when reload_in_progress, in expr.c:emit_move_complex_parts.  I don't
have a testcase, but I'm tempted to propose '!lra_in_progress &&' for it
as well.  Can you think of any reason not to?

The only reason I can think of is we're in stage3 :-)  It'd be a lot
easier to green light that if we could trigger an issue.

I have not found the cycles to try to construct a testcase to trigger
the issue, but before moving on, I have regstrapped this on
x86_64-linux-gnu, so, at least for now, I propose it for the next
release cycle.  Ok to install then?


[PR103302] skip multi-part clobber during lra for complex parts too

From: Alexandre Oliva 

As with the earlier patch, avoid emitting clobbers that we used to
avoid during reload also during LRA, now when moving complex
multi-part values.  We don't have a testcase for this one.


for  gcc/ChangeLog

PR target/103302
* expr.c (emit_move_complex_parts): Skip clobbers during lra.

OK for the next cycle.
jeff



Re: [PATCH v4 3/6] tree-object-size: Support dynamic sizes in conditions

2021-12-15 Thread Jakub Jelinek via Gcc-patches
On Wed, Dec 01, 2021 at 07:57:54PM +0530, Siddhesh Poyarekar wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
> @@ -0,0 +1,72 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +typedef __SIZE_TYPE__ size_t;
> +#define abort __builtin_abort
> +
> +size_t
> +__attribute__ ((noinline))
> +test_builtin_malloc_condphi (int cond)
> +{
> +  void *ret;
> + 
> +  if (cond)
> +ret = __builtin_malloc (32);
> +  else
> +ret = __builtin_malloc (64);
> +
> +  return __builtin_dynamic_object_size (ret, 0);
> +}
> +
> +size_t
> +__attribute__ ((noinline))
> +test_builtin_calloc_condphi (size_t cnt, size_t sz, int cond)
> +{
> +  struct
> +{
> +  int a;
> +  char b;
> +} bin[cnt];
> +
> +  char *ch = __builtin_calloc (cnt, sz);
> +
> +  return __builtin_dynamic_object_size (cond ? ch : (void *) &bin, 0);

I think it would be nice if the testcases didn't leak memory, can
you replace return ... with size_t ret = 
and add
  __builtin_free (ch);
  return ret;
in both cases (in the first perhaps rename ret to ch first.

> --- a/gcc/testsuite/gcc.dg/builtin-object-size-5.c
> +++ b/gcc/testsuite/gcc.dg/builtin-object-size-5.c
> @@ -1,5 +1,7 @@
>  /* { dg-do compile { target i?86-*-linux* i?86-*-gnu* x86_64-*-linux* } } */
>  /* { dg-options "-O2" } */
> +/* For dynamic object sizes we 'succeed' if the returned size is known for
> +   maximum object size.  */
>  
>  typedef __SIZE_TYPE__ size_t;
>  extern void abort (void);
> @@ -13,7 +15,11 @@ test1 (size_t x)
>  
>for (i = 0; i < x; ++i)
>  p = p + 4;
> +#ifdef __builtin_object_size
> +  if (__builtin_object_size (p, 0) == -1)
> +#else
>if (__builtin_object_size (p, 0) != sizeof (buf) - 8)
> +#endif
>  abort ();
>  }
>  
> @@ -25,10 +31,15 @@ test2 (size_t x)
>  
>for (i = 0; i < x; ++i)
>  p = p + 4;
> +#ifdef __builtin_object_size
> +  if (__builtin_object_size (p, 1) == -1)
> +#else
>if (__builtin_object_size (p, 1) != sizeof (buf) - 8)
> +#endif
>  abort ();
>  }

I'd say for __bdos it would be better to rewrite the testcase
as dg-do run, perhaps use somewhat smaller buffer (say 16 times smaller;
and dg-additional-sources for a file that actually defines that buffer
and main.  Perhaps you can have those
#ifdef __builtin_object_size
  if (__builtin_object_size (p, 0) != sizeof (buf) - 8 - 4 * x)
#else
in there, just in the wrapper that #define __builtin_object_size
make it dg-do run and have dg-additional-sources (and
#ifndef N
#define N 0x4000
#endif
and use that as size of buf.

> + gcc_checking_assert (is_gimple_variable (ret)

This should be TREE_CODE (ret) == SSA_NAME
The reason why is_gimple_variable accepts VAR_DECLs/PARM_DECLs/RESULT_DECLs
is high vs. low gimple, but size_type_node sizes are gimple types and
both objsz passes are run when in ssa form, so it should always be either
a SSA_NAME or INTEGER_CST.

> +  || TREE_CODE (ret) == INTEGER_CST);
> +}
> +
> +  return ret;
>  }
>  
>  /* Set size for VARNO corresponding to OSI to VAL.  */
> @@ -176,27 +218,113 @@ object_sizes_initialize (struct object_size_info *osi, 
> unsigned varno,
>object_sizes[object_size_type][varno].wholesize = wholeval;
>  }
>  
> +/* Return a MODIFY_EXPR for cases where SSA and EXPR have the same type.  The
> +   TREE_VEC is returned only in case of PHI nodes.  */
> +
> +static tree
> +bundle_sizes (tree ssa, tree expr)
> +{
> +  gcc_checking_assert (TREE_TYPE (ssa) == sizetype);
> +
> +  if (!TREE_TYPE (expr))
> +{
> +  gcc_checking_assert (TREE_CODE (expr) == TREE_VEC);

I think I'd prefer to do it the other way, condition on TREE_CODE (expr) == 
TREE_VEC
and if needed assert it has NULL TREE_TYPE.

> +  TREE_VEC_ELT (expr, TREE_VEC_LENGTH (expr) - 1) = ssa;
> +  return expr;
> +}
> +
> +  gcc_checking_assert (types_compatible_p (TREE_TYPE (expr), sizetype));
> +  return size_binop (MODIFY_EXPR, ssa, expr);

This looks wrong.  MODIFY_EXPR isn't a binary expression
(tcc_binary/tcc_comparison), size_binop shouldn't be called on it.
I think you even don't want to fold it, so
  return build2 (MODIFY_EXPR, sizetype, ssa, expr);
?
Also, calling a parameter or var ssa is quite unusual, normally
one calls a SSA_NAME either name, or ssa_name etc.

> +   gcc_checking_assert (size_initval_p (oldval, object_size_type));
> +   gcc_checking_assert (size_initval_p (old_wholeval,
> +object_size_type));
> +   /* For dynamic object sizes, all object sizes that are not gimple
> +  variables will need to be gimplified.  */
> +   if (TREE_CODE (wholeval) != INTEGER_CST
> +   && !is_gimple_variable (wholeval))
> + {
> +   bitmap_set_bit (osi->reexamine, varno);
> +   wholeval = bundle_sizes (make_ssa_name (sizetype), wholeval);
> + }
> +   if (TREE_CODE (val) != INTEGER_CST && !is_gimple_variable (val))

Again twice above.

> +/* Set tempor

Re: [PATCH v4 1/6] tree-object-size: Use trees and support negative offsets

2021-12-15 Thread Siddhesh Poyarekar

On 12/15/21 20:51, Jakub Jelinek wrote:

Shouldn't this also tree_int_cst_compare (old_wholeval, wholeval) ?



AFAICT, there is no situation where wholeval changes but val doesn't, so 
I believe the val check should be sufficient.  Do you think otherwise?


Siddhesh


Re: [Patch]Enable -Wuninitialized + -ftrivial-auto-var-init for address taken variables

2021-12-15 Thread Qing Zhao via Gcc-patches


> On Dec 14, 2021, at 4:06 PM, Martin Sebor  wrote:
> 
> 
 
 Dynamically creating the string seems quite cumbersome here, and
 it leaks the allocated block.  I wonder if it might be better to
 remove the gmsgid argument from the function and assign it to
 one of the literals based on the other arguments.
 
 Since only one of var and var_name is used, I also wonder if
 the %qs form could be used for both to simplify the overall
 logic.  (I.e., get the IDENTIFIER_POINTER string from var and
 use it instead of %qD).
>> Looks like that using “%qs” + get the IDENTIFIER_POINTER string from var did 
>> not work very well for the following testing case:
>>   1 /* PR tree-optimization/45083 */
>>   2 /* { dg-do compile } */
>>   3 /* { dg-options "-O2 -Wuninitialized" } */
>>   4
>>   5 struct S { char *a; unsigned b; unsigned c; };
>>   6 extern int foo (const char *);
>>   7 extern void bar (int, int);
>>   8
>>   9 static void
>>  10 baz (void)
>>  11 {
>>  12   struct S cs[1];   /* { dg-message "was declared here" } */
>>  13   switch (cs->b)/* { dg-warning "cs\[^\n\r\]*\\.b\[^\n\r\]*is 
>> used uninitialized" } */
>>  14 {
>>  15 case 101:
>>  16   if (foo (cs->a))  /* { dg-warning "cs\[^\n\r\]*\\.a\[^\n\r\]*may 
>> be used uninitialized" } */
>>  17 bar (cs->c, cs->b); /* { dg-warning 
>> "cs\[^\n\r\]*\\.c\[^\n\r\]*may be used uninitialized" } */
>>  18 }
>>  19 }
>>  20
>>  21 void
>>  22 test (void)
>>  23 {
>>  24   baz ();
>>  25 }
>> For the uninitialized usages at line 13, 16, 17: the IDENTIFIER_POINTER 
>> string of var are:
>> cs$0$b, cs$0$a ,cs$0$c
>> However, with %qD, they are printed as cs[0].b, cs[0].a, cs[0].c
>> But with %qs, they are printed as cs$0$b, cs$0$a ,cs$0$c.
>> Looks like that %qD does not simplify print out the IDENTIFIER_POINTER 
>> string directly, it specially handle it for some cases.
>> I tried to see how %qD specially handle the strings, but didn’t get it so 
>> far.
>> Do you know where the %qD handle this case specially?
> 
> In the front end's pretty printer where it handles %D (e.g.,
> for C in c_tree_printer in c/c-objc-common.c).  For VARs with
> DECL_HAS_DEBUG_EXPR_P (temp) the code uses DECL_DEBUG_EXPR().
> 
> There's also print_generic_expr_to_str(tree) that formats a decl
> or an expression to a dynamically allocated string (the string
> needs to be freed).

Thanks a lot.
This resolved the issue.

Qing
> 
> Martin
> 
>> Thanks.
>> Qing
>>> Both are good suggestions, I will try to update the code based on this.
>>> 
>>> Thanks again.
>>> 
>>> Qing



Re: [PATCH 1/2] Sync with binutils: GCC: Pass --plugin to AR and RANLIB

2021-12-15 Thread Jeff Law via Gcc-patches




On 11/13/2021 9:33 AM, H.J. Lu via Gcc-patches wrote:

Sync with binutils for building binutils with LTO:

 From 50ad1254d5030d0804cbf89c758359ae202e8d55 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 9 Jan 2021 06:43:11 -0800
Subject: [PATCH] GCC: Pass --plugin to AR and RANLIB

Detect GCC LTO plugin.  Pass --plugin to AR and RANLIB to support LTO
build.

* Makefile.tpl (AR): Add @AR_PLUGIN_OPTION@
(RANLIB): Add @RANLIB_PLUGIN_OPTION@.
* configure.ac: Include config/gcc-plugin.m4.
AC_SUBST AR_PLUGIN_OPTION and RANLIB_PLUGIN_OPTION.
* libtool.m4 (_LT_CMD_OLD_ARCHIVE): Pass --plugin to AR and
RANLIB if possible.
* Makefile.in: Regenerated.
* configure: Likewise.

config/

* gcc-plugin.m4 (GCC_PLUGIN_OPTION): New.

libiberty/

* Makefile.in (AR): Add @AR_PLUGIN_OPTION@
(RANLIB): Add @RANLIB_PLUGIN_OPTION@.
(configure_deps): Depend on ../config/gcc-plugin.m4.
* configure.ac: AC_SUBST AR_PLUGIN_OPTION and
RANLIB_PLUGIN_OPTION.
* aclocal.m4: Regenerated.
* configure: Likewise.

zlib/

* configure: Regenerated.

OK.  Thanks for your patience.

Jeff



[PATCH] c++: ahead-of-time overload set pruning for non-dep calls

2021-12-15 Thread Patrick Palka via Gcc-patches
This patch makes us remember the function selected by overload
resolution during ahead of time processing of a non-dependent call
expression, so that we avoid repeating most of the work of overload
resolution at instantiation time.  This mirrors what we already do for
non-dependent operator expressions via build_min_non_dep_op_overload.

Some caveats:

 * When processing ahead of time a non-dependent call to a member
   function template inside a class template (as in
   g++.dg/template/deduce4.C), we end up generating an "inverted" partial
   instantiation such as S::foo(), the kinds of which we're
   apparently not prepared to fully instantiate (e.g. tsubst_baselink
   mishandles it).  So this patch disables this optimization for such
   functions and adds a FIXME.

 * WHen trying to make the instantiation machinery handle these partial
   instantiations, I made a couple of changes in register_specialization
   and tsubst_function_decl that get us closer to handling such partial
   instantiations and that seem like improvements on their own, so this
   patch includes these changes.

  * This change triggered a latent FUNCTION_DECL pretty printing issue
in cpp0x/error2.C -- since we now resolve the call to foo<0> ahead
of time, the error now looks like:

  error: expansion pattern ‘foo()()=0’ contains no parameter pack

where the FUNCTION_DECL foo is clearly misprinted.  But this
pretty-printing issue could be reproduced without this patch if
we replace foo with an ordinary function.  Since this testcase was
added to verify pretty printing of TEMPLATE_ID_EXPR, I work around
this test failure by making the call to foo type-dependent and thus
immune to this ahead of time pruning.

  * We now reject parts of cpp0x/fntmp-equiv1.C because we notice that
the call d(f, b) in

  template  e d();

isn't constexpr because the (resolved) d isn't.  I tried fixing this
by making d constexpr, but then the call to d from main becomes
ambiguous.  So I settled with removing this part of the testcase.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Also tested on cmcstl2 and range-v3.

gcc/cp/ChangeLog:

* call.c (build_new_method_call): For a non-dependent call
expression inside a template, returning a templated tree
whose overload set contains just the selected function.
* pt.c (register_specialization): Check only the innermost
template args for dependence in the early exit test.
(tsubst_function_decl): Simplify obtaining the template arguments
for a partial instantiation.
* semantics.c (finish_call_expr): As with build_new_method_call.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/error2.C: Make the call to foo type-dependent in
order to avoid latent pretty-printing issue for FUNCTION_DECL
inside MODOP_EXPR.
* g++.dg/cpp0x/fntmp-equiv1.C: Remove ill-formed parts of
testcase that we now diagnose.
* g++.dg/template/non-dependent16.C: New test.
* g++.dg/template/non-dependent16a.C: New test.
---
 gcc/cp/call.c | 17 +
 gcc/cp/pt.c   | 18 ++---
 gcc/cp/semantics.c| 15 
 gcc/testsuite/g++.dg/cpp0x/error2.C   |  4 +-
 gcc/testsuite/g++.dg/cpp0x/fntmp-equiv1.C |  4 --
 .../g++.dg/template/non-dependent16.C | 37 +++
 .../g++.dg/template/non-dependent16a.C| 36 ++
 7 files changed, 111 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/non-dependent16.C
 create mode 100644 gcc/testsuite/g++.dg/template/non-dependent16a.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 53a391cbc6b..92d96c19f5c 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -11165,6 +11165,23 @@ build_new_method_call (tree instance, tree fns, 
vec **args,
}
   if (INDIRECT_REF_P (call))
call = TREE_OPERAND (call, 0);
+
+  /* Prune all but the selected function from the original overload
+set so that we can avoid some duplicate work at instantiation time.  */
+  if (really_overloaded_fn (fns))
+   {
+ if (DECL_TEMPLATE_INFO (fn)
+ && DECL_MEMBER_TEMPLATE_P (DECL_TI_TEMPLATE (fn))
+ && dependent_type_p (DECL_CONTEXT (fn)))
+   /* FIXME: We're not prepared to fully instantiate "inverted"
+  partial instantiations such as A::f().  */;
+ else
+   {
+ orig_fns = copy_node (orig_fns);
+ BASELINK_FUNCTIONS (orig_fns) = fn;
+   }
+   }
+
   call = (build_min_non_dep_call_vec
  (call,
   build_min (COMPONENT_REF, TREE_TYPE (CALL_EXPR_FN (call)),
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 2340139b238..b114114e617 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -1566,18 +1566,10 @@ register_specializati

Re: [PATCH v4 3/6] tree-object-size: Support dynamic sizes in conditions

2021-12-15 Thread Siddhesh Poyarekar

On 12/15/21 21:54, Jakub Jelinek wrote:

On Wed, Dec 01, 2021 at 07:57:54PM +0530, Siddhesh Poyarekar wrote:

--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -0,0 +1,72 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+typedef __SIZE_TYPE__ size_t;
+#define abort __builtin_abort
+
+size_t
+__attribute__ ((noinline))
+test_builtin_malloc_condphi (int cond)
+{
+  void *ret;
+
+  if (cond)
+ret = __builtin_malloc (32);
+  else
+ret = __builtin_malloc (64);
+
+  return __builtin_dynamic_object_size (ret, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_builtin_calloc_condphi (size_t cnt, size_t sz, int cond)
+{
+  struct
+{
+  int a;
+  char b;
+} bin[cnt];
+
+  char *ch = __builtin_calloc (cnt, sz);
+
+  return __builtin_dynamic_object_size (cond ? ch : (void *) &bin, 0);


I think it would be nice if the testcases didn't leak memory, can
you replace return ... with size_t ret =
and add
   __builtin_free (ch);
   return ret;
in both cases (in the first perhaps rename ret to ch first.



OK, I'll fix up all patches for this.


--- a/gcc/testsuite/gcc.dg/builtin-object-size-5.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-5.c
@@ -1,5 +1,7 @@
  /* { dg-do compile { target i?86-*-linux* i?86-*-gnu* x86_64-*-linux* } } */
  /* { dg-options "-O2" } */
+/* For dynamic object sizes we 'succeed' if the returned size is known for
+   maximum object size.  */
  
  typedef __SIZE_TYPE__ size_t;

  extern void abort (void);
@@ -13,7 +15,11 @@ test1 (size_t x)
  
for (i = 0; i < x; ++i)

  p = p + 4;
+#ifdef __builtin_object_size
+  if (__builtin_object_size (p, 0) == -1)
+#else
if (__builtin_object_size (p, 0) != sizeof (buf) - 8)
+#endif
  abort ();
  }
  
@@ -25,10 +31,15 @@ test2 (size_t x)
  
for (i = 0; i < x; ++i)

  p = p + 4;
+#ifdef __builtin_object_size
+  if (__builtin_object_size (p, 1) == -1)
+#else
if (__builtin_object_size (p, 1) != sizeof (buf) - 8)
+#endif
  abort ();
  }


I'd say for __bdos it would be better to rewrite the testcase
as dg-do run, perhaps use somewhat smaller buffer (say 16 times smaller;
and dg-additional-sources for a file that actually defines that buffer
and main.  Perhaps you can have those
#ifdef __builtin_object_size
   if (__builtin_object_size (p, 0) != sizeof (buf) - 8 - 4 * x)
#else
in there, just in the wrapper that #define __builtin_object_size
make it dg-do run and have dg-additional-sources (and
#ifndef N
#define N 0x4000
#endif
and use that as size of buf.


Got it, I'll do that.


+   gcc_checking_assert (is_gimple_variable (ret)


This should be TREE_CODE (ret) == SSA_NAME
The reason why is_gimple_variable accepts VAR_DECLs/PARM_DECLs/RESULT_DECLs
is high vs. low gimple, but size_type_node sizes are gimple types and
both objsz passes are run when in ssa form, so it should always be either
a SSA_NAME or INTEGER_CST.


OK.




+|| TREE_CODE (ret) == INTEGER_CST);
+}
+
+  return ret;
  }
  
  /* Set size for VARNO corresponding to OSI to VAL.  */

@@ -176,27 +218,113 @@ object_sizes_initialize (struct object_size_info *osi, 
unsigned varno,
object_sizes[object_size_type][varno].wholesize = wholeval;
  }
  
+/* Return a MODIFY_EXPR for cases where SSA and EXPR have the same type.  The

+   TREE_VEC is returned only in case of PHI nodes.  */
+
+static tree
+bundle_sizes (tree ssa, tree expr)
+{
+  gcc_checking_assert (TREE_TYPE (ssa) == sizetype);
+
+  if (!TREE_TYPE (expr))
+{
+  gcc_checking_assert (TREE_CODE (expr) == TREE_VEC);


I think I'd prefer to do it the other way, condition on TREE_CODE (expr) == 
TREE_VEC
and if needed assert it has NULL TREE_TYPE.


OK.




+  TREE_VEC_ELT (expr, TREE_VEC_LENGTH (expr) - 1) = ssa;
+  return expr;
+}
+
+  gcc_checking_assert (types_compatible_p (TREE_TYPE (expr), sizetype));
+  return size_binop (MODIFY_EXPR, ssa, expr);


This looks wrong.  MODIFY_EXPR isn't a binary expression
(tcc_binary/tcc_comparison), size_binop shouldn't be called on it.
I think you even don't want to fold it, so
   return build2 (MODIFY_EXPR, sizetype, ssa, expr);
?


Got it, I'll fix that.


Also, calling a parameter or var ssa is quite unusual, normally
one calls a SSA_NAME either name, or ssa_name etc.


OK.


+ gcc_checking_assert (size_initval_p (oldval, object_size_type));
+ gcc_checking_assert (size_initval_p (old_wholeval,
+  object_size_type));
+ /* For dynamic object sizes, all object sizes that are not gimple
+variables will need to be gimplified.  */
+ if (TREE_CODE (wholeval) != INTEGER_CST
+ && !is_gimple_variable (wholeval))
+   {
+ bitmap_set_bit (osi->reexamine, varno);
+ wholeval = bundle_sizes (make_ssa_name (sizetype), wholeval);
+   }
+ if (TREE_CODE (val) != INTEGER_CST && !is_gimple_variable (val))


Again twice above.


OK.


+/* 

Re: [PATCH] rs6000: Refactor altivec_build_resolved_builtin

2021-12-15 Thread Segher Boessenkool
Hi!

On Wed, Dec 15, 2021 at 08:21:24AM -0600, Bill Schmidt wrote:
> While replacing the built-in machinery, we agreed to defer some necessary
> refactoring of the overload processing.  This patch cleans it up considerably.
> 
> I've put in one FIXME for an additional level of cleanup that should be done
> independently.  The various helper functions (resolve_VEC_*) can be simplified
> if we move the argument processing in altivec_resolve_overloaded_builtin
> earlier.  But this requires making nontrivial changes to those functions that
> will need careful review.  Let's do that in a later patch.

All these names should be lower case.  If you promise to do that in the
aforementioned cleanup (and it happens before GCC 12), then okay for
trunk.  Thanks!

Very minor stuff below.

>   * config/rs6000/rs6000-c.c (resolution): New enum.
>   (resolve_VEC_MUL): New function.
>   (resolve_VEC_CMPNE): Likewise.
>   (resolve_VEC_ADDE_SUBE): Likewise.
>   (resolve_VEC_ADDEC_SUBEC): Likewise.
>   (resolve_VEC_SPLATS): Likewise.
>   (resolve_VEC_EXTRACT): Likewise.
>   (resolve_VEC_INSERT): Likewise.
>   (resolve_VEC_STEP): Likewise.
>   (find_instance): Likewise.
>   (altivec_resolve_overloaded_builtin): Many cleanups:  Call factored-out

No two spaces after colon, no capital after colon.  You probably want
a full stop though?

>   functions.  Move variable declarations closer to uses.  Add commentary.
>   Remove unnecessary levels of braces.  Avoid use of gotos.  Change
>   misleading variable names.  Use switches over if-else-if chains.

> +   /* Note:  vec_nand also works but opt changes vec_nand's
> +  to vec_nor's anyway.  */

Maybe there should be a vec_not?  There is one at the RTL level (called
one_cmpl2).

> + decl= rs6000_builtin_decls[RS6000_OVLD_VEC_NOR];

Missing space btw.

> -   support Altivec's overloaded builtins.  FIXME: This code needs
> -   to be brutally factored.  */

Yay :-)

>/* Return immediately if this isn't an overload.  */
> +  rs6000_gen_builtins fcode
> += (rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
> +
>if (fcode <= RS6000_OVLD_NONE)
>  return NULL_TREE;

The new code should be before this comment?  I don't see how the comment
makes much sense like this.

> +  /* Some overloads require special handling.  */
> +  /* FIXME: Could we simplify the helper functions if we gathered arguments
> + and types into arrays first?  */

A lot of the argument checking can be handled more generically.  If
there are many exceptions to that it will not be useful, but a bit of
it would make sense.

If you do that (and maybe similar things as well) this array question
will swivel.

> -  if (TREE_CODE (arg0_type) != VECTOR_TYPE)
> - goto bad;
> -  if (!lang_hooks.types_compatible_p (arg0_type, arg1_type))
> - goto bad;

Yay, all gotos should go :-)

Thanks again,


Segher


Re: [PATCH] rs6000: __builtin_darn[_raw] should be in [power9-64] (PR103624)

2021-12-15 Thread Segher Boessenkool
On Wed, Dec 15, 2021 at 08:00:02AM -0600, Bill Schmidt wrote:
> > No, all builtins should work in either mode, and always return long.
> > If the patterns are broken, the *patterns* should be fixed :-)
> 
> OK, thanks!  This is much clearer now.
> 
> I've opened an internal issue about the deficiencies of the darn patterns and
> their associated built-ins.  In response to PR103624, I would like to start
> with the existing patch to ensure the new support mirrors what we had before,
> so we have that as a baseline.  We can then move on to fixing the larger
> set of problems.  Is that a reasonable plan?

It is much more work than doing it correct in the first place.

I'll do the RTL side, if you want?


Segher


Re: [PATCH v4 1/6] tree-object-size: Use trees and support negative offsets

2021-12-15 Thread Jakub Jelinek via Gcc-patches
On Wed, Dec 15, 2021 at 10:42:29PM +0530, Siddhesh Poyarekar wrote:
> On 12/15/21 20:51, Jakub Jelinek wrote:
> > Shouldn't this also tree_int_cst_compare (old_wholeval, wholeval) ?
> > 
> 
> AFAICT, there is no situation where wholeval changes but val doesn't, so I
> believe the val check should be sufficient.  Do you think otherwise?

Dunno, just something that caught my eye.

Jakub



Re: [PATCH v4 3/6] tree-object-size: Support dynamic sizes in conditions

2021-12-15 Thread Jakub Jelinek via Gcc-patches
On Wed, Dec 15, 2021 at 11:26:48PM +0530, Siddhesh Poyarekar wrote:
> > This makes me a little bit worried.  Do you compute the wholesize SSA_NAME
> > at runtime always, or only when it is really needed and known not to always
> > be equal to the size?
> > I mean, e.g. for the cases where there is just const char *p = malloc 
> > (size);
> > and the pointer is never increased size == wholesize.  For __bos it will
> > just be 2 different INTEGER_CSTs, but if it would at runtime mean we compute
> > something twice and hope we eventually find out during later passes
> > it is the same, it would be bad.
> 
> I'm emitting both size and wholesize all the time; wholesize only really
> gets used in size_for_offset and otherwise should get DCE'd.  Right now for
> __bos (and constant sizes) wholesize is unused if it is the same as size.
> 
> FOR GIMPLE_CALL, GIMPLE_NOP, etc. I return the same tree for size and
> wholesize; maybe a trivial pointer comparison (sz != wholesize) ought to get
> rid of most of the uses in size_for_offset.

Perhaps DCE can handle well when you compute something (wholesize) that isn't 
really
needed and VN/CSE the case where size and wholesize is equal.  I think it
would be worth looking at a few testcases.

> > > + {
> > > +   edge e = gimple_phi_arg_edge (obj_phi, i);
> > > +
> > > +   /* Put the size definition before the last statement of the source
> > > +  block of the PHI edge.  This ensures that any branches at the end
> > > +  of the source block remain the last statement.  We are OK even if
> > > +  the last statement is the definition of the object since it will
> > > +  succeed any definitions that contribute to its size and the size
> > > +  expression will succeed them too.  */
> > > +   gimple_stmt_iterator gsi = gsi_last_bb (e->src);
> > > +   gsi_insert_seq_before (&gsi, seq, GSI_CONTINUE_LINKING);
> > 
> > This looks problematic.  The last stmt in the bb might not exist at all,
> 
> Wouldn't the bb minimally have to contain the definition of the object whose
> size we computed?  e.g. for PHI [a(2), b(3)], wouldn't bb 2 at least have a
> statement with the definition of a?

It can e.g. contain just a PHI.

> Or wait, there could be situations where the definition is in a different
> block, e.g. bb 1, which has a single edge going on to bb 2?

> I suppose __bos-like behaviour could be a good compromise, i.e. insert a
> MAX_EXPR (or MIN_EXPR) if we can't find a suitable location to insert on
> edge.

MAX_EXPR or MIN_EXPR?  I'd have expect the __bos constant in there.
But I must say I'm right now unsure what kind of PHIs one can have on bbs
reachable from both ab/eh edges and normal edges if we can have such bbs at
all.  I guess looking at some sigjmp/longjmp or non-local or computed goto
testcases might show something, perhaps I'll have a look tomorrow.
I'm sure we can have vop PHI.

Jakub



[committed] d: Merge upstream dmd 93108bb9e, druntime 6364e010, phobos 575b67a9b.

2021-12-15 Thread Iain Buclaw via Gcc-patches
Hi,

This patch merges the D front-end implementation with upstream dmd
93108bb9e, and the D run-time libraries with druntime 6364e010 and
phobos 575b67a9b.  The internal version of the language has been bumped
to v2.098.1-beta.1.

D front-end changes:

- Import dmd v2.098.1-beta.1.
- Default extern(C++) compatibility to C++17.

Druntime changes:

- Import druntime v2.098.1-beta.1.
- Fix definition of stat_t on MIPS64 (PR103604)

Phobos changes:

- Import phobos v2.098.1-beta.1.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32, and
committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* d-lang.cc (d_init_options): Set default -fextern-std= to C++17.
* dmd/MERGE: Merge upstream dmd 93108bb9e.
* gdc.texi (Runtime Options): Document the default for -fextern-std=.

libphobos/ChangeLog:

PR d/103604
* configure: Regenerate.
* configure.ac (libtool_VERSION): Update to 3:0:0.
* libdruntime/MERGE: Merge upstream druntime 6364e010.
* src/MERGE: Merge upstream phobos 575b67a9b.
* testsuite/libphobos.traits/all_satisfy.d: New test.
* testsuite/libphobos.traits/traits.exp: New test.
---
 gcc/d/d-lang.cc   |   4 +-
 gcc/d/dmd/MERGE   |   2 +-
 gcc/d/dmd/VERSION |   2 +-
 gcc/d/dmd/constfold.d |  20 +-
 gcc/d/dmd/cparse.d|  36 +-
 gcc/d/dmd/ctfeexpr.d  |  11 +-
 gcc/d/dmd/dinterpret.d|   2 +-
 gcc/d/dmd/dsymbol.d   |  55 ++-
 gcc/d/dmd/dsymbol.h   |   1 +
 gcc/d/dmd/dsymbolsem.d|   7 +-
 gcc/d/dmd/dtemplate.d |  14 +
 gcc/d/dmd/expression.d|   2 +-
 gcc/d/dmd/expressionsem.d |  79 ++--
 gcc/d/dmd/importc.d   |  93 -
 gcc/d/dmd/initsem.d   |  13 +-
 gcc/d/dmd/lexer.d |  18 +-
 gcc/d/dmd/opover.d|  18 +-
 gcc/d/dmd/optimize.d  |  55 +++
 gcc/d/dmd/parse.d |  21 +-
 gcc/d/dmd/printast.d  |  27 ++
 gcc/d/dmd/semantic3.d |  12 +
 gcc/d/dmd/statementsem.d  | 111 +++---
 gcc/d/dmd/target.d|   2 +-
 gcc/d/dmd/target.h|   1 +
 gcc/d/dmd/tokens.d|  72 +---
 gcc/d/dmd/tokens.h|  26 --
 gcc/d/dmd/typesem.d   |  33 +-
 gcc/d/gdc.texi|  11 +-
 gcc/testsuite/gdc.test/compilable/cppmangle.d | 371 ++
 .../gdc.test/compilable/cppmangle3.d  |   9 +-
 .../gdc.test/compilable/issue21203.d  | 210 ++
 .../gdc.test/compilable/issue21340.d  |  38 ++
 gcc/testsuite/gdc.test/compilable/test10028.d |   7 +
 gcc/testsuite/gdc.test/compilable/test20236.d |  22 ++
 gcc/testsuite/gdc.test/compilable/test20860.d |  16 +
 gcc/testsuite/gdc.test/compilable/test21073.d |  16 +
 gcc/testsuite/gdc.test/compilable/test21414.d |  13 +
 .../gdc.test/fail_compilation/b15875.d|   2 +-
 .../gdc.test/fail_compilation/fail116.d   |   2 +-
 .../gdc.test/fail_compilation/fail20616.d |  26 ++
 .../gdc.test/fail_compilation/fail22529.d |  14 +
 .../gdc.test/fail_compilation/fail22570.d |  21 +
 .../gdc.test/fail_compilation/ice22516.d  |  21 +
 .../gdc.test/fail_compilation/test22574.d |  12 +
 .../fail_compilation/test_switch_error.d  | 101 +
 gcc/testsuite/gdc.test/runnable/interpret.d   |  23 ++
 gcc/testsuite/gdc.test/runnable/test16579.d   |  57 +++
 gcc/testsuite/gdc.test/runnable/test18054.d   |  41 ++
 gcc/testsuite/gdc.test/runnable_cxx/cppa.d|  59 ++-
 .../runnable_cxx/extra-files/cppb.cpp |  33 --
 libphobos/configure   |   2 +-
 libphobos/configure.ac|   2 +-
 libphobos/libdruntime/MERGE   |   2 +-
 libphobos/libdruntime/core/internal/traits.d  |  40 +-
 libphobos/libdruntime/core/lifetime.d | 109 -
 libphobos/libdruntime/core/runtime.d  |   2 +-
 .../libdruntime/core/sys/openbsd/execinfo.d   | 139 +--
 .../libdruntime/core/sys/posix/sys/stat.d |  46 ++-
 libphobos/libdruntime/object.d|   2 +-
 libphobos/libdruntime/rt/monitor_.d   |  36 +-
 libphobos/src/MERGE   |   2 +-
 libphobos/src/std/algorithm/searching.d   |  12 +-
 libphobos/src/std/datetime/timezone.d |   3 +-
 libphobos/src/std/parallelism.d   |   6 +-
 libphobos/src/std/regex/package.d |  16 +-
 libphobos/src/std/traits.d|   5 +
 .../testsuite/libphobos.traits/all_satisfy.d  

SV: [commited] jit: Support for global rvalue initialization and constructors

2021-12-15 Thread Petter Tomner via Gcc-patches
Oh ye I accidentally dropped that in the merge thank you.

I believe there is an implicit "global:" in the top of each version scope, so 
it shouldn't
matter other than looking a bit deviant.

Regards,
Petter

Från: Antoni Boucher 
Skickat: den 15 december 2021 15:19
Till: Petter Tomner; David Malcolm; j...@gcc.gnu.org; gcc-patches@gcc.gnu.org
Ämne: Re: [commited] jit: Support for global rvalue initialization and 
constructors
    
Hi Petter.
I believe you have forgotten the line `global:` in the file
`gcc/jit/libgccjit.map`.
I'm not sure what this line does, but it is there for all other ABI.
David: What do you think?
Regards.

Le mardi 14 décembre 2021 à 17:22 +, Petter Tomner via Jit a
écrit :
> Hi!
> 
> I have pushed the patch for rvalue initialization and ctors for
> libgccjit, for ABI 19.
> 
> Please see attached patch.
> 
> Regards,
> Petter
>   



[PATCH] print-tree: dump DECL_LANG_FLAG_8

2021-12-15 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

gcc/ChangeLog:

* print-tree.c (print_node) : Dump
DECL_LANG_FLAG_8.
---
 gcc/print-tree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/print-tree.c b/gcc/print-tree.c
index b5dc523fcb1..297492ad51c 100644
--- a/gcc/print-tree.c
+++ b/gcc/print-tree.c
@@ -484,6 +484,8 @@ print_node (FILE *file, const char *prefix, tree node, int 
indent,
fputs (" decl_6", file);
  if (DECL_LANG_FLAG_7 (node))
fputs (" decl_7", file);
+ if (DECL_LANG_FLAG_8 (node))
+   fputs (" decl_8", file);
 
  mode = DECL_MODE (node);
  fprintf (file, " %s", GET_MODE_NAME (mode));
-- 
2.34.1.182.ge773545c7f



Re: [PATCH] rs6000: __builtin_darn[_raw] should be in [power9-64] (PR103624)

2021-12-15 Thread Bill Schmidt via Gcc-patches


On 12/15/21 12:41 PM, Segher Boessenkool wrote:
> On Wed, Dec 15, 2021 at 08:00:02AM -0600, Bill Schmidt wrote:
>>> No, all builtins should work in either mode, and always return long.
>>> If the patterns are broken, the *patterns* should be fixed :-)
>> OK, thanks!  This is much clearer now.
>>
>> I've opened an internal issue about the deficiencies of the darn patterns and
>> their associated built-ins.  In response to PR103624, I would like to start
>> with the existing patch to ensure the new support mirrors what we had before,
>> so we have that as a baseline.  We can then move on to fixing the larger
>> set of problems.  Is that a reasonable plan?
> It is much more work than doing it correct in the first place.
>
> I'll do the RTL side, if you want?

Sure, go ahead.

Bill

>
>
> Segher


Re: [PATCH] c++: Allow constexpr decltype(auto) [PR102229]

2021-12-15 Thread Marek Polacek via Gcc-patches
On Mon, Dec 13, 2021 at 10:02:24AM -0500, Jason Merrill wrote:
> On 12/10/21 17:29, Marek Polacek wrote:
> > My r11-2202 was trying to enforce [dcl.type.auto.deduct]/4, which says
> > "If the placeholder-type-specifier is of the form type-constraint[opt]
> > decltype(auto), T shall be the placeholder alone."  But this made us
> > reject 'constexpr decltype(auto)', which, after clarification from CWG,
> > should be valid.  [dcl.type.auto.deduct]/4 is supposed to be a syntactic
> > constraint, not semantic, so it's OK that the constexpr marks the object
> > as const.
> > 
> > As a consequence, checking TYPE_QUALS in do_auto_deduction is too late,
> > and we have a FIXME there anyway.  So in this patch I'm attempting to
> > detect 'const decltype(auto)' earlier.  If I'm going to use TYPE_QUALS,
> > it needs to happen before we mark the object as const due to constexpr,
> > that is, before grokdeclarator's
> > 
> >/* A `constexpr' specifier used in an object declaration declares
> >   the object as `const'.  */
> >if (constexpr_p && innermost_code != cdk_function)
> >  ...
> > 
> > Constrained decltype(auto) was a little problem, hence the TYPENAME
> > check.  But in a typename context you can't use decltype(auto) anyway,
> > I think.
> 
> I wonder about checking even earlier, like in cp_parser_decl_specifier_seq?

That _almost_ works except it wouldn't detect things like 'decltype(auto)*'
because the '*' isn't parsed in cp_parser_decl_specifier_seq, only in
declarator.  So the

 if (a != type)
   {
 error_at (loc, "%qT as type rather than plain "
   "%", type);

check wouldn't work.  Maybe I could just check if the next token is * or &
and give an error then.

Marek



Re: [PATCH v3 7/7] ifcvt: Run second pass if it is possible to omit a temporary.

2021-12-15 Thread Jeff Law via Gcc-patches




On 12/10/2021 8:06 AM, Robin Dapp wrote:

Hi Jeff,


I'd generally prefer to refactor the bits between the restart label and
the goto restart into a function and call it twice, passing in the
additional bits to allow for better costing.  Can you look into that?
If it's going to be major surgery, then the goto approach will be OK.

I transplanted the loop into a separate function
"noce_convert_multiple_sets_1" (for the lack of a better name right
now).  I guess an argument could be made about also moving

+  rtx cc_cmp = cond_exec_get_condition (jump);
+  rtx rev_cc_cmp = cond_exec_get_condition (jump, /* get_reversed */ true);

into the function and not care about traversing all instructions
twice/four times (will not be more than a few anyway) but I did not do
that for now.

Does this look better? Not fully tested yet everywhere but a test suite
run on s390 looked good.
I think it's looks better.  One might argue that a structure rather than 
a half-dozen named arguments or a class would be even better, but I 
think that can wait for a full class-ification of that file.


You probably should move the prototype for noce_convert_multiple_set_1 
into the .c file.  It's static, no no need to expose it in the .h file 
AFAICT.  OK with that change.


jeff



Re: [PATCH 2/2] Sync with binutils: Support the PGO build for binutils+gdb

2021-12-15 Thread Jeff Law via Gcc-patches




On 11/13/2021 9:33 AM, H.J. Lu via Gcc-patches wrote:

Sync with binutils for building binutils with LTO:

 From af019bfde9b13d628202fe58054ec7ff08d92a0f Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 9 Jan 2021 06:51:15 -0800
Subject: [PATCH] Support the PGO build for binutils+gdb

Add the --enable-pgo-build[=lto] configure option.  When binutils+gdb
is not built together with GCC, --enable-pgo-build enables the PGO build:

1. First build with -fprofile-generate.
2. Use "make maybe-check-*" to generate profiling data and pass -i to make
to ignore errors when generating profiling data.
3. Use "make clean" to remove the previous build.
4. Rebuild with -fprofile-use.

With --enable-pgo-build=lto, -flto=jobserver -ffat-lto-objects are used
together with -fprofile-generate and -fprofile-use.  Add '+' to the command
line for recursive make to support -flto=jobserver -ffat-lto-objects.

NB: --enable-pgo-build=lto enables the PGO build with LTO while
--enable-lto enables LTO support in toolchain.

PR binutils/26766
* Makefile.tpl (BUILD_CFLAGS): New.
(CFLAGS): Append $(BUILD_CFLAGS).
(CXXFLAGS): Likewise.
(PGO_BUILD_GEN_FLAGS_TO_PASS): New.
(PGO_BUILD_TRAINING_CFLAGS): Likewise.
(PGO_BUILD_TRAINING_CXXFLAGS): Likewise.
(PGO_BUILD_TRAINING_FLAGS_TO_PASS): Likewise.
(PGO_BUILD_TRAINING_MFLAGS): Likewise.
(PGO_BUILD_USE_FLAGS_TO_PASS): Likewise.
(PGO-TRAINING-TARGETS): Likewise.
(PGO_BUILD_TRAINING): Likewise.
(all): Add '+' to the command line for recursive make.  Support
the PGO build.
* configure.ac: Add --enable-pgo-build[=lto].
AC_SUBST PGO_BUILD_GEN_CFLAGS, PGO_BUILD_USE_CFLAGS and
PGO_BUILD_LTO_CFLAGS.  Enable the PGO build in Makefile.
* Makefile.in: Regenerated.
* configure: Likewise.

My understanding is this is just syncing us with binutils.  So OK.

jeff



Re: [PATCH v7] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2021-12-15 Thread Jeff Law via Gcc-patches




On 11/24/2021 4:48 PM, Raoni Fassina Firmino via Gcc-patches wrote:

Changes since v6[6] and v5[5]:
   - Based this version on the v5 one.
   - Reworked all builtins back to the way they are in v5 and added the
 following changes:
 + Added a test to target libc, only expanding with glibc as the
   target libc.
 + Updated all three expanders header comment to reflect the added
   behavior (fegetround got a full header as it had none).
 + Added extra documentation for the builtins on doc/extend.texi,
   similar to v6 version, but only the introductory paragraph,
   without a dedicated entry for each, since now they behavior and
   signature match the C99 ones.
   - Changed the description for the return operand in the RTL template
 of the fegetround expander.  Using "(set )", the same way as
 rs6000_mffsl expander (this change was taken from v6).
   - Updated the commit message mentioning the target libc restriction
 and updated changelog.

Tested on top of master (9bf69a8558638ce0cdd69e83a68776deb9b8e053)
on the following plataforms with no regression:
   - powerpc64le-linux-gnu (Power 9)
   - powerpc64le-linux-gnu (Power 8)
   - powerpc64-linux-gnu (Power 9, with 32 and 64 bits tests)

Also made a visual test comparing the generated assembly of a test
program built against glibc and musl (with -mmusl and with musl-gcc).

Documentation changes tested on x86_64-redhat-linux.

Well, turns out v6 was kind of a misstep[7].  But turns out the
solution was in my face the whole time and Joseph was kind enough to
spell it out to me.  I should have known, one can check for the target
libc at runtime. It is a really simple addition to each expander, only
expanding for the libcs the expander know the FE_* and can handle it.
As Joseph mentioned on his review, with that the expander don't have
to always expand and everything is fine.

As I mentioned[8], musl and uclibc both uses the same values as glibc,
I could add then enabling the expanders for them, not sure about it.

I don't know if I should add something to the documentation, more
precisely on section "6.59 Other Built-in Functions Provided by GCC"
in doc/extend.text. Like I mentioned in v6 but I don't know if I'm
doing it right, especially changing such a front facing documentation,
but here it is.

I'm repeating the "changelog" from past versions here for convenience:

Changes since v5[5]:
   - Reworked all builtins to accept the FE_* macros as parameters and
 so be agnostic to libc implementations.  Largely based of
 fpclassify.  To that end, there is some new files changed:
 + Change the argument list for the builtins declarations in
   builtins.def
 + Added new types in builtin-types.def to use in the buitins
   declarations.
 + Added extra documentation for the builtins on doc/extend.texi,
   similar to fpclassify.
   - Updated doc/md.texi documentation with the new optab behaviors.
   - Updated comments to the expanders and expand handlers to try to
 explain whats is going on.
   - Changed the description for the return operand in the RTL template
 of the fegetround expander.  Using "(set )", the same way as
 rs6000_mffsl expander.
   - Updated testcases with helper macros with the new argument list.

Changes since v4[4]:
   - Fixed more spelling and code style.
   - Add more clarification on  comments for feraiseexcept and
 feclearexcept expands;

Changes since v3[3]:
   - Fixed fegetround bug on powerpc64 (big endian) that Segher
 spotted;

Changes since v2[2]:
   - Added documentation for the new optabs;
   - Remove use of non portable __builtin_clz;
   - Changed feclearexcept and feraiseexcept to accept all 4 valid
 flags at the same time and added more test for that case;
   - Extended feclearexcept and feraiseexcept testcases to match
 accepting multiple flags;
   - Fixed builtin-feclearexcept-feraiseexcept-2.c testcase comparison
 after feclearexcept tests;
   - Updated commit message to reflect change in feclearexcept and
 feraiseexcept from the glibc counterpart;
   - Fixed English spelling and typos;
   - Fixed code-style;
   - Changed subject line tag to make clear it is not just rs6000 code.

Changes since v1[1]:
   - Fixed English spelling;
   - Fixed code-style;
   - Changed match operand predicate in feclearexcept and feraiseexcept;
   - Changed testcase options;
   - Minor changes in test code to be C90 compatible;
   - Other minor changes suggested by Segher;
   - Changed subject line tag (not sure if I tagged correctly or should
 include optabs: also)

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552024.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553297.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557109.html
[4] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557349.html
[5] https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557984.html
[6] https://gc

[PATCH] c++: nested lambda capturing a capture proxy, part 2 [PR94376]

2021-12-15 Thread Patrick Palka via Gcc-patches
The r12-5403 fix apparently doesn't handle the case where the inner
lambda explicitly rather implicitly captures the capture proxy from
the outer lambda, and so we still reject the first example in the
testcase below.

The reason is that compared to an implicit capture, the effective
initializer for an explicit capture is wrapped in a location wrapper
(pointing to the source location of the explicit capture), and this
wrapper foils the is_capture_proxy check.  The simplest fix appears to
be to strip location wrappers accordingly.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/94376

gcc/cp/ChangeLog:

* lambda.c (lambda_capture_field_type): Strip location wrappers
before checking for a capture proxy.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-nested9a.C: New test.
---
 gcc/cp/lambda.c   |  1 +
 .../g++.dg/cpp0x/lambda/lambda-nested9a.C | 42 +++
 2 files changed, 43 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested9a.C

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index c39a2bca416..7f2f927bda2 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -221,6 +221,7 @@ lambda_capture_field_type (tree expr, bool explicit_init_p,
 }
   else
 {
+  STRIP_ANY_LOCATION_WRAPPER (expr);
   if (!by_reference_p && is_capture_proxy (expr))
{
  /* When capturing by-value another capture proxy from an enclosing
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested9a.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested9a.C
new file mode 100644
index 000..d62f8f0c952
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested9a.C
@@ -0,0 +1,42 @@
+// PR c++/94376
+// Like lambda-nested9.C but using explicit captures in the inner lambda.
+// { dg-do compile { target c++11 } }
+
+int main() {
+  // We used to incorrectly reject the first two cases.
+  int i = 0;
+  [=] () {
+[i] () mutable {
+  ++i;
+};
+  };
+
+#if __cpp_init_captures
+  [j=0] () {
+[j] () mutable {
+  ++j;
+};
+  };
+#endif
+
+  [=] () {
+[&i] () mutable {
+  ++i; // { dg-error "read-only" }
+};
+  };
+
+  const int j = 0;
+  [=] () {
+[j] () mutable {
+  ++j; // { dg-error "read-only" }
+};
+  };
+
+#if __cpp_init_captures
+  [j=0] () {
+[&j] () mutable {
+  ++j; // { dg-error "read-only" "" { target c++14 } }
+};
+  };
+#endif
+}
-- 
2.34.1.182.ge773545c7f



Re: [PATCH] c++: two-stage name lookup for overloaded operators [PR51577]

2021-12-15 Thread Jason Merrill via Gcc-patches

On 12/10/21 09:53, Patrick Palka wrote:

In order to properly implement two-stage name lookup for dependent
operator expressions, we need to remember the result of unqualified
lookup of the operator at template definition time, and reuse that
result rather than performing another unqualified lookup at
instantiation time.

Ideally we could just store the lookup result in the expression directly,
but as pointed out in r9-6405 this isn't really possible since we use
the standard tree codes to represent most dependent operator expressions.

We could perhaps create a new tree code to represent dependent operator
expressions, say a DEPENDENT_OPERATOR_EXPR with enough operands to store
the lookup results along and everything else, but that'd require a lot
of careful work to make sure we handle this new tree code properly
across the frontend.

However, currently type-dependent operator (and call) expressions are
given an empty TREE_TYPE, so this space is effectively unused except to
signal that the expression is type-dependent.  It'd be convenient if we
could use this space to store the lookup results while preserving the
dependent-ness of the expression.

To that end, this patch creates a new kind of type, called
DEPENDENT_OPERATOR_TYPE, which we give to dependent operator expressions
and into which we can store the result of operator lookup at template
definition time (DEPENDENT_OPERATOR_TYPE_SAVED_LOOKUPS).  Since this
type is always dependent, and since the frontend doesn't seem to care
much about the particular type of a type-dependent expression, using
this type in place of a NULL_TREE type seems to just work; only
dependent_type_p and WILDCARD_TYPE_P need to be adjusted to return true
for this new type.

The rest of the patch mostly consists of adding the necessary plumbing
to pass DEPENDENT_OPERATOR_TYPE_SAVED_LOOKUPS to add_operator_candidates,
adjusting all callers of build_x_binary_op & friends appropriately, and
removing the now unnecessary push_operator_bindings mechanism.

In passing, this patch simplifies finish_constraint_binary_op to avoid
using build_x_binary_op for building a binary constraint-expr; we don't
need to consider operator||/&& overloads here.  This patch also makes
FOLD_EXPR_OP yield a tree_code instead of a raw INTEGER_CST.

Finally, this patch adds the XFAILed test operator-8.C which is about
broken two-stage name lookup for rewritten non-dependent operator
expressions, an existing bug that's otherwise only documented in
build_new_op.

Bootstrapped and regtested on x86-64-pc-linux-gnu, does this look OK for
trunk?

PR c++/51577
PR c++/83035
PR c++/100465

gcc/cp/ChangeLog:

* call.c (add_operator_candidates): Add lookups parameter.
Use it to avoid performing a second unqualified lookup when
instantiating a dependent operator expression.
(build_new_op): Add lookups parameter and pass it appropriately.
* constraint.cc (finish_constraint_binary_op): Use
build_min_nt_loc instead of build_x_binary_op.
* coroutines.cc (build_co_await): Adjust call to build_new_op.
* cp-objcp-common.c (cp_common_init_ts): Mark
DEPENDENT_OPERATOR_TYPE appropriately.
* cp-tree.def (DEPENDENT_OPERATOR_TYPE): Define.
* cp-tree.h (WILDCARD_TYPE_P): Accept DEPENDENT_OPERATOR_TYPE.
(FOLD_EXPR_OP_RAW): New, renamed from ...
(FOLD_EXPR_OP): ... this.  Change this to return the tree_code directly.
(DEPENDENT_OPERATOR_TYPE_SAVED_LOOKUPS): Define.
(DEPENDENT_OPERATOR_SAVED_LOOKUPS): Define.
(build_new_op): Add lookups parameter.
(build_dependent_operator_type): Declare.
(build_x_indirect_ref): Add lookups parameter.
(build_x_binary_op): Likewise.
(build_x_unary_op): Likewise.
(build_x_compound_expr): Likewise.
(build_x_modify_expr): Likewise.
* cxx-pretty-print.c (get_fold_operator): Adjust after
FOLD_EXPR_OP change.
* decl.c (start_preparsed_function): Don't call
push_operator_bindings.
* decl2.c (grok_array_decl): Adjust calls to build_new_op.
* method.c (do_one_comp): Likewise.
(build_comparison_op): Likewise.
* module.cc (trees_out::type_node): Handle DEPENDENT_OPERATOR_TYPE.
(trees_in::tree_node): Likewise.
* name-lookup.c (lookup_name): Revert r11-2876 change.
(op_unqualified_lookup): Remove.
(maybe_save_operator_binding): Remove.
(discard_operator_bindings): Remove.
(push_operator_bindings): Remove.
* name-lookup.h (maybe_save_operator_binding): Remove.
(push_operator_bindings): Remove.
(discard_operator_bindings): Remove.
* parser.c (cp_parser_unary_expression): Adjust calls to build_x_*.
(cp_parser_binary_expression): Likewise.
(cp_parser_assignment_expression): Likewise.
(cp_parser_expression): Likewise.
(do_range_for_auto_deduction): Li

Re: [PATCH] c++: nested lambda capturing a capture proxy, part 2 [PR94376]

2021-12-15 Thread Jason Merrill via Gcc-patches

On 12/15/21 15:36, Patrick Palka wrote:

The r12-5403 fix apparently doesn't handle the case where the inner
lambda explicitly rather implicitly captures the capture proxy from
the outer lambda, and so we still reject the first example in the
testcase below.

The reason is that compared to an implicit capture, the effective
initializer for an explicit capture is wrapped in a location wrapper
(pointing to the source location of the explicit capture), and this
wrapper foils the is_capture_proxy check.  The simplest fix appears to
be to strip location wrappers accordingly.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/94376

gcc/cp/ChangeLog:

* lambda.c (lambda_capture_field_type): Strip location wrappers
before checking for a capture proxy.


I think either is_capture_proxy should strip location wrappers or 
gcc_checking_assert that it doesn't see one.



gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-nested9a.C: New test.
---
  gcc/cp/lambda.c   |  1 +
  .../g++.dg/cpp0x/lambda/lambda-nested9a.C | 42 +++
  2 files changed, 43 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested9a.C

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index c39a2bca416..7f2f927bda2 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -221,6 +221,7 @@ lambda_capture_field_type (tree expr, bool explicit_init_p,
  }
else
  {
+  STRIP_ANY_LOCATION_WRAPPER (expr);
if (!by_reference_p && is_capture_proxy (expr))
{
  /* When capturing by-value another capture proxy from an enclosing
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested9a.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested9a.C
new file mode 100644
index 000..d62f8f0c952
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-nested9a.C
@@ -0,0 +1,42 @@
+// PR c++/94376
+// Like lambda-nested9.C but using explicit captures in the inner lambda.
+// { dg-do compile { target c++11 } }
+
+int main() {
+  // We used to incorrectly reject the first two cases.
+  int i = 0;
+  [=] () {
+[i] () mutable {
+  ++i;
+};
+  };
+
+#if __cpp_init_captures
+  [j=0] () {
+[j] () mutable {
+  ++j;
+};
+  };
+#endif
+
+  [=] () {
+[&i] () mutable {
+  ++i; // { dg-error "read-only" }
+};
+  };
+
+  const int j = 0;
+  [=] () {
+[j] () mutable {
+  ++j; // { dg-error "read-only" }
+};
+  };
+
+#if __cpp_init_captures
+  [j=0] () {
+[&j] () mutable {
+  ++j; // { dg-error "read-only" "" { target c++14 } }
+};
+  };
+#endif
+}




Re: [committed] libstdc++: Specialize std::pointer_traits<__normal_iterator>

2021-12-15 Thread François Dumont via Gcc-patches

Here is what I eventually would like to commit.

I was not able to remove the _Safe_iterator_base branch in ptr_traits.h. 
When adding the _Safe_iterator overload in C++20 and removing the branch 
the 20_util/to_address/debug.cc test started to fail because it was not 
calling my overload. I tried to declare the overload in ptr_traits.h 
directly so it is known at the time it is used in std::to_address but 
then it failed to match it with the implementation in safe_iterator.h. 
The declaration was not easy to do and I guess I had it wrong.


But it does not matter cause I think this version is the simplest one 
(as it does not change a lot of code).


    libstdc++: Overload std::__to_address for __gnu_cxx::__normal_iterator.

    Prefer to overload __to_address to partially specialize 
std::pointer_traits because
    std::pointer_traits would be mostly useless. Moreover partial 
specialization of
    pointer_traits<__normal_iterator> fails to rebind C, so you 
get incorrect types
    like __normal_iterator>. In the case of 
__gnu_debug::_Safe_iterator
    the to_pointer method is impossible to implement correctly because 
we are missing

    the parent container to associate the iterator to.

    libstdc++-v3/ChangeLog:

    * include/bits/stl_iterator.h
(std::pointer_traits<__gnu_cxx::__normal_iterator<>>): Remove.
    (std::__to_address(const __gnu_cxx::__normal_iterator<>&)): 
New for C++11 to C++17.

    * include/debug/safe_iterator.h
    (std::__to_address(const 
__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<>, _Sequence>&)):

    New for C++11 to C++17.
    * testsuite/24_iterators/normal_iterator/to_address.cc: Add 
check on std::vector::iterator
    to validate both __gnu_cxx::__normal_iterator<> 
__to_address overload in normal mode and

    __gnu_debug::_Safe_iterator in _GLIBCXX_DEBUG mode.

Tested under Linux x86_64 normal and _GLIBCXX_DEBUG modes for 
C++11/C++14/C++17/C++20.


Ok to commit ?

François


On 14/12/21 2:12 pm, Jonathan Wakely wrote:

On Tue, 14 Dec 2021 at 06:53, François Dumont wrote:

Hi

 Any conclusion regarding this thread ?

François


On 06/10/21 7:25 pm, François Dumont wrote:
> I forgot to ask if with this patch this overload:
>
>   template
>     constexpr auto
>     __to_address(const _Ptr& __ptr, _None...) noexcept
>     {
>   if constexpr
(is_base_of_v<__gnu_debug::_Safe_iterator_base, _Ptr>)
>     return std::__to_address(__ptr.base().operator->());
>   else
>     return std::__to_address(__ptr.operator->());
>     }
>
> should be removed ?


No, definitely not.

That is the default overload for types that do not have a 
pointer_traits::to_address specialization. If you remove it, 
__to_address won't work for fancy pointers or any other pointer-like 
types. That would completely break it.


The purpose of C++20's std::to_address is to get a real pointer from a 
pointer-like type. Using it with iterators is not the primary use 
case, but it needs to work with contiguous iterators because those are 
pointer-like. I made it work correctly with __normal_iterator because 
that was necessary to support the uses of std::__to_address in  
and , but I removed those uses in:


https://gcc.gnu.org/g:247bac507e63b32d4dc23ef1c55f300aafea24c6 

https://gcc.gnu.org/g:b83b810ac440f72e7551b6496539e60ac30c0d8a 



So now we don't really need the C++17 version of std::__to_address to 
work with __normal_iterator at all.


I think it's OK to add the overload for __normal_iterator though, but 
only for C++11/14/17, because the default std::__to_address handles 
__normal_iterator correctly in C++20.



> Or perhaps just the _Safe_iterator_base branch in it ?


Yes, you can just remove that branch, because your new overload 
handles it.



>

> On 06/10/21 7:18 pm, François Dumont wrote:
>> Here is another proposal with the __to_address overload.
>>
>> I preferred to let it open to any kind of __normal_iterator
>> instantiation cause afaics std::vector supports fancy pointer
types.
>> It is better if __to_address works fine also in this case, no ?


 If we intend to support that, then we should verify it in the 
testsuite, using __gnu_test::CustomPointerAlloc.



>>     libstdc++: Overload std::__to_address for
>> __gnu_cxx::__normal_iterator.
>>
>>     Prefer to overload __to_address to partially specialize
>> std::pointer_traits because
>>     std::pointer_traits would be mostly useless. In the case of
>> __gnu_debug::_Safe_iterator
>>     the to_pointer method is even impossible to implement
correctly
>> because we are missing
>>     the parent container to associate the iterator to.


To record additional rationale in the gi

Re: [PATCH] c++: Allow constexpr decltype(auto) [PR102229]

2021-12-15 Thread Jason Merrill via Gcc-patches

On 12/15/21 15:20, Marek Polacek wrote:

On Mon, Dec 13, 2021 at 10:02:24AM -0500, Jason Merrill wrote:

On 12/10/21 17:29, Marek Polacek wrote:

My r11-2202 was trying to enforce [dcl.type.auto.deduct]/4, which says
"If the placeholder-type-specifier is of the form type-constraint[opt]
decltype(auto), T shall be the placeholder alone."  But this made us
reject 'constexpr decltype(auto)', which, after clarification from CWG,
should be valid.  [dcl.type.auto.deduct]/4 is supposed to be a syntactic
constraint, not semantic, so it's OK that the constexpr marks the object
as const.

As a consequence, checking TYPE_QUALS in do_auto_deduction is too late,
and we have a FIXME there anyway.  So in this patch I'm attempting to
detect 'const decltype(auto)' earlier.  If I'm going to use TYPE_QUALS,
it needs to happen before we mark the object as const due to constexpr,
that is, before grokdeclarator's

/* A `constexpr' specifier used in an object declaration declares
   the object as `const'.  */
if (constexpr_p && innermost_code != cdk_function)
  ...

Constrained decltype(auto) was a little problem, hence the TYPENAME
check.  But in a typename context you can't use decltype(auto) anyway,
I think.


I wonder about checking even earlier, like in cp_parser_decl_specifier_seq?


That _almost_ works except it wouldn't detect things like 'decltype(auto)*'
because the '*' isn't parsed in cp_parser_decl_specifier_seq, only in
declarator.  So the


Ah, right.


  if (a != type)
{
  error_at (loc, "%qT as type rather than plain "
"%", type);

check wouldn't work.  Maybe I could just check if the next token is * or &
and give an error then.


No, checking in grokdeclarator makes sense.


Constrained decltype(auto) was a little problem, hence the TYPENAME
check.  But in a typename context you can't use decltype(auto) anyway,
I think.


Maybe check PLACEHOLDER_TYPE_CONSTRAINTS in check_decltype_auto instead?

Jason



Re: [PATCH] print-tree: dump DECL_LANG_FLAG_8

2021-12-15 Thread Jason Merrill via Gcc-patches

On 12/15/21 15:11, Patrick Palka wrote:

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?


OK.


gcc/ChangeLog:

* print-tree.c (print_node) : Dump
DECL_LANG_FLAG_8.
---
  gcc/print-tree.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/print-tree.c b/gcc/print-tree.c
index b5dc523fcb1..297492ad51c 100644
--- a/gcc/print-tree.c
+++ b/gcc/print-tree.c
@@ -484,6 +484,8 @@ print_node (FILE *file, const char *prefix, tree node, int 
indent,
fputs (" decl_6", file);
  if (DECL_LANG_FLAG_7 (node))
fputs (" decl_7", file);
+ if (DECL_LANG_FLAG_8 (node))
+   fputs (" decl_8", file);
  
  	  mode = DECL_MODE (node);

  fprintf (file, " %s", GET_MODE_NAME (mode));




Re: [committed] libstdc++: Specialize std::pointer_traits<__normal_iterator>

2021-12-15 Thread Jonathan Wakely via Gcc-patches
On Wed, 15 Dec 2021 at 21:16, François Dumont  wrote:

> Here is what I eventually would like to commit.
>
> I was not able to remove the _Safe_iterator_base branch in ptr_traits.h.
> When adding the _Safe_iterator overload in C++20 and removing the branch
> the 20_util/to_address/debug.cc test started to fail because it was not
> calling my overload. I tried to declare the overload in ptr_traits.h
> directly so it is known at the time it is used in std::to_address but then
> it failed to match it with the implementation in safe_iterator.h. The
> declaration was not easy to do and I guess I had it wrong.
>
> But it does not matter cause I think this version is the simplest one (as
> it does not change a lot of code).
>
> libstdc++: Overload std::__to_address for __gnu_cxx::__normal_iterator.
>
> Prefer to overload __to_address to partially specialize
> std::pointer_traits because
> std::pointer_traits would be mostly useless. Moreover partial
> specialization of
> pointer_traits<__normal_iterator> fails to rebind C, so you get
> incorrect types
> like __normal_iterator>. In the case of
> __gnu_debug::_Safe_iterator
> the to_pointer method is impossible to implement correctly because we
> are missing
> the parent container to associate the iterator to.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/stl_iterator.h
> (std::pointer_traits<__gnu_cxx::__normal_iterator<>>): Remove.
> (std::__to_address(const __gnu_cxx::__normal_iterator<>&)):
> New for C++11 to C++17.
> * include/debug/safe_iterator.h
> (std::__to_address(const
> __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<>, _Sequence>&)):
> New for C++11 to C++17.
> * testsuite/24_iterators/normal_iterator/to_address.cc: Add
> check on std::vector::iterator
> to validate both __gnu_cxx::__normal_iterator<> __to_address
> overload in normal mode and
> __gnu_debug::_Safe_iterator in _GLIBCXX_DEBUG mode.
>
> Tested under Linux x86_64 normal and _GLIBCXX_DEBUG modes for
> C++11/C++14/C++17/C++20.
>
> Ok to commit ?
>

OK, thanks!



> François
>
>
> On 14/12/21 2:12 pm, Jonathan Wakely wrote:
>
> On Tue, 14 Dec 2021 at 06:53, François Dumont wrote:
>
>> Hi
>>
>>  Any conclusion regarding this thread ?
>>
>> François
>>
>>
>> On 06/10/21 7:25 pm, François Dumont wrote:
>> > I forgot to ask if with this patch this overload:
>> >
>> >   template
>> > constexpr auto
>> > __to_address(const _Ptr& __ptr, _None...) noexcept
>> > {
>> >   if constexpr (is_base_of_v<__gnu_debug::_Safe_iterator_base,
>> _Ptr>)
>> > return std::__to_address(__ptr.base().operator->());
>> >   else
>> > return std::__to_address(__ptr.operator->());
>> > }
>> >
>> > should be removed ?
>>
>>
> No, definitely not.
>
> That is the default overload for types that do not have a
> pointer_traits::to_address specialization. If you remove it, __to_address
> won't work for fancy pointers or any other pointer-like types. That would
> completely break it.
>
> The purpose of C++20's std::to_address is to get a real pointer from a
> pointer-like type. Using it with iterators is not the primary use case, but
> it needs to work with contiguous iterators because those are pointer-like.
> I made it work correctly with __normal_iterator because that was necessary
> to support the uses of std::__to_address in  and , but I
> removed those uses in:
>
> https://gcc.gnu.org/g:247bac507e63b32d4dc23ef1c55f300aafea24c6
> https://gcc.gnu.org/g:b83b810ac440f72e7551b6496539e60ac30c0d8a
>
> So now we don't really need the C++17 version of std::__to_address to work
> with __normal_iterator at all.
>
> I think it's OK to add the overload for __normal_iterator though, but only
> for C++11/14/17, because the default std::__to_address handles
> __normal_iterator correctly in C++20.
>
>
> > Or perhaps just the _Safe_iterator_base branch in it ?
>>
>
> Yes, you can just remove that branch, because your new overload handles it.
>
>
> >
>
>> > On 06/10/21 7:18 pm, François Dumont wrote:
>> >> Here is another proposal with the __to_address overload.
>> >>
>> >> I preferred to let it open to any kind of __normal_iterator
>> >> instantiation cause afaics std::vector supports fancy pointer types.
>> >> It is better if __to_address works fine also in this case, no ?
>>
>
>  If we intend to support that, then we should verify it in the testsuite,
> using __gnu_test::CustomPointerAlloc.
>
>
> >> libstdc++: Overload std::__to_address for
>> >> __gnu_cxx::__normal_iterator.
>> >>
>> >> Prefer to overload __to_address to partially specialize
>> >> std::pointer_traits because
>> >> std::pointer_traits would be mostly useless. In the case of
>> >> __gnu_debug::_Safe_iterator
>> >> the to_pointer method is even impossible to implement correctly
>> >> because we are missing
>> >> the parent container to associate the iterator to.
>>
>
> To 

Re: [PATCH] c++: Allow constexpr decltype(auto) [PR102229]

2021-12-15 Thread Marek Polacek via Gcc-patches
On Wed, Dec 15, 2021 at 04:17:37PM -0500, Jason Merrill wrote:
> On 12/15/21 15:20, Marek Polacek wrote:
> > On Mon, Dec 13, 2021 at 10:02:24AM -0500, Jason Merrill wrote:
> > > On 12/10/21 17:29, Marek Polacek wrote:
> > > > My r11-2202 was trying to enforce [dcl.type.auto.deduct]/4, which says
> > > > "If the placeholder-type-specifier is of the form type-constraint[opt]
> > > > decltype(auto), T shall be the placeholder alone."  But this made us
> > > > reject 'constexpr decltype(auto)', which, after clarification from CWG,
> > > > should be valid.  [dcl.type.auto.deduct]/4 is supposed to be a syntactic
> > > > constraint, not semantic, so it's OK that the constexpr marks the object
> > > > as const.
> > > > 
> > > > As a consequence, checking TYPE_QUALS in do_auto_deduction is too late,
> > > > and we have a FIXME there anyway.  So in this patch I'm attempting to
> > > > detect 'const decltype(auto)' earlier.  If I'm going to use TYPE_QUALS,
> > > > it needs to happen before we mark the object as const due to constexpr,
> > > > that is, before grokdeclarator's
> > > > 
> > > > /* A `constexpr' specifier used in an object declaration declares
> > > >the object as `const'.  */
> > > > if (constexpr_p && innermost_code != cdk_function)
> > > >   ...
> > > > 
> > > > Constrained decltype(auto) was a little problem, hence the TYPENAME
> > > > check.  But in a typename context you can't use decltype(auto) anyway,
> > > > I think.
> > > 
> > > I wonder about checking even earlier, like in 
> > > cp_parser_decl_specifier_seq?
> > 
> > That _almost_ works except it wouldn't detect things like 'decltype(auto)*'
> > because the '*' isn't parsed in cp_parser_decl_specifier_seq, only in
> > declarator.  So the
> 
> Ah, right.
> 
> >   if (a != type)
> > {
> >   error_at (loc, "%qT as type rather than plain "
> > "%", type);
> > 
> > check wouldn't work.  Maybe I could just check if the next token is * or &
> > and give an error then.
> 
> No, checking in grokdeclarator makes sense.
> 
> > Constrained decltype(auto) was a little problem, hence the TYPENAME
> > check.  But in a typename context you can't use decltype(auto) anyway,
> > I think.
> 
> Maybe check PLACEHOLDER_TYPE_CONSTRAINTS in check_decltype_auto instead?

I've tried that, but that is also true for

const constexpr C decltype(auto) x2 = 0;
const constexpr C decltype(auto) fn4() { return 0; }

where we do want to check if the auto has quals.  Therefore the not very
pretty TYPENAME check :/.

Marek



Re: [PATCH] c++: ahead-of-time overload set pruning for non-dep calls

2021-12-15 Thread Jason Merrill via Gcc-patches

On 12/15/21 12:49, Patrick Palka wrote:

This patch makes us remember the function selected by overload
resolution during ahead of time processing of a non-dependent call
expression, so that we avoid repeating most of the work of overload
resolution at instantiation time.  This mirrors what we already do for
non-dependent operator expressions via build_min_non_dep_op_overload.

Some caveats:

  * When processing ahead of time a non-dependent call to a member
function template inside a class template (as in
g++.dg/template/deduce4.C), we end up generating an "inverted" partial
instantiation such as S::foo(), the kinds of which we're
apparently not prepared to fully instantiate (e.g. tsubst_baselink
mishandles it).  So this patch disables this optimization for such
functions and adds a FIXME.


I wonder if it would be worthwhile to build a TEMPLATE_ID_EXPR to 
remember the deduced template args, even if we are failing to remember 
the actual function?



  * WHen trying to make the instantiation machinery handle these partial
instantiations, I made a couple of changes in register_specialization
and tsubst_function_decl that get us closer to handling such partial
instantiations and that seem like improvements on their own, so this
patch includes these changes.


The tsubst_function_decl change makes me nervous; surely there was some 
reason that function wasn't that way in the first place.  Let's hold 
these changes for stage 1 if they aren't actually fixing anything.



   * This change triggered a latent FUNCTION_DECL pretty printing issue
 in cpp0x/error2.C -- since we now resolve the call to foo<0> ahead
 of time, the error now looks like:

   error: expansion pattern ‘foo()()=0’ contains no parameter pack

 where the FUNCTION_DECL foo is clearly misprinted.  But this
 pretty-printing issue could be reproduced without this patch if
 we replace foo with an ordinary function.  Since this testcase was
 added to verify pretty printing of TEMPLATE_ID_EXPR, I work around
 this test failure by making the call to foo type-dependent and thus
 immune to this ahead of time pruning.

   * We now reject parts of cpp0x/fntmp-equiv1.C because we notice that
 the call d(f, b) in

   template  e d();

 isn't constexpr because the (resolved) d isn't.  I tried fixing this
 by making d constexpr, but then the call to d from main becomes
 ambiguous.  So I settled with removing this part of the testcase.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Also tested on cmcstl2 and range-v3.

gcc/cp/ChangeLog:

* call.c (build_new_method_call): For a non-dependent call
expression inside a template, returning a templated tree
whose overload set contains just the selected function.
* pt.c (register_specialization): Check only the innermost
template args for dependence in the early exit test.
(tsubst_function_decl): Simplify obtaining the template arguments
for a partial instantiation.
* semantics.c (finish_call_expr): As with build_new_method_call.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/error2.C: Make the call to foo type-dependent in
order to avoid latent pretty-printing issue for FUNCTION_DECL
inside MODOP_EXPR.
* g++.dg/cpp0x/fntmp-equiv1.C: Remove ill-formed parts of
testcase that we now diagnose.
* g++.dg/template/non-dependent16.C: New test.
* g++.dg/template/non-dependent16a.C: New test.
---
  gcc/cp/call.c | 17 +
  gcc/cp/pt.c   | 18 ++---
  gcc/cp/semantics.c| 15 
  gcc/testsuite/g++.dg/cpp0x/error2.C   |  4 +-
  gcc/testsuite/g++.dg/cpp0x/fntmp-equiv1.C |  4 --
  .../g++.dg/template/non-dependent16.C | 37 +++
  .../g++.dg/template/non-dependent16a.C| 36 ++
  7 files changed, 111 insertions(+), 20 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/non-dependent16.C
  create mode 100644 gcc/testsuite/g++.dg/template/non-dependent16a.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 53a391cbc6b..92d96c19f5c 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -11165,6 +11165,23 @@ build_new_method_call (tree instance, tree fns, vec **args,
}
if (INDIRECT_REF_P (call))
call = TREE_OPERAND (call, 0);
+
+  /* Prune all but the selected function from the original overload
+set so that we can avoid some duplicate work at instantiation time.  */
+  if (really_overloaded_fn (fns))
+   {
+ if (DECL_TEMPLATE_INFO (fn)
+ && DECL_MEMBER_TEMPLATE_P (DECL_TI_TEMPLATE (fn))
+ && dependent_type_p (DECL_CONTEXT (fn)))
+   /* FIXME: We're not prepared to fully instantiate "inverted"
+  partial instantiations such as A:

Re: [PATCH] rs6000: Refactor altivec_build_resolved_builtin

2021-12-15 Thread Bill Schmidt via Gcc-patches
Hi!

On 12/15/21 12:16 PM, Segher Boessenkool wrote:
>> +  /* Note:  vec_nand also works but opt changes vec_nand's
>> + to vec_nor's anyway.  */
> Maybe there should be a vec_not?  There is one at the RTL level (called
> one_cmpl2).

As I recall, we have an issue open for this already... but nobody's grabbed it 
yet.

Thanks for the review!

(I'll change all those VEC_* things to lower-case.)

Bill



Re: [PATCH] c++: Allow constexpr decltype(auto) [PR102229]

2021-12-15 Thread Jason Merrill via Gcc-patches

On 12/15/21 16:28, Marek Polacek wrote:

On Wed, Dec 15, 2021 at 04:17:37PM -0500, Jason Merrill wrote:

On 12/15/21 15:20, Marek Polacek wrote:

On Mon, Dec 13, 2021 at 10:02:24AM -0500, Jason Merrill wrote:

On 12/10/21 17:29, Marek Polacek wrote:

My r11-2202 was trying to enforce [dcl.type.auto.deduct]/4, which says
"If the placeholder-type-specifier is of the form type-constraint[opt]
decltype(auto), T shall be the placeholder alone."  But this made us
reject 'constexpr decltype(auto)', which, after clarification from CWG,
should be valid.  [dcl.type.auto.deduct]/4 is supposed to be a syntactic
constraint, not semantic, so it's OK that the constexpr marks the object
as const.

As a consequence, checking TYPE_QUALS in do_auto_deduction is too late,
and we have a FIXME there anyway.  So in this patch I'm attempting to
detect 'const decltype(auto)' earlier.  If I'm going to use TYPE_QUALS,
it needs to happen before we mark the object as const due to constexpr,
that is, before grokdeclarator's

 /* A `constexpr' specifier used in an object declaration declares
the object as `const'.  */
 if (constexpr_p && innermost_code != cdk_function)
   ...

Constrained decltype(auto) was a little problem, hence the TYPENAME
check.  But in a typename context you can't use decltype(auto) anyway,
I think.


I wonder about checking even earlier, like in cp_parser_decl_specifier_seq?


That _almost_ works except it wouldn't detect things like 'decltype(auto)*'
because the '*' isn't parsed in cp_parser_decl_specifier_seq, only in
declarator.  So the


Ah, right.


   if (a != type)
 {
   error_at (loc, "%qT as type rather than plain "
 "%", type);

check wouldn't work.  Maybe I could just check if the next token is * or &
and give an error then.


No, checking in grokdeclarator makes sense.


Constrained decltype(auto) was a little problem, hence the TYPENAME
check.  But in a typename context you can't use decltype(auto) anyway,
I think.


Maybe check PLACEHOLDER_TYPE_CONSTRAINTS in check_decltype_auto instead?


I've tried that, but that is also true for

const constexpr C decltype(auto) x2 = 0;
const constexpr C decltype(auto) fn4() { return 0; }

where we do want to check if the auto has quals.  Therefore the not very
pretty TYPENAME check :/.


Aha.  The patch is OK.

Jason



Re: [PATCH] c++: processing_template_decl vs template depth [PR103408]

2021-12-15 Thread Jason Merrill via Gcc-patches

On 12/15/21 09:09, Jakub Jelinek wrote:

On Wed, Dec 15, 2021 at 08:58:45AM -0500, Patrick Palka wrote:

Oops, thanks Jakub, I didn't realize we don't run the testsuite with
-std=c++23 yet.

I guess it'd be too expensive to add another std to the testing matrix
at this point, but I wonder if the test harness should at least run the
testcases inside cpp23/ with -std=c++23?  Something like the following
seems to work.

(And since -std=c++11 also isn't part of the default testing matrix
anymore, perhaps we could give the testscases inside cpp0x/ a similar
treatment too?)


I think up to Jason, but I'd say if we do it, we should do it for all those
language version subdirectories and make sure we only add those extra modes
temporarily (for that subdir files only) and only if they aren't already
present in the list we cycle through (to avoid running it e.g. with
-std=c++23 twice).


Sounds good.

Note that there's also the 'check-c++-all' make target to run the full 
set of standards.



Subject: [PATCH] testsuite: run testcases in g++.dg/cpp23/ with -std=c++23

gcc/testsuite/ChangeLog:

* lib/g++-dg.exp (g++-dg-runtest): Add -std=c++23 to option_list
for testcases in cpp23/.
---
  gcc/testsuite/lib/g++-dg.exp | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/g++-dg.exp b/gcc/testsuite/lib/g++-dg.exp
index fd06d278faa..79fe3db014e 100644
--- a/gcc/testsuite/lib/g++-dg.exp
+++ b/gcc/testsuite/lib/g++-dg.exp
@@ -38,6 +38,8 @@ proc g++-dg-runtest { testcases flags default-extra-flags } {
continue
}
  
+	set nshort [file tail [file dirname $test]]/[file tail $test]

+
# If the testcase specifies a standard, use that one.
# If not, run it under both standards, allowing GNU extensions
# if there's a dg-options line.
@@ -61,12 +63,13 @@ proc g++-dg-runtest { testcases flags default-extra-flags } 
{
} elseif { $x eq "impcx" } then { set x "23 
-fimplicit-constexpr" }
lappend option_list "${std_prefix}$x"
}
+   if [string match "cpp23/*" $nshort] {
+   lappend option_list "${std_prefix}23"
+   }
} else {
set option_list { "" }
}
  
-	set nshort [file tail [file dirname $test]]/[file tail $test]

-
foreach flags_t $option_list {
verbose "Testing $nshort, $flags $flags_t" 1
dg-test $test "$flags $flags_t" ${default-extra-flags}


Jakub





  1   2   >