Re: Fix PR tree-optimization/77808, ICE in duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439
On 2 October 2016 at 23:05, Doug Gilmore wrote: > Hi Christophe, > >> From: Christophe Lyon [christophe.l...@linaro.org] >> Sent: Saturday, October 01, 2016 7:57 AM >> To: Doug Gilmore >> Cc: gcc-patches@gcc.gnu.org >> Subject: Re: Fix PR tree-optimization/77808, ICE in >> duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439 >> >> Hi Doug, >> >> ... >> I can confirm that your patch fixes the ICE I was seeing. >> >> However, the new testcase does not pass on low end >> architectures: >> cc1: warning: -fprefetch-loop-arrays not supported for this target >> (try -march switches) >> >> Can you add a guard? >> >> Thanks, >> >> Christophe > I updated the test to only run on X86, MIPS and AARCH64. Is that OK? > I'm afraid not. The ICE occurred on some arm targets. By "low end" I meant armv5t for example, as opposed to armv7t. Is there a suitable effective target? Thanks, Christophe > Thanks, > > Doug
Re: [Patch 3/11] Implement TARGET_C_EXCESS_PRECISION for s390
On Fri, Sep 30, 2016 at 05:57:45PM +, Joseph Myers wrote: > On Fri, 30 Sep 2016, Jeff Law wrote: > > > On 09/30/2016 11:34 AM, Joseph Myers wrote: > > > On Fri, 30 Sep 2016, James Greenhalgh wrote: > > > > > > > + case EXCESS_PRECISION_TYPE_STANDARD: > > > > + case EXCESS_PRECISION_TYPE_IMPLICIT: > > > > + /* Otherwise, the excess precision we want when we are > > > > + in a standards compliant mode, and the implicit precision we > > > > + provide can be identical. */ > > > > + return FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE; > > > > > > That's wrong for EXCESS_PRECISION_TYPE_IMPLICIT. There is no implicit > > > promotion in the back end (and really there shouldn't be any excess > > > precision here at all, and double_t in glibc should be fixed along with a > > > GCC change to remove this mistake). > > Sorry, change to a NAK. > > > > Joseph, what's the right thing to do here? > > (a) The present patch would keep the existing value of FLT_EVAL_METHOD. > But the existing value is inaccurate for the default compilation mode, > when there is no implicit promotion in the back end, and doing so means > suboptimal code in libgcc and glibc because it does things to handle > excess precision that isn't actually there (and quite possibly in code > elsewhere that looks at FLT_EVAL_METHOD). > > (b) Handling EXCESS_PRECISION_TYPE_IMPLICIT like > EXCESS_PRECISION_TYPE_FAST would accurately describe what the back end > does. It would mean that the default FLT_EVAL_METHOD is 0, which is a > more accurate description of how the compiler actually behaves, and would > avoid the suboptimal code in libgcc and glibc. It would however mean that > unless -fexcess-precision=standard is used, FLT_EVAL_METHOD (accurate) is > out of synx with float_t in math.h (inaccurate). > > (c) Removing all special excess precision for S/390 from GCC, and changing > float_t to float in glibc, is logically correct and produces optimal code. > float_t does not appear in the ABI of any glibc function; in principle it > could affect the ABIs of other libraries, but I don't think that's > particularly likely. > > The only argument for (a) is that's it's semantics-preserving - it's just > that the preserved semantics are nonsensical and involve an inaccurate > value of FLT_EVAL_METHOD in the default compilation mode. I'm happy progressing whichever of a) or b) would be preferred by the the s390 maintainers. But I'd be uncomfortable making the wider changes in c) as I've got no access to an s390 build and test environment in which I have any confidence, nor do I know the s390 port history that led to the 'typedef double float_t' in glibc. Regardless of which approach is chosen, I'll be sure to update the patch with a comment paraphrasing your suggestions above. Thanks, James
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On 08/18/2016 03:36 PM, Jakub Jelinek wrote: > On Thu, May 12, 2016 at 04:12:21PM +0200, Martin Liška wrote: >> --- a/gcc/asan.c >> +++ b/gcc/asan.c >> @@ -243,6 +243,11 @@ static unsigned HOST_WIDE_INT asan_shadow_offset_value; >> static bool asan_shadow_offset_computed; >> static vec sanitized_sections; >> >> +/* Set of variable declarations that are going to be guarded by >> + use-after-scope sanitizer. */ >> + >> +static hash_set asan_handled_variables (13); > > Doesn't this introduce yet another global ctor? If yes (and we > unfortunately have already way too many), I'd strongly prefer to avoid that, > use pointer to hash_set or something similar. Hello It does, I did pointer in second version of that patch. > >> +/* Depending on POISON flag, emit a call to poison (or unpoison) stack >> memory >> + allocated for local variables, localted in OFFSETS. LENGTH is number >> + of OFFSETS, BASE is the register holding the stack base, >> + against which OFFSETS array offsets are relative to. BASE_OFFSET >> represents >> + an offset requested by alignment and similar stuff. */ >> + >> +static void >> +asan_poison_stack_variables (rtx shadow_base, rtx base, >> + HOST_WIDE_INT base_offset, >> + HOST_WIDE_INT *offsets, int length, >> + tree *decls, bool poison) >> +{ >> + if (asan_sanitize_use_after_scope ()) >> +for (int l = length - 2; l > 0; l -= 2) >> + { > > I think this is unfortunate, it leads to: > movl$-235802127, 2147450880(%rax) > movl$-185335552, 2147450884(%rax) > movl$-202116109, 2147450888(%rax) > movb$-8, 2147450884(%rax) > movb$-8, 2147450885(%rax) > (e.g. on use-after-scope-1.c). > The asan_emit_stack_protection function already walks all the > entries in the offsets array in both of the > for (l = length; l; l -= 2) > loops, so please handle the initial poisoning and final unpoisoning there > as well. The goal is that for variables that you want poison-after-scope > at the start of the function (btw, I've noticed that current SVN LLVM > doesn't bother with it and thus doesn't track "use before scope" (before the > scope is entered for the first time, maybe we shouldn't either, that would > catch only compiler bugs rather than user code bugs, right?)) have 0xf8 > on all corresponding bytes including the one that would otherwise have 0x01 > through 0x07. When unpoisoning at the end of the function, again you should > combine that with unpoisoning of the red zone and partial zone bytes plus > the last 0x01 through 0x07, etc. I also decided to not to handle "use before scope" issues and thus I do not poison stack variables at the very beginning of a function. As you noticed, the format stack poisoning/unpoisoning code was kind of ugly. Current unpoisoning code (trunk version) basically clears the whole shadow memory for a stack frame except local variables that are not touched by use-after-scope machinery. That eventually leads to a bit easier code, producing the shadow clearing stuff. > > Plus, as I've mentioned before, it would be nice to optimize - for ASAN_MARK > unpoison appearing strictly before (i.e. dominating) the first (non-shadow) > memory read > or write in the function (explicit or possible through function calls etc.) > you really don't need to unpoison (depending on whether we follow LLVM as > mentioned above then it can be removed without anything, or the decl needs > to be somehow marked and tell asan_emit_stack_protection it shouldn't poison > it at the start), and for ASAN_MARK poisoning appearing after the last > load/store in the function (post dominating those, you don't care about > noreturn though) you can combine that (remove the ASAN_MARK) with letting > asan_emit_stack_protection know it doesn't need to unpoison. Fully agree with that approach, however I would be happy to do that as a follow-up as it's not going to so trivial.. > >> +char c = poison ? ASAN_STACK_MAGIC_USE_AFTER_SCOPE : 0; >> +for (unsigned i = 0; i < shadow_size; ++i) >> + { >> +emit_move_insn (var_mem, gen_int_mode (c, QImode)); >> +var_mem = adjust_address (var_mem, QImode, 1); > > When you combine it with the loop, you can also use the infrastructure to > handle it 4 bytes at a time. Current implementation can handle up to 4 bytes at a time. I'm wondering we can do even better for targets with 64-bits memory stores? How can one get such info about a target? > > Another thing I've noticed is that the inline expansion of > __asan_unpoison_stack_memory you emit looks buggy. > In use-after-scope-1.c I see: > _9 = (unsigned long) &my_char; > _10 = _9 >> 3; > _11 = _10 + 2147450880; > _12 = (signed char *) _11; > MEM[(short int *)_12] = 0; > > That would be fine only for 16 byte long my_char, but we have instead 9 byte > one. So I believe in that case we need to
[PATCH, 02/N] Introduce tests for -fsanitize-address-use-after-scope
Following patch adjusts expected test dumps and also introduces various new tests. Martin >From 4ddafab1e533a1d3580d2f883955d61fe23aa353 Mon Sep 17 00:00:00 2001 From: marxin Date: Mon, 19 Sep 2016 17:39:29 +0200 Subject: [PATCH 3/3] Introduce tests for -fsanitize-address-use-after-scope gcc/testsuite/ChangeLog: 2016-09-26 Martin Liska * c-c++-common/asan/force-inline-opt0-1.c: Disable -f-sanitize-address-use-after-scope. * c-c++-common/asan/inc.c: Change number of expected ASAN_CHECK internal fn calls. * g++.dg/asan/use-after-scope-1.C: New test. * g++.dg/asan/use-after-scope-2.C: Likewise. * g++.dg/asan/use-after-scope-3.C: Likewise. * g++.dg/asan/use-after-scope-types-1.C: Likewise. * g++.dg/asan/use-after-scope-types-2.C: Likewise. * g++.dg/asan/use-after-scope-types-3.C: Likewise. * g++.dg/asan/use-after-scope-types-4.C: Likewise. * g++.dg/asan/use-after-scope-types-5.C: Likewise. * g++.dg/asan/use-after-scope-types.h: Likewise. * gcc.dg/asan/use-after-scope-1.c: Likewise. * gcc.dg/asan/use-after-scope-2.c: Likewise. * gcc.dg/asan/use-after-scope-3.c: Likewise. * gcc.dg/asan/use-after-scope-4.c: Likewise. * gcc.dg/asan/use-after-scope-5.c: Likewise. * gcc.dg/asan/use-after-scope-6.c: Likewise. * gcc.dg/asan/use-after-scope-7.c: Likewise. * gcc.dg/asan/use-after-scope-8.c: Likewise. * gcc.dg/asan/use-after-scope-goto-1.c: Likewise. * gcc.dg/asan/use-after-scope-goto-2.c: Likewise. --- .../c-c++-common/asan/force-inline-opt0-1.c| 1 + gcc/testsuite/c-c++-common/asan/inc.c | 3 +- gcc/testsuite/g++.dg/asan/use-after-scope-1.C | 21 ++ gcc/testsuite/g++.dg/asan/use-after-scope-2.C | 40 ++ gcc/testsuite/g++.dg/asan/use-after-scope-3.C | 22 ++ .../g++.dg/asan/use-after-scope-types-1.C | 17 .../g++.dg/asan/use-after-scope-types-2.C | 17 .../g++.dg/asan/use-after-scope-types-3.C | 17 .../g++.dg/asan/use-after-scope-types-4.C | 17 .../g++.dg/asan/use-after-scope-types-5.C | 17 gcc/testsuite/g++.dg/asan/use-after-scope-types.h | 30 ++ gcc/testsuite/gcc.dg/asan/use-after-scope-1.c | 18 + gcc/testsuite/gcc.dg/asan/use-after-scope-2.c | 47 ++ gcc/testsuite/gcc.dg/asan/use-after-scope-3.c | 20 + gcc/testsuite/gcc.dg/asan/use-after-scope-4.c | 16 gcc/testsuite/gcc.dg/asan/use-after-scope-5.c | 27 + gcc/testsuite/gcc.dg/asan/use-after-scope-6.c | 15 +++ gcc/testsuite/gcc.dg/asan/use-after-scope-7.c | 15 +++ gcc/testsuite/gcc.dg/asan/use-after-scope-8.c | 14 +++ gcc/testsuite/gcc.dg/asan/use-after-scope-goto-1.c | 47 ++ gcc/testsuite/gcc.dg/asan/use-after-scope-goto-2.c | 25 21 files changed, 445 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-1.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-2.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-3.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-types-1.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-types-2.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-types-3.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-types-4.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-types-5.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-types.h create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-1.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-2.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-3.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-4.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-5.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-6.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-7.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-8.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-goto-1.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-goto-2.c diff --git a/gcc/testsuite/c-c++-common/asan/force-inline-opt0-1.c b/gcc/testsuite/c-c++-common/asan/force-inline-opt0-1.c index 0576155..2e156f7 100644 --- a/gcc/testsuite/c-c++-common/asan/force-inline-opt0-1.c +++ b/gcc/testsuite/c-c++-common/asan/force-inline-opt0-1.c @@ -2,6 +2,7 @@ (before and after inlining) */ /* { dg-do compile } */ +/* { dg-options "-fno-sanitize-address-use-after-scope" } */ /* { dg-final { scan-assembler-not "__asan_report_load" } } */ __attribute__((always_inline)) diff --git a/gcc/testsuite/c-c++-common/asan/inc.c b/gcc/testsuite/c-c++-common/asan/inc.c index 5abf373..98121d2 100644 --- a/gcc/testsuite/c-c++-common/asan/inc.c +++ b/gcc/testsuite/c-c++-common/asan/inc.c @@ -16,5 +16,6 @@ main () return 0; } -/* { dg-fin
Re: [PATCH][RTL ifcvt] Transform (X == CST) ? -CST : Y into (X == CST) ? -X : Y when conditional negation is available
On 02/10/16 20:03, Andrew Pinski wrote: On Sun, Oct 2, 2016 at 7:50 AM, Jeff Law wrote: On 10/02/2016 04:48 AM, Andreas Schwab wrote: This miscompiles the stage2 ada compiler. No target identified. He reported it in a bug report, aarch64-linux-gnu. As I mentioned in PR 77816 I can't reproduce the Fortran failures reported and it will take me a while to setup an Ada bootstrap environment. So I have reverted the patch in the interest of not blocking folks while I try to reproduce/fix. Thanks, Kyrill Thanks, Andrew jeff
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)
On Mon, Oct 03, 2016 at 11:27:38AM +0200, Martin Liška wrote: > > Plus, as I've mentioned before, it would be nice to optimize - for ASAN_MARK > > unpoison appearing strictly before (i.e. dominating) the first (non-shadow) > > memory read > > or write in the function (explicit or possible through function calls etc.) > > you really don't need to unpoison (depending on whether we follow LLVM as > > mentioned above then it can be removed without anything, or the decl needs > > to be somehow marked and tell asan_emit_stack_protection it shouldn't poison > > it at the start), and for ASAN_MARK poisoning appearing after the last > > load/store in the function (post dominating those, you don't care about > > noreturn though) you can combine that (remove the ASAN_MARK) with letting > > asan_emit_stack_protection know it doesn't need to unpoison. > > Fully agree with that approach, however I would be happy to do that as a > follow-up as > it's not going to so trivial.. Ok. > >> + char c = poison ? ASAN_STACK_MAGIC_USE_AFTER_SCOPE : 0; > >> + for (unsigned i = 0; i < shadow_size; ++i) > >> +{ > >> + emit_move_insn (var_mem, gen_int_mode (c, QImode)); > >> + var_mem = adjust_address (var_mem, QImode, 1); > > > > When you combine it with the loop, you can also use the infrastructure to > > handle it 4 bytes at a time. > > Current implementation can handle up to 4 bytes at a time. I'm wondering we > can > do even better for targets with 64-bits memory stores? How can one get such > info about a target? It is not just the question of whether the target has fast 64-bit memory stores, but also whether the constants you want to store are reasonably cheap. E.g. on x86_64, movabsq is kind of expensive, so storing 64-bit 0 is cheap, but storing 64-bit 0xfdfdfdfdfdfdfdfdULL might be better done as 2 32-bit stores, perhaps both for speed and size. > > > > Another thing I've noticed is that the inline expansion of > > __asan_unpoison_stack_memory you emit looks buggy. > > In use-after-scope-1.c I see: > > _9 = (unsigned long) &my_char; > > _10 = _9 >> 3; > > _11 = _10 + 2147450880; > > _12 = (signed char *) _11; > > MEM[(short int *)_12] = 0; > > > > That would be fine only for 16 byte long my_char, but we have instead 9 byte > > one. So I believe in that case we need to store > > 0x00, 0x01 bytes, for little endian thus 0x0100. You could use for it > > a function similarly to asan_shadow_cst, just build INTEGER_CST rather than > > CONST_INT out of it. In general, poisioning is storing 0xf8 to all affected > > shadow bytes, unpoisioning should restore the state what we would emit > > without use-after-scope sanitization, which is all but the last byte 0, and > > the last byte 0 only if the var size is a multiple of 8, otherwise number > > of valid bytes (1-7). > > Fixed in the newer patch. > > > > > As for the option, it seems clang uses now > > -fsanitize-address-use-after-scope option, while I don't like that much, if > > they have already released some version with that option or if they are > > unwilling to change, I'd go with their option. > > I also do not like the option, but 3.9.0 has already the functionality. Thus, > I'm copying LLVM behavior. > > > > >> + if (flag_stack_reuse != SR_NONE > >> + && flag_openacc > >> + && oacc_declare_returns != NULL) > > > > This actually looks like preexisting OpenACC bug, I doubt the OpenACC > > behavior should depend on -fstack-reuse= setting. > > The generated diff for this hunk is bit misleading, I simplified that > in the second version. > > > > > + bool unpoison_var = asan_poisoned_variables.contains (t); > > + if (asan_sanitize_use_after_scope () > > + && unpoison_var) > > + asan_poisoned_variables.remove (t); > > > > Similarly to asan_handled_variables, I'd prefer it to be a pointer to > > hash_set or something similar, so that it costs as few as possible for the > > general case (no sanitization). Similarly, querying the hash_set even for > > no use-after-scope sanitization looks wrong. > > Sure, fixed. > > > > > + if ((asan_sanitize_stack_p () || asan_sanitize_use_after_scope ()) > > > > I would say if asan_sanitize_stack_p () is false, then we should not be > > doing use-after-scope sanitization (error if user requested that > > explicitly). > > Done by adding '&& ASAN_STACK' to asan_sanitize_use_after_scope. > > > > > Don't remember if I've mentioned it earlier, but for vars that are > > TREE_ADDRESSABLE only because of ASAN_MARK calls, we should probably turn > > them non-addressable and remove those ASAN_MARK calls, those shouldn't leak. > > You can have a look at the r237814 change for how similarly > > compare and exchange is special cased for the > > addressables discovery (though, the ASAN_MARK case would be easier, just > > drop it rather than turn it into something different). > > I like the approach to not to handle loca
Re: [PATCH v2] add -fprolog-pad=N option to c-family
On Fri, Sep 30, 2016 at 12:01:47PM +0200, Jose E. Marchesi wrote: > > In case anybody missed it, the Linux kernel side to make use > of this has also been finished meanwhile. Of course it can not > be accepted without compiler support; and this feature patch > is much more versatile than just Linux kernel live patching > on a single architecture. > > How is this supposed to be exploited atomically in RISC arches such as > sparc? In such architectures you usually need to patch several > instructions to load an absolute address into a register. We had some disucssions in the context of arm64: https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01093.html But I don't think that we reached a final consensus at that time. Thanks, -Takahiro AKASHI > If a general mechanism is what is intended I would suggest to offer the > possibility of extending the nops _before_ the function entry point, > like in: > > (a) nop ! Load address > nop ! Load address > nop ! Load address > nop ! Load address > nop ! Jump to loaded address. > entry: > (b) nop ! PC-relative jump to (a) > save %sp, bleh, %sp > ... > > So after the live-patcher patches the loading of the destination address > and the jump, it can atomically patch (b) to effectively replace the > implementation of `entry'. > > Wdyt? >
[PATCH, configure]: Merge two checks for warning options
Hello! I plan to commit the attached patch later today. 2016-10-03 Uros Bizjak * configure.ac (strict_warn): Merge -Wmissing-format-attribute and -Woverloaded-virtual checks for warning options. * configure: Regenerate. Bootstrapped on x86_64-linux-gnu. Uros. diff --git a/gcc/configure b/gcc/configure index 2503ba9..80fc5c7 100755 --- a/gcc/configure +++ b/gcc/configure @@ -6758,63 +6758,7 @@ ac_compiler_gnu=$ac_cv_cxx_compiler_gnu strict_warn= save_CXXFLAGS="$CXXFLAGS" -for real_option in -Wmissing-format-attribute; do - # Do the check with the no- prefix removed since gcc silently - # accepts any -Wno-* option on purpose - case $real_option in --Wno-*) option=-W`expr x$real_option : 'x-Wno-\(.*\)'` ;; -*) option=$real_option ;; - esac - as_acx_Woption=`$as_echo "acx_cv_prog_cc_warning_$option" | $as_tr_sh` - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CXX supports $option" >&5 -$as_echo_n "checking whether $CXX supports $option... " >&6; } -if { as_var=$as_acx_Woption; eval "test \"\${$as_var+set}\" = set"; }; then : - $as_echo_n "(cached) " >&6 -else - CXXFLAGS="$option" -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ - -int -main () -{ - - ; - return 0; -} -_ACEOF -if ac_fn_cxx_try_compile "$LINENO"; then : - eval "$as_acx_Woption=yes" -else - eval "$as_acx_Woption=no" -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext - -fi -eval ac_res=\$$as_acx_Woption - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } - if test `eval 'as_val=${'$as_acx_Woption'};$as_echo "$as_val"'` = yes; then : - strict_warn="$strict_warn${strict_warn:+ }$real_option" -fi - done -CXXFLAGS="$save_CXXFLAGS" -ac_ext=cpp -ac_cpp='$CXXCPP $CPPFLAGS' -ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_cxx_compiler_gnu - - -ac_ext=cpp -ac_cpp='$CXXCPP $CPPFLAGS' -ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_cxx_compiler_gnu - -save_CXXFLAGS="$CXXFLAGS" -for real_option in -Woverloaded-virtual; do +for real_option in -Wmissing-format-attribute -Woverloaded-virtual; do # Do the check with the no- prefix removed since gcc silently # accepts any -Wno-* option on purpose case $real_option in @@ -18479,7 +18423,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18482 "configure" +#line 18426 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -18585,7 +18529,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 18588 "configure" +#line 18532 "configure" #include "confdefs.h" #if HAVE_DLFCN_H diff --git a/gcc/configure.ac b/gcc/configure.ac index fa789d5..338956f 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -476,14 +476,14 @@ AC_ARG_ENABLE(build-format-warnings, AS_IF([test $enable_build_format_warnings = no], [wf_opt=-Wno-format],[wf_opt=]) ACX_PROG_CXX_WARNING_OPTS( - m4_quote(m4_do([-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual $wf_opt])), [loose_warn]) + m4_quote(m4_do([-W -Wall -Wno-narrowing -Wwrite-strings ], + [-Wcast-qual $wf_opt])), [loose_warn]) ACX_PROG_CC_WARNING_OPTS( m4_quote(m4_do([-Wstrict-prototypes -Wmissing-prototypes])), [c_loose_warn]) ACX_PROG_CXX_WARNING_OPTS( - m4_quote(m4_do([-Wmissing-format-attribute])), [strict_warn]) -ACX_PROG_CXX_WARNING_OPTS( - m4_quote(m4_do([-Woverloaded-virtual])), [strict_warn]) + m4_quote(m4_do([-Wmissing-format-attribute ], + [-Woverloaded-virtual])), [strict_warn]) ACX_PROG_CC_WARNING_OPTS( m4_quote(m4_do([-Wold-style-definition -Wc++-compat])), [c_strict_warn]) ACX_PROG_CXX_WARNING_ALMOST_PEDANTIC(
Re: [v3 PATCH] PR libstdc++/77802
On 01/10/16 00:12 +0300, Ville Voutilainen wrote: I do this with a rather heavy heart, but since gcc6 compiles boost 1.62, I'll rather have gcc7 do so as well, and I'll throw the tuple fix for lwg2729 to the wolves not because I want to, but because I have to. Tested on Linux-x64. 2016-10-01 Ville Voutilainen PR libstdc++/77802 * testsuite/20_util/tuple/77802.cc: New. Could you please add a note to this new test saying it's undefined behaviour to instantiate std::tuple with an incomplete type (but that we try to support it anyway). OK with that change.
Re: Shared mutex pool
On 28/09/16 21:34 +0200, François Dumont wrote: Hi Here is the patch to share a mutex pool between debug mode and shared_ptr implementation. It saves 392 bytes on generated .so and will make sure that fixing false sharing will impact both usages. I preferred to leave implementation in shared_ptr.cc to avoid introducing another translation unit. * src/c++11/shared_ptr.cc (mask, invalid, get_mutex): Move declaration... * src/c++11/mutex_pool.h: ... here. New. * src/c++11/debug.cc: Use latter. Tested under Linux x86_64, normal and debug modes. Ok to commit ? OK, thanks.
[PATCH] Ensure "C++" language linkage for std::abs overloads
PR libstdc++/77814 * include/bits/std_abs.h: Use "C++" language linkage. * testsuite/17_intro/headers/c++2011/linkage.cc: Move to the end. Add . I'll commit to trunk when testing finishes. commit 2dc6b0497b7d0ec0cb298f749419d70a43c2ab70 Author: Jonathan Wakely Date: Mon Oct 3 12:26:55 2016 +0100 Ensure "C++" language linkage for std::abs overloads PR libstdc++/77814 * include/bits/std_abs.h: Use "C++" language linkage. * testsuite/17_intro/headers/c++2011/linkage.cc: Move to the end. Add . diff --git a/libstdc++-v3/include/bits/std_abs.h b/libstdc++-v3/include/bits/std_abs.h index ab0f980..732b81a3 100644 --- a/libstdc++-v3/include/bits/std_abs.h +++ b/libstdc++-v3/include/bits/std_abs.h @@ -43,6 +43,8 @@ #undef abs +extern "C++" +{ namespace std _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -103,5 +105,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _GLIBCXX_END_NAMESPACE_VERSION } // namespace +} #endif // _GLIBCXX_BITS_STD_ABS_H diff --git a/libstdc++-v3/testsuite/17_intro/headers/c++2011/linkage.cc b/libstdc++-v3/testsuite/17_intro/headers/c++2011/linkage.cc index 67c384b..bb56dbf 100644 --- a/libstdc++-v3/testsuite/17_intro/headers/c++2011/linkage.cc +++ b/libstdc++-v3/testsuite/17_intro/headers/c++2011/linkage.cc @@ -25,9 +25,7 @@ extern "C" { #include -#ifdef _GLIBCXX_HAVE_COMPLEX_H -#include -#endif +// See below for #include #include #ifdef _GLIBCXX_HAVE_FENV_H @@ -43,6 +41,9 @@ extern "C" #include #include #include +#if _GLIBCXX_HAVE_STDALIGN_H +#include +#endif #include #ifdef _GLIBCXX_HAVE_STDBOOL_H #include @@ -67,4 +68,10 @@ extern "C" #ifdef _GLIBCXX_HAVE_WCTYPE_H #include #endif + +// Include this last, because it adds extern "C++" and so hides problems in +// other headers if included first (e.g. PR libstdc++/77814). +#ifdef _GLIBCXX_HAVE_COMPLEX_H +#include +#endif }
[Patch, testsuite] Add ffat-lto-objects to gcc.target/avr/torture/builtins_error.c
Hi, This patch adds -ffat-lto-objects option to an avr target testcase. The compiler defaults to thin LTO objects if built with linker plugin support, and the error expected by the testcase appears only at link time, if at all. Forcing fat LTO object file creation generates the error consistently at compile, as expected. Committed to trunk. Regards Senthil gcc/testsuite/ChangeLog: 2016-10-03 Senthil Kumar Selvaraj * gcc.target/avr/torture/builtins-error.c: Add -ffat-lto-objects option. Index: gcc/testsuite/gcc.target/avr/torture/builtins-error.c === --- gcc/testsuite/gcc.target/avr/torture/builtins-error.c (revision 240709) +++ gcc/testsuite/gcc.target/avr/torture/builtins-error.c (working copy) @@ -1,4 +1,5 @@ /* { dg-do assemble } */ +/* { dg-options "-ffat-lto-objects" } */ char insert (long a) {
Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
On 08/18/2016 05:53 PM, Jeff Law wrote: > On 08/18/2016 09:51 AM, Andi Kleen wrote: >>> I'd prefer to make updates atomic in multi-threaded applications. >>> The best proxy we have for that is -pthread. >>> >>> Is it slower, most definitely, but odds are we're giving folks >>> garbage data otherwise, which in many ways is even worse. >> >> It will likely be catastrophically slower in some cases. >> >> Catastrophically as in too slow to be usable. >> >> An atomic instruction is a lot more expensive than a single increment. Also >> they sometimes are really slow depending on the state of the machine. > And for those cases there's a way to override. > > The default should be set for correctness. > > jeff I would to somehow resolve the discussion related to default value selection. Is the prevailing consensus that we should set -fprofile-update=atomic when -pthread is set? If so, I'll prepare a patch. I tend to do it this way. Moreover, I also have a patch that provides a warning, which can be also useful even though we would change the default behavior: $ ./xgcc -B. /tmp/a.c -fprofile-update=single -pthread -fprofile-generate xgcc: warning: -profile-update=atomic should be used to generate a valid profile for a multithreaded application Ideas? Martin >From d5a8097dd07d1a3f4263da7ccad970543d92f3e9 Mon Sep 17 00:00:00 2001 From: marxin Date: Mon, 3 Oct 2016 14:02:14 +0200 Subject: [PATCH] Warn about -fprofile-update=single and -pthread gcc/ChangeLog: 2016-10-03 Martin Liska * common.opt: Mark couple of flags with 'Driver' keyword. * gcc.c (driver_handle_option): Handle these options. (process_command): Generate the warning. --- gcc/common.opt | 8 gcc/gcc.c | 31 +++ 2 files changed, 35 insertions(+), 4 deletions(-) diff --git a/gcc/common.opt b/gcc/common.opt index 0e01577..3af9c64 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1920,7 +1920,7 @@ Common Report Var(profile_flag) Enable basic program profiling code. fprofile-arcs -Common Report Var(profile_arc_flag) +Common Driver Report Var(profile_arc_flag) Insert arc-based program profiling code. fprofile-dir= @@ -1933,7 +1933,7 @@ Common Report Var(flag_profile_correction) Enable correction of flow inconsistent profile data input. fprofile-update= -Common Joined RejectNegative Enum(profile_update) Var(flag_profile_update) Init(PROFILE_UPDATE_SINGLE) +Common Driver Joined RejectNegative Enum(profile_update) Var(flag_profile_update) Init(PROFILE_UPDATE_SINGLE) -fprofile-update=[single|atomic] Set the profile update method. Enum @@ -1946,11 +1946,11 @@ EnumValue Enum(profile_update) String(atomic) Value(PROFILE_UPDATE_ATOMIC) fprofile-generate -Common +Common Driver Enable common options for generating profile info for profile feedback directed optimizations. fprofile-generate= -Common Joined RejectNegative +Common Driver Joined RejectNegative Enable common options for generating profile info for profile feedback directed optimizations, and set -fprofile-dir=. fprofile-use diff --git a/gcc/gcc.c b/gcc/gcc.c index d3e8c88..b023013 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -233,6 +233,16 @@ static int print_subprocess_help; /* Linker suffix passed to -fuse-ld=... */ static const char *use_ld; +/* Flag indicating whether pthread is provided as a command line option. */ +static bool pthread_set = false; + +/* Flag indicating whether profiling is enabled by an option */ +static bool profiling_enabled = false; + +/* Flag indicating whether profile-update=atomic is provided as a command + line option. */ +static bool profile_update_atomic = false; + /* Whether we should report subprocess execution times to a file. */ FILE *report_times_to_file = NULL; @@ -4112,6 +4122,22 @@ driver_handle_option (struct gcc_options *opts, handle_foffload_option (arg); break; +case OPT_fprofile_update_: + if ((profile_update)value == PROFILE_UPDATE_ATOMIC) + profile_update_atomic = true; + break; + +case OPT_pthread: + pthread_set = true; + break; + +case OPT_fprofile_generate: +case OPT_fprofile_generate_: +case OPT_fprofile_arcs: +case OPT_coverage: + profiling_enabled = true; + break; + default: /* Various driver options need no special processing at this point, having been handled in a prescan above or being @@ -4580,6 +4606,11 @@ process_command (unsigned int decoded_options_count, add_infile ("help-dummy", "c"); } + /* Warn about multi-threaded program that do not use -profile=atomic. */ + if (profiling_enabled && pthread_set && !profile_update_atomic) +warning (0, "-profile-update=atomic should be used to generate a valid" + " profile for a multithreaded application"); + /* Decide if undefined variable references are allowed in specs. */ /* --version and --help alone or together are safe. Note that -v would -- 2.9.2
Re: gcc build problem (i386.c) -- missing declaration
On Thu, 29 Sep 2016, Louis Krupp wrote: > My target was gfortran. > > In any case, someone else fixed this problem. Good. Note that by target we are referring to the platform (processor plus operating system). You can see this by looking for a line started with "Target:" in the output of `gcc -v`. On one of my machines this says "Target: x86_64-suse-linux", on another one "Target: i386-unknown-freebsd10.3", for example. Gerald
Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
On 10/03/16 08:13, Martin Liška wrote: On 08/18/2016 05:53 PM, Jeff Law wrote: On 08/18/2016 09:51 AM, Andi Kleen wrote: I'd prefer to make updates atomic in multi-threaded applications. The best proxy we have for that is -pthread. Is it slower, most definitely, but odds are we're giving folks garbage data otherwise, which in many ways is even worse. It will likely be catastrophically slower in some cases. Catastrophically as in too slow to be usable. An atomic instruction is a lot more expensive than a single increment. Also they sometimes are really slow depending on the state of the machine. And for those cases there's a way to override. The default should be set for correctness. jeff I would to somehow resolve the discussion related to default value selection. Is the prevailing consensus that we should set -fprofile-update=atomic when -pthread is set? If so, I'll prepare a patch. I tend to do it this way. This is my preference. nathan
Re: [PATCH] Machine-readable RTL dumps: print_rtx_function
On Sun, 2016-10-02 at 07:04 -0500, Segher Boessenkool wrote: > On Thu, Sep 29, 2016 at 11:36:29AM -0600, Jeff Law wrote: > > On 09/29/2016 11:25 AM, Bernd Schmidt wrote: > > > On 09/29/2016 07:47 PM, David Malcolm wrote: > > > > This patch adds a new function, print_rtx_function, intended > > > > for use > > > > for generating function dumps suitable for parsing by the RTL > > > > frontend, > > > > but also intended to be human-readable, and human-authorable. > > > > > > > (note 1 0 4 (nil) NOTE_INSN_DELETED) > > > > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > > > > (insn 2 4 3 2 (set (mem/c:SI (plus:DI (reg/f:DI 82 > > > > virtual-stack-vars) > > > > (const_int -4 [0xfffc])) [1 > > > > i+0 > > > > S4 A32]) > > > > (reg:SI 5 di [ i ])) t.c:2 -1 > > > > (nil)) > > > > > > I think it might be a good idea to get rid of redundant > > > information like > > > insn numbers for such a dump format. But that can be left for > > > followup > > > patches. > > I would make the same suggestion. The insn # and backend pattern > > name > > (if any) should be omitted in machine-readable dump format. I'm > > fine > > with that as a follow-up as well. > > You need the insn id for (at least) code_label. I think that Bernd is referring to the INSN_CODE, rather than than INSN_UID.
Re: [PATCH] Fix bootstrap with --enable-languages=all,go
Andrew Haley writes: > On 30/09/16 23:16, Rainer Orth wrote: >> me too, though mostly to have maximum test coverage (primarily on >> Solaris). As expected, a x86_64-apple-darwin16 bootstrap with >> --enable-objc-gc just failed for me. I'm testing the following patch >> (on top of Jakub's). >> >> Rainer >> >> >> 2016-10-01 Rainer Orth >> >> * configure.ac (target_libraries): Readd target-boehm-gc. >> Restore --enable-objc-gc handling. >> * configure: Regenerate. > > Thanks everybody. My apologies. The bootstrap completed successfully now. Ok for mainline? Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH][v4] GIMPLE store merging pass
Hi Richard, another question as I'm working through your comments... On 29/09/16 11:45, Richard Biener wrote: + /* The region from the byte array that we're inserting into. */ + tree ptr_wide_int + = native_interpret_expr (dest_int_type, ptr + first_byte, +total_bytes); + + gcc_assert (ptr_wide_int); + wide_int dest_wide_int + = wi::to_wide (ptr_wide_int, TYPE_PRECISION (dest_int_type)); + wide_int expr_wide_int + = wi::to_wide (tmp_int, byte_size * BITS_PER_UNIT); + if (BYTES_BIG_ENDIAN) + { + unsigned int insert_pos + = byte_size * BITS_PER_UNIT - bitlen - (bitpos % BITS_PER_UNIT); + dest_wide_int + = wi::insert (dest_wide_int, expr_wide_int, insert_pos, bitlen); + } + else + dest_wide_int = wi::insert (dest_wide_int, expr_wide_int, + bitpos % BITS_PER_UNIT, bitlen); + + tree res = wide_int_to_tree (dest_int_type, dest_wide_int); + native_encode_expr (res, ptr + first_byte, total_bytes, 0); + OTOH this whole dance looks as complicated and way more expensive than using native_encode_expr into a temporary buffern and then a manually implemented "bit-merging" of it at ptr + first_byte + bitpos. AFAICS that operation is even endianess agnostic. If the quantity we're inserting at a non-byte boundary is more than a byte wide we still have to shift the value to position properly across the bytes it straddles, so I don't see how we can avoid creating a wide_int here. Consider inserting a 10-bit value at bitposition 3 (I hope the mailer doesn't screw up the indentation): value: xx before: |||| | byte 1 || byte 2 | after: |---x||x---| We'll native_encode_expr the value into a two-byte buffer but then we can't just shift each byte by 3 to insert it into the destination buffer, we need to form the whole 10-bit value and shift is as a whole to not lose any bits. And if a value crosses bytes then we need to care about BYTES_BIG_ENDIAN when writing the bytes back into the buffer, no? Thanks, Kyrill
Re: [PATCH, RFC] gcov: dump in a static dtor instead of in an atexit handler
Hi Martin, > On 09/30/2016 02:31 PM, Rainer Orth wrote: >> this would be i386-pc-solaris2.12. I'm not sure if the constructor >> priority detection works in a cross scenario. >> >> I'm attaching the resulting assembly (although for Solaris as, the gas >> build is still running). > > Hi. Sorry, I have a stupid mistake in dtor priority > (I used 65534 instead of desired 99). Please try to test it on Solaris 12 > with the attached patch. I'll send the patch to ML soon. unfortunately, the patch makes no difference on Solaris 12. The test even FAILs when using gas/gld, which is a different/independent implementation of constructor priority. > Can you please test whether it makes any change on a solaris target w/o > prioritized ctors/dtors? It doesn't: the test PASSes on Solaris 10 and 11 with and without your patch. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCHv2] Cleanup of input.c
On Sun, 2016-10-02 at 13:07 +, Bernd Edlinger wrote: > Hi Dave, > > here is the new version of the input.c patch: > > I have updated the comments, and revised the test case as requested. > I have additionally done a bootstrap with build config=bootstrap > -asan. Thanks. A couple of nits inline below... > Bootstrap and reg-testing on x86_64-pc-linux-gnu. > Is it OK for trunk? > 2016-09-26 Bernd Edlinger > > PR preprocessor/77699 > * input.c (maybe_grow): Don't allocate one byte extra headroom. > (get_next_line): Return false on error. > (read_next_line): Removed, use get_next_line instead. > (read_line_num): Don't copy the line. > (location_get_source_line): Don't use static data. > (test_reading_source_line): Add more test cases. FWIW I've been adding selftest:: to those symbols within the namespace in ChangeLog entries, so I would have written this last one as: (selftest::test_reading_source_line): Add more test cases. (mostly out of wanting to emphasize the "real" code vs test code split). That said, I don't think we have any official policy on this. > Index: gcc/input.c > === > --- gcc/input.c (revision 240693) > +++ gcc/input.c (working copy) [...snip...] > @@ -643,15 +612,15 @@ goto_next_line (fcache *cache) > } > > /* Read an arbitrary line number LINE_NUM from the file cached in C. > - The line is copied into *LINE. *LINE_LEN must have been set to the > - length of *LINE. If *LINE is too small (or NULL) it's extended (or > - allocated) and *LINE_LEN is adjusted accordingly. *LINE ends up > - with a terminal zero byte and can contain additional zero bytes. > + If the line was read successfully, *LINE points to the beginning > + of the line in the file cache and *LINE_LEN is the length of the > + line. *LINE is not nul-terminated, but may contain zero bytes. > + *LINE is only valid until the next call of read_line_num. > This function returns bool if a line was read. */ > > static bool > read_line_num (fcache *c, size_t line_num, > -char ** line, ssize_t *line_len) > +char **line, ssize_t *line_len) > { >gcc_assert (line_num > 0); > > @@ -705,12 +674,8 @@ read_line_num (fcache *c, size_t line_num, > { > /* We have the start/end of the line. Let's just copy >it again and we are done. */ The reference to a "copy" in this comment is now invalid. Maybe the comment should now simply read: /* We have the start/end of the line. */ or somesuch. > - ssize_t len = i->end_pos - i->start_pos + 1; > - if (*line_len < len) > - *line = XRESIZEVEC (char, *line, len); > - memmove (*line, c->data + i->start_pos, len); > - (*line)[len - 1] = '\0'; > - *line_len = --len; > + *line = c->data + i->start_pos; > + *line_len = i->end_pos - i->start_pos; > return true; > } > [...snip...] OK for trunk with the above comment nit fixed. Dave
[PATCH 5/6] rs6000: Separate shrink-wrapping
This implements the hooks for separate shrink-wrapping for rs6000. It handles GPRs and LR. The GPRs get a component number corresponding to their register number; LR gets component number 0. 2016-06-07 Segher Boessenkool * config/rs6000/rs6000.c (machine_function): Add new fields gpr_is_wrapped_separately and lr_is_wrapped_separately. (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS, TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB, TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS, TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define. (rs6000_get_separate_components): New function. (rs6000_components_for_bb): New function. (rs6000_disqualify_components): New function. (rs6000_emit_prologue_components): New function. (rs6000_emit_epilogue_components): New function. (rs6000_set_handled_components): New function. (rs6000_emit_prologue): Don't emit LR save if lr_is_wrapped_separately. Don't emit GPR saves if gpr_is_wrapped_separately for that register. (restore_saved_lr): Don't restore LR if lr_is_wrapped_separately. (rs6000_emit_epilogue): Don't emit GPR restores if gpr_is_wrapped_separately for that register. Don't make a REG_CFA_RESTORE note for registers we did not restore, either. --- gcc/config/rs6000/rs6000.c | 269 ++--- 1 file changed, 253 insertions(+), 16 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 6897b5c..ff606c9 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -153,6 +153,10 @@ typedef struct GTY(()) machine_function bool split_stack_argp_used; /* Flag if r2 setup is needed with ELFv2 ABI. */ bool r2_setup_needed; + /* The components already handled by separate shrink-wrapping, which should + not be considered by the prologue and epilogue. */ + bool gpr_is_wrapped_separately[32]; + bool lr_is_wrapped_separately; } machine_function; /* Support targetm.vectorize.builtin_mask_for_load. */ @@ -1514,6 +1518,19 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_SET_UP_BY_PROLOGUE #define TARGET_SET_UP_BY_PROLOGUE rs6000_set_up_by_prologue +#undef TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS +#define TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS rs6000_get_separate_components +#undef TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB +#define TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB rs6000_components_for_bb +#undef TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS +#define TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS rs6000_disqualify_components +#undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS +#define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS rs6000_emit_prologue_components +#undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS +#define TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS rs6000_emit_epilogue_components +#undef TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS +#define TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS rs6000_set_handled_components + #undef TARGET_EXTRA_LIVE_ON_ENTRY #define TARGET_EXTRA_LIVE_ON_ENTRY rs6000_live_on_entry @@ -27285,6 +27302,212 @@ rs6000_global_entry_point_needed_p (void) return cfun->machine->r2_setup_needed; } +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */ +static sbitmap +rs6000_get_separate_components (void) +{ + rs6000_stack_t *info = rs6000_stack_info (); + + if (!(info->savres_strategy & SAVE_INLINE_GPRS) + || !(info->savres_strategy & REST_INLINE_GPRS) + || WORLD_SAVE_P (info)) +return NULL; + + sbitmap components = sbitmap_alloc (32); + bitmap_clear (components); + + /* The GPRs we need saved to the frame. */ + int reg_size = TARGET_32BIT ? 4 : 8; + int offset = info->gp_save_offset; + if (info->push_p) +offset += info->total_size; + + for (unsigned regno = info->first_gp_reg_save; regno < 32; regno++) +{ + if (IN_RANGE (offset, -0x8000, 0x7fff) + && rs6000_reg_live_or_pic_offset_p (regno)) + bitmap_set_bit (components, regno); + + offset += reg_size; +} + + /* Don't mess with the hard frame pointer. */ + if (frame_pointer_needed) +bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM); + + /* Don't mess with the fixed TOC register. */ + if ((TARGET_TOC && TARGET_MINIMAL_TOC) + || (flag_pic == 1 && DEFAULT_ABI == ABI_V4) + || (flag_pic && DEFAULT_ABI == ABI_DARWIN)) +bitmap_clear_bit (components, RS6000_PIC_OFFSET_TABLE_REGNUM); + + /* Optimize LR save and restore if we can. This is component 0. */ + if (info->lr_save_p + && !(flag_pic && (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN))) +{ + offset = info->lr_save_offset; + if (info->push_p) + offset += info->total_size; + if (IN_RANGE (offset, -0x8000, 0x7fff)) + bitmap_set_bit (components, 0); +}
[PATCH 4/6] shrink-wrap: Shrink-wrapping for separate components
This is the main substance of this patch series. Instead of doing all of the prologue and epilogue in one spot, it often is better to do components of it at different places, so that they are executed less frequently. What exactly is a component is completely up to the target; this code treats it all abstractly, and uses hooks for the target to handle the more concrete things. Commonly there is one component for each callee- saved register, for example. Components can be executed more than once per function execution. This pass makes sure that a component's epilogue is not called more often than the corresponding prologue has been, at any point in time; that the prologue is called more often, wherever the prologue's effect is needed; and that the epilogue is called as often as the prologue has been, when the function exits. It does this by first deciding which blocks need which components active, and then placing prologue and epilogue components to make that exactly true. Deciding what blocks should run with a certain component active so that the total cost of executing the prologues (and epilogues) is optimal, is not a computationally feasible problem. Instead, for each basic block, we estimate the cost of putting a prologue right before the block, and if that is cheaper than the total cost of putting prologues optimally (according to the estimated cost) in the dominator subtrees strictly dominated by this first block, place it at the first block instead. This simple procedure places the components optimally for any dominator sub tree where the root node's cost does not depend on anything outside its subtree. The cost is the execution frequency of all edges into the block coming from blocks that do not have this component active. The estimated cost is the execution frequency of the block, minus the execution frequency of any backedges (which by definition are coming from subtrees, so if the "head" block gets a prologue, the source block of any backedge has that component active as well). Currently, the epilogues are placed as late as possible, given the constraints. This does not matter for execution cost, but we could save a little bit of code size by placing the epilogues in a smarter way. This is a possible future optimisation. Now all that is left is inserting prologues and epilogues on all edges that jump into resp. out of the "active" set of blocks. Often we need to insert some components' prologues (or epilogues) on all edges into (or out of) a block. In theory cross-jumping can unify all such, but in practice that often fails; besides, that is a lot of work. So in this case we insert the prologue and epilogue components at the "head" or "tail" of a block, instead. As a final optimisation, if a block needs a prologue and its immediate dominator has the block as a post-dominator, that immediate dominator gets the prologue as well. 2016-06-07 Segher Boessenkool * function.c (thread_prologue_and_epilogue_insns): Recompute the live info. Call try_shrink_wrapping_separate. Compute the prologue_seq afterwards, if it has possibly changed. Compute the split_prologue_seq and epilogue_seq later, too. * shrink-wrap.c: #include cfgbuild.h. (dump_components): New function. (struct sw): New struct. (SW): New function. (init_separate_shrink_wrap): New function. (fini_separate_shrink_wrap): New function. (place_prologue_for_one_component): New function. (spread_components): New function. (disqualify_problematic_components): New function. (emit_common_heads_for_components): New function. (emit_common_tails_for_components): New function. (insert_prologue_epilogue_for_components): New function. (try_shrink_wrapping_separate): New function. * shrink-wrap.h: Declare try_shrink_wrapping_separate. --- gcc/function.c| 15 +- gcc/shrink-wrap.c | 741 ++ gcc/shrink-wrap.h | 1 + 3 files changed, 754 insertions(+), 3 deletions(-) diff --git a/gcc/function.c b/gcc/function.c index 94ed786..6d2a079 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -5920,16 +5920,25 @@ thread_prologue_and_epilogue_insns (void) edge entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)); edge orig_entry_edge = entry_edge; - rtx_insn *split_prologue_seq = make_split_prologue_seq (); rtx_insn *prologue_seq = make_prologue_seq (); - rtx_insn *epilogue_seq = make_epilogue_seq (); /* Try to perform a kind of shrink-wrapping, making sure the prologue/epilogue is emitted only around those parts of the function that require it. */ - try_shrink_wrapping (&entry_edge, prologue_seq); + /* If the target can handle splitting the prologue/epilogue into separate + components, try to shrink-wrap these components separately. */ + try_shrink_wrapping_separate (entry_edge->dest); + +
[PATCH 3/6] regrename: Don't rename restores
A restore is supposed to restore some certain register. Restoring it into some other register will not work. Don't. 2016-06-07 Segher Boessenkool * regrename.c (build_def_use): Invalidate chains that have a REG_CFA_RESTORE on some instruction. --- gcc/regrename.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/gcc/regrename.c b/gcc/regrename.c index 3509e8b..e0d2dd1 100644 --- a/gcc/regrename.c +++ b/gcc/regrename.c @@ -1655,6 +1655,7 @@ build_def_use (basic_block bb) (6) For any non-earlyclobber write we find in an operand, make a new chain or mark the hard register as live. (7) For any REG_UNUSED, close any chains we just opened. +(8) For any REG_CFA_RESTORE, kill any chain containing it. We cannot deal with situations where we track a reg in one mode and see a reference in another mode; these will cause the chain @@ -1867,6 +1868,12 @@ build_def_use (basic_block bb) scan_rtx (insn, &XEXP (note, 0), NO_REGS, terminate_dead, OP_IN); } + + /* Step 8: Kill the chains involving register restores. Those +should restore _that_ register. */ + for (note = REG_NOTES (insn); note; note = XEXP (note, 1)) + if (REG_NOTE_KIND (note) == REG_CFA_RESTORE) + scan_rtx (insn, &XEXP (note, 0), NO_REGS, mark_all_read, OP_IN); } else if (DEBUG_INSN_P (insn) && !VAR_LOC_UNKNOWN_P (INSN_VAR_LOCATION_LOC (insn))) -- 1.9.3
[PATCH 6/6] shrink-wrap: Testcases for separate shrink-wrapping
A few testcases for separate shrink-wrapping: test whether it works in a trivial case; whether it creates more than one prologue where that is useful; whether it puts prologues inside a loop if that is cheaper. 2016-10-03 Segher Boessenkool gcc/testsuite/ * gcc.target/powerpc/shrink-wrap-separate-0.c: New testcase. * gcc.target/powerpc/shrink-wrap-separate-1.c: New testcase. * gcc.target/powerpc/shrink-wrap-separate-2.c: New testcase. --- .../gcc.target/powerpc/shrink-wrap-separate-0.c| 22 ++ .../gcc.target/powerpc/shrink-wrap-separate-1.c| 18 +++ .../gcc.target/powerpc/shrink-wrap-separate-2.c| 26 ++ 3 files changed, 66 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-0.c create mode 100644 gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-2.c diff --git a/gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-0.c b/gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-0.c new file mode 100644 index 000..dea0611 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-0.c @@ -0,0 +1,22 @@ +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler {#before\M.*\mmflr\M} } } */ + +/* This tests if shrink-wrapping for separate components works. + + r20 (a callee-saved register) is forced live at the start, so that we + get it saved in a prologue at the start of the function. + The link register only needs to be saved if x is non-zero; without + separate shrink-wrapping it would however be saved in the one prologue. + The test tests if the mflr insn ends up behind the prologue. */ + +void g(void); + +void f(int x) +{ + register int r20 asm("20") = x; + asm("#before" : : "r"(r20)); + if (x) + g(); + asm(""); // no tailcall of g +} diff --git a/gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-1.c b/gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-1.c new file mode 100644 index 000..735b606 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler {\mmflr\M.*\mbl\M.*\mmflr\M.*\mbl\M} } } */ + +/* This tests if shrink-wrapping for separate components creates more + than one prologue when that is useful. In this case, it saves the + link register before both the call to g and the call to h. */ + +void g(void) __attribute__((noreturn)); +void h(void) __attribute__((noreturn)); + +void f(int x) +{ + if (x == 42) + g(); + if (x == 31) + h(); +} diff --git a/gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-2.c b/gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-2.c new file mode 100644 index 000..b22564a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-2.c @@ -0,0 +1,26 @@ +/* { dg-do compile { target powerpc*-*-* } } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler {\mmflr\M.*\mbl\M.*\mmflr\M.*\mbl\M} } } */ + +/* This tests if shrink-wrapping for separate components puts a prologue + inside a loop when that is useful. In this case, it saves the link + register before each call: both calls happen with probability .10, + so saving the link register happens with .80 per execution of f on + average, which is smaller than 1 which you would get if you saved + it outside the loop. */ + +int *a; +void g(void); + +void f(int x) +{ + int j; + for (j = 0; j < 4; j++) { + if (__builtin_expect(a[j], 0)) + g(); + asm("#" : : : "memory"); + if (__builtin_expect(a[j], 0)) + g(); + a[j]++; + } +} -- 1.9.3
[PATCH v4 0/6] Separate shrink-wrapping
I updated according to Jeff's latest comments (importantly, we cannot move a *logue in front of a move in general), and added some testcases. Bootstrapping is in progress on today's trunk, powerpc64-linux and powerpc64le-linux. Is this okay to commit now? Segher Segher Boessenkool (6): separate shrink-wrap: New command-line flag, status flag, hooks, and doc dce: Don't dead-code delete separately wrapped restores regrename: Don't rename restores shrink-wrap: Shrink-wrapping for separate components rs6000: Separate shrink-wrapping shrink-wrap: Testcases for separate shrink-wrapping gcc/common.opt | 4 + gcc/config/rs6000/rs6000.c | 269 +++- gcc/dce.c | 9 + gcc/doc/invoke.texi| 11 +- gcc/doc/tm.texi| 63 ++ gcc/doc/tm.texi.in | 38 ++ gcc/emit-rtl.h | 4 + gcc/function.c | 15 +- gcc/regrename.c| 7 + gcc/shrink-wrap.c | 741 + gcc/shrink-wrap.h | 1 + gcc/target.def | 57 ++ .../gcc.target/powerpc/shrink-wrap-separate-0.c| 22 + .../gcc.target/powerpc/shrink-wrap-separate-1.c| 18 + .../gcc.target/powerpc/shrink-wrap-separate-2.c| 26 + 15 files changed, 1265 insertions(+), 20 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-0.c create mode 100644 gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/shrink-wrap-separate-2.c -- 1.9.3
[PATCH 2/6] dce: Don't dead-code delete separately wrapped restores
If there is a separately wrapped register restore on some path that is dead (say, control goes into an endless loop after it), then we cannot delete that restore because that would confuse the DWARF CFI (if there is another path joining). This happens with gcc.dg/torture/pr53168.c, for example. 2016-06-07 Segher Boessenkool * dce.c (delete_unmarked_insns): Don't delete instructions with a REG_CFA_RESTORE note. --- gcc/dce.c | 9 + 1 file changed, 9 insertions(+) diff --git a/gcc/dce.c b/gcc/dce.c index ea3fb00..d510287 100644 --- a/gcc/dce.c +++ b/gcc/dce.c @@ -587,6 +587,15 @@ delete_unmarked_insns (void) if (!dbg_cnt (dce)) continue; + if (crtl->shrink_wrapped_separate + && find_reg_note (insn, REG_CFA_RESTORE, NULL)) + { + if (dump_file) + fprintf (dump_file, "DCE: NOT deleting insn %d, it's a " + "callee-save restore\n", INSN_UID (insn)); + continue; + } + if (dump_file) fprintf (dump_file, "DCE: Deleting insn %d\n", INSN_UID (insn)); -- 1.9.3
[PATCH 1/6] separate shrink-wrap: New command-line flag, status flag, hooks, and doc
This patch adds a new command-line flag "-fshrink-wrap-separate", a status flag "shrink_wrapped_separate", hooks for abstracting the target components, and documentation for all those. 2016-06-07 Segher Boessenkool * common.opt (-fshrink-wrap-separate): New flag. * doc/invoke.texi: Document it. * doc/tm.texi.in (Shrink-wrapping separate components): New subsection. * doc/tm.texi: Regenerate. * emit-rtl.h (struct rtl_data): New field shrink_wrapped_separate. * target.def (shrink_wrap): New hook vector. (get_separate_components, components_for_bb, disqualify_components, emit_prologue_components, emit_epilogue_components, set_handled_components): New hooks. --- gcc/common.opt | 4 gcc/doc/invoke.texi | 11 +- gcc/doc/tm.texi | 63 + gcc/doc/tm.texi.in | 38 gcc/emit-rtl.h | 4 gcc/target.def | 57 6 files changed, 176 insertions(+), 1 deletion(-) diff --git a/gcc/common.opt b/gcc/common.opt index 0e01577..971f296 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2197,6 +2197,10 @@ Common Report Var(flag_shrink_wrap) Optimization Emit function prologues only before parts of the function that need it, rather than at the top of the function. +fshrink-wrap-separate +Common Report Var(flag_shrink_wrap_separate) Init(1) Optimization +Shrink-wrap parts of the prologue and epilogue separately. + fsignaling-nans Common Report Var(flag_signaling_nans) Optimization SetByCombined Disable optimizations observable by IEEE signaling NaNs. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 6767462..7a167a64 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -399,7 +399,8 @@ Objective-C and Objective-C++ Dialects}. -fschedule-insns -fschedule-insns2 -fsection-anchors @gol -fselective-scheduling -fselective-scheduling2 @gol -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol --fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol +-fsemantic-interposition -fshrink-wrap -fshrink-wrap-separate @gol +-fsignaling-nans @gol -fsingle-precision-constant -fsplit-ivs-in-unroller @gol -fsplit-paths @gol -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol @@ -6590,6 +6591,7 @@ compilation time. -fmove-loop-invariants @gol -freorder-blocks @gol -fshrink-wrap @gol +-fshrink-wrap-separate @gol -fsplit-wide-types @gol -fssa-backprop @gol -fssa-phiopt @gol @@ -7500,6 +7502,13 @@ Emit function prologues only before parts of the function that need it, rather than at the top of the function. This flag is enabled by default at @option{-O} and higher. +@item -fshrink-wrap-separate +@opindex fshrink-wrap-separate +Shrink-wrap separate parts of the prologue and epilogue separately, so that +those parts are only executed when needed. +This option is on by default, but has no effect unless @option{-fshrink-wrap} +is also turned on and the target supports this. + @item -fcaller-saves @opindex fcaller-saves Enable allocation of values to registers that are clobbered by diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 8a98ba4..e74ae47 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -2924,6 +2924,7 @@ This describes the stack layout and calling conventions. * Function Entry:: * Profiling:: * Tail Calls:: +* Shrink-wrapping separate components:: * Stack Smashing Protection:: * Miscellaneous Register Hooks:: @end menu @@ -4853,6 +4854,68 @@ This hook should add additional registers that are computed by the prologue to t True if a function's return statements should be checked for matching the function's return type. This includes checking for falling off the end of a non-void function. Return false if no such check should be made. @end deftypefn +@node Shrink-wrapping separate components +@subsection Shrink-wrapping separate components +@cindex shrink-wrapping separate components + +The prologue may perform a variety of target dependent tasks such as +saving callee-saved registers, saving the return address, aligning the +stack, creating a stack frame, initializing the PIC register, setting +up the static chain, etc. + +On some targets some of these tasks may be independent of others and +thus may be shrink-wrapped separately. These independent tasks are +referred to as components and are handled generically by the target +independent parts of GCC. + +Using the following hooks those prologue or epilogue components can be +shrink-wrapped separately, so that the initialization (and possibly +teardown) those components do is not done as frequently on execution +paths where this would unnecessary. + +What exactly those components are is up to the target code; the generic +code treats them abstractly, as a bit in an @code{sbitmap}. These +@code{sbitmap}s are allocated by the @code{shrink_wrap.get_separate_components}
[gomp4] update gfortran's tile clause error handling
This patch updates the fortran FE to generate errors, rather than warnings, for non-positive integer tile clause arguments. I noticed this problem when I ported over the C/C++ compile time test cases to fortran. In addition to the two new test files, a couple of other existing tests needed to be updated to accommodate this new behavior. I've applied it to gomp-4_0-branch. Nathan, I haven't looked too deeply into your tile changes yet. Do you know of the fortran FE is doing anything wrong? I haven't checked if it's lowering the tile clause in the proper format yet. Cesar 2016-10-03 Cesar Philippidis gcc/fortran/ * openmp.c (resolve_oacc_positive_int_expr):Promote the warning to an error. gcc/testsuite/ * gfortran.dg/goacc/loop-2.f95: Change expected tile clause warnings to errors. * gfortran.dg/goacc/loop-5.f95: Likewise. * gfortran.dg/goacc/sie.f95: Likewise. * gfortran.dg/goacc/tile-1.f90: New test. * gfortran.dg/goacc/tile-2.f90: New test. diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 92b9afe..399b5d1 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -3266,8 +3266,8 @@ resolve_oacc_positive_int_expr (gfc_expr *expr, const char *clause) resolve_oacc_scalar_int_expr (expr, clause); if (expr->expr_type == EXPR_CONSTANT && expr->ts.type == BT_INTEGER && mpz_sgn(expr->value.integer) <= 0) -gfc_warning (0, "INTEGER expression of %s clause at %L must be positive", - clause, &expr->where); +gfc_error ("INTEGER expression of %s clause at %L must be positive", + clause, &expr->where); } /* Emits error when symbol is pointer, cray pointer or cray pointee diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-2.f95 b/gcc/testsuite/gfortran.dg/goacc/loop-2.f95 index 0c902b2..d4c6273 100644 --- a/gcc/testsuite/gfortran.dg/goacc/loop-2.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/loop-2.f95 @@ -143,7 +143,7 @@ program test DO j = 1,10 ENDDO ENDDO -!$acc loop tile(-1) ! { dg-warning "must be positive" } +!$acc loop tile(-1) ! { dg-error "must be positive" } do i = 1,10 enddo !$acc loop tile(i) ! { dg-error "constant expression" } @@ -307,7 +307,7 @@ program test DO j = 1,10 ENDDO ENDDO -!$acc loop tile(-1) ! { dg-warning "must be positive" } +!$acc loop tile(-1) ! { dg-error "must be positive" } do i = 1,10 enddo !$acc loop tile(i) ! { dg-error "constant expression" } @@ -460,7 +460,7 @@ program test DO j = 1,10 ENDDO ENDDO - !$acc kernels loop tile(-1) ! { dg-warning "must be positive" } + !$acc kernels loop tile(-1) ! { dg-error "must be positive" } do i = 1,10 enddo !$acc kernels loop tile(i) ! { dg-error "constant expression" } @@ -612,7 +612,7 @@ program test DO j = 1,10 ENDDO ENDDO - !$acc parallel loop tile(-1) ! { dg-warning "must be positive" } + !$acc parallel loop tile(-1) ! { dg-error "must be positive" } do i = 1,10 enddo !$acc parallel loop tile(i) ! { dg-error "constant expression" } diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-5.f95 b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95 index d059cf7..fe137d5 100644 --- a/gcc/testsuite/gfortran.dg/goacc/loop-5.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95 @@ -93,9 +93,6 @@ program test DO j = 1,10 ENDDO ENDDO -!$acc loop tile(-1) ! { dg-warning "must be positive" } -do i = 1,10 -enddo !$acc loop vector tile(*) DO i = 1,10 ENDDO @@ -129,9 +126,6 @@ program test DO j = 1,10 ENDDO ENDDO -!$acc loop tile(-1) ! { dg-warning "must be positive" } -do i = 1,10 -enddo !$acc loop vector tile(*) DO i = 1,10 ENDDO @@ -242,9 +236,6 @@ program test DO j = 1,10 ENDDO ENDDO - !$acc kernels loop tile(-1) ! { dg-warning "must be positive" } - do i = 1,10 - enddo !$acc kernels loop vector tile(*) DO i = 1,10 ENDDO @@ -333,9 +324,6 @@ program test DO j = 1,10 ENDDO ENDDO - !$acc parallel loop tile(-1) ! { dg-warning "must be positive" } - do i = 1,10 - enddo !$acc parallel loop vector tile(*) DO i = 1,10 ENDDO diff --git a/gcc/testsuite/gfortran.dg/goacc/sie.f95 b/gcc/testsuite/gfortran.dg/goacc/sie.f95 index 2d66026..b4dd9ed 100644 --- a/gcc/testsuite/gfortran.dg/goacc/sie.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/sie.f95 @@ -78,10 +78,10 @@ program test !$acc parallel num_gangs(i+1) !$acc end parallel - !$acc parallel num_gangs(-1) ! { dg-warning "must be positive" } + !$acc parallel num_gangs(-1) ! { dg-error "must be positive" } !$acc end parallel - !$acc parallel num_gangs(0) ! { dg-warning "must be positive" } + !$acc parallel num_gangs(0) ! { dg-error "must be positive" } !$acc end parallel !$acc parallel num_gangs() ! { dg-error "Invalid character in name" } @@ -107,10 +107,10 @@ program test !$acc parallel num_workers(i+1) !$acc end parallel - !$acc parallel num_workers(-1
Re: [PATCH] Fix -Wimplicit-fallthrough -C, handle some more comment styles and comments in between FALLTHRU comment and label
> "Eric" == Eric Botcazou writes: Eric> So, because of its excessive pickiness, the warning ends up making the user Eric> butcher informative comments. How is that helpful? Those comments are not informative. In most cases I kept the original text just to forestall complaints. But really if you read those comments they are pointless. That said, it would be better by far to have a mode where only the attribute is accepted, and where comments aren't parsed. Then gcc could also warn when the attribute is used incorrectly. The reason this is preferable is that it helps protect against more errors, say those introduced by merge mistakes. Tom
Re: [gomp4] update gfortran's tile clause error handling
On 10/03/16 10:07, Cesar Philippidis wrote: Nathan, I haven't looked too deeply into your tile changes yet. Do you know of the fortran FE is doing anything wrong? I haven't checked if it's lowering the tile clause in the proper format yet. thanks for working on this. The problems I noticed (& fixed) in the C/c++ frontends were 1) map '*' onto integer_zero_node -- this makes my changes cleaner. 2) should only accept integer constant expressions (whatever the fortran equivalent of that is). While runtime values could be made to work, the std doesn't require that, and it would perform quite badly due to the lack of constant folding 3) failing to parse nested loops correctly. It only parsed the outermost loop as a parallel loop. Tile in many ways looks like collapse If those could be addressed that'd be great -- it doesn't need my tile WIP to do that. -- Nathan Sidwell
Re: [PATCH, RFC] gcov: dump in a static dtor instead of in an atexit handler
On 10/03/2016 03:03 PM, Rainer Orth wrote: > Hi Martin, > >> On 09/30/2016 02:31 PM, Rainer Orth wrote: >>> this would be i386-pc-solaris2.12. I'm not sure if the constructor >>> priority detection works in a cross scenario. >>> >>> I'm attaching the resulting assembly (although for Solaris as, the gas >>> build is still running). >> >> Hi. Sorry, I have a stupid mistake in dtor priority >> (I used 65534 instead of desired 99). Please try to test it on Solaris 12 >> with the attached patch. I'll send the patch to ML soon. > > unfortunately, the patch makes no difference on Solaris 12. The test > even FAILs when using gas/gld, which is a different/independent > implementation of constructor priority. Ok, can you please send me x.S file for Solaris 12? > >> Can you please test whether it makes any change on a solaris target w/o >> prioritized ctors/dtors? > > It doesn't: the test PASSes on Solaris 10 and 11 with and without your > patch. I see, that would require the former approach using atexit, which would be chosen depending on whether target supports prioritized dtors or not. Martin > > Rainer >
[PATCH] Fix libstdc++ versioned namespace build
The versioned namespace build has been broken on all branches for some time. It's due to new code that doesn't use the namespace macros in the right places. This fixes all issues. Rather than declaring the std::experimental::* namespaces in I've added a new file that declares them and is only included by LFTS headers. That allows the new test to pass, which verifies that the std::experimental namespace doesn't exist when no TS headers are included. PR libstdc++/68323 PR libstdc++/77794 * config/abi/pre/gnu-versioned-namespace.ver: Add exports for __cxa_thread_atexit and __gnu_cxx::__freeres. * include/Makefile.am: Add * include/Makefile.in: Regenerate. * include.bits/basic_string.h: Fix nesting of versioned namespaces. * include/bits/c++config: Declare versioned namespaces for literals. * include/bits/regex.h (basic_regex, match_results): Add workarounds for PR c++/59256. * include/bits/uniform_int_dist.h: Fix nesting of versioned namespace. * include/std/chrono: Likewise. * include/std/complex: Likewise. * include/std/string_view: Likewise. * include/std/variant: Likewise. Add workaround for PR c++/59256. * include/experimental/bits/fs_fwd.h: Declare versioned namespace. * include/experimental/bits/lfts_config.h: Declare versioned namespaces. * include/experimental/algorithm: Include . * include/experimental/any: Likewise. * include/experimental/bits/erase_if.h: Likewise. * include/experimental/chrono: Likewise. * include/experimental/functional: Likewise. * include/experimental/memory_resource: Likewise. * include/experimental/optional: Likewise. * include/experimental/propagate_const: Likewise. * include/experimental/random: Likewise. * include/experimental/ratio: Likewise. * include/experimental/system_error: Likewise. * include/experimental/tuple: Likewise. * include/experimental/type_traits: Likewise. * include/experimental/utility: Likewise. * include/experimental/string_view: Likewise. Fix nesting of versioned namespaces. * include/experimental/bits/string_view.tcc: Reopen inline namespace for non-inline function definitions. * testsuite/17_intro/using_namespace_std_exp_neg.cc: New test. * testsuite/20_util/duration/literals/range.cc: Adjust dg-error line. * testsuite/experimental/any/misc/any_cast_neg.cc: Likewise. * testsuite/experimental/propagate_const/assignment/move_neg.cc: Likewise. * testsuite/experimental/propagate_const/cons/move_neg.cc: Likewise. * testsuite/experimental/propagate_const/requirements2.cc: Likewise. * testsuite/experimental/propagate_const/requirements3.cc: Likewise. * testsuite/experimental/propagate_const/requirements4.cc: Likewise. * testsuite/experimental/propagate_const/requirements5.cc: Likewise. * testsuite/ext/profile/mutex_extensions_neg.cc: Likewise. Tested x86_64-linux, with --enable-symvers=gnu-versioned-namespace and --enable-symvers=gnu, on trunk and gcc-6 and gcc-5 branches. The only failures are in synopsis.cc tests which expect to be able to redeclare names in namespace std (which is ambiguous if they're really declared in std::__7) or in tests that use scan-assembler or GDB and the expected strings are different due to the __7 namespace. I will probably add an effective target for the versioned namespace so we can disable those tests when they're going to fail. Committed to trunk and gcc-6 and gcc-5 branches. commit 7a3e391a33130d8cee8d763978b6fdc7b0ffd8ea Author: redi Date: Mon Oct 3 14:35:28 2016 + Fix libstdc++ versioned namespace build PR libstdc++/68323 PR libstdc++/77794 * config/abi/pre/gnu-versioned-namespace.ver: Add exports for __cxa_thread_atexit and __gnu_cxx::__freeres. * include/Makefile.am: Add * include/Makefile.in: Regenerate. * include.bits/basic_string.h: Fix nesting of versioned namespaces. * include/bits/c++config: Declare versioned namespaces for literals. * include/bits/regex.h (basic_regex, match_results): Add workarounds for PR c++/59256. * include/bits/uniform_int_dist.h: Fix nesting of versioned namespace. * include/std/chrono: Likewise. * include/std/complex: Likewise. * include/std/string_view: Likewise. * include/std/variant: Likewise. Add workaround for PR c++/59256. * include/experimental/bits/fs_fwd.h: Declare versioned namespace. * include/experimental/bits/lfts_config.h: Declare versioned namespaces. * include/experimental/algorithm: Include . * include/experimental/any: Likewise. * include/experimental/bits/erase_if.h: Likewise. * inclu
Re: [PATCH, OpenACC, Fortran] Fix PR77371, ICE on allocatable
On Sun, Oct 02, 2016 at 06:15:18PM +0800, Chung-Lin Tang wrote: > This patch fixes the two ICEs listed on PR77371. > One is due to the Fortran omp_privatize_by_reference hook returning true > for types like 'character(kind=1)[1:XX] *', causing them to be processed > by the path intended for C++ reference types. The path isn't something intended for C++ reference types, but for all the vars where whatever they point to should be privatized rather than just their value. Consider program p integer, allocatable :: n integer :: m allocate (n) n = 6 !$acc parallel firstprivate(n) private(m) m = n !$acc end parallel end testcase which with -fopenacc ICEs the same way, and then look carefully what is done on program p integer, allocatable :: n integer :: m allocate (n) n = 6 !$omp parallel firstprivate(n) private(m) m = n !$omp end parallel end with -fopenmp. The var is actually properly allocatable in the latter case, while it is not with your patch on the first testcase, you just copy over the host pointer, that is definitely not going to work on non-shared memory offloading. There is nothing special about references that use POINTER_TYPE as opposed to REFERENCE_TYPE. So, please first get this working with firstprivate on allocatables and only then start to play with reductions. > The other one is simply not setting 'remove = true' while error_at() was > already called. The gimplify.c change is ok for trunk. > Tested without regressions, committed on gomp-4_0-branch, > is this okay for trunk as well? > > Thanks, > Chung-Lin > > PR fortran/77371 > * omp-low.c (lower_omp_target): Avoid reference-type processing > on pointers for firstprivate clause. > * gimplify.c (gimplify_adjust_omp_clauses): Add 'remove = true' > when emitting error on private/firstprivate reductions. > > testsuite/ > * gfortran.dg/goacc/pr77371-1.f90: New test. > * gfortran.dg/goacc/pr77371-2.f90: New test. Jakub
Re: [PATCH] Remove .jcr registry from the crtfiles
As usual when removing target macros they should be poisoned in system.h. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] Remove x86 pcommit instruction
Hi, this patch removes PCOMMIT instruction since it was deprecated, please visit https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction for details. Regtested on x86_64. Is it Ok for trunk? 2016-10-03 Andrew Senkevich gcc/ * common/config/i386/i386-common.c (OPTION_MASK_ISA_PCOMMIT_UNSET, OPTION_MASK_ISA_PCOMMIT_SET): Deleted definitions. (ix86_handle_option): Deleted handle of OPT_mpcommit. * config.gcc: Deleted pcommitintrin.h * config/i386/pcommitintrin.h: Deleted file. * config/i386/cpuid.h (bit_PCOMMIT): Deleted. * config/i386/driver-i386.c (host_detect_local_cpu): Deleted pcommit detection. * config/i386/i386-c.c (ix86_target_macros_internal): Deleted define __PCOMMIT__. * config/i386/i386.c (ix86_target_string): Deleted -mpcommit. (PTA_PCOMMIT): Deleted define. (ix86_option_override_internal): Deleted handle of option. (ix86_valid_target_attribute_inner_p): Deleted pcommit. * config/i386/i386-builtin.def (IX86_BUILTIN_PCOMMIT, __builtin_ia32_pcommit): Deleted. * config/i386/i386.h (TARGET_PCOMMIT, TARGET_PCOMMIT_P): Deleted. * config/i386/i386.md (unspecv): Deleted UNSPECV_PCOMMIT. (pcommit): Deleted instruction. * config/i386/i386.opt: Deleted mpcommit. * config/i386/x86intrin.h: Deleted inclusion of pcommitintrin.h. gcc/testsuite/ * gcc.target/i386/pcommit-1.c: Deleted. * gcc.target/i386/sse-12.c: Deleted -mpcommit option. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * g++.dg/other/i386-2.C: Ditto. * g++.dg/other/i386-3.C: Ditto. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 2b771d1..0728a9d 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,35 @@ +2016-10-03 Andrew Senkevich + + * common/config/i386/i386-common.c (OPTION_MASK_ISA_PCOMMIT_UNSET, + OPTION_MASK_ISA_PCOMMIT_SET): Deleted definitions. + (ix86_handle_option): Deleted handle of OPT_mpcommit. + * config.gcc: Deleted pcommitintrin.h + * config/i386/pcommitintrin.h: Deleted. + * config/i386/cpuid.h (bit_PCOMMIT): Deleted. + * config/i386/driver-i386.c (host_detect_local_cpu): Deleted pcommit + detection. + * config/i386/i386-c.c (ix86_target_macros_internal): Deleted define + __PCOMMIT__. + * config/i386/i386.c (ix86_target_string): Deleted -mpcommit. + (PTA_PCOMMIT): Deleted define. + (ix86_option_override_internal): Deleted handle of option. + (ix86_valid_target_attribute_inner_p): Deleted pcommit. + * config/i386/i386-builtin.def (IX86_BUILTIN_PCOMMIT, + __builtin_ia32_pcommit): Deleted. + * config/i386/i386.h (TARGET_PCOMMIT, TARGET_PCOMMIT_P): Deleted. + * config/i386/i386.md (unspecv): Deleted UNSPECV_PCOMMIT. + (pcommit): Deleted instruction. + * config/i386/i386.opt: Add mpcommit. + * config/i386/x86intrin.h: Delete inclusion of pcommitintrin.h. + * testsuite/gcc.target/i386/pcommit-1.c: Deleted. + * gcc/testsuite/gcc.target/i386/sse-12.c: Deleted -pcommit option. + * gcc/testsuite/gcc.target/i386/sse-13.c: Ditto. + * gcc/testsuite/gcc.target/i386/sse-14.c: Ditto. + * gcc/testsuite/gcc.target/i386/sse-22.c: Ditto. + * gcc/testsuite/gcc.target/i386/sse-23.c: Ditto. + * gcc/testsuite/g++.dg/other/i386-2.C: Ditto. + * gcc/testsuite/g++.dg/other/i386-3.C: Ditto. + 2016-10-03 Kyrylo Tkachov Revert diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index 4f0a55f..ce1b5f7 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -86,7 +86,6 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_XSAVEC_SET \ (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_XSAVE) #define OPTION_MASK_ISA_CLWB_SET OPTION_MASK_ISA_CLWB -#define OPTION_MASK_ISA_PCOMMIT_SET OPTION_MASK_ISA_PCOMMIT /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -187,7 +186,6 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_CLFLUSHOPT_UNSET OPTION_MASK_ISA_CLFLUSHOPT #define OPTION_MASK_ISA_XSAVEC_UNSET OPTION_MASK_ISA_XSAVEC #define OPTION_MASK_ISA_XSAVES_UNSET OPTION_MASK_ISA_XSAVES -#define OPTION_MASK_ISA_PCOMMIT_UNSET OPTION_MASK_ISA_PCOMMIT #define OPTION_MASK_ISA_CLWB_UNSET OPTION_MASK_ISA_CLWB #define OPTION_MASK_ISA_MWAITX_UNSET OPTION_MASK_ISA_MWAITX #define OPTION_MASK_ISA_CLZERO_UNSET OPTION_MASK_ISA_CLZERO @@ -933,19 +931,6 @@ ix86_handle_option (struct gcc_options *opts, } return true; -case OPT_mpcommit: - if (value) - { - opts->x_ix86_isa_flags |= OPTION_MASK_ISA_PCOMMIT_SET; - o
Re: [patch] Fix ICE on ACATS test for Aarch64 at -O
Ping: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01781.html > 2016-09-26 Eric Botcazou > > * expmed.c (expand_shift_1): Add MAY_FAIL parameter and do not assert > that the result is non-zero if it is true. > (maybe_expand_shift): New wrapper around expand_shift_1. > (emit_store_flag): Call maybe_expand_shift in lieu of expand_shift. -- Eric Botcazou
RE: Fix PR tree-optimization/77808, ICE in duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439
>From: Christophe Lyon [christophe.l...@linaro.org] >Sent: Monday, October 03, 2016 12:05 AM >To: Doug Gilmore >Cc: gcc-patches@gcc.gnu.org >Subject: Re: Fix PR tree-optimization/77808, ICE in >duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439 > >On 2 October 2016 at 23:05, Doug Gilmore wrote: >> Hi Christophe, >> >>> From: Christophe Lyon [christophe.l...@linaro.org] >>> Sent: Saturday, October 01, 2016 7:57 AM >>> To: Doug Gilmore >>> Cc: gcc-patches@gcc.gnu.org >>> Subject: Re: Fix PR tree-optimization/77808, ICE in >>> duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439 >>> >>> Hi Doug, >>> >>> ... >>> I can confirm that your patch fixes the ICE I was seeing. >>> >>> However, the new testcase does not pass on low end >>> architectures: >>> cc1: warning: -fprefetch-loop-arrays not supported for this target >>> (try -march switches) >>> >>> Can you add a guard? >>> >>> Thanks, >>> >>> Christophe >> I updated the test to only run on X86, MIPS and AARCH64. Is that OK? >> > >I'm afraid not. > >The ICE occurred on some arm targets. By "low end" I meant armv5t for >example, as opposed to armv7t. >Is there a suitable effective target? I'll need to investigate that. BTW, gcc.dg/pr53550.c contains: /* PR tree-optimization/53550 */ /* { dg-do compile } */ /* { dg-options "-O2 -fprefetch-loop-arrays -w" } */ int * foo (int *x) { int *a = x + 10, *b = x, *c = a; while (b != c) *--c = *b++; return x; } Is it also failing on armv5t? I suppose it would. Thanks, Doug > >Thanks, > >Christophe > >> Thanks, >> >> Doug
Re: [PATCH, ARM 2/7, ping] Adapt atomic and exclusive load and store to ARMv8-M Baseline
Ping? Best regards, Thomas On 22/09/16 14:41, Thomas Preudhomme wrote: Hi, This patch is part of a patch series to add support for atomic operations on ARMv8-M Baseline targets in GCC. This specific patch adapts atomic and exclusive load and store patterns to the constraints of ARMv8-M Baseline. It consists of two sets of changes: - adding non predicated output templates because ARMv8-M Baseline does not have IT instruction - use low registers for ldr/str Together these changes require to create 2 new alternatives for atomic_load and atomic_store: (i) one for relaxed, consume and release memory model (the new Pf constraint) where ldr/str are used and thus low registers must be used and (ii) another one for the other memory model where lda/stl are used. These are separate from the constraint for 32bit targets whose output templates expect predication. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2016-07-05 Thomas Preud'homme * config/arm/constraints.md (Q constraint): Document its use for Thumb-1. (Pf constraint): New constraint for relaxed, consume or relaxed memory models. * config/arm/sync.md (atomic_load): Add new ARMv8-M Baseline only alternatives to allow any register when memory model matches Pf and thus lda is used, but only low registers otherwise. Use unpredicated output template for Thumb-1 targets. (atomic_store): Likewise for stl. (arm_load_exclusive): Add new ARMv8-M Baseline only alternative whose output template does not have predication. (arm_load_acquire_exclusive): Likewise. (arm_load_exclusivesi): Likewise. (arm_load_acquire_exclusivesi): Likewise. (arm_store_release_exclusive): Likewise. (arm_store_exclusive): Use unpredicated output template for Thumb-1 targets. Testing: No code generation difference for ARMv7-A, ARMv7VE and ARMv8-A on all atomic and synchronization testcases in the testsuite [2]. Patchset was also bootstrapped with --enable-itm --enable-gomp on ARMv8-A in ARM and Thumb mode at optimization level -O1 and above [1] without any regression in the testsuite and no code generation difference in libitm and libgomp. Code generation for ARMv8-M Baseline has been manually examined and compared against ARMv8-A Thumb-2 for the following configuration without finding any issue: gcc.dg/atomic-op-2.c at -Os gcc.dg/atomic-compare-exchange-2.c at -Os gcc.dg/atomic-compare-exchange-3.c at -O3 Is this ok for trunk? Best regards, Thomas [1] CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET were set to "-O1 -g", "-O3 -g" and undefined ("-O2 -g") [2] The exact list is: gcc/testsuite/gcc.dg/atomic-compare-exchange-1.c gcc/testsuite/gcc.dg/atomic-compare-exchange-2.c gcc/testsuite/gcc.dg/atomic-compare-exchange-3.c gcc/testsuite/gcc.dg/atomic-exchange-1.c gcc/testsuite/gcc.dg/atomic-exchange-2.c gcc/testsuite/gcc.dg/atomic-exchange-3.c gcc/testsuite/gcc.dg/atomic-fence.c gcc/testsuite/gcc.dg/atomic-flag.c gcc/testsuite/gcc.dg/atomic-generic.c gcc/testsuite/gcc.dg/atomic-generic-aux.c gcc/testsuite/gcc.dg/atomic-invalid-2.c gcc/testsuite/gcc.dg/atomic-load-1.c gcc/testsuite/gcc.dg/atomic-load-2.c gcc/testsuite/gcc.dg/atomic-load-3.c gcc/testsuite/gcc.dg/atomic-lockfree.c gcc/testsuite/gcc.dg/atomic-lockfree-aux.c gcc/testsuite/gcc.dg/atomic-noinline.c gcc/testsuite/gcc.dg/atomic-noinline-aux.c gcc/testsuite/gcc.dg/atomic-op-1.c gcc/testsuite/gcc.dg/atomic-op-2.c gcc/testsuite/gcc.dg/atomic-op-3.c gcc/testsuite/gcc.dg/atomic-op-6.c gcc/testsuite/gcc.dg/atomic-store-1.c gcc/testsuite/gcc.dg/atomic-store-2.c gcc/testsuite/gcc.dg/atomic-store-3.c gcc/testsuite/g++.dg/ext/atomic-1.C gcc/testsuite/g++.dg/ext/atomic-2.C gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire.c gcc/testsuite/gcc.target/arm/atomic-op-acq_rel.c gcc/testsuite/gcc.target/arm/atomic-op-acquire.c gcc/testsuite/gcc.target/arm/atomic-op-char.c gcc/testsuite/gcc.target/arm/atomic-op-consume.c gcc/testsuite/gcc.target/arm/atomic-op-int.c gcc/testsuite/gcc.target/arm/atomic-op-relaxed.c gcc/testsuite/gcc.target/arm/atomic-op-release.c gcc/testsuite/gcc.target/arm/atomic-op-seq_cst.c gcc/testsuite/gcc.target/arm/atomic-op-short.c gcc/testsuite/gcc.target/arm/atomic_loaddi_1.c gcc/testsuite/gcc.target/arm/atomic_loaddi_2.c gcc/testsuite/gcc.target/arm/atomic_loaddi_3.c gcc/testsuite/gcc.target/arm/atomic_loaddi_4.c gcc/testsuite/gcc.target/arm/atomic_loaddi_5.c gcc/testsuite/gcc.target/arm/atomic_loaddi_6.c gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c gcc/testsuite/gcc.target/arm/atomic_loaddi_9.c gcc/testsuite/gcc.target/arm/sync-1.c gcc/testsuite/gcc.target/arm/synchronize.c gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c libstdc++-v3/testsuite/29_atomics/a
Re: [PATCH][v4] GIMPLE store merging pass
On October 3, 2016 3:02:04 PM GMT+02:00, Kyrill Tkachov wrote: >Hi Richard, >another question as I'm working through your comments... > >On 29/09/16 11:45, Richard Biener wrote: >> >>> + /* The region from the byte array that we're inserting into. >*/ >>> + tree ptr_wide_int >>> + = native_interpret_expr (dest_int_type, ptr + first_byte, >>> +total_bytes); >>> + >>> + gcc_assert (ptr_wide_int); >>> + wide_int dest_wide_int >>> + = wi::to_wide (ptr_wide_int, TYPE_PRECISION (dest_int_type)); >>> + wide_int expr_wide_int >>> + = wi::to_wide (tmp_int, byte_size * BITS_PER_UNIT); >>> + if (BYTES_BIG_ENDIAN) >>> + { >>> + unsigned int insert_pos >>> + = byte_size * BITS_PER_UNIT - bitlen - (bitpos % >BITS_PER_UNIT); >>> + dest_wide_int >>> + = wi::insert (dest_wide_int, expr_wide_int, insert_pos, >bitlen); >>> + } >>> + else >>> + dest_wide_int = wi::insert (dest_wide_int, expr_wide_int, >>> + bitpos % BITS_PER_UNIT, bitlen); >>> + >>> + tree res = wide_int_to_tree (dest_int_type, dest_wide_int); >>> + native_encode_expr (res, ptr + first_byte, total_bytes, 0); >>> + >> OTOH this whole dance looks as complicated and way more expensive >than >> using native_encode_expr into a temporary buffern and then a >> manually implemented "bit-merging" of it at ptr + first_byte + >bitpos. >> AFAICS that operation is even endianess agnostic. > >If the quantity we're inserting at a non-byte boundary >is more than a byte wide we still have to shift the value >to position properly across the bytes it straddles, so I don't >see how we can avoid creating a wide_int here. >Consider inserting a 10-bit value at bitposition 3 (I hope the mailer >doesn't screw up the indentation): >value: xx >before: |||| > | byte 1 || byte 2 | >after: |---x||x---| > >We'll native_encode_expr the value into a two-byte buffer but then we >can't >just shift each byte by 3 to insert it into the destination buffer, we >need >to form the whole 10-bit value and shift is as a whole to not lose any >bits. Native encode will encode into a byte array in target representation / endianess. I think you can work byte-wise by properly merging 'lost' bits from adjacent bytes. And you at most need 2 of them per 'target byte'. >And if a value crosses bytes then we need to care about >BYTES_BIG_ENDIAN when >writing the bytes back into the buffer, no? If you shift a > byte size quantity on the host (wide-ints are in host endianess) then you indeed need to watch out for endianess. But as we deal with target memory representation plus bit offsets into memory I think it's natural to work with bytes. Richard. >Thanks, >Kyrill
Re: [PATCH, ARM 3/7, ping] Refactor atomic compare_and_swap to make it fit for ARMv8-M Baseline
Ping? Best regards, Thomas On 22/09/16 14:44, Thomas Preudhomme wrote: Hi, This patch is part of a patch series to add support for atomic operations on ARMv8-M Baseline targets in GCC. This specific patch refactors the expander and splitter for atomics to make the logic work with ARMv8-M Baseline which has limitation of Thumb-1 in terms of CC flag setting and different conditional compare insn patterns. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2016-09-02 Thomas Preud'homme * config/arm/arm.c (arm_expand_compare_and_swap): Add new bdst local variable. Add the new parameter to the insn generator. Set that parameter to be CC flag for 32-bit targets, bval otherwise. Set the return value from the negation of that parameter for Thumb-1, keeping the logic unchanged otherwise except for using bdst as the destination register of the compare_and_swap insn. (arm_split_compare_and_swap): Add explanation about how is the value returned to the function comment. Rename scratch variable to neg_bval. Adapt initialization of variables holding operands to the new operand numbers. Use return register to hold result of store exclusive for Thumb-1, scratch register otherwise. Construct the appropriate cbranch for Thumb-1 targets, keeping the logic unchanged for 32-bit targets. Guard Z flag setting to restrict to 32bit targets. Use gen_cbranchsi4 rather than hand-written conditional branch to loop for strongly ordered compare_and_swap. * config/arm/predicates.md (cc_register_operand): New predicate. * config/arm/sync.md (atomic_compare_and_swap_1): Use a match_operand with the new predicate to accept either the CC flag or a destination register for the boolean return value, restricting it to CC flag only via constraint. Adapt operand numbers accordingly. Testing: No code generation difference for ARMv7-A, ARMv7VE and ARMv8-A on all atomic and synchronization testcases in the testsuite [2]. Patchset was also bootstrapped with --enable-itm --enable-gomp on ARMv8-A in ARM and Thumb mode at optimization level -O1 and above [1] without any regression in the testsuite and no code generation difference in libitm and libgomp. Code generation for ARMv8-M Baseline has been manually examined and compared against ARMv8-A Thumb-2 for the following configuration without finding any issue: gcc.dg/atomic-op-2.c at -Os gcc.dg/atomic-compare-exchange-2.c at -Os gcc.dg/atomic-compare-exchange-3.c at -O3 Is this ok for trunk? Best regards, Thomas [1] CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET were set to "-O1 -g", "-O3 -g" and undefined ("-O2 -g") [2] The exact list is: gcc/testsuite/gcc.dg/atomic-compare-exchange-1.c gcc/testsuite/gcc.dg/atomic-compare-exchange-2.c gcc/testsuite/gcc.dg/atomic-compare-exchange-3.c gcc/testsuite/gcc.dg/atomic-exchange-1.c gcc/testsuite/gcc.dg/atomic-exchange-2.c gcc/testsuite/gcc.dg/atomic-exchange-3.c gcc/testsuite/gcc.dg/atomic-fence.c gcc/testsuite/gcc.dg/atomic-flag.c gcc/testsuite/gcc.dg/atomic-generic.c gcc/testsuite/gcc.dg/atomic-generic-aux.c gcc/testsuite/gcc.dg/atomic-invalid-2.c gcc/testsuite/gcc.dg/atomic-load-1.c gcc/testsuite/gcc.dg/atomic-load-2.c gcc/testsuite/gcc.dg/atomic-load-3.c gcc/testsuite/gcc.dg/atomic-lockfree.c gcc/testsuite/gcc.dg/atomic-lockfree-aux.c gcc/testsuite/gcc.dg/atomic-noinline.c gcc/testsuite/gcc.dg/atomic-noinline-aux.c gcc/testsuite/gcc.dg/atomic-op-1.c gcc/testsuite/gcc.dg/atomic-op-2.c gcc/testsuite/gcc.dg/atomic-op-3.c gcc/testsuite/gcc.dg/atomic-op-6.c gcc/testsuite/gcc.dg/atomic-store-1.c gcc/testsuite/gcc.dg/atomic-store-2.c gcc/testsuite/gcc.dg/atomic-store-3.c gcc/testsuite/g++.dg/ext/atomic-1.C gcc/testsuite/g++.dg/ext/atomic-2.C gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire.c gcc/testsuite/gcc.target/arm/atomic-op-acq_rel.c gcc/testsuite/gcc.target/arm/atomic-op-acquire.c gcc/testsuite/gcc.target/arm/atomic-op-char.c gcc/testsuite/gcc.target/arm/atomic-op-consume.c gcc/testsuite/gcc.target/arm/atomic-op-int.c gcc/testsuite/gcc.target/arm/atomic-op-relaxed.c gcc/testsuite/gcc.target/arm/atomic-op-release.c gcc/testsuite/gcc.target/arm/atomic-op-seq_cst.c gcc/testsuite/gcc.target/arm/atomic-op-short.c gcc/testsuite/gcc.target/arm/atomic_loaddi_1.c gcc/testsuite/gcc.target/arm/atomic_loaddi_2.c gcc/testsuite/gcc.target/arm/atomic_loaddi_3.c gcc/testsuite/gcc.target/arm/atomic_loaddi_4.c gcc/testsuite/gcc.target/arm/atomic_loaddi_5.c gcc/testsuite/gcc.target/arm/atomic_loaddi_6.c gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c gcc/testsuite/gcc.target/arm/atomic_loaddi_9.c gcc/testsuite/gcc.target/arm/sync-1.c gcc/testsuite/gcc.target/arm/synchronize.c gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c gcc/
Re: [PATCH, ARM 4/7, ping] Adapt atomic compare and swap to ARMv8-M Baseline
Ping? Best regards, Thomas On 22/09/16 14:46, Thomas Preudhomme wrote: Hi, This patch is part of a patch series to add support for atomic operations on ARMv8-M Baseline targets in GCC. This specific patch makes the necessary change for compare and swap to work for ARMv8-M Baseline, doubleword integers excepted. Namely, it adds Thumb-1 specific constraints to compare_and_swap. The constraints are chosen so that once the pattern is splitted, the individual instructions have their constraints respected. In particular, the constraints for the cbranchsi4_* pattern must be duplicated here, which explains the use of several alternatives. Note: changes to enable other atomic operation are in the next patch of the series. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2016-07-05 Thomas Preud'homme * config/arm/sync.md (atomic_compare_and_swap_1): Add new ARMv8-M Baseline only alternatives to (i) hold store atomic success value in a return register rather than a scratch register, (ii) use a low register for it and to (iii) ensure the cbranchsi insn generated by the split respect the constraints of Thumb-1 cbranchsi4_insn and cbranchsi4_scratch. * config/arm/thumb1.md (cbranchsi4_insn): Add comment to indicate constraints must match those in atomic_compare_and_swap. (cbranchsi4_scratch): Likewise. Testing: No code generation difference for ARMv7-A, ARMv7VE and ARMv8-A on all atomic and synchronization testcases in the testsuite [2]. Patchset was also bootstrapped with --enable-itm --enable-gomp on ARMv8-A in ARM and Thumb mode at optimization level -O1 and above [1] without any regression in the testsuite and no code generation difference in libitm and libgomp. Code generation for ARMv8-M Baseline has been manually examined and compared against ARMv8-A Thumb-2 for the following configuration without finding any issue: gcc.dg/atomic-op-2.c at -Os gcc.dg/atomic-compare-exchange-2.c at -Os gcc.dg/atomic-compare-exchange-3.c at -O3 Is this ok for trunk? Best regards, Thomas [1] CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET were set to "-O1 -g", "-O3 -g" and undefined ("-O2 -g") [2] The exact list is: gcc/testsuite/gcc.dg/atomic-compare-exchange-1.c gcc/testsuite/gcc.dg/atomic-compare-exchange-2.c gcc/testsuite/gcc.dg/atomic-compare-exchange-3.c gcc/testsuite/gcc.dg/atomic-exchange-1.c gcc/testsuite/gcc.dg/atomic-exchange-2.c gcc/testsuite/gcc.dg/atomic-exchange-3.c gcc/testsuite/gcc.dg/atomic-fence.c gcc/testsuite/gcc.dg/atomic-flag.c gcc/testsuite/gcc.dg/atomic-generic.c gcc/testsuite/gcc.dg/atomic-generic-aux.c gcc/testsuite/gcc.dg/atomic-invalid-2.c gcc/testsuite/gcc.dg/atomic-load-1.c gcc/testsuite/gcc.dg/atomic-load-2.c gcc/testsuite/gcc.dg/atomic-load-3.c gcc/testsuite/gcc.dg/atomic-lockfree.c gcc/testsuite/gcc.dg/atomic-lockfree-aux.c gcc/testsuite/gcc.dg/atomic-noinline.c gcc/testsuite/gcc.dg/atomic-noinline-aux.c gcc/testsuite/gcc.dg/atomic-op-1.c gcc/testsuite/gcc.dg/atomic-op-2.c gcc/testsuite/gcc.dg/atomic-op-3.c gcc/testsuite/gcc.dg/atomic-op-6.c gcc/testsuite/gcc.dg/atomic-store-1.c gcc/testsuite/gcc.dg/atomic-store-2.c gcc/testsuite/gcc.dg/atomic-store-3.c gcc/testsuite/g++.dg/ext/atomic-1.C gcc/testsuite/g++.dg/ext/atomic-2.C gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire.c gcc/testsuite/gcc.target/arm/atomic-op-acq_rel.c gcc/testsuite/gcc.target/arm/atomic-op-acquire.c gcc/testsuite/gcc.target/arm/atomic-op-char.c gcc/testsuite/gcc.target/arm/atomic-op-consume.c gcc/testsuite/gcc.target/arm/atomic-op-int.c gcc/testsuite/gcc.target/arm/atomic-op-relaxed.c gcc/testsuite/gcc.target/arm/atomic-op-release.c gcc/testsuite/gcc.target/arm/atomic-op-seq_cst.c gcc/testsuite/gcc.target/arm/atomic-op-short.c gcc/testsuite/gcc.target/arm/atomic_loaddi_1.c gcc/testsuite/gcc.target/arm/atomic_loaddi_2.c gcc/testsuite/gcc.target/arm/atomic_loaddi_3.c gcc/testsuite/gcc.target/arm/atomic_loaddi_4.c gcc/testsuite/gcc.target/arm/atomic_loaddi_5.c gcc/testsuite/gcc.target/arm/atomic_loaddi_6.c gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c gcc/testsuite/gcc.target/arm/atomic_loaddi_9.c gcc/testsuite/gcc.target/arm/sync-1.c gcc/testsuite/gcc.target/arm/synchronize.c gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c libstdc++-v3/testsuite/29_atomics/atomic/60658.cc libstdc++-v3/testsuite/29_atomics/atomic/62259.cc libstdc++-v3/testsuite/29_atomics/atomic/64658.cc libstdc++-v3/testsuite/29_atomics/atomic/65147.cc libstdc++-v3/testsuite/29_atomics/atomic/65913.cc libstdc++-v3/testsuite/29_atomics/atomic/70766.cc libstdc++-v3/testsuite/29_atomics/atomic/cons/49445.cc libstdc++-v3/testsuite/29_atomics/atomic/cons/constexpr.cc libstdc++-v3/testsuite/29_atomics/atomic/cons/copy_list.cc libstdc++-v3/testsuite/29_atomics
Re: [PATCH, ARM 5/7, ping] Adapt other atomic operations to ARMv8-M Baseline
Ping? Best regards, Thomas On 22/09/16 14:47, Thomas Preudhomme wrote: Hi, This patch is part of a patch series to add support for atomic operations on ARMv8-M Baseline targets in GCC. This specific patch adds support for remaining atomic operations (exchange, addition, substraction, bitwise AND, OR, XOR and NAND to ARMv8-M Baseline, doubleword integers excepted. As with the previous patch in the patch series, this mostly consists adding Thumb-1 specific constraints to atomic_* patterns to match those in thumb1.md for the non atomic operation. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2016-09-02 Thomas Preud'homme * config/arm/arm.c (arm_split_atomic_op): Add function comment. Add logic to to decide whether to copy over old value to register for new value. * config/arm/sync.md: Add comments explaning why mode and code attribute are not defined in iterators.md (thumb1_atomic_op_str): New code attribute. (thumb1_atomic_newop_str): Likewise. (thumb1_atomic_fetch_op_str): Likewise. (thumb1_atomic_fetch_newop_str): Likewise. (thumb1_atomic_fetch_oldop_str): Likewise. (atomic_exchange): Add new ARMv8-M Baseline only alternatives to mirror the more restrictive constraints of the Thumb-1 insns after split compared to Thumb-2 counterpart insns. (atomic_): Likewise. Add comment to keep constraints in sync with non atomic version. (atomic_nand): Likewise. (atomic_fetch_): Likewise. (atomic_fetch_nand): Likewise. (atomic__fetch): Likewise. (atomic_nand_fetch): Likewise. * config/arm/thumb1.md (thumb1_addsi3): Add comment to keep contraint in sync with atomic version. (thumb1_subsi3_insn): Likewise. (thumb1_andsi3_insn): Likewise. (thumb1_iorsi3_insn): Likewise. (thumb1_xorsi3_insn): Likewise. Testing: No code generation difference for ARMv7-A, ARMv7VE and ARMv8-A on all atomic and synchronization testcases in the testsuite [2]. Patchset was also bootstrapped with --enable-itm --enable-gomp on ARMv8-A in ARM and Thumb mode at optimization level -O1 and above [1] without any regression in the testsuite and no code generation difference in libitm and libgomp. Code generation for ARMv8-M Baseline has been manually examined and compared against ARMv8-A Thumb-2 for the following configuration without finding any issue: gcc.dg/atomic-op-2.c at -Os gcc.dg/atomic-compare-exchange-2.c at -Os gcc.dg/atomic-compare-exchange-3.c at -O3 Is this ok for trunk? Best regards, Thomas [1] CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET were set to "-O1 -g", "-O3 -g" and undefined ("-O2 -g") [2] The exact list is: gcc/testsuite/gcc.dg/atomic-compare-exchange-1.c gcc/testsuite/gcc.dg/atomic-compare-exchange-2.c gcc/testsuite/gcc.dg/atomic-compare-exchange-3.c gcc/testsuite/gcc.dg/atomic-exchange-1.c gcc/testsuite/gcc.dg/atomic-exchange-2.c gcc/testsuite/gcc.dg/atomic-exchange-3.c gcc/testsuite/gcc.dg/atomic-fence.c gcc/testsuite/gcc.dg/atomic-flag.c gcc/testsuite/gcc.dg/atomic-generic.c gcc/testsuite/gcc.dg/atomic-generic-aux.c gcc/testsuite/gcc.dg/atomic-invalid-2.c gcc/testsuite/gcc.dg/atomic-load-1.c gcc/testsuite/gcc.dg/atomic-load-2.c gcc/testsuite/gcc.dg/atomic-load-3.c gcc/testsuite/gcc.dg/atomic-lockfree.c gcc/testsuite/gcc.dg/atomic-lockfree-aux.c gcc/testsuite/gcc.dg/atomic-noinline.c gcc/testsuite/gcc.dg/atomic-noinline-aux.c gcc/testsuite/gcc.dg/atomic-op-1.c gcc/testsuite/gcc.dg/atomic-op-2.c gcc/testsuite/gcc.dg/atomic-op-3.c gcc/testsuite/gcc.dg/atomic-op-6.c gcc/testsuite/gcc.dg/atomic-store-1.c gcc/testsuite/gcc.dg/atomic-store-2.c gcc/testsuite/gcc.dg/atomic-store-3.c gcc/testsuite/g++.dg/ext/atomic-1.C gcc/testsuite/g++.dg/ext/atomic-2.C gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire.c gcc/testsuite/gcc.target/arm/atomic-op-acq_rel.c gcc/testsuite/gcc.target/arm/atomic-op-acquire.c gcc/testsuite/gcc.target/arm/atomic-op-char.c gcc/testsuite/gcc.target/arm/atomic-op-consume.c gcc/testsuite/gcc.target/arm/atomic-op-int.c gcc/testsuite/gcc.target/arm/atomic-op-relaxed.c gcc/testsuite/gcc.target/arm/atomic-op-release.c gcc/testsuite/gcc.target/arm/atomic-op-seq_cst.c gcc/testsuite/gcc.target/arm/atomic-op-short.c gcc/testsuite/gcc.target/arm/atomic_loaddi_1.c gcc/testsuite/gcc.target/arm/atomic_loaddi_2.c gcc/testsuite/gcc.target/arm/atomic_loaddi_3.c gcc/testsuite/gcc.target/arm/atomic_loaddi_4.c gcc/testsuite/gcc.target/arm/atomic_loaddi_5.c gcc/testsuite/gcc.target/arm/atomic_loaddi_6.c gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c gcc/testsuite/gcc.target/arm/atomic_loaddi_9.c gcc/testsuite/gcc.target/arm/sync-1.c gcc/testsuite/gcc.target/arm/synchronize.c gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c gcc/testsuite/gc
Re: [PATCH, ARM/testsuite 6/7, ping] Force soft float in ARMv6-M and ARMv8-M Baseline options
On 22/09/16 17:15, Thomas Preudhomme wrote: On 22/09/16 16:47, Richard Earnshaw (lists) wrote: On 22/09/16 15:51, Thomas Preudhomme wrote: Sorry, noticed an error in the patch. It was not caught during testing because GCC was built with --with-mode=thumb. Correct patch attached. Best regards, Thomas On 22/09/16 14:49, Thomas Preudhomme wrote: Hi, ARMv6-M and ARMv8-M Baseline only support soft float ABI. Therefore, the arm_arch_v8m_base add option should pass -mfloat-abi=soft, much like -mthumb is passed for architectures that only support Thumb instruction set. This patch adds -mfloat-abi=soft to both arm_arch_v6m and arm_arch_v8m_base add options. Patch is in attachment. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2016-07-15 Thomas Preud'homme * lib/target-supports.exp (add_options_for_arm_arch_v6m): Add -mfloat-abi=soft option. (add_options_for_arm_arch_v8m_base): Likewise. Is this ok for trunk? Best regards, Thomas 6_softfloat_testing_v6m_v8m_baseline.patch diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 0dabea0850124947a7fe333e0b94c4077434f278..b5d72f1283be6a6e4736a1d20936e169c1384398 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3540,24 +3540,25 @@ proc check_effective_target_arm_fp16_hw { } { # Usage: /* { dg-require-effective-target arm_arch_v5_ok } */ #/* { dg-add-options arm_arch_v5 } */ # /* { dg-require-effective-target arm_arch_v5_multilib } */ -foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__ - v4t "-march=armv4t" __ARM_ARCH_4T__ - v5 "-march=armv5 -marm" __ARM_ARCH_5__ - v5t "-march=armv5t" __ARM_ARCH_5T__ - v5te "-march=armv5te" __ARM_ARCH_5TE__ - v6 "-march=armv6" __ARM_ARCH_6__ - v6k "-march=armv6k" __ARM_ARCH_6K__ - v6t2 "-march=armv6t2" __ARM_ARCH_6T2__ - v6z "-march=armv6z" __ARM_ARCH_6Z__ - v6m "-march=armv6-m -mthumb" __ARM_ARCH_6M__ - v7a "-march=armv7-a" __ARM_ARCH_7A__ - v7r "-march=armv7-r" __ARM_ARCH_7R__ - v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__ - v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__ - v8a "-march=armv8-a" __ARM_ARCH_8A__ - v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ - v8m_base "-march=armv8-m.base -mthumb" __ARM_ARCH_8M_BASE__ - v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } { +foreach { armfunc armflag armdef } { +v4 "-march=armv4 -marm" __ARM_ARCH_4__ +v4t "-march=armv4t" __ARM_ARCH_4T__ +v5 "-march=armv5 -marm" __ARM_ARCH_5__ +v5t "-march=armv5t" __ARM_ARCH_5T__ +v5te "-march=armv5te" __ARM_ARCH_5TE__ +v6 "-march=armv6" __ARM_ARCH_6__ +v6k "-march=armv6k" __ARM_ARCH_6K__ +v6t2 "-march=armv6t2" __ARM_ARCH_6T2__ +v6z "-march=armv6z" __ARM_ARCH_6Z__ +v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__ +v7a "-march=armv7-a" __ARM_ARCH_7A__ +v7r "-march=armv7-r" __ARM_ARCH_7R__ +v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__ +v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__ +v8a "-march=armv8-a" __ARM_ARCH_8A__ +v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ +v8m_base "-march=armv8-m.base -mthumb -mfloat-abi=soft" __ARM_ARCH_8M_BASE__ +v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } { eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] { proc check_effective_target_arm_arch_FUNC_ok { } { if { [ string match "*-marm*" "FLAG" ] && I think if you're going to do this you need to also check that changing the ABI in this way isn't incompatible with other aspects of how the user has invoked dejagnu. So should this check also be done for all the target for which -mthumb is passed or is there a difference between the two situations? Ping? Best regards, Thomas
Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
On 10/03/2016 06:26 AM, Nathan Sidwell wrote: On 10/03/16 08:13, Martin Liška wrote: On 08/18/2016 05:53 PM, Jeff Law wrote: On 08/18/2016 09:51 AM, Andi Kleen wrote: I'd prefer to make updates atomic in multi-threaded applications. The best proxy we have for that is -pthread. Is it slower, most definitely, but odds are we're giving folks garbage data otherwise, which in many ways is even worse. It will likely be catastrophically slower in some cases. Catastrophically as in too slow to be usable. An atomic instruction is a lot more expensive than a single increment. Also they sometimes are really slow depending on the state of the machine. And for those cases there's a way to override. The default should be set for correctness. jeff I would to somehow resolve the discussion related to default value selection. Is the prevailing consensus that we should set -fprofile-update=atomic when -pthread is set? If so, I'll prepare a patch. I tend to do it this way. This is my preference. Likewise. jeff
Re: [PATCH, ARM 7/7, ping] Enable ARMv8-M atomic and synchronization support for ARMv8-M Baseline
Ping? Best regards, Thomas On 22/09/16 14:50, Thomas Preudhomme wrote: Hi, This patch is part of a patch series to add support for atomic operations on ARMv8-M Baseline targets in GCC. This specific patch enables atomic and synchronization support added in previous patches of the series and adds tests. Enabling is done at the end of the patch series to ensure that no ICE is seen when in the middle of the patch series (eg. while doing a bisect). Enabling is done by enabling the exclusive and atomic loads and stores needed to implement all synchronization and atomic operations. ChangeLog entries are as follow: *** gcc/ChangeLog *** 2016-07-05 Thomas Preud'homme * config/arm/arm.h (TARGET_HAVE_LDREX): Define for ARMv8-M Baseline. (TARGET_HAVE_LDREXBH): Likewise. (TARGET_HAVE_LDACQ): Likewise. *** gcc/testsuite/ChangeLog *** 2016-07-05 Thomas Preud'homme * gcc.target/arm/atomic-comp-swap-release-acquire-3.c: New test. * gcc.target/arm/atomic-op-acq_rel-3.c: Likewise. * gcc.target/arm/atomic-op-acquire-3.c: Likewise. * gcc.target/arm/atomic-op-char-3.c: Likewise. * gcc.target/arm/atomic-op-consume-3.c: Likewise. * gcc.target/arm/atomic-op-int-3.c: Likewise. * gcc.target/arm/atomic-op-relaxed-3.c: Likewise. * gcc.target/arm/atomic-op-release-3.c: Likewise. * gcc.target/arm/atomic-op-seq_cst-3.c: Likewise. * gcc.target/arm/atomic-op-short-3.c: Likewise. Testing: No code generation difference for ARMv7-A, ARMv7VE and ARMv8-A on all atomic and synchronization testcases in the testsuite [2]. Patchset was also bootstrapped with --enable-itm --enable-gomp on ARMv8-A in ARM and Thumb mode at optimization level -O1 and above [1] without any regression in the testsuite and no code generation difference in libitm and libgomp. Code generation for ARMv8-M Baseline has been manually examined and compared against ARMv8-A Thumb-2 for the following configuration without finding any issue: gcc.dg/atomic-op-2.c at -Os gcc.dg/atomic-compare-exchange-2.c at -Os gcc.dg/atomic-compare-exchange-3.c at -O3 Is this ok for trunk? Best regards, Thomas [1] CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET were set to "-O1 -g", "-O3 -g" and undefined ("-O2 -g") [2] The exact list is: gcc/testsuite/gcc.dg/atomic-compare-exchange-1.c gcc/testsuite/gcc.dg/atomic-compare-exchange-2.c gcc/testsuite/gcc.dg/atomic-compare-exchange-3.c gcc/testsuite/gcc.dg/atomic-exchange-1.c gcc/testsuite/gcc.dg/atomic-exchange-2.c gcc/testsuite/gcc.dg/atomic-exchange-3.c gcc/testsuite/gcc.dg/atomic-fence.c gcc/testsuite/gcc.dg/atomic-flag.c gcc/testsuite/gcc.dg/atomic-generic.c gcc/testsuite/gcc.dg/atomic-generic-aux.c gcc/testsuite/gcc.dg/atomic-invalid-2.c gcc/testsuite/gcc.dg/atomic-load-1.c gcc/testsuite/gcc.dg/atomic-load-2.c gcc/testsuite/gcc.dg/atomic-load-3.c gcc/testsuite/gcc.dg/atomic-lockfree.c gcc/testsuite/gcc.dg/atomic-lockfree-aux.c gcc/testsuite/gcc.dg/atomic-noinline.c gcc/testsuite/gcc.dg/atomic-noinline-aux.c gcc/testsuite/gcc.dg/atomic-op-1.c gcc/testsuite/gcc.dg/atomic-op-2.c gcc/testsuite/gcc.dg/atomic-op-3.c gcc/testsuite/gcc.dg/atomic-op-6.c gcc/testsuite/gcc.dg/atomic-store-1.c gcc/testsuite/gcc.dg/atomic-store-2.c gcc/testsuite/gcc.dg/atomic-store-3.c gcc/testsuite/g++.dg/ext/atomic-1.C gcc/testsuite/g++.dg/ext/atomic-2.C gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire.c gcc/testsuite/gcc.target/arm/atomic-op-acq_rel.c gcc/testsuite/gcc.target/arm/atomic-op-acquire.c gcc/testsuite/gcc.target/arm/atomic-op-char.c gcc/testsuite/gcc.target/arm/atomic-op-consume.c gcc/testsuite/gcc.target/arm/atomic-op-int.c gcc/testsuite/gcc.target/arm/atomic-op-relaxed.c gcc/testsuite/gcc.target/arm/atomic-op-release.c gcc/testsuite/gcc.target/arm/atomic-op-seq_cst.c gcc/testsuite/gcc.target/arm/atomic-op-short.c gcc/testsuite/gcc.target/arm/atomic_loaddi_1.c gcc/testsuite/gcc.target/arm/atomic_loaddi_2.c gcc/testsuite/gcc.target/arm/atomic_loaddi_3.c gcc/testsuite/gcc.target/arm/atomic_loaddi_4.c gcc/testsuite/gcc.target/arm/atomic_loaddi_5.c gcc/testsuite/gcc.target/arm/atomic_loaddi_6.c gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c gcc/testsuite/gcc.target/arm/atomic_loaddi_9.c gcc/testsuite/gcc.target/arm/sync-1.c gcc/testsuite/gcc.target/arm/synchronize.c gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c libstdc++-v3/testsuite/29_atomics/atomic/60658.cc libstdc++-v3/testsuite/29_atomics/atomic/62259.cc libstdc++-v3/testsuite/29_atomics/atomic/64658.cc libstdc++-v3/testsuite/29_atomics/atomic/65147.cc libstdc++-v3/testsuite/29_atomics/atomic/65913.cc libstdc++-v3/testsuite/29_atomics/atomic/70766.cc libstdc++-v3/testsuite/29_atomics/atomic/cons/49445.cc libstdc++-v3/testsuite/29_atomics/atomic/cons/constexpr.cc
[PATCH] Define std::gcd and std::lcm for C++17
This shares the code for std::gcd and std::experimental::gcd, and similarly for lcm. I realised I'd mixed up the gcd.cc and lcm.cc tests so this patch also swaps them around. * doc/xml/manual/status_cxx2017.xml: Update gcd/lcm status. * doc/html/*: Regenerate. * include/experimental/numeric (__abs): Move to . (gcd, lcm): Use __detail::gcd and __detail::lcm. * include/std/numeric (__detail::__abs_integral) (__detail::__gcd, __detail::__lcm): Define. (gcd, lcm): Define for C++17. * testsuite/26_numerics/gcd/1.cc: New test. * testsuite/26_numerics/lcm/1.cc: New test. * testsuite/experimental/numeric/gcd.cc: Swap contents with ... * testsuite/experimental/numeric/lcd.cc: ... this. Tested powerpc64le-linux, committed to trunk. commit 56efd86de7a7bbcfe43fe1c20979e06eb5e49802 Author: Jonathan Wakely Date: Mon Oct 3 17:09:10 2016 +0100 Define std::gcd and std::lcm for C++17 * doc/xml/manual/status_cxx2017.xml: Update gcd/lcm status. * doc/html/*: Regenerate. * include/experimental/numeric (__abs): Move to . (gcd, lcm): Use __detail::gcd and __detail::lcm. * include/std/numeric (__detail::__abs_integral) (__detail::__gcd, __detail::__lcm): Define. (gcd, lcm): Define for C++17. * testsuite/26_numerics/gcd/1.cc: New test. * testsuite/26_numerics/lcm/1.cc: New test. * testsuite/experimental/numeric/gcd.cc: Swap contents with ... * testsuite/experimental/numeric/lcd.cc: ... this. diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml index feed085..9f47b349 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml @@ -615,14 +615,13 @@ Feature-testing recommendations for C++. - Adopt Selected Library Fundamentals V2 Components for C++17 http://www.w3.org/1999/xlink"; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0295r0.pdf";> P0295R0 - No + 7 __cpp_lib_gcd >= 201606 , __cpp_lib_lcm >= 201606 diff --git a/libstdc++-v3/include/experimental/numeric b/libstdc++-v3/include/experimental/numeric index 5089772..6d1dc21 100644 --- a/libstdc++-v3/include/experimental/numeric +++ b/libstdc++-v3/include/experimental/numeric @@ -52,44 +52,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #define __cpp_lib_experimental_gcd_lcm 201411 - // std::abs is not constexpr and doesn't support unsigned integers. - template -constexpr -enable_if_t<__and_, is_signed<_Tp>>::value, _Tp> -__abs(_Tp __val) -{ return __val < 0 ? -__val : __val; } - - template -constexpr -enable_if_t<__and_, is_unsigned<_Tp>>::value, _Tp> -__abs(_Tp __val) -{ return __val; } - - // Greatest common divisor + /// Greatest common divisor template constexpr common_type_t<_Mn, _Nn> gcd(_Mn __m, _Nn __n) { static_assert(is_integral<_Mn>::value, "arguments to gcd are integers"); static_assert(is_integral<_Nn>::value, "arguments to gcd are integers"); - - return __m == 0 ? fundamentals_v2::__abs(__n) - : __n == 0 ? fundamentals_v2::__abs(__m) - : fundamentals_v2::gcd(__n, __m % __n); + return std::__detail::__gcd(__m, __n); } - // Least common multiple + /// Least common multiple template constexpr common_type_t<_Mn, _Nn> lcm(_Mn __m, _Nn __n) { static_assert(is_integral<_Mn>::value, "arguments to lcm are integers"); static_assert(is_integral<_Nn>::value, "arguments to lcm are integers"); - - return (__m != 0 && __n != 0) - ? (fundamentals_v2::__abs(__m) / fundamentals_v2::gcd(__m, __n)) - * fundamentals_v2::__abs(__n) - : 0; + return std::__detail::__lcm(__m, __n); } _GLIBCXX_END_NAMESPACE_VERSION diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric index 47a7cb8..7b1ab98 100644 --- a/libstdc++-v3/include/std/numeric +++ b/libstdc++-v3/include/std/numeric @@ -74,4 +74,83 @@ * math functions. */ +#if __cplusplus >= 201402L +#include + +namespace std _GLIBCXX_VISIBILITY(default) +{ +namespace __detail +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + + // std::abs is not constexpr and doesn't support unsigned integers. + template +constexpr +enable_if_t<__and_, is_signed<_Tp>>::value, _Tp> +__abs_integral(_Tp __val) +{ return __val < 0 ? -__val : __val; } + + template +constexpr +enable_if_t<__and_, is_unsigned<_Tp>>::value, _Tp> +__abs_integral(_Tp __val) +{ return __val; } + + template +constexpr common_type_t<_Mn, _Nn> +__gcd(_Mn __m, _Nn __n) +{ + return __m == 0 ? __detail::__abs_integral(__n) + : __n == 0 ? __detail::__abs_integral(__m) + : __detail::__gcd(__n, __m % __n); +} + +
[PATCH] Extend -Wint-in-bool-context to more conditional expressions
Hi! This is a next step in extending the -Wint-in-bool-context warning to cover the case when a conditional expression has only one arm which evaluates to a non-boolean integer value. With a previous version of this warning, we found PR 77574, among with several more or less false positives, but meanwhile, mostly due to excluding conditional expressions that originate from macro expansion, there are no false positives any more, so I think this is fine now with -Wall. Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd.c-family: 2016-10-03 Bernd Edlinger * c-common.c (c_common_truthvalue_conversion): Warn also for suspicious conditional expression in boolean context when only one arm is non-boolean. testsuite: 2016-10-03 Bernd Edlinger * c-c++-common/Wint-in-bool-context.c: Update test. Index: gcc/c-family/c-common.c === --- gcc/c-family/c-common.c (revision 240713) +++ gcc/c-family/c-common.c (working copy) @@ -4675,6 +4675,14 @@ c_common_truthvalue_conversion (location_t locatio warning_at (EXPR_LOCATION (expr), OPT_Wint_in_bool_context, "?: using integer constants in boolean context, " "the expression will always evaluate to %"); + else if ((TREE_CODE (val1) == INTEGER_CST + && !integer_zerop (val1) + && !integer_onep (val1)) + || (TREE_CODE (val2) == INTEGER_CST + && !integer_zerop (val2) + && !integer_onep (val2))) + warning_at (EXPR_LOCATION (expr), OPT_Wint_in_bool_context, + "?: using integer constants in boolean context"); } /* Distribute the conversion into the arms of a COND_EXPR. */ if (c_dialect_cxx ()) Index: gcc/testsuite/c-c++-common/Wint-in-bool-context.c === --- gcc/testsuite/c-c++-common/Wint-in-bool-context.c (revision 240713) +++ gcc/testsuite/c-c++-common/Wint-in-bool-context.c (working copy) @@ -10,7 +10,7 @@ int foo (int a, int b) if (a > 0 && a <= (b == 2) ? 1 : 1) /* { dg-bogus "boolean context" } */ return 2; - if (a > 0 && a <= (b == 3) ? 0 : 2) /* { dg-bogus "boolean context" } */ + if (a > 0 && a <= (b == 3) ? 0 : 2) /* { dg-warning "boolean context" } */ return 3; if (a == b ? 0 : 0) /* { dg-bogus "boolean context" } */
Re: [PATCH] Remove .jcr registry from the crtfiles
On Mon, Oct 03, 2016 at 03:26:10PM +, Joseph Myers wrote: > As usual when removing target macros they should be poisoned in system.h. Here is the patch with that poisoning. Bootstrapped/regtested on x86_64-linux and i686-linux again, ok for trunk? 2016-10-03 Jakub Jelinek gcc/ * defaults.h (JCR_SECTION_NAME, TARGET_USE_JCR_SECTION): Remove. * system.h (JCR_SECTION_NAME, TARGET_USE_JCR_SECTION): Poison. * doc/tm.texi.in (TARGET_USE_JCR_SECTION): Remove. * doc/tm.texi: Regenerated. * config/i386/mingw32.h (TARGET_USE_JCR_SECTION): Remove. * config/i386/cygming.h (TARGET_USE_JCR_SECTION): Remove. * config/darwin.h (JCR_SECTION_NAME): Remove. * config/pa/pa64-hpux.h (JCR_SECTION_NAME): Remove. * config/rs6000/aix71.h (TARGET_USE_JCR_SECTION): Remove. * config/rs6000/aix51.h (TARGET_USE_JCR_SECTION): Remove. * config/rs6000/aix52.h (TARGET_USE_JCR_SECTION): Remove. * config/rs6000/aix53.h (TARGET_USE_JCR_SECTION): Remove. * config/rs6000/aix61.h (TARGET_USE_JCR_SECTION): Remove. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins): Don't define __LIBGCC_JCR_SECTION_NAME__. libgcc/ * config/i386/cygming-crtbegin.c (_Jv_RegisterClasses): Remove. (__JCR_LIST__): Remove. (__gcc_register_frame): Don't attempt to _Jv_RegisterClasses. * config/i386/cygming-crtend.c (__JCR_END__): Remove. * config/ia64/crtbegin.S (__JCR_LIST__): Remove. * config/ia64/crtend.S (__JCR_END__): Remove. * crtstuff.c: Remove __LIBGCC_JCR_SECTION_NAME__ from preprocessor conditionals. (__JCR_LIST__, __JCR_END__): Remove. (frame_dummy): Don't attempt to _Jv_RegisterClasses. (__do_global_ctors_1): Likewise. --- gcc/config/i386/mingw32.h.jj2016-05-20 09:05:08.836063467 +0200 +++ gcc/config/i386/mingw32.h 2016-10-01 18:55:14.646199686 +0200 @@ -239,9 +239,6 @@ do { \ #undef TARGET_N_FORMAT_TYPES #define TARGET_N_FORMAT_TYPES 3 -/* Let defaults.h definition of TARGET_USE_JCR_SECTION apply. */ -#undef TARGET_USE_JCR_SECTION - #define HAVE_ENABLE_EXECUTE_STACK #undef CHECK_EXECUTE_STACK_ENABLED #define CHECK_EXECUTE_STACK_ENABLED flag_setstackexecutable --- gcc/config/i386/cygming.h.jj2016-09-27 09:46:13.0 +0200 +++ gcc/config/i386/cygming.h 2016-10-01 18:56:16.133441952 +0200 @@ -443,11 +443,6 @@ do { \ #endif /* HAVE_GAS_WEAK */ -/* FIXME: SUPPORTS_WEAK && TARGET_HAVE_NAMED_SECTIONS is true, - but for .jcr section to work we also need crtbegin and crtend - objects. */ -#define TARGET_USE_JCR_SECTION 1 - /* Decide whether it is safe to use a local alias for a virtual function when constructing thunks. */ #undef TARGET_USE_LOCAL_THUNK_ALIAS_P --- gcc/config/darwin.h.jj 2016-09-15 13:39:14.518013115 +0200 +++ gcc/config/darwin.h 2016-10-01 18:55:40.056886539 +0200 @@ -825,9 +825,6 @@ enum machopic_addr_class { #define EH_FRAME_SECTION_NAME "__TEXT" #define EH_FRAME_SECTION_ATTR ",coalesced,no_toc+strip_static_syms+live_support" -/* Java runtime class list. */ -#define JCR_SECTION_NAME "__DATA,jcr,regular,no_dead_strip" - #undef ASM_PREFERRED_EH_DATA_FORMAT #define ASM_PREFERRED_EH_DATA_FORMAT(CODE,GLOBAL) \ (((CODE) == 2 && (GLOBAL) == 1) \ --- gcc/config/pa/pa64-hpux.h.jj2016-04-08 19:19:23.894042211 +0200 +++ gcc/config/pa/pa64-hpux.h 2016-10-01 18:55:35.171946738 +0200 @@ -170,8 +170,6 @@ along with GCC; see the file COPYING3. #define DATA_SECTION_ASM_OP"\t.data" #define BSS_SECTION_ASM_OP "\t.section\t.bss" -#define JCR_SECTION_NAME ".jcr" - #define HP_INIT_ARRAY_SECTION_ASM_OP "\t.section\t.init" #define GNU_INIT_ARRAY_SECTION_ASM_OP "\t.section\t.init_array" #define HP_FINI_ARRAY_SECTION_ASM_OP "\t.section\t.fini" @@ -382,8 +380,8 @@ do { \ initializers specified here. */ /* We need to add frame_dummy to the initializer list if EH_FRAME_SECTION_NAME - or JCR_SECTION_NAME is defined. */ -#if defined(EH_FRAME_SECTION_NAME) || defined(JCR_SECTION_NAME) + is defined. */ +#if defined(EH_FRAME_SECTION_NAME) #define PA_INIT_FRAME_DUMMY_ASM_OP ".dword P%frame_dummy" #else #define PA_INIT_FRAME_DUMMY_ASM_OP "" --- gcc/config/rs6000/aix71.h.jj2016-01-21 21:28:01.218834652 +0100 +++ gcc/config/rs6000/aix71.h 2016-10-01 18:55:49.667768100 +0200 @@ -210,8 +210,6 @@ extern long long intatoll(const char /* This target defines SUPPORTS_WEAK and TARGET_ASM_NAMED_SECTION, but does not have crtbegin/end. */ -#define TARGET_USE_JCR_SECTION 0 - #define TARGET_AIX_VERSION 71 /* AIX 7.1 supports DWARF3 debugging, but XCOFF remains the default. */ --- gcc/config/rs6000/aix51.h.jj2016-06-20 10:30:35.629607920 +0200 +++ gcc/config/rs6000/
[PATCH] Fix ubsan ICE on vector shift (PR sanitizer/77823)
Hi! libsanitizer isn't right now prepared to handle vector types, and we don't instrument vector additions/multiplications etc. for overflow etc. either, so this patch just turns the single case that slipped through. As I wrote in the PR, in the future we should probably change libubsan to handle them and start instrumenting those. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-10-03 Jakub Jelinek PR sanitizer/77823 * c-ubsan.c (ubsan_instrument_shift): Return NULL_TREE if type0 is not integral. * c-c++-common/ubsan/shift-9.c: New test. --- gcc/c-family/c-ubsan.c.jj 2016-01-04 14:55:58.0 +0100 +++ gcc/c-family/c-ubsan.c 2016-10-03 13:49:49.423318587 +0200 @@ -114,6 +114,9 @@ ubsan_instrument_shift (location_t loc, tree t, tt = NULL_TREE; tree type0 = TREE_TYPE (op0); tree type1 = TREE_TYPE (op1); + if (!INTEGRAL_TYPE_P (type0)) +return NULL_TREE; + tree op1_utype = unsigned_type_for (type1); HOST_WIDE_INT op0_prec = TYPE_PRECISION (type0); tree uprecm1 = build_int_cst (op1_utype, op0_prec - 1); @@ -126,8 +129,7 @@ ubsan_instrument_shift (location_t loc, /* If this is not a signed operation, don't perform overflow checks. Also punt on bit-fields. */ - if (!INTEGRAL_TYPE_P (type0) - || TYPE_OVERFLOW_WRAPS (type0) + if (TYPE_OVERFLOW_WRAPS (type0) || GET_MODE_BITSIZE (TYPE_MODE (type0)) != TYPE_PRECISION (type0)) ; --- gcc/testsuite/c-c++-common/ubsan/shift-9.c.jj 2016-10-03 14:23:54.301711636 +0200 +++ gcc/testsuite/c-c++-common/ubsan/shift-9.c 2016-10-03 13:54:50.0 +0200 @@ -0,0 +1,30 @@ +/* PR sanitizer/77823 */ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-fsanitize=undefined -Wno-psabi -w" } */ + +typedef unsigned V __attribute__((vector_size(32))); +typedef unsigned __int128 W __attribute__((vector_size(32))); + +V +foo (V v) +{ + return v << 30; +} + +V +bar (V v, V w) +{ + return v << w; +} + +W +baz (W v) +{ + return v << 30; +} + +W +boo (W v, W w) +{ + return v << w; +} Jakub
[C++ PATCH] Fix ICE during C++11 lambda error recovery (PR c++/77791)
Hi! In param_list some entries could be error_mark_node, we should just ignore those. ALso, this patch optimizes by testing cxx_dialect < cxx14 just once. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-10-03 Jakub Jelinek PR c++/77791 * parser.c (cp_parser_lambda_declarator_opt): Only pedwarn for C++11 on decls in the param_list. Test cxx_dialect < cxx14 before the loop just once. * g++.dg/cpp0x/lambda/lambda-77791.C: New test. --- gcc/cp/parser.c.jj 2016-09-27 21:09:59.0 +0200 +++ gcc/cp/parser.c 2016-10-03 15:00:31.759317804 +0200 @@ -10114,10 +10114,11 @@ cp_parser_lambda_declarator_opt (cp_pars /* Default arguments shall not be specified in the parameter-declaration-clause of a lambda-declarator. */ - for (tree t = param_list; t; t = TREE_CHAIN (t)) - if (TREE_PURPOSE (t) && cxx_dialect < cxx14) - pedwarn (DECL_SOURCE_LOCATION (TREE_VALUE (t)), OPT_Wpedantic, - "default argument specified for lambda parameter"); + if (cxx_dialect < cxx14) + for (tree t = param_list; t; t = TREE_CHAIN (t)) + if (TREE_PURPOSE (t) && DECL_P (TREE_VALUE (t))) + pedwarn (DECL_SOURCE_LOCATION (TREE_VALUE (t)), OPT_Wpedantic, +"default argument specified for lambda parameter"); cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN); --- gcc/testsuite/g++.dg/cpp0x/lambda/lambda-77791.C.jj 2016-10-03 15:01:21.831694292 +0200 +++ gcc/testsuite/g++.dg/cpp0x/lambda/lambda-77791.C2016-10-03 14:58:22.0 +0200 @@ -0,0 +1,4 @@ +// PR c++/77791 +// { dg-do compile { target c++11 } } + +auto a = [] (int i, int i = 0) {}; // { dg-error "redefinition of" } Jakub
Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
> >>I would to somehow resolve the discussion related to default value > >>selection. > >>Is the prevailing consensus that we should set -fprofile-update=atomic > >>when > >>-pthread is set? If so, I'll prepare a patch. I tend to do it this way. > > > >This is my preference. > Likewise. I still think it shouldn't be default even with -pthread because it could dramatically degrade performance in these cases. People likely have -pthread in their Makefiles without realizing it. Such changes should be explict opt-in. Often severe performance decreases lead to incorrectness in practice ("is now too slow to finish training workload in rebuild cycle") -Andi
Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)
+FAIL: gcc.dg/tree-ssa/builtin-sprintf.c execution test FAIL: test_a_double:364: "%a" expected result for "0x0.0p+0" doesn't match function call return value: 20 != 6 FAIL: test_a_double:365: "%a" expected result for "0x1.0p+0" doesn't match function call return value: 20 != 6 FAIL: test_a_double:366: "%a" expected result for "0x1.0p+1" doesn't match function call return value: 20 != 6 FAIL: test_a_long_double:375: "%La" expected result for "0x0.p+0" doesn't match function call return value: 35 != 6 FAIL: test_a_long_double:376: "%La" expected result for "0x1.p+0" doesn't match function call return value: 35 != 6 FAIL: test_a_long_double:377: "%La" expected result for "0x1.p+1" doesn't match function call return value: 35 != 6 I don't know about these. It looks like the Solaris printf doesn't handle the %a directive correctly and the tests (and the related checks/optimization) might need to be disabled, which in turn might involve extending the existing printf hook or adding a new one. I've found the following in Solaris 10 (and up) printf(3C): a, AA double argument representing a floating-point number is converted in the style "[-]0xh.p+d", where the single hexadecimal digit preceding the radix point is 0 if the value converted is zero and 1 otherwise and the number of hexadecimal digits after it is equal to the precision; if the precision is missing, the number of digits printed after the radix point is 13 for the conversion of a double value, 16 for the conversion of a long double value on x86, and 28 for the conversion of a long double value on SPARC; if the precision is zero and the '#' flag is not specified, no decimal-point character will appear. The letters "abcdef" are used for a conversion and the letters "ABCDEF" for A conver- sion. The A conversion specifier produces a number with 'X' and 'P' instead of 'x' and 'p'. The exponent will always contain at least one digit, and only as many more digits as necessary to represent the decimal exponent of 2. If the value is zero, the exponent is zero. The converted value is rounded to fit the specified output format according to the prevailing floating point rounding direction mode. If the conversion is not exact, an inexact exception is raised. A double argument representing an infinity or NaN is converted in the SUSv3 style of an e or E conversion specifier. I tried to check the relevant sections of the latest C99 and C11 drafts to check if this handling of missing precision is allowed by the standard, but I couldn't even fully parse the language there. I don't have access to Solaris to fully debug and test this there. Would you mind helping with it? Not at all: if it turns out that Solaris has bugs in this area, I can easily file them, too. I think it's actually a defect in the C standard. It doesn't specify how many decimal digits an implementation must produce on output for a plain %a directive (i.e., when precision isn't explicitly specified). With Glibc, for instance, printf("%a", 1.0) prints 0x8p-3 while on Solaris it prints 0x8.00p-3. Both seem reasonable but neither is actually specified. In theory, an implementation is allowed print any number of zeros after the decimal point, which the standard should (IMO) not permit. There should be a cap (e.g., of at most 6 decimal digits when precision is not specified with %a, just like there us with %e). I'll propose to change the standard and forward it to the C committee. Until then, I've worked around it in the patch for pr77735 (under review). If you have a moment and could try it out on Solaris and let me know how it goes I'd be grateful. Thanks Martin
Re: Fix PR tree-optimization/77808, ICE in duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439
On 3 October 2016 at 18:07, Doug Gilmore wrote: >>From: Christophe Lyon [christophe.l...@linaro.org] >>Sent: Monday, October 03, 2016 12:05 AM >>To: Doug Gilmore >>Cc: gcc-patches@gcc.gnu.org >>Subject: Re: Fix PR tree-optimization/77808, ICE in >>duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439 >> >>On 2 October 2016 at 23:05, Doug Gilmore wrote: >>> Hi Christophe, >>> From: Christophe Lyon [christophe.l...@linaro.org] Sent: Saturday, October 01, 2016 7:57 AM To: Doug Gilmore Cc: gcc-patches@gcc.gnu.org Subject: Re: Fix PR tree-optimization/77808, ICE in duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439 Hi Doug, ... I can confirm that your patch fixes the ICE I was seeing. However, the new testcase does not pass on low end architectures: cc1: warning: -fprefetch-loop-arrays not supported for this target (try -march switches) Can you add a guard? Thanks, Christophe >>> I updated the test to only run on X86, MIPS and AARCH64. Is that OK? >>> >> >>I'm afraid not. >> >>The ICE occurred on some arm targets. By "low end" I meant armv5t for >>example, as opposed to armv7t. >>Is there a suitable effective target? > I'll need to investigate that. BTW, gcc.dg/pr53550.c contains: > /* PR tree-optimization/53550 */ > /* { dg-do compile } */ > /* { dg-options "-O2 -fprefetch-loop-arrays -w" } */ > > int * > foo (int *x) > { > int *a = x + 10, *b = x, *c = a; > while (b != c) > *--c = *b++; > return x; > } > > Is it also failing on armv5t? I suppose it would. > It doesn't, but that's probably thanks to -w Christophe > Thanks, > > Doug >> >>Thanks, >> >>Christophe >> >>> Thanks, >>> >>> Doug
RE: Fix PR tree-optimization/77808, ICE in duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439
>From: Christophe Lyon [christophe.l...@linaro.org] >Sent: Monday, October 03, 2016 11:23 AM >To: Doug Gilmore >Cc: gcc-patches@gcc.gnu.org >Subject: Re: Fix PR tree-optimization/77808, ICE in >duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439 > >On 3 October 2016 at 18:07, Doug Gilmore wrote: >>>From: Christophe Lyon [christophe.l...@linaro.org] >>>Sent: Monday, October 03, 2016 12:05 AM >>>To: Doug Gilmore >>>Cc: gcc-patches@gcc.gnu.org >>>Subject: Re: Fix PR tree-optimization/77808, ICE in >>>duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439 >>> >>>On 2 October 2016 at 23:05, Doug Gilmore wrote: Hi Christophe, > From: Christophe Lyon [christophe.l...@linaro.org] > Sent: Saturday, October 01, 2016 7:57 AM > To: Doug Gilmore > Cc: gcc-patches@gcc.gnu.org > Subject: Re: Fix PR tree-optimization/77808, ICE in > duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439 > > Hi Doug, > > ... > I can confirm that your patch fixes the ICE I was seeing. > > However, the new testcase does not pass on low end > architectures: > cc1: warning: -fprefetch-loop-arrays not supported for this target > (try -march switches) > > Can you add a guard? > > Thanks, > > Christophe I updated the test to only run on X86, MIPS and AARCH64. Is that OK? >>> >>>I'm afraid not. >>> >>>The ICE occurred on some arm targets. By "low end" I meant armv5t for >>>example, as opposed to armv7t. >>>Is there a suitable effective target? >> I'll need to investigate that. BTW, gcc.dg/pr53550.c contains: >> /* PR tree-optimization/53550 */ >> /* { dg-do compile } */ >> /* { dg-options "-O2 -fprefetch-loop-arrays -w" } */ >> >> int * >> foo (int *x) >> { >> int *a = x + 10, *b = x, *c = a; >> while (b != c) >> *--c = *b++; >> return x; >> } >> >> Is it also failing on armv5t? I suppose it would. >> >It doesn't, but that's probably thanks to -w Sounds like we don't need add guards then, it is just a matter of adding -w to the command line. Does that work for you? Thanks, Doug > >Christophe > >> Thanks, >> >> Doug >>> >>>Thanks, >>> >>>Christophe >>> Thanks, Doug
libgo patch committed: strip most C macros from runtime.inc
The Go runtime package in libgo is picking up C macros from runtime_sysinfo.go and then re-exporting them to runtime.inc. This can cause name conflicts. Change the Makefile so that we only put the macros we need into runtime.inc. These are the constants that are actually defined by Go code, not runtime_sysinfo.go. There are only a few, so we can pattern match. This is an additional hack on runtime.inc. The long term goal is to convert the runtime package to Go and eliminate runtime.inc entirely, so a few hacks seem acceptable. This fixes GCC PR 77809. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 240667) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -f3fb9bf2d5a009a707962a416fcd1a8435756218 +325f8074c5224ae537f8e00aede5c780b70f914c The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/Makefile.am === --- libgo/Makefile.am (revision 240667) +++ libgo/Makefile.am (working copy) @@ -1284,8 +1284,13 @@ runtime_go_lo_GOCFLAGS = -fgo-c-header=r runtime-go.lo: $(BUILDPACKAGE) runtime.inc: s-runtime-inc; @true -s-runtime-inc: runtime-go.lo - $(SHELL) $(srcdir)/mvifdiff.sh runtime.inc.tmp runtime.inc +s-runtime-inc: runtime-go.lo Makefile + rm -f runtime.inc.tmp2 + grep -v "#define _" runtime.inc.tmp > runtime.inc.tmp2 + for pattern in '_G[a-z]' '_P[a-z]' _Max _Lock _Sig _Trace _MHeap _Num; do \ + grep "#define $$pattern" runtime.inc.tmp >> runtime.inc.tmp2; \ + done + $(SHELL) $(srcdir)/mvifdiff.sh runtime.inc.tmp2 runtime.inc $(STAMP) $@ runtime_check_GOCFLAGS = -fgo-compiling-runtime runtime/check: $(CHECK_DEPS)
[gomp4] gimple prettypint
Committed this port from trunk to gomp4 nathan Index: ChangeLog.gomp === --- ChangeLog.gomp (revision 240724) +++ ChangeLog.gomp (working copy) @@ -1,3 +1,8 @@ +2016-10-03 Nathan Sidwell + + * gimple-pretty-print.c (dump_gimple_call_args): Simplify "' " + printing. + 2016-10-02 Chung-Lin Tang PR fortran/77371 Index: gimple-pretty-print.c === --- gimple-pretty-print.c (revision 240724) +++ gimple-pretty-print.c (working copy) @@ -629,26 +629,21 @@ dump_gimple_call_args (pretty_printer *b { i++; pp_string (buffer, enums[v]); - if (i < gimple_call_num_args (gs)) - pp_string (buffer, ", "); } } } for (; i < gimple_call_num_args (gs); i++) { - dump_generic_node (buffer, gimple_call_arg (gs, i), 0, flags, false); - if (i < gimple_call_num_args (gs) - 1) + if (i) pp_string (buffer, ", "); + dump_generic_node (buffer, gimple_call_arg (gs, i), 0, flags, false); } if (gimple_call_va_arg_pack_p (gs)) { - if (gimple_call_num_args (gs) > 0) -{ - pp_comma (buffer); - pp_space (buffer); -} + if (i) + pp_string (buffer, ", "); pp_string (buffer, "__builtin_va_arg_pack ()"); }
Re: [PATCH, Fortran] Fix ICE due to comparison between UNION components
On Sun, Oct 2, 2016 at 6:27 PM, Fritz Reese wrote: > All, > > The attached fixes an[other] ICE in the comparison between UNIONs. > This time the ICE is due to a BT_UNION component comparing itself to a > BT_DERIVED component, thus considering their FL_STRUCT and FL_UNION > typenodes to be equal. This is very similar to PR fortran/77782, > except it is an error in the comparison of _components_ from > gfc_compare_types, instead of an error comparing the _type symbols_ > from gfc_compare_derived. The patch makes sure that BT_UNION compared > to anything other than BT_UNION is _not_ equal, while still comparing > two union components with gfc_compare_union_types. > > Will commit soon with no complaints. Maybe this patch will finally get > type comparison right for unions. > Meant to include a null-guard as part of the patch, see below. Fritz Reese 2016-10-03 Fritz Reese Fix ICE due to comparison between UNION components. gcc/fortran/ * interface.c (gfc_compare_types): Don't compare BT_UNION components until we know they're both UNIONs. * interface.c (gfc_compare_union_types): Guard against empty components. gcc/testsuite/gfortran.dg/ * dec_union_9.f90, dec_union_10.f90: New testcases. union_ice.patch2 Description: Binary data
Re: [RFC] Extend ipa-bitwise-cp with pointer alignment propagation
On 22 September 2016 at 17:26, Jan Hubicka wrote: >> Hi, >> The attached patch tries to extend ipa bits propagation to handle >> pointer alignment propagation. >> The patch just disables ipa-cp-alignment pass, I suppose we want to >> eventually remove it ? > > Yes, can you please verify that alignments it computes are monotonously > worse than those your new code computes and include the removal in the > next iteration of the patch? >> >> Bootstrap+tested on x86_64-unknown-linux-gnu. >> Cross-tested on arm*-*-*, aarch64*-*-*. >> Does the patch look OK ? >> >> Thanks, >> Prathamesh >> @@ -2258,8 +2271,8 @@ propagate_constants_accross_call (struct cgraph_edge >> *cs) >>&dest_plats->itself); >> ret |= propagate_context_accross_jump_function (cs, jump_func, i, >> &dest_plats->ctxlat); >> - ret |= propagate_alignment_accross_jump_function (cs, jump_func, >> - >> &dest_plats->alignment); >> +// ret |= propagate_alignment_accross_jump_function (cs, jump_func, >> +// >> &dest_plats->alignment); > > obviously we do not want commented out ocde.. > >> ret |= propagate_bits_accross_jump_function (cs, i, jump_func, >> >> &dest_plats->bits_lattice); >> ret |= propagate_aggs_accross_jump_function (cs, jump_func, >> diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c >> index 1629781..5cee27b 100644 >> --- a/gcc/ipa-prop.c >> +++ b/gcc/ipa-prop.c >> @@ -1701,6 +1701,16 @@ ipa_compute_jump_functions_for_edge (struct >> ipa_func_body_info *fbi, >> jfunc->bits.mask = 0; >> } >> } >> + else if (POINTER_TYPE_P (TREE_TYPE (arg))) >> + { >> + unsigned HOST_WIDE_INT bitpos; >> + unsigned align; >> + >> + jfunc->bits.known = true; >> + get_pointer_alignment_1 (arg, &align, &bitpos); >> + jfunc->bits.mask = wi::mask(TYPE_PRECISION (TREE_TYPE >> (arg)), false).and_not (align / BITS_PER_UNIT - 1); > > ... and long lines :) > >> + jfunc->bits.value = bitpos / BITS_PER_UNIT; >> + } >>else >> gcc_assert (!jfunc->bits.known); >> >> @@ -5534,7 +5544,7 @@ ipcp_update_bits (struct cgraph_node *node) >>next_parm = DECL_CHAIN (parm); >> >>if (!bits[i].known >> - || !INTEGRAL_TYPE_P (TREE_TYPE (parm)) >> + || !(INTEGRAL_TYPE_P (TREE_TYPE (parm)) || POINTER_TYPE_P (TREE_TYPE >> (parm))) > > I suppose eventually we may want to enable other types, too. > It does even make sense to propagate this on aggregates, but definitly on > vectors and complex numbers. > > Otherwise the patch seems fine to me (modulo Richard's comments) Hi, Sorry for late response, I was travelling. I tried to verify the alignments are monotonously worse with the attached patch (verify.diff), which asserts that alignment lattice is not better than bits lattice during each propagation step in propagate_constants_accross_call(). Does that look OK ? ipa-cp-alignment has better alignments than ipa-bit-cp in following cases: a) ipa_get_type() returns NULL: ipa-bits-cp sets lattice to bottom if ipa_get_type (param) returns NULL, for instance in case of K&R function, while ipa-cp-alignment doesn't look at param types, and can propagate alignments. The following assert: if (bits_lattice.bottom_p ()) gcc_assert (align_lattice.bottom_p()) triggered for 400.perlbench, 403.gcc, 456.hmmer and 481.wrf due to ipa_get_type() returning NULL. I am not really sure how to handle this case, since we need to know parameter's type during bits propagation for obtaining precision. b) This happens for attached test-case (test.i), which is a reduced (and slightly modified) test-case from 458.sjeng. Bits propagation sets lattice to bottom, while alignment propagation propagates . In bits_lattice::meet_with_1 m_mask = other_mask = 0x0fff0 m_value = 0x7 other_value = 0x8 In this case meet operation sets m_mask to: (m_mask | mask) | (m_value ^ other_value) = 0x0fff0 | (0xf) == 0x0 and hence the bits lattice is set to bottom. I suppose it doesn't matter for this case if bits propagation sets lattice to bottom, since propagating isn't really useful ? The attached patch (alignprop-4.diff) removes ipa-cp-alignment, and checks for misalign against old_misalgin and prints message in the dump file if they mismatch. Testing in progress. Thanks, Prathamesh > Honza diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 77da489..fee530e 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -1910,6 +1910,28 @@ propagate_context_accross_jump_function (cgraph_edge *cs, return ret; } +static void +verify_align_worse_p (ipcp_param_lattices *dest_plats) +{ + ipcp_alignment_lattice align_lattice = dest_plats->alignment; + ipcp_bits_lattice bits_lattice = dest_plat
[gomp4] tile pre patch
I've committed this to gomp4. It gets a few tile-related things out of the way. 1) we were asserting we never saw tile clauses in a few places. That'll change soon, and the processing required of them is nothing, so just accept them. We don't need to gimplify the operands, as they have to be INTEGER_CSTs 2) Broke out OACC_DIM_{SIZE,POS} internal function generation to a worker function, as I need that soon. 3) Remvoe a stale comment and note OACC_DIM_POS's non-constness might be overly conservative. nathan 2016-10-03 Nathan Sidwell * gimplify.c (gimplify_scan_omp_clauses): No special handling for OMP_CLAUSE_TILE. * omp-low.c (scan_sharing_clauses): Allow OMP_CLAUSE_TILE. (expand_oacc_for): Remove out of date note. Fix whitespace. (oacc_dim_call): New. (oacc_thread_numbers): Use it. (oacc_loop_fixed_partitions): Dump partitioning. * tree-nested.c (convert_nonlocal_omp_clauses): Allow OMP_CLAUSE_TILE. * internal-fn.def (GOACC_DIM_POS): Comment may be overly conservative. Index: gimplify.c === --- gimplify.c (revision 240724) +++ gimplify.c (working copy) @@ -7555,16 +7555,6 @@ gimplify_scan_omp_clauses (tree *list_p, remove = true; break; - case OMP_CLAUSE_TILE: - for (tree list = OMP_CLAUSE_TILE_LIST (c); !remove && list; - list = TREE_CHAIN (list)) - { - if (gimplify_expr (&TREE_VALUE (list), pre_p, NULL, - is_gimple_val, fb_rvalue) == GS_ERROR) - remove = true; - } - break; - case OMP_CLAUSE_DEVICE_RESIDENT: remove = true; break; @@ -7573,6 +7563,7 @@ gimplify_scan_omp_clauses (tree *list_p, case OMP_CLAUSE_ORDERED: case OMP_CLAUSE_UNTIED: case OMP_CLAUSE_COLLAPSE: + case OMP_CLAUSE_TILE: case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: case OMP_CLAUSE_INDEPENDENT: Index: internal-fn.def === --- internal-fn.def (revision 240724) +++ internal-fn.def (working copy) @@ -175,7 +175,7 @@ DEF_INTERNAL_FN (UNIQUE, ECF_NOTHROW, NU dimension. DIM_POS is pure (and not const) so that it isn't thought to clobber memory and can be gcse'd within a single parallel region, but not across FORK/JOIN boundaries. They take a - single INTEGER_CST argument. */ + single INTEGER_CST argument. This might be overly conservative. */ DEF_INTERNAL_FN (GOACC_DIM_SIZE, ECF_CONST | ECF_NOTHROW | ECF_LEAF, ".") DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW | ECF_LEAF, ".") Index: omp-low.c === --- omp-low.c (revision 240724) +++ omp-low.c (working copy) @@ -2221,6 +2221,7 @@ scan_sharing_clauses (tree clauses, omp_ case OMP_CLAUSE_INDEPENDENT: case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: + case OMP_CLAUSE_TILE: case OMP_CLAUSE_DEVICE_TYPE: break; @@ -2234,7 +2235,6 @@ scan_sharing_clauses (tree clauses, omp_ case OMP_CLAUSE_BIND: case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_NOHOST: - case OMP_CLAUSE_TILE: case OMP_CLAUSE__CACHE_: default: gcc_unreachable (); @@ -2395,6 +2395,7 @@ scan_sharing_clauses (tree clauses, omp_ case OMP_CLAUSE_INDEPENDENT: case OMP_CLAUSE_AUTO: case OMP_CLAUSE_SEQ: + case OMP_CLAUSE_TILE: case OMP_CLAUSE__GRIDDIM_: case OMP_CLAUSE_DEVICE_TYPE: break; @@ -2402,7 +2403,6 @@ scan_sharing_clauses (tree clauses, omp_ case OMP_CLAUSE_BIND: case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_NOHOST: - case OMP_CLAUSE_TILE: case OMP_CLAUSE__CACHE_: default: gcc_unreachable (); @@ -11244,11 +11244,7 @@ expand_omp_taskloop_for_inner (struct om [incoming] V = B + ((range -/+ 1) / S +/- 1) * S [*] - [*] Needed if V live at end of loop - - Note: CHUNKING & GWV mask are specified explicitly here. This is a - transition, and will be specified by a more general mechanism shortly. - */ + [*] Needed if V live at end of loop. */ static void expand_oacc_for (struct omp_region *region, struct omp_for_data *fd) @@ -11357,7 +11353,6 @@ expand_oacc_for (struct omp_region *regi ass = gimple_build_assign (fd->loop.n2, total); gsi_insert_before (&gsi, ass, GSI_SAME_STMT); } - } tree b = fd->loop.n1; @@ -18906,6 +18901,23 @@ omp_finish_file (void) } } +/* Call dim_pos (POS == true) or dim_size (POS == false) builtins for + axis DIM. Return a tmp var holding the result. */ + +static tree +oacc_dim_call (bool pos, int dim, gimple_seq *seq) +{ + tree arg = build_int_cst (unsigned_type_node, dim); + tree size = create_tmp_var (integer_type_node); + enum internal_fn fn = pos ? IFN_GOACC_DIM_POS : IFN_GOACC_DIM_SIZE; + gimple *call = gimple_build_call_internal (fn, 1, arg); + + gimple_call_set_lhs (call, size); + gimple_seq_add_stmt (seq, call); + + return size; +} + /* Find the number of threads (POS = false), or thread number (POS = true) for an OpenACC region partitioned as MASK. Setup code required
[EVRP] Fold stmts with vrp_fold_stmt
Hi, This patch improves Early VRP by folding stmts using vrp_fold_stmt as it is done in ssa_propagate for VRP. I have also changed EVRP to handle POINTER_TYPE_P. I will send follow up patches to use this in IPA-VRP. Bootstrapped and regression testd on x86_64-linux-gnu with no new regressions. Is this OK for trunk? Thanks, Kugan gcc/testsuite/ChangeLog: 2016-10-03 Kugan Vivekanandarajah * gcc.dg/pr68217.c: Adjust testcase as more cases are now handled in evrp. * gcc.dg/predict-1.c: Likewise. * gcc.dg/predict-9.c: Likewise. * gcc.dg/tree-ssa/pr20318.c: Likewise. * gcc.dg/tree-ssa/pr21001.c: Likewise. * gcc.dg/tree-ssa/pr21090.c: Likewise. * gcc.dg/tree-ssa/pr21294.c: Likewise. * gcc.dg/tree-ssa/pr21559.c: Likewise. * gcc.dg/tree-ssa/pr21563.c: Likewise. * gcc.dg/tree-ssa/pr23744.c: Likewise. * gcc.dg/tree-ssa/pr25382.c: Likewise. * gcc.dg/tree-ssa/pr61839_1.c: Likewise. * gcc.dg/tree-ssa/pr68431.c: Likewise. * gcc.dg/tree-ssa/vrp03.c: Likewise. * gcc.dg/tree-ssa/vrp07.c: Likewise. * gcc.dg/tree-ssa/vrp09.c: Likewise. * gcc.dg/tree-ssa/vrp17.c: Likewise. * gcc.dg/tree-ssa/vrp18.c: Likewise. * gcc.dg/tree-ssa/vrp19.c: Likewise. * gcc.dg/tree-ssa/vrp20.c: Likewise. * gcc.dg/tree-ssa/vrp23.c: Likewise. * gcc.dg/tree-ssa/vrp24.c: Likewise. * gcc.dg/tree-ssa/vrp58.c: Likewise. * gcc.dg/tree-ssa/vrp92.c: Likewise. * gcc.dg/tree-ssa/vrp98.c: Likewise. * gcc.dg/vrp-min-max-1.c: Likewise. gcc/ChangeLog: 2016-10-03 Kugan Vivekanandarajah * tree-vrp.c (evrp_dom_walker::before_dom_children): Handle POINTER_TYPE_P. Also fold stmts with vrp_fold_stmt. >From 4bb16e7d01674461a47e6b6488b04fb1907234ea Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah Date: Mon, 3 Oct 2016 06:12:05 +1100 Subject: [PATCH 1/5] Fold stmts using vrp_fold in evrp --- gcc/testsuite/gcc.dg/pr68217.c| 4 ++-- gcc/testsuite/gcc.dg/predict-1.c | 2 +- gcc/testsuite/gcc.dg/predict-9.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr20318.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/pr21001.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr21090.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/pr21294.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/pr21559.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr21563.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr23744.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/pr25382.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/pr68431.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/vrp03.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/vrp07.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/vrp09.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/vrp17.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/vrp18.c | 4 ++-- gcc/testsuite/gcc.dg/tree-ssa/vrp19.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/vrp20.c | 6 +++--- gcc/testsuite/gcc.dg/tree-ssa/vrp23.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/vrp24.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/vrp58.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/vrp92.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/vrp98.c | 2 +- gcc/testsuite/gcc.dg/vrp-min-max-1.c | 2 +- gcc/tree-vrp.c| 27 --- 27 files changed, 62 insertions(+), 49 deletions(-) diff --git a/gcc/testsuite/gcc.dg/pr68217.c b/gcc/testsuite/gcc.dg/pr68217.c index 426a99a..fbe4627 100644 --- a/gcc/testsuite/gcc.dg/pr68217.c +++ b/gcc/testsuite/gcc.dg/pr68217.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-vrp1" } */ +/* { dg-options "-O2 -fdump-tree-evrp" } */ int foo (void) { @@ -11,4 +11,4 @@ int foo (void) return 0; } -/* { dg-final { scan-tree-dump "\\\[-INF, 0\\\]" "vrp1" } } */ +/* { dg-final { scan-tree-dump "\\\[-INF, 0\\\]" "evrp" } } */ diff --git a/gcc/testsuite/gcc.dg/predict-1.c b/gcc/testsuite/gcc.dg/predict-1.c index 10d62ba..0d14802 100644 --- a/gcc/testsuite/gcc.dg/predict-1.c +++ b/gcc/testsuite/gcc.dg/predict-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-profile_estimate" } */ +/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-profile_estimate" } */ extern int global; diff --git a/gcc/testsuite/gcc.dg/predict-9.c b/gcc/testsuite/gcc.dg/predict-9.c index 196e31c..8833cb3 100644 --- a/gcc/testsuite/gcc.dg/predict-9.c +++ b/gcc/testsuite/gcc.dg/predict-9.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-profile_estimate" } */ +/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-profile_estimate" } */ extern int global; extern int global2; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr20318.c b/gcc/testsuite/gcc.dg/tree-ssa/pr20318.c index 41f569e..11d4f0d 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr20318.c +++ b/gcc/testsuite/g
[PR tree-optimization/71550 ] Drop cached loop iteration information as needed due to threading
As noted in BZs 71550 and 71403 (and possibly others, I'm going to have to do some searching). Jump threading can sometimes fuse two loops, in the process creating an irreducible loop and invalidating the cached iteration information. The no longer valid cached iteration information can result in unrolling doing some unpleasant transformations and generating incorrect code. I believe it was Jan that pointed me at vect_free_loop_info_assumption. It's somewhat poorly named, but does exactly what we need. I look at renaming it and putting it elsewhere, but it seems to straddle scev, vectorization, the generic loop infrastructure, etc. So I just left it as-is. Bootstrapped and regression tested on x86_64-linux-gnu. Also tested by reverting the changes on the trunk which make 71550 latent, adding this patch and verifying 71550 works correctly. Installed on the trunk. Jeff commit 88cd085912752f971aa2e6aee2ed2d05dd2c5ca7 Author: law Date: Mon Oct 3 19:28:24 2016 + PR tree-optimization/71550 PR tree-optimization/71403 * tree-ssa-threadbackward.c: Include tree-vectorizer.h (profitable_jump_thread_path): Also return boolean indicating if the realized path will create an irreducible loop. Remove loop depth tests from 71403. (fsm_find_control_statement_thread_paths): Remove loop depth tests from 71403. If threading will create an irreducible loop, then throw away loop iteration and related information. PR tree-optimization/71550 PR tree-optimization/71403 * gcc.c-torture/execute/pr71550.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@240727 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 7f4e311..ba56e63 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,15 @@ +2016-10-03 Jeff Law + + PR tree-optimization/71550 + PR tree-optimization/71403 + * tree-ssa-threadbackward.c: Include tree-vectorizer.h + (profitable_jump_thread_path): Also return boolean indicating if + the realized path will create an irreducible loop. + Remove loop depth tests from 71403. + (fsm_find_control_statement_thread_paths): Remove loop depth tests + from 71403. If threading will create an irreducible loop, then + throw away loop iteration and related information. + 2016-10-03 Uros Bizjak * configure.ac (strict_warn): Merge -Wmissing-format-attribute and diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index d0ed6a6..9e62464 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2016-09-26 Jeff Law + + PR tree-optimization/71550 + PR tree-optimization/71403 + * gcc.c-torture/execute/pr71550.c: New test. + 2016-10-03 Senthil Kumar Selvaraj * gcc.target/avr/torture/builtins-error.c: Add -ffat-lto-objects diff --git a/gcc/testsuite/gcc.c-torture/execute/pr71550.c b/gcc/testsuite/gcc.c-torture/execute/pr71550.c new file mode 100644 index 000..8d1ecda --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr71550.c @@ -0,0 +1,26 @@ + +extern void exit (int); + +int a = 3, b, c, f, g, h; +unsigned d; +char *e; + +int +main () +{ + for (; a; a--) +{ + int i; + if (h && i) + __builtin_printf ("%d%d", c, f); + i = 0; + for (; i < 2; i++) + if (g) + for (; d < 10; d++) + b = *e; + i = 0; + for (; i < 1; i++) + ; +} + exit (0); +} diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index 6b522ad..fd7d855 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -36,6 +36,7 @@ along with GCC; see the file COPYING3. If not see #include "gimple-ssa.h" #include "tree-phinodes.h" #include "tree-inline.h" +#include "tree-vectorizer.h" static int max_threaded_paths; @@ -110,7 +111,8 @@ fsm_find_thread_path (basic_block start_bb, basic_block end_bb, static edge profitable_jump_thread_path (vec *&path, -basic_block bbi, tree name, tree arg, bool speed_p) +basic_block bbi, tree name, tree arg, bool speed_p, +bool *creates_irreducible_loop) { /* Note BBI is not in the path yet, hence the +1 in the test below to make sure BBI is accounted for in the path length test. */ @@ -296,12 +298,12 @@ profitable_jump_thread_path (vec *&path, return NULL; } - bool creates_irreducible_loop = false; + *creates_irreducible_loop = false; if (threaded_through_latch && loop == taken_edge->dest->loop_father && (determine_bb_domination_status (loop, taken_edge->dest) == DOMST_NONDOMINATING)) -creates_irreducible_loop = true; +*creates_irreducible_loop = true; if (path_crosses_loops) { @@ -343,7 +345,7 @@ profitable_jump_thread_path (vec *&path,
[ipa-cp] add space in dump message
Committed as obvious (r240730). Thanks, Prathamesh 2016-10-03 Prathamesh Kulkarni * ipa-cp.c (propagate_bits_accross_jump_function): Introduce space between callee name and param in dump message in call to fprintf. diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 88baf69..1dc5cb6 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -1775,7 +1775,7 @@ propagate_bits_accross_jump_function (cgraph_edge *cs, int idx, ipa_jump_func *j { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Setting dest_lattice to bottom, because" - "param %i type is NULL for %s\n", idx, + " param %i type is NULL for %s\n", idx, cs->callee->name ()); return dest_lattice->set_to_bottom ();
[ipa-prop] set m_vr and bits to NULL in ipcp_transform_function
Bootstrap+test in progress on x86_64-unknown-linux-gnu. OK to commit if passes ? Thanks, Prathamesh diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index 5ed9bbf..d71ffcf 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -5667,6 +5667,9 @@ ipcp_transform_function (struct cgraph_node *node) fbi.bb_infos.release (); free_dominance_info (CDI_DOMINATORS); (*ipcp_transformations)[node->uid].agg_values = NULL; + (*ipcp_transformations)[node->uid].bits = NULL; + (*ipcp_transformations)[node->uid].m_vr = NULL; + descriptors.release (); if (!something_changed)
[PATCH] fix PR c++/77804 - ICE on placement VLA new
The attached patch removes an assumption from the implementation of the -Wplacement-new warning that the size of the array type enclosed in parentheses and accepted by G++ as an extension is constant. The assumption causes an ICE in 6.2.0 and 7.0. Is the patch good to commit to both 7.0 and the 6 branch? Thanks Martin PR c++/77804 - Internal compiler error on incorrect initialization of new-d array gcc/cp/ChangeLog: 2016-10-03 Martin Sebor PR c++/77804 * init.c (warn_placement_new_too_small): Avoid assuming an array type has a constant size. gcc/testsuite/ChangeLog: 2016-10-03 Martin Sebor PR c++/77804 * g++.dg/warn/Wplacement-new-size-4.C: New test. diff --git a/gcc/cp/init.c b/gcc/cp/init.c index 798de08..30957f1 100644 --- a/gcc/cp/init.c +++ b/gcc/cp/init.c @@ -2504,7 +2504,7 @@ warn_placement_new_too_small (tree type, tree nelts, tree size, tree oper) && warn_placement_new < 2) return; } - + /* The size of the buffer can only be adjusted down but not up. */ gcc_checking_assert (0 <= adjust); @@ -2526,8 +2526,13 @@ warn_placement_new_too_small (tree type, tree nelts, tree size, tree oper) else if (nelts && CONSTANT_CLASS_P (nelts)) bytes_need = tree_to_uhwi (nelts) * tree_to_uhwi (TYPE_SIZE_UNIT (type)); - else + else if (tree_fits_uhwi_p (TYPE_SIZE_UNIT (type))) bytes_need = tree_to_uhwi (TYPE_SIZE_UNIT (type)); + else + { + /* The type is a VLA. */ + return; + } if (bytes_avail < bytes_need) { diff --git a/gcc/testsuite/g++.dg/warn/Wplacement-new-size-4.C b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-4.C new file mode 100644 index 000..da9b1ab --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Wplacement-new-size-4.C @@ -0,0 +1,14 @@ +// PR c++/77804 - Internal compiler error on incorrect initialization of +// new-d array +// { dg-do compile } +// { dg-additional-options "-Wplacement-new -Wvla -Wno-error=vla" } + +void* operator new[] (__SIZE_TYPE__ n, void *p) { return p; } + +int main() +{ +char buf[256]; +unsigned n = 10; +int* p = new (buf) (int[n]); // { dg-warning "non-constant array new length must be specified without parentheses around the type-id" } +// { dg-warning "ISO C\\+\\+ forbids variable length array" "vla warning" { target *-*-* } .-1 } +}
Re: [PATCH] Fix libstdc++ versioned namespace build
On 03/10/16 15:41 +0100, Jonathan Wakely wrote: The versioned namespace build has been broken on all branches for some time. It's due to new code that doesn't use the namespace macros in the right places. This fixes all issues. Rather than declaring the std::experimental::* namespaces in I've added a new file that declares them and is only included by LFTS headers. That allows the new test to pass, which verifies that the std::experimental namespace doesn't exist when no TS headers are included. PR libstdc++/68323 PR libstdc++/77794 * config/abi/pre/gnu-versioned-namespace.ver: Add exports for __cxa_thread_atexit and __gnu_cxx::__freeres. * include/Makefile.am: Add * include/Makefile.in: Regenerate. * include.bits/basic_string.h: Fix nesting of versioned namespaces. * include/bits/c++config: Declare versioned namespaces for literals. * include/bits/regex.h (basic_regex, match_results): Add workarounds for PR c++/59256. * include/bits/uniform_int_dist.h: Fix nesting of versioned namespace. * include/std/chrono: Likewise. * include/std/complex: Likewise. * include/std/string_view: Likewise. * include/std/variant: Likewise. Add workaround for PR c++/59256. * include/experimental/bits/fs_fwd.h: Declare versioned namespace. * include/experimental/bits/lfts_config.h: Declare versioned namespaces. * include/experimental/algorithm: Include . * include/experimental/any: Likewise. * include/experimental/bits/erase_if.h: Likewise. * include/experimental/chrono: Likewise. * include/experimental/functional: Likewise. * include/experimental/memory_resource: Likewise. * include/experimental/optional: Likewise. * include/experimental/propagate_const: Likewise. * include/experimental/random: Likewise. * include/experimental/ratio: Likewise. * include/experimental/system_error: Likewise. * include/experimental/tuple: Likewise. * include/experimental/type_traits: Likewise. * include/experimental/utility: Likewise. * include/experimental/string_view: Likewise. Fix nesting of versioned namespaces. * include/experimental/bits/string_view.tcc: Reopen inline namespace for non-inline function definitions. * testsuite/17_intro/using_namespace_std_exp_neg.cc: New test. * testsuite/20_util/duration/literals/range.cc: Adjust dg-error line. * testsuite/experimental/any/misc/any_cast_neg.cc: Likewise. * testsuite/experimental/propagate_const/assignment/move_neg.cc: Likewise. * testsuite/experimental/propagate_const/cons/move_neg.cc: Likewise. * testsuite/experimental/propagate_const/requirements2.cc: Likewise. * testsuite/experimental/propagate_const/requirements3.cc: Likewise. * testsuite/experimental/propagate_const/requirements4.cc: Likewise. * testsuite/experimental/propagate_const/requirements5.cc: Likewise. * testsuite/ext/profile/mutex_extensions_neg.cc: Likewise. Tested x86_64-linux, with --enable-symvers=gnu-versioned-namespace and --enable-symvers=gnu, on trunk and gcc-6 and gcc-5 branches. The only failures are in synopsis.cc tests which expect to be able to redeclare names in namespace std (which is ambiguous if they're really declared in std::__7) or in tests that use scan-assembler or GDB and the expected strings are different due to the __7 namespace. I will probably add an effective target for the versioned namespace so we can disable those tests when they're going to fail. Committed to trunk and gcc-6 and gcc-5 branches. It appears that I failed to squash my work-in-progress commits on the gcc-6-branch, sorry. The following commits (r240715-r240718) should have been the same commit as r240719, and won't build for the default config (they work for --enable-symvers=gnu-versioned-namespace, but commit r240719 has a fix to make it also build for --enable-symvers=gnu, which is why it all needed to be squashed into a single commit. Apologies for messing up the branch. commit b78f70a1eeac5da59f977fa58c332490e05c14b5 Author: redi Date: Mon Oct 3 14:36:18 2016 + add exports git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-6-branch@240718 138bc75d-0d04-0410-961f-82ee72b054a4 commit e5e83353c99c7cabf8d7cdcdf7ee750e5ec7f129 Author: redi Date: Mon Oct 3 14:36:13 2016 + Fix misuse of versioned namespace for LFTS git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-6-branch@240717 138bc75d-0d04-0410-961f-82ee72b054a4 commit a2121229a7e8405c7722a8b872d6ad8981c4b2d4 Author: redi Date: Mon Oct 3 14:36:06 2016 + Declare inline namespaces for filesystem git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-6-branch@240716 138bc75d-0d04-0410-961f-82ee72b054a4 commit 0215eefdfd5d872
[gomp4] map the '*' tile argument onto integer_zero_node in fortran
As the subject states, this patch maps the '*' tile clause arguments onto integer_zero_node. Before the fortran FE was using mapping it onto -1. This patch should make the clause parsing in fortran on par with the C and C++ FEs. This patch has been applied to gomp-4_0-branch. Cesar 2016-10-03 Cesar Philippidis gcc/fortran/ * openmp.c (resolve_oacc_loop_blocks): Use integer zero to represent the '*' tile argument. gcc/testsuite/ * gfortran.dg/goacc/tile-lowering.f95: New test. diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 399b5d1..df489ba 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -5183,11 +5183,11 @@ resolve_oacc_loop_blocks (gfc_code *code) if (el->expr == NULL) { /* NULL expressions are used to represent '*' arguments. - Convert those to a -1 expressions. */ + Convert those to a 0 expressions. */ el->expr = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind, &code->loc); - mpz_set_si (el->expr->value.integer, -1); + mpz_set_si (el->expr->value.integer, 0); } else { diff --git a/gcc/testsuite/gfortran.dg/goacc/tile-lowering.f95 b/gcc/testsuite/gfortran.dg/goacc/tile-lowering.f95 new file mode 100644 index 000..b36cdc7 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/tile-lowering.f95 @@ -0,0 +1,85 @@ +! { dg-do compile } +! { dg-additional-options "-fdump-tree-original" } + +subroutine test + integer i, j, k + + !$acc parallel + !$acc loop tile (1) + do i = 1, 10 + end do + + !$acc loop tile (*) + do i = 1, 10 + end do + + !$acc loop tile (1,2) + do i = 1, 10 + do j = 1, 10 + end do + end do + + !$acc loop tile (*,2) + do i = 1, 10 + do j = 1, 10 + end do + end do + + !$acc loop tile (1,*) + do i = 1, 10 + do j = 1, 10 + end do + end do + + !$acc loop tile (*,*) + do i = 1, 10 + do j = 1, 10 + end do + end do + + + !$acc loop tile (1,2,3) + do i = 1, 10 + do j = 1, 10 +do k = 1, 10 +end do + end do + end do + + !$acc loop tile (*,2,3) + do i = 1, 10 + do j = 1, 10 +do k = 1, 10 +end do + end do + end do + + !$acc loop tile (1,*,3) + do i = 1, 10 + do j = 1, 10 +do k = 1, 10 +end do + end do + end do + + !$acc loop tile (1,2,*) + do i = 1, 10 + do j = 1, 10 +do k = 1, 10 +end do + end do + end do + !$acc end parallel +end subroutine test + +! { dg-final { scan-tree-dump-times "tile\\(1\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "tile\\(0\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "tile\\(1, 2\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "tile\\(0, 2\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "tile\\(1, 0\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "tile\\(0, 0\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "tile\\(1, 2, 3\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "tile\\(0, 2, 3\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "tile\\(1, 0, 3\\)" 1 "original" } } +! { dg-final { scan-tree-dump-times "tile\\(1, 2, 0\\)" 1 "original" } } +
Re: Re: [PATCH, OpenACC, Fortran] Fix PR77371, ICE on allocatable
On 10/03/2016 07:59 AM, Jakub Jelinek wrote: > with -fopenmp. The var is actually properly allocatable in the latter case, > while it is not with your patch on the first testcase, you just copy over the > host pointer, that > is definitely not going to work on non-shared memory offloading. I think that's the expected behavior in OpenACC. Basically, unless pointers have explicit data clauses with subarray arguments, the compiler is supposed to treat those pointers as "scalars" and not remap the contents of the pointer. Chung-Lin, maybe this issue with allocatable data along with a different void will persuade the OpenACC technical committee update the implicit data mapping behavior of pointers. Can you raise this issue with the OpenACC technical committee? > There is nothing special about references that use POINTER_TYPE as opposed > to REFERENCE_TYPE. So, please first get this working with firstprivate on > allocatables and only then start to play with reductions. I agree something like that would be better. Is OpenMP supposed to implicitly map the allocated data on the accelerator too? Cesar
Re: [PATCH] Delete GCJ
On 05.09.2016 17:13, Andrew Haley wrote: > As discussed. I think I should ask a Global reviewer to approve this > one. For obvious reasons I haven't included the diffs to the deleted > gcc/java and libjava directories. The whole tree, post GCJ-deletion, > is at svn+ssh://gcc.gnu.org/svn/gcc/branches/gcj/gcj-deletion-branch > if anyone would like to try it. > > Andrew. > still breaks bootstraps when configured with --enable-objc-gc. the immediate step should be to fix the bootstrap failure, as an additional step to remove boehm-gc from the gcc sources and be able to use an external boehm-gc. Thanks, Matthias
[PATCH] define TARGET_PRINTF_POINTER_FORMAT for powerpc-linux (77837)
The attached patch adds definitions of TARGET_PRINTF_POINTER_FORMAT to the rs6000 pair of linux.h and linux64.h headers, analogous to the config/linux.h header. This appears to be necessary since unlike most other back ends, the rs6000 back end doesn't include the latter linux.h. The patch fixes bug 77837 - missing -Wformat-length warning for %p with null argument on powerpc64. Thanks Martin PR target/77837 - missing -Wformat-length warning for %p with null argument on powerpc64 gcc/ChangeLog: 2016-10-03 Martin Sebor PR target/77837 * config/rs6000/linux.h (TARGET_PRINTF_POINTER_FORMAT): Define. * config/rs6000/linux64.h (TARGET_PRINTF_POINTER_FORMAT): Likewise. diff --git a/gcc/config/rs6000/linux.h b/gcc/config/rs6000/linux.h index ac9296d..e70fa02 100644 --- a/gcc/config/rs6000/linux.h +++ b/gcc/config/rs6000/linux.h @@ -138,3 +138,7 @@ || (TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 19) #define RS6000_GLIBC_ATOMIC_FENV 1 #endif + +/* The format string to which "%p" corresponds. */ +#undef TARGET_PRINTF_POINTER_FORMAT +#define TARGET_PRINTF_POINTER_FORMAT gnu_libc_printf_pointer_format diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h index e86b5d5..fa75bce 100644 --- a/gcc/config/rs6000/linux64.h +++ b/gcc/config/rs6000/linux64.h @@ -634,3 +634,7 @@ extern int dot_symbols; || (TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 19) #define RS6000_GLIBC_ATOMIC_FENV 1 #endif + +/* The format string to which "%p" corresponds. */ +#undef TARGET_PRINTF_POINTER_FORMAT +#define TARGET_PRINTF_POINTER_FORMAT gnu_libc_printf_pointer_format
Re: [PATCH] define TARGET_PRINTF_POINTER_FORMAT for powerpc-linux (77837)
On Mon, Oct 03, 2016 at 05:30:35PM -0600, Martin Sebor wrote: > The attached patch adds definitions of TARGET_PRINTF_POINTER_FORMAT > to the rs6000 pair of linux.h and linux64.h headers, analogous to > the config/linux.h header. This appears to be necessary since > unlike most other back ends, the rs6000 back end doesn't include > the latter linux.h. > > The patch fixes bug 77837 - missing -Wformat-length warning for %p > with null argument on powerpc64. > > Thanks > Martin > PR target/77837 - missing -Wformat-length warning for %p with null argument > on powerpc64 > > gcc/ChangeLog: > 2016-10-03 Martin Sebor > > PR target/77837 > * config/rs6000/linux.h (TARGET_PRINTF_POINTER_FORMAT): Define. > * config/rs6000/linux64.h (TARGET_PRINTF_POINTER_FORMAT): Likewise. Okay for trunk, thanks! Segher p.s. You forgot to cc: the maintainers, and the email subject doesn't start with "rs6000" or similar either, I found this mail by accident...
Re: [PATCH] define TARGET_PRINTF_POINTER_FORMAT for powerpc-linux (77837)
On 10/03/2016 07:10 PM, Segher Boessenkool wrote: On Mon, Oct 03, 2016 at 05:30:35PM -0600, Martin Sebor wrote: The attached patch adds definitions of TARGET_PRINTF_POINTER_FORMAT to the rs6000 pair of linux.h and linux64.h headers, analogous to the config/linux.h header. This appears to be necessary since unlike most other back ends, the rs6000 back end doesn't include the latter linux.h. The patch fixes bug 77837 - missing -Wformat-length warning for %p with null argument on powerpc64. Thanks Martin PR target/77837 - missing -Wformat-length warning for %p with null argument on powerpc64 gcc/ChangeLog: 2016-10-03 Martin Sebor PR target/77837 * config/rs6000/linux.h (TARGET_PRINTF_POINTER_FORMAT): Define. * config/rs6000/linux64.h (TARGET_PRINTF_POINTER_FORMAT): Likewise. Okay for trunk, thanks! Segher p.s. You forgot to cc: the maintainers, and the email subject doesn't start with "rs6000" or similar either, I found this mail by accident... Thanks and my bad. I thought Bill was one of them/you. I'll remember to CC you and David in the future. Better yet, with my increasingly volatile memory, I might write a script to do it for me. Martin
Re: Fix PR tree-optimization/77808, ICE in duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439
On 3 October 2016 at 20:36, Doug Gilmore wrote: >>From: Christophe Lyon [christophe.l...@linaro.org] >>Sent: Monday, October 03, 2016 11:23 AM >>To: Doug Gilmore >>Cc: gcc-patches@gcc.gnu.org >>Subject: Re: Fix PR tree-optimization/77808, ICE in >>duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439 >> >>On 3 October 2016 at 18:07, Doug Gilmore wrote: From: Christophe Lyon [christophe.l...@linaro.org] Sent: Monday, October 03, 2016 12:05 AM To: Doug Gilmore Cc: gcc-patches@gcc.gnu.org Subject: Re: Fix PR tree-optimization/77808, ICE in duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439 On 2 October 2016 at 23:05, Doug Gilmore wrote: > Hi Christophe, > >> From: Christophe Lyon [christophe.l...@linaro.org] >> Sent: Saturday, October 01, 2016 7:57 AM >> To: Doug Gilmore >> Cc: gcc-patches@gcc.gnu.org >> Subject: Re: Fix PR tree-optimization/77808, ICE in >> duplicate_ssa_name_ptr_info, at tree-ssanames.c:630 starting with r240439 >> >> Hi Doug, >> >> ... >> I can confirm that your patch fixes the ICE I was seeing. >> >> However, the new testcase does not pass on low end >> architectures: >> cc1: warning: -fprefetch-loop-arrays not supported for this target >> (try -march switches) >> >> Can you add a guard? >> >> Thanks, >> >> Christophe > I updated the test to only run on X86, MIPS and AARCH64. Is that OK? > I'm afraid not. The ICE occurred on some arm targets. By "low end" I meant armv5t for example, as opposed to armv7t. Is there a suitable effective target? >>> I'll need to investigate that. BTW, gcc.dg/pr53550.c contains: >>> /* PR tree-optimization/53550 */ >>> /* { dg-do compile } */ >>> /* { dg-options "-O2 -fprefetch-loop-arrays -w" } */ >>> >>> int * >>> foo (int *x) >>> { >>> int *a = x + 10, *b = x, *c = a; >>> while (b != c) >>> *--c = *b++; >>> return x; >>> } >>> >>> Is it also failing on armv5t? I suppose it would. >>> >>It doesn't, but that's probably thanks to -w > Sounds like we don't need add guards then, it is just a matter > of adding -w to the command line. > > Does that work for you? > Yes, it does, I verified all the configurations I normally validate. Adding "-w" to the testcase does the trick. Thanks, Christophe > Thanks, > > Doug >> >>Christophe >> >>> Thanks, >>> >>> Doug Thanks, Christophe > Thanks, > > Doug