[PATCH] Fix PR78189
The following fixes an oversight when computing alignment in the vectorizer. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2016-11-07 Richard Biener PR tree-optimization/78189 * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix alignment computation. * g++.dg/torture/pr78189.C: New testcase. Index: gcc/testsuite/g++.dg/torture/pr78189.C === --- gcc/testsuite/g++.dg/torture/pr78189.C (revision 0) +++ gcc/testsuite/g++.dg/torture/pr78189.C (working copy) @@ -0,0 +1,41 @@ +/* { dg-do run } */ +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" } */ + +#include + +struct A +{ + void * a; + void * b; +}; + +struct alignas(16) B +{ + void * pad; + void * misaligned; + void * pad2; + + A a; + + void Null(); +}; + +void B::Null() +{ + a.a = nullptr; + a.b = nullptr; +} + +void __attribute__((noinline,noclone)) +NullB(void * misalignedPtr) +{ + B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) - offsetof(B, misaligned)); + b->Null(); +} + +int main() +{ + B b; + NullB(&b.misaligned); + return 0; +} diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 9346cfe..b03cb1e 100644 --- gcc/tree-vect-data-refs.c +++ gcc/tree-vect-data-refs.c @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct data_reference *dr) base = ref; while (handled_component_p (base)) base = TREE_OPERAND (base, 0); + unsigned int base_alignment; + unsigned HOST_WIDE_INT base_bitpos; + get_object_alignment_1 (base, &base_alignment, &base_bitpos); + /* As data-ref analysis strips the MEM_REF down to its base operand + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to + adjust things to make base_alignment valid as the alignment of + DR_BASE_ADDRESS. */ if (TREE_CODE (base) == MEM_REF) -base = build2 (MEM_REF, TREE_TYPE (base), base_addr, - build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 0)); - unsigned int base_alignment = get_object_alignment (base); +{ + base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; + base_bitpos &= (base_alignment - 1); +} + if (base_bitpos != 0) +base_alignment = base_bitpos & -base_bitpos; + /* Also look at the alignment of the base address DR analysis + computed. */ + unsigned int base_addr_alignment = get_pointer_alignment (base_addr); + if (base_addr_alignment > base_alignment) +base_alignment = base_addr_alignment; if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) DR_VECT_AUX (dr)->base_element_aligned = true;
[PATCH 0/7] Libsanitizer merge from upstream r285547.
Hi, this patch set performs libsanitizer merge from upstream. Patch 1 is the library merge itself. Patch 2 is the reapplied change for SPARC by David S. Miller. Patch 3 changes heuristic for extracting last PC from stack frame for ARM in fast unwind routine. More details can be found here (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61771). Patch 4 replaces Jakub's fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63888 and removes CheckODRViolationViaPoisoning call from RegisterGlobal to avoid false positive odr violation reports. Patch 5 combines necessary compiler changes. Patch 6 adds several new tests, backported from upstream. Patch 7 adds support for ASan odr indicators at compiler side. The whole patch set was regtested/bootstrapped/ASan bootstrapped on x86_64-unknown-linux-gnu and i386-unknown-linux-gnu. Also, passed regression tests on arm-linux-gnueabi and aarch64-linux under QEMU. -Maxim
[PATCH 2/7] Libsanitizer merge from upstream r285547.
This is just reapplied patch for SPARC by David S. Miller. From 0ff8d1c408b076970c323361922c35033aaae245 Mon Sep 17 00:00:00 2001 From: Maxim Ostapenko Date: Tue, 25 Oct 2016 20:00:43 +0300 Subject: [PATCH 2/7] libsanitizer/ PR sanitizer/63958 Reapply: 2014-10-14 David S. Miller * sanitizer_common/sanitizer_platform_limits_linux.cc (time_t): Define at __kernel_time_t, as needed for sparc. (struct __old_kernel_stat): Don't check if __sparc__ is defined. * libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h (__sanitizer): Define struct___old_kernel_stat_sz, struct_kernel_stat_sz, and struct_kernel_stat64_sz for sparc. (__sanitizer_ipc_perm): Adjust for sparc targets. (__sanitizer_shmid_ds): Likewsie. (__sanitizer_sigaction): Likewise. (IOC_SIZE): Likewsie. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@229113 138bc75d-0d04-0410-961f-82ee72b054a4 --- libsanitizer/ChangeLog | 17 +++ .../sanitizer_platform_limits_linux.cc | 4 +- .../sanitizer_platform_limits_posix.h | 59 +- 3 files changed, 77 insertions(+), 3 deletions(-) diff --git a/libsanitizer/ChangeLog b/libsanitizer/ChangeLog index eaf907c..10b1207 100644 --- a/libsanitizer/ChangeLog +++ b/libsanitizer/ChangeLog @@ -1,5 +1,22 @@ 2016-11-07 Maxim Ostapenko + PR sanitizer/63958 + Reapply: + 2014-10-14 David S. Miller + + * sanitizer_common/sanitizer_platform_limits_linux.cc (time_t): + Define at __kernel_time_t, as needed for sparc. + (struct __old_kernel_stat): Don't check if __sparc__ is defined. + * libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h + (__sanitizer): Define struct___old_kernel_stat_sz, + struct_kernel_stat_sz, and struct_kernel_stat64_sz for sparc. + (__sanitizer_ipc_perm): Adjust for sparc targets. + (__sanitizer_shmid_ds): Likewsie. + (__sanitizer_sigaction): Likewise. + (IOC_SIZE): Likewsie. + +2016-11-07 Maxim Ostapenko + * All source files: Merge from upstream 285547. * configure.tgt (SANITIZER_COMMON_TARGET_DEPENDENT_OBJECTS): New variable. diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc b/libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc index edc6730..23a0148 100644 --- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc +++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc @@ -36,6 +36,7 @@ #define uid_t __kernel_uid_t #define gid_t __kernel_gid_t #define off_t __kernel_off_t +#define time_t __kernel_time_t // This header seems to contain the definitions of _kernel_ stat* structs. #include #undef ino_t @@ -62,7 +63,8 @@ namespace __sanitizer { } // namespace __sanitizer #if !defined(__powerpc64__) && !defined(__x86_64__) && !defined(__aarch64__)\ -&& !defined(__mips__) && !defined(__s390__) +&& !defined(__mips__) && !defined(__s390__)\ +&& !defined(__sparc__) COMPILER_CHECK(struct___old_kernel_stat_sz == sizeof(struct __old_kernel_stat)); #endif diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h index 17906d3..d1a3051 100644 --- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h +++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h @@ -85,6 +85,14 @@ namespace __sanitizer { #elif defined(__s390x__) const unsigned struct_kernel_stat_sz = 144; const unsigned struct_kernel_stat64_sz = 0; +#elif defined(__sparc__) && defined(__arch64__) + const unsigned struct___old_kernel_stat_sz = 0; + const unsigned struct_kernel_stat_sz = 104; + const unsigned struct_kernel_stat64_sz = 144; +#elif defined(__sparc__) && !defined(__arch64__) + const unsigned struct___old_kernel_stat_sz = 0; + const unsigned struct_kernel_stat_sz = 64; + const unsigned struct_kernel_stat64_sz = 104; #endif struct __sanitizer_perf_event_attr { unsigned type; @@ -107,7 +115,7 @@ namespace __sanitizer { #if defined(__powerpc64__) || defined(__s390__) const unsigned struct___old_kernel_stat_sz = 0; -#else +#elif !defined(__sparc__) const unsigned struct___old_kernel_stat_sz = 32; #endif @@ -198,6 +206,18 @@ namespace __sanitizer { unsigned short __pad1; unsigned long __unused1; unsigned long __unused2; +#elif defined(__sparc__) +# if defined(__arch64__) +unsigned mode; +unsigned short __pad1; +# else +unsigned short __pad1; +unsigned short mode; +unsigned short __pad2; +# endif +unsigned short __seq; +unsigned long long __unused1; +unsigned long long __unused2; #else unsigned short mode; unsigned short __pad1; @@ -215,6 +235,26 @@ namespace __sanitizer { struct __sanitizer_shmid_ds { __sanitizer_ipc_perm shm_perm; + #if defined(__sparc__) + # if !defined(__arch64__) +u32 __pad1; + # endif +long shm_atime; + # if !defi
[PATCH 3/7] Libsanitizer merge from upstream r285547.
This patch adjusts the fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61771 to extract the last PC from the stack frame if no valid FP is available for ARM. From 6dc6e4f761080cf19a161fb0e27c1fd584688f40 Mon Sep 17 00:00:00 2001 From: Maxim Ostapenko Date: Tue, 25 Oct 2016 20:27:37 +0300 Subject: [PATCH 3/7] libsanitizer/ * sanitizer_common/sanitizer_stacktrace.cc (GetCanonicFrame): Assume we compiled code with GCC when extracting the caller PC for ARM if no valid frame pointer is available. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@229115 138bc75d-0d04-0410-961f-82ee72b054a4 --- libsanitizer/ChangeLog| 6 ++ libsanitizer/sanitizer_common/sanitizer_stacktrace.cc | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/libsanitizer/ChangeLog b/libsanitizer/ChangeLog index 10b1207..7e4f89f 100644 --- a/libsanitizer/ChangeLog +++ b/libsanitizer/ChangeLog @@ -1,5 +1,11 @@ 2016-11-07 Maxim Ostapenko + * sanitizer_common/sanitizer_stacktrace.cc (GetCanonicFrame): Assume we + compiled code with GCC when extracting the caller PC for ARM if no + valid frame pointer is available. + +2016-11-07 Maxim Ostapenko + PR sanitizer/63958 Reapply: 2014-10-14 David S. Miller diff --git a/libsanitizer/sanitizer_common/sanitizer_stacktrace.cc b/libsanitizer/sanitizer_common/sanitizer_stacktrace.cc index 531f256..cbb3af2 100644 --- a/libsanitizer/sanitizer_common/sanitizer_stacktrace.cc +++ b/libsanitizer/sanitizer_common/sanitizer_stacktrace.cc @@ -55,8 +55,8 @@ static inline uhwptr *GetCanonicFrame(uptr bp, // Nope, this does not look right either. This means the frame after next does // not have a valid frame pointer, but we can still extract the caller PC. // Unfortunately, there is no way to decide between GCC and LLVM frame - // layouts. Assume LLVM. - return bp_prev; + // layouts. Assume GCC. + return bp_prev - 1; #else return (uhwptr*)bp; #endif -- 1.9.1
[PATCH 4/7] Libsanitizer merge from upstream r285547.
This is rewritten Jakub's fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63888. Upstream now supports new approach for ODR violation detection: compiler emits new __odr_asan_XXX symbol for each instrumented global that indicates whether this global was already registered and the library checks this indicator symbol at runtime. However, to preserve compatibility, the library still can fall to old, incompatible with GCC approach of ODR violation detection (say, when the odr indicator symbol wasn't emitted e.g. for static variable, libasan tries the old method). To avoid this, this patch removes CheckODRViolationViaPoisoning call and leaves only CheckODRViolationViaIndicator. From 5cd9a7cb1c2dd668e533bee1bc15e367d367d84f Mon Sep 17 00:00:00 2001 From: Maxim Ostapenko Date: Fri, 28 Oct 2016 10:22:35 +0300 Subject: [PATCH 4/7] libsanitizer/ * asan/asan_globals.cc (RegisterGlobal): Do not call CheckODRViolationViaPoisoning. (CheckODRViolationViaPoisoning): Remove. --- libsanitizer/ChangeLog| 6 ++ libsanitizer/asan/asan_globals.cc | 19 --- 2 files changed, 6 insertions(+), 19 deletions(-) diff --git a/libsanitizer/ChangeLog b/libsanitizer/ChangeLog index 7e4f89f..d439f45 100644 --- a/libsanitizer/ChangeLog +++ b/libsanitizer/ChangeLog @@ -1,5 +1,11 @@ 2016-11-07 Maxim Ostapenko + * asan/asan_globals.cc (RegisterGlobal): Do not call + CheckODRViolationViaPoisoning. + (CheckODRViolationViaPoisoning): Remove. + +2016-11-07 Maxim Ostapenko + * sanitizer_common/sanitizer_stacktrace.cc (GetCanonicFrame): Assume we compiled code with GCC when extracting the caller PC for ARM if no valid frame pointer is available. diff --git a/libsanitizer/asan/asan_globals.cc b/libsanitizer/asan/asan_globals.cc index 007fce72..f229292 100644 --- a/libsanitizer/asan/asan_globals.cc +++ b/libsanitizer/asan/asan_globals.cc @@ -147,23 +147,6 @@ static void CheckODRViolationViaIndicator(const Global *g) { } } -// Check ODR violation for given global G by checking if it's already poisoned. -// We use this method in case compiler doesn't use private aliases for global -// variables. -static void CheckODRViolationViaPoisoning(const Global *g) { - if (__asan_region_is_poisoned(g->beg, g->size_with_redzone)) { -// This check may not be enough: if the first global is much larger -// the entire redzone of the second global may be within the first global. -for (ListOfGlobals *l = list_of_all_globals; l; l = l->next) { - if (g->beg == l->g->beg && - (flags()->detect_odr_violation >= 2 || g->size != l->g->size) && - !IsODRViolationSuppressed(g->name)) -ReportODRViolation(g, FindRegistrationSite(g), - l->g, FindRegistrationSite(l->g)); -} - } -} - // Clang provides two different ways for global variables protection: // it can poison the global itself or its private alias. In former // case we may poison same symbol multiple times, that can help us to @@ -211,8 +194,6 @@ static void RegisterGlobal(const Global *g) { // where two globals with the same name are defined in different modules. if (UseODRIndicator(g)) CheckODRViolationViaIndicator(g); -else - CheckODRViolationViaPoisoning(g); } if (CanPoisonMemory()) PoisonRedZones(*g); -- 1.9.1
[PATCH 5/7] Libsanitizer merge from upstream r285547.
This patch just combines minimal necessary changes to support new libasan ABI. This patch doesn't try to implement odr indicators at compiler part, it simply pass a zero stub to runtime. The actual implementation of odr indicators goes in patch 7. From 33f6f98faa86c61b9895db0d71e0e88a9ae4fa59 Mon Sep 17 00:00:00 2001 From: Maxim Ostapenko Date: Tue, 25 Oct 2016 20:34:23 +0300 Subject: [PATCH 5/7] libsanitizer merge from upstream r285547, compiler part. gcc/ * asan.h (ASAN_STACK_MAGIC_PARTIAL): Remove. * asan.c (ASAN_STACK_MAGIC_PARTIAL): Replace with ASAN_STACK_MAGIC_MIDDLE. (asan_global_struct): Increase the size of fields. (asan_add_global): Add new field constructor. * sanitizer.def (__asan_version_mismatch_check_v6): Replace with __asan_version_mismatch_check_v8. gcc/testsuite/ * c-c++-common/asan/null-deref-1.c: Adjust testcase. --- gcc/ChangeLog | 10 ++ gcc/asan.c | 13 - gcc/asan.h | 1 - gcc/sanitizer.def | 2 +- gcc/testsuite/ChangeLog| 4 gcc/testsuite/c-c++-common/asan/null-deref-1.c | 4 ++-- 6 files changed, 25 insertions(+), 9 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index f29b9b5..943e21c 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,13 @@ +2016-11-07 Maxim Ostapenko + + * asan.h (ASAN_STACK_MAGIC_PARTIAL): Remove. + * asan.c (ASAN_STACK_MAGIC_PARTIAL): Replace with + ASAN_STACK_MAGIC_MIDDLE. + (asan_global_struct): Increase the size of fields. + (asan_add_global): Add new field constructor. + * sanitizer.def (__asan_version_mismatch_check_v6): Replace with + __asan_version_mismatch_check_v8. + 2016-10-30 Bill Schmidt PR tree-optimization/71915 diff --git a/gcc/asan.c b/gcc/asan.c index c6d9240..fdc84bd 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -1214,7 +1214,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, shadow_bytes[i] = offset - aoff; } else - shadow_bytes[i] = ASAN_STACK_MAGIC_PARTIAL; + shadow_bytes[i] = ASAN_STACK_MAGIC_MIDDLE; emit_move_insn (shadow_mem, asan_shadow_cst (shadow_bytes)); offset = aoff; } @@ -2191,19 +2191,20 @@ asan_dynamic_init_call (bool after_p) const void *__module_name; uptr __has_dynamic_init; __asan_global_source_location *__location; + char *__odr_indicator; } type. */ static tree asan_global_struct (void) { - static const char *field_names[7] + static const char *field_names[8] = { "__beg", "__size", "__size_with_redzone", - "__name", "__module_name", "__has_dynamic_init", "__location"}; - tree fields[7], ret; + "__name", "__module_name", "__has_dynamic_init", "__location", "__odr_indicator"}; + tree fields[8], ret; int i; ret = make_node (RECORD_TYPE); - for (i = 0; i < 7; i++) + for (i = 0; i < 8; i++) { fields[i] = build_decl (UNKNOWN_LOCATION, FIELD_DECL, @@ -2312,6 +2313,8 @@ asan_add_global (tree decl, tree type, vec *v) else locptr = build_int_cst (uptr, 0); CONSTRUCTOR_APPEND_ELT (vinner, NULL_TREE, locptr); + /* TODO: support ODR indicators. */ + CONSTRUCTOR_APPEND_ELT(vinner, NULL_TREE, build_int_cst (uptr, 0)); init = build_constructor (type, vinner); CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init); } diff --git a/gcc/asan.h b/gcc/asan.h index 7ec693f..a259b1a 100644 --- a/gcc/asan.h +++ b/gcc/asan.h @@ -53,7 +53,6 @@ extern alias_set_type asan_shadow_set; #define ASAN_STACK_MAGIC_LEFT 0xf1 #define ASAN_STACK_MAGIC_MIDDLE 0xf2 #define ASAN_STACK_MAGIC_RIGHT 0xf3 -#define ASAN_STACK_MAGIC_PARTIAL 0xf4 #define ASAN_STACK_MAGIC_USE_AFTER_RET 0xf5 #define ASAN_STACK_FRAME_MAGIC 0x41b58ab3 diff --git a/gcc/sanitizer.def b/gcc/sanitizer.def index 303c1e4..ac85096 100644 --- a/gcc/sanitizer.def +++ b/gcc/sanitizer.def @@ -34,7 +34,7 @@ DEF_BUILTIN_STUB(BEGIN_SANITIZER_BUILTINS, (const char *)0) DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_INIT, "__asan_init", BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_VERSION_MISMATCH_CHECK, - "__asan_version_mismatch_check_v6", + "__asan_version_mismatch_check_v8", BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) /* Do not reorder the BUILT_IN_ASAN_{REPORT,CHECK}* builtins, e.g. cfgcleanup.c relies on this order. */ diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 051ae83..49fab6e 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2016-11-07 Maxim Ostapenko + + * c-c++-common/asan/null-deref-1.c: Adjust testcase. + 2016-10-30 Bill Schmidt PR tree-optimization/71915 diff --git a/gcc/testsuite/c-c++-common/asan/null-deref-1.c b/gcc/testsuite/c-c++-common/asan/null-deref-1.c index 45d35ac..f4f8f37 100644 --- a/gcc/testsuite/c-c++-common/asan/null-deref-1.c +++ b/gcc/testsuite/c-c++-common/asan/null-deref-1.c @@ -17,6
[PATCH 6/7] Libsanitizer merge from upstream r285547.
This patch just adds several tests backported from upstream. From b4677ed64e7aee1af7772750e0b18ed8271f4757 Mon Sep 17 00:00:00 2001 From: Maxim Ostapenko Date: Tue, 1 Nov 2016 16:52:13 +0300 Subject: [PATCH 6/7] Backport several testcases for ASan from upstream. gcc/ * asan.h (asan_intercepted_p): Handle BUILT_IN_STRCSPN, BUILT_IN_STRPBRK, BUILT_IN_STRSPN and BUILT_IN_STRSTR. gcc/testsuite/ * c-c++-common/asan/default_options.h: New file. * c-c++-common/asan/strcasestr-1.c: New test. * c-c++-common/asan/strcasestr-2.c: Likewise. * c-c++-common/asan/strcspn-1.c: Likewise. * c-c++-common/asan/strcspn-2.c: Likewise. * c-c++-common/asan/strpbrk-1.c: Likewise. * c-c++-common/asan/strpbrk-2.c: Likewise. * c-c++-common/asan/strspn-1.c: Likewise. * c-c++-common/asan/strspn-2.c: Likewise. * c-c++-common/asan/strstr-1.c: Likewise. * c-c++-common/asan/strstr-2.c: Likewise. * c-c++-common/asan/halt_on_error_suppress_equal_pcs-1.c: Likewise. --- gcc/ChangeLog | 5 +++ gcc/asan.h | 4 +++ gcc/testsuite/ChangeLog| 15 + gcc/testsuite/c-c++-common/asan/default_options.h | 9 + .../asan/halt_on_error_suppress_equal_pcs-1.c | 38 ++ gcc/testsuite/c-c++-common/asan/strcasestr-1.c | 32 ++ gcc/testsuite/c-c++-common/asan/strcasestr-2.c | 32 ++ gcc/testsuite/c-c++-common/asan/strcspn-1.c| 31 ++ gcc/testsuite/c-c++-common/asan/strcspn-2.c| 31 ++ gcc/testsuite/c-c++-common/asan/strpbrk-1.c| 31 ++ gcc/testsuite/c-c++-common/asan/strpbrk-2.c| 31 ++ gcc/testsuite/c-c++-common/asan/strspn-1.c | 31 ++ gcc/testsuite/c-c++-common/asan/strspn-2.c | 31 ++ gcc/testsuite/c-c++-common/asan/strstr-1.c | 31 ++ gcc/testsuite/c-c++-common/asan/strstr-2.c | 31 ++ 15 files changed, 383 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/asan/default_options.h create mode 100644 gcc/testsuite/c-c++-common/asan/halt_on_error_suppress_equal_pcs-1.c create mode 100644 gcc/testsuite/c-c++-common/asan/strcasestr-1.c create mode 100644 gcc/testsuite/c-c++-common/asan/strcasestr-2.c create mode 100644 gcc/testsuite/c-c++-common/asan/strcspn-1.c create mode 100644 gcc/testsuite/c-c++-common/asan/strcspn-2.c create mode 100644 gcc/testsuite/c-c++-common/asan/strpbrk-1.c create mode 100644 gcc/testsuite/c-c++-common/asan/strpbrk-2.c create mode 100644 gcc/testsuite/c-c++-common/asan/strspn-1.c create mode 100644 gcc/testsuite/c-c++-common/asan/strspn-2.c create mode 100644 gcc/testsuite/c-c++-common/asan/strstr-1.c create mode 100644 gcc/testsuite/c-c++-common/asan/strstr-2.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 943e21c..1da0ef9 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,10 @@ 2016-11-07 Maxim Ostapenko + * asan.h (asan_intercepted_p): Handle BUILT_IN_STRCSPN, + BUILT_IN_STRPBRK, BUILT_IN_STRSPN and BUILT_IN_STRSTR. + +2016-11-07 Maxim Ostapenko + * asan.h (ASAN_STACK_MAGIC_PARTIAL): Remove. * asan.c (ASAN_STACK_MAGIC_PARTIAL): Replace with ASAN_STACK_MAGIC_MIDDLE. diff --git a/gcc/asan.h b/gcc/asan.h index a259b1a..b96395b 100644 --- a/gcc/asan.h +++ b/gcc/asan.h @@ -102,6 +102,10 @@ asan_intercepted_p (enum built_in_function fcode) || fcode == BUILT_IN_STRNCASECMP || fcode == BUILT_IN_STRNCAT || fcode == BUILT_IN_STRNCMP + || fcode == BUILT_IN_STRCSPN + || fcode == BUILT_IN_STRPBRK + || fcode == BUILT_IN_STRSPN + || fcode == BUILT_IN_STRSTR || fcode == BUILT_IN_STRNCPY; } #endif /* TREE_ASAN */ diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 49fab6e..afa77a8 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,20 @@ 2016-11-07 Maxim Ostapenko + * c-c++-common/asan/default_options.h: New file. + * c-c++-common/asan/strcasestr-1.c: New test. + * c-c++-common/asan/strcasestr-2.c: Likewise. + * c-c++-common/asan/strcspn-1.c: Likewise. + * c-c++-common/asan/strcspn-2.c: Likewise. + * c-c++-common/asan/strpbrk-1.c: Likewise. + * c-c++-common/asan/strpbrk-2.c: Likewise. + * c-c++-common/asan/strspn-1.c: Likewise. + * c-c++-common/asan/strspn-2.c: Likewise. + * c-c++-common/asan/strstr-1.c: Likewise. + * c-c++-common/asan/strstr-2.c: Likewise. + * c-c++-common/asan/halt_on_error_suppress_equal_pcs-1.c: Likewise. + +2016-11-07 Maxim Ostapenko + * c-c++-common/asan/null-deref-1.c: Adjust testcase. 2016-10-30 Bill Schmidt diff --git a/gcc/testsuite/c-c++-common/asan/default_options.h b/gcc/testsuite/c-c++-common/asan/default_options.h new file mode 100644 index 000..1e5c486 --- /dev/null +++ b/gcc/testsuite/c-c++-common/asan/default_options.h @@ -0,0 +1,9 @@ +#ifdef __cplusplus +extern "C" +#endif +const char * +__asan_d
[PATCH 7/7] Libsanitizer merge from upstream r285547.
This patch tries to implement odr indicators functionality at compiler side. We emit new __odr_asan_XXX symbol for each instrumented global that indicates whether this global was already registered and the library checks this indicator symbol at runtime. For some globals (e.g. static or hidden) the odr indicator is not needed, thus we can skip the indicator for them and pass zero to runtime. If this patch is undesirable at this stage, we can probably postpone it until GCC 8 though. From 137f139972a89259b9d8521e13ecb76fd2cef433 Mon Sep 17 00:00:00 2001 From: Maxim Ostapenko Date: Fri, 28 Oct 2016 10:22:03 +0300 Subject: [PATCH 7/7] Add support for ASan odr_indicator. config/ * bootstrap-asan.mk: Replace LSAN_OPTIONS=detect_leaks=0 with ASAN_OPTIONS=detect_leaks=0:use_odr_indicator=1. gcc/ * asan.c (asan_global_struct): Refactor. (create_odr_indicator): New function. (asan_needs_odr_indicator_p): Likewise. (is_odr_indicator): Likewise. (asan_add_global): Introduce odr_indicator_ptr. Pass it into global's constructor. (asan_protect_global): Do not protect odr indicators. gcc/testsuite/ * c-c++-common/asan/no-redundant-odr-indicators-1.c: New test. --- config/ChangeLog | 5 ++ config/bootstrap-asan.mk | 2 +- gcc/ChangeLog | 10 +++ gcc/asan.c | 76 +++--- gcc/testsuite/ChangeLog| 4 ++ .../asan/no-redundant-odr-indicators-1.c | 17 + 6 files changed, 105 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/asan/no-redundant-odr-indicators-1.c diff --git a/config/ChangeLog b/config/ChangeLog index 3b0092b..0c75185 100644 --- a/config/ChangeLog +++ b/config/ChangeLog @@ -1,3 +1,8 @@ +2016-11-07 Maxim Ostapenko + + * bootstrap-asan.mk: Replace LSAN_OPTIONS=detect_leaks=0 with + ASAN_OPTIONS=detect_leaks=0:use_odr_indicator=1. + 2016-06-21 Trevor Saunders * elf.m4: Remove interix support. diff --git a/config/bootstrap-asan.mk b/config/bootstrap-asan.mk index 70baaf9..e73d4c2 100644 --- a/config/bootstrap-asan.mk +++ b/config/bootstrap-asan.mk @@ -1,7 +1,7 @@ # This option enables -fsanitize=address for stage2 and stage3. # Suppress LeakSanitizer in bootstrap. -export LSAN_OPTIONS="detect_leaks=0" +export ASAN_OPTIONS=detect_leaks=0:use_odr_indicator=1 STAGE2_CFLAGS += -fsanitize=address STAGE3_CFLAGS += -fsanitize=address diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 1da0ef9..527cafa 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,15 @@ 2016-11-07 Maxim Ostapenko + * asan.c (asan_global_struct): Refactor. + (create_odr_indicator): New function. + (asan_needs_odr_indicator_p): Likewise. + (is_odr_indicator): Likewise. + (asan_add_global): Introduce odr_indicator_ptr. Pass it into global's + constructor. + (asan_protect_global): Do not protect odr indicators. + +2016-11-07 Maxim Ostapenko + * asan.h (asan_intercepted_p): Handle BUILT_IN_STRCSPN, BUILT_IN_STRPBRK, BUILT_IN_STRSPN and BUILT_IN_STRSTR. diff --git a/gcc/asan.c b/gcc/asan.c index fdc84bd..b54110a 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -1329,6 +1329,16 @@ asan_needs_local_alias (tree decl) return DECL_WEAK (decl) || !targetm.binds_local_p (decl); } +/* Return true if DECL, a global var, is an artificial ODR indicator symbol + therefore doesn't need protection. */ + +static bool +is_odr_indicator (tree decl) +{ + const char *sym_name = IDENTIFIER_POINTER (DECL_NAME (decl)); + return strstr(sym_name, "_.__odr_asan_") == sym_name; +} + /* Return true if DECL is a VAR_DECL that should be protected by Address Sanitizer, by appending a red zone with protected shadow memory after it and aligning it to at least @@ -1377,7 +1387,8 @@ asan_protect_global (tree decl) || ASAN_RED_ZONE_SIZE * BITS_PER_UNIT > MAX_OFILE_ALIGNMENT || !valid_constant_size_p (DECL_SIZE_UNIT (decl)) || DECL_ALIGN_UNIT (decl) > 2 * ASAN_RED_ZONE_SIZE - || TREE_TYPE (decl) == ubsan_get_source_location_type ()) + || TREE_TYPE (decl) == ubsan_get_source_location_type () + || is_odr_indicator (decl)) return false; rtl = DECL_RTL (decl); @@ -2197,14 +2208,15 @@ asan_dynamic_init_call (bool after_p) static tree asan_global_struct (void) { - static const char *field_names[8] + static const char *field_names[] = { "__beg", "__size", "__size_with_redzone", - "__name", "__module_name", "__has_dynamic_init", "__location", "__odr_indicator"}; - tree fields[8], ret; - int i; + "__name", "__module_name", "__has_dynamic_init", "__location", + "__odr_indicator"}; + tree fields[ARRAY_SIZE (field_names)], ret; + unsigned i; ret = make_node (RECORD_TYPE); - for (i = 0; i < 8; i++) + for (i = 0; i < ARRAY_SIZE (field_names); i++) { fields[i] = build_decl (UNKNOWN_LOCATION, FIELD_DECL, @@ -2226,6 +2238,52 @@ asan_glo
Re: [PATCH 0/7] Libsanitizer merge from upstream r285547.
On Mon, Nov 07, 2016 at 11:22:28AM +0300, Maxim Ostapenko wrote: > this patch set performs libsanitizer merge from upstream. > > Patch 1 is the library merge itself. > > Patch 2 is the reapplied change for SPARC by David S. Miller. > > Patch 3 changes heuristic for extracting last PC from stack frame for ARM in > fast unwind routine. More details can be found here > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61771). > > Patch 4 replaces Jakub's fix for > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63888 and removes > CheckODRViolationViaPoisoning call from RegisterGlobal to avoid false > positive odr violation reports. > > Patch 5 combines necessary compiler changes. > > Patch 6 adds several new tests, backported from upstream. > > Patch 7 adds support for ASan odr indicators at compiler side. > > The whole patch set was regtested/bootstrapped/ASan bootstrapped on > x86_64-unknown-linux-gnu and i386-unknown-linux-gnu. > Also, passed regression tests on arm-linux-gnueabi and aarch64-linux under > QEMU. So, libasan.so.* is again ABI incompatible, but libtsan and libubsan stay (hopefully) backwards ABI compatible? Jakub
Re: [PATCH 7/7] Libsanitizer merge from upstream r285547.
On Mon, Nov 07, 2016 at 11:31:18AM +0300, Maxim Ostapenko wrote: > --- a/gcc/asan.c > +++ b/gcc/asan.c > @@ -1329,6 +1329,16 @@ asan_needs_local_alias (tree decl) >return DECL_WEAK (decl) || !targetm.binds_local_p (decl); > } > > +/* Return true if DECL, a global var, is an artificial ODR indicator symbol > + therefore doesn't need protection. */ > + > +static bool > +is_odr_indicator (tree decl) > +{ > + const char *sym_name = IDENTIFIER_POINTER (DECL_NAME (decl)); > + return strstr(sym_name, "_.__odr_asan_") == sym_name; Formatting, missing space before (. Plus strstr (x, y) == x is very inefficient, strncmp would be cheaper. But more importantly, you are relying on what exactly does ASM_GENERATE_INTERNAL_LABEL, that differs between targets, not all of them e.g. allow . in symbol names, other targets use $, others can only use _, etc. I think you'd better just add "asan odr indicator" attribute (including the spaces, so it isn't something users can add to their variables) to the artificial vars and lookup_attribute in the is_odr_indicator predicate (after testing some cheap flags like DECL_ARTIFICIAL). > + tree var = build_decl (UNKNOWN_LOCATION, VAR_DECL, get_identifier > (sym_name), > + char_type_node); > + TREE_ADDRESSABLE (var) = TREE_ADDRESSABLE (decl); How is addressability of the original decl related to addressability of the indicator? If you take address of the indicator (it is stored in the structure), it should be just 1. > + TREE_READONLY (var) = 0; > + TREE_THIS_VOLATILE (var) = 1; > + DECL_GIMPLE_REG_P (var) = DECL_GIMPLE_REG_P (decl); Again, how is this related? Just store 0. > + DECL_ARTIFICIAL (var) = 1; > + DECL_IGNORED_P (var) = DECL_IGNORED_P (decl); The indicators should be surely not recorded in debug info, so DEC_IGNORED_P should be 1. > + TREE_STATIC (var) = 1; > + TREE_PUBLIC (var) = 1; > + DECL_VISIBILITY (var) = DECL_VISIBILITY (decl); Are they meant to have the same visibility and be exported from DSOs if the original var is? > @@ -2313,8 +2374,7 @@ asan_add_global (tree decl, tree type, > vec *v) >else > locptr = build_int_cst (uptr, 0); >CONSTRUCTOR_APPEND_ELT (vinner, NULL_TREE, locptr); > - /* TODO: support ODR indicators. */ > - CONSTRUCTOR_APPEND_ELT(vinner, NULL_TREE, build_int_cst (uptr, 0)); > + CONSTRUCTOR_APPEND_ELT(vinner, NULL_TREE, odr_indicator_ptr); Formatting, missing space before (, both in this patch and in the previous one. Jakub
Re: [PATCH] Make direct emission of time profiler counter
On 11/05/2016 09:38 AM, Jan Hubicka wrote: > Looks OK if it passes. > > Honza Thanks, fixed on trunk as r241894. Martin
Re: [PATCH] combine lhs zero_extract fix (PR78186)
Hi Christophe, On Fri, Nov 04, 2016 at 02:31:28PM +0100, Christophe Lyon wrote: > Since this commit I have noticed execution failures on "old" arm targets: > > gcc.dg/torture/pr48124-4.c -O1 execution test > gcc.dg/torture/pr48124-4.c -O2 execution test > gcc.dg/torture/pr48124-4.c -O2 -flto -fno-use-linker-plugin > -flto-partition=none execution test > gcc.dg/torture/pr48124-4.c -O2 -flto -fuse-linker-plugin > -fno-fat-lto-objects execution test > gcc.dg/torture/pr48124-4.c -O3 -g execution test > gcc.dg/torture/pr48124-4.c -Os execution test > > For instance on target arm-none-linux-gnueabi --with-cpu=cortex-a9 > --with-mode=arm > and running the tests with -march=armv5t Confirmed. What a nasty, nasty bug, and it has been here for decades it seems. Could you please open a PR? Segher
Re: [PATCH 0/7] Libsanitizer merge from upstream r285547.
On 07/11/16 11:39, Jakub Jelinek wrote: On Mon, Nov 07, 2016 at 11:22:28AM +0300, Maxim Ostapenko wrote: this patch set performs libsanitizer merge from upstream. Patch 1 is the library merge itself. Patch 2 is the reapplied change for SPARC by David S. Miller. Patch 3 changes heuristic for extracting last PC from stack frame for ARM in fast unwind routine. More details can be found here (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61771). Patch 4 replaces Jakub's fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63888 and removes CheckODRViolationViaPoisoning call from RegisterGlobal to avoid false positive odr violation reports. Patch 5 combines necessary compiler changes. Patch 6 adds several new tests, backported from upstream. Patch 7 adds support for ASan odr indicators at compiler side. The whole patch set was regtested/bootstrapped/ASan bootstrapped on x86_64-unknown-linux-gnu and i386-unknown-linux-gnu. Also, passed regression tests on arm-linux-gnueabi and aarch64-linux under QEMU. So, libasan.so.* is again ABI incompatible, but libtsan and libubsan stay (hopefully) backwards ABI compatible? libubsan is definitely compatible. For libtsan we have several changes: 1) Several interceptors (34 of them) were added and __interceptor_lstat{64} were removed. 2) __interceptor_strchr has change in its parameters type: __interceptor_strchr(char*, int) -> __interceptor_strchr(const char*, int) 3) tsan's internal type __tsan::ReportDesc has several changes, but it seems that this doesn't introduce ABI incompatibility with compiler side. Full abidiff listing is attached. So, I suppose libtsan is also compatible. -Maxim Jakub Functions changes summary: 4 Removed, 3 Changed (70 filtered out), 34 Added functions Variables changes summary: 0 Removed, 0 Changed, 0 Added variable Function symbols changes summary: 0 Removed, 10 Added function symbols not referenced by debug info Variable symbols changes summary: 0 Removed, 0 Added variable symbol not referenced by debug info 4 Removed functions: 'function int __interceptor_lstat(const char*, void*)'{lstat, aliases __interceptor_lstat} 'function int __interceptor_lstat64(const char*, void*)'{lstat64, aliases __interceptor_lstat64} 'function int __interceptor_stat(const char*, void*)'{__interceptor_stat, aliases stat} 'function int __interceptor_stat64(const char*, void*)'{stat64, aliases __interceptor_stat64} 34 Added functions: 'function char* __interceptor_ctermid(char*)'{__interceptor_ctermid, aliases ctermid} 'function int __interceptor_epoll_pwait(int, void*, int, int, void*)'{epoll_pwait, aliases __interceptor_epoll_pwait} 'function int __interceptor_eventfd_read(int, __sanitizer::u64*)'{eventfd_read, aliases __interceptor_eventfd_read} 'function int __interceptor_eventfd_write(int, __sanitizer::u64)'{__interceptor_eventfd_write, aliases eventfd_write} 'function void* __interceptor_memmem(SIZE_T, SIZE_T)'{__interceptor_memmem, aliases memmem} 'function int __interceptor_pthread_sigmask(int, const __sanitizer::__sanitizer_sigset_t*, __sanitizer::__sanitizer_sigset_t*)'{__interceptor_pthread_sigmask, aliases pthread_sigmask} 'function SSIZE_T __interceptor_recvfrom(int, void*, SIZE_T, int, void*, int*)'{__interceptor_recvfrom, aliases recvfrom} 'function SSIZE_T __interceptor_sendto(int, void*, SIZE_T, int, void*, int)'{__interceptor_sendto, aliases sendto} 'function int __interceptor_sigblock(int)'{sigblock, aliases __interceptor_sigblock} 'function int __interceptor_sigsetmask(int)'{sigsetmask, aliases __interceptor_sigsetmask} 'function SIZE_T __interceptor_strnlen(const char*, SIZE_T)'{__interceptor_strnlen, aliases strnlen} 'function int __interceptor_ttyname_r(int, char*, SIZE_T)'{__interceptor_ttyname_r, aliases ttyname_r} 'function void __sanitizer_cov_trace_pc_guard_init()'{__sanitizer_cov_trace_pc_guard_init} 'function int __sanitizer_install_malloc_and_free_hooks(void (typedef __sanitizer::uptr)*, void ()*)'{__sanitizer_install_malloc_and_free_hooks} 'function void __sanitizer_set_report_fd(void*)'{__sanitizer_set_report_fd} 'function void __sanitizer_symbolize_global(__sanitizer::uptr, const char*, char*, __sanitizer::uptr)'{__sanitizer_symbolize_global} 'function void __sanitizer_symbolize_pc(__sanitizer::uptr, const char*, char*, __sanitizer::uptr)'{__sanitizer_symbolize_pc} 'function void __sanitizer_syscall_post_impl_rt_sigaction(long int, long int, const __sanitizer::__sanitizer_kernel_sigaction_t*, __sanitizer::__sanitizer_kernel_sigaction_t*, SIZE_T)'{__sanitizer_syscall_post_impl_rt_sigaction} 'function void __sanitizer_syscall_post_impl_sigaction(long int, long int, const __sanitizer::__sanitizer_kernel_sigaction_t*, __sanitizer::__sanitizer_kernel_sigaction_t*)'{__sanitizer_syscall_post_impl_sigaction} 'function void __sanitizer_syscall_pre_impl_rt_sigactio
Re: [PATCH 0/7] Libsanitizer merge from upstream r285547.
On Mon, Nov 07, 2016 at 12:14:39PM +0300, Maxim Ostapenko wrote: > libubsan is definitely compatible. Nice. > For libtsan we have several changes: > > 1) Several interceptors (34 of them) were added and __interceptor_lstat{64} > were removed. That is bad, I think we need to readd those and perhaps just do what lstat*/stat* do. Weren't we solving the same thing a year ago on some other symbol? > 2) __interceptor_strchr has change in its parameters type: > __interceptor_strchr(char*, int) -> __interceptor_strchr(const char*, int) That is not a big deal, the function is extern "C". > 3) tsan's internal type __tsan::ReportDesc has several changes, but it seems > that this doesn't introduce ABI incompatibility with compiler side. If __tsan::ReportDesc is not defined in publicly installed headers, I think we are fine. Jakub
Re: [PATCH 0/7] Libsanitizer merge from upstream r285547.
On Mon, Nov 7, 2016 at 9:20 AM, Jakub Jelinek wrote: > On Mon, Nov 07, 2016 at 12:14:39PM +0300, Maxim Ostapenko wrote: >> libubsan is definitely compatible. > > Nice. > >> For libtsan we have several changes: >> >> 1) Several interceptors (34 of them) were added and __interceptor_lstat{64} >> were removed. > > That is bad, I think we need to readd those and perhaps just do what > lstat*/stat* do. Weren't we solving the same thing a year ago on some other > symbol? > >> 2) __interceptor_strchr has change in its parameters type: >> __interceptor_strchr(char*, int) -> __interceptor_strchr(const char*, int) > > That is not a big deal, the function is extern "C". > >> 3) tsan's internal type __tsan::ReportDesc has several changes, but it seems >> that this doesn't introduce ABI incompatibility with compiler side. > > If __tsan::ReportDesc is not defined in publicly installed headers, I think > we are fine. As a side note, why is it in the list of exported symbols? -I
[PATCH] rs6000: Do swdiv at expand time
We transform floating point divide instructions to a faster series of simple instructions, "swdiv". Currently we do not do that until the first splitter pass, which is much too late for most optimisations that can happen on those new instructions, e.g. the constant loads are not CSEd inside an unrolled loop. This patch changes things so those divide instructions are expanded during expand already. Bootstrapped and tested on powerpc64-linux; Bill has run SPEC on it, and if anything it shows a slight improvement. Is this okay for trunk? Segher --- gcc/config/rs6000/rs6000.md | 10 +- gcc/config/rs6000/vector.md | 10 +- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index e432a5a..e08f120 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4457,7 +4457,15 @@ (define_expand "div3" (div:SFDF (match_operand:SFDF 1 "gpc_reg_operand" "") (match_operand:SFDF 2 "gpc_reg_operand" "")))] "TARGET__INSN && !TARGET_SIMPLE_FPU" - "") +{ + if (RS6000_RECIP_AUTO_RE_P (mode) + && can_create_pseudo_p () && flag_finite_math_only + && !flag_trapping_math && flag_reciprocal_math) +{ + rs6000_emit_swdiv (operands[0], operands[1], operands[2], true); + DONE; +} +}) (define_insn "*div3_fpr" [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,") diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md index 7240345..05f3bdb 100644 --- a/gcc/config/rs6000/vector.md +++ b/gcc/config/rs6000/vector.md @@ -248,7 +248,15 @@ (define_expand "div3" (div:VEC_F (match_operand:VEC_F 1 "vfloat_operand" "") (match_operand:VEC_F 2 "vfloat_operand" "")))] "VECTOR_UNIT_VSX_P (mode)" - "") +{ + if (RS6000_RECIP_AUTO_RE_P (mode) + && can_create_pseudo_p () && flag_finite_math_only + && !flag_trapping_math && flag_reciprocal_math) +{ + rs6000_emit_swdiv (operands[0], operands[1], operands[2], true); + DONE; +} +}) (define_expand "neg2" [(set (match_operand:VEC_F 0 "vfloat_operand" "") -- 1.9.3
Re: [PATCH 0/7] Libsanitizer merge from upstream r285547.
On 07/11/16 12:28, Yuri Gribov wrote: On Mon, Nov 7, 2016 at 9:20 AM, Jakub Jelinek wrote: On Mon, Nov 07, 2016 at 12:14:39PM +0300, Maxim Ostapenko wrote: libubsan is definitely compatible. Nice. For libtsan we have several changes: 1) Several interceptors (34 of them) were added and __interceptor_lstat{64} were removed. That is bad, I think we need to readd those and perhaps just do what lstat*/stat* do. Weren't we solving the same thing a year ago on some other symbol? 2) __interceptor_strchr has change in its parameters type: __interceptor_strchr(char*, int) -> __interceptor_strchr(const char*, int) That is not a big deal, the function is extern "C". 3) tsan's internal type __tsan::ReportDesc has several changes, but it seems that this doesn't introduce ABI incompatibility with compiler side. If __tsan::ReportDesc is not defined in publicly installed headers, I think we are fine. As a side note, why is it in the list of exported symbols? Because it appears as a type of parameter of exported __tsan::OnReport function: // Can be overriden by an application/test to intercept reports. #ifdef TSAN_EXTERNAL_HOOKS bool OnReport(const ReportDesc *rep, bool suppressed); #else SANITIZER_WEAK_CXX_DEFAULT_IMPL bool OnReport(const ReportDesc *rep, bool suppressed) { (void)rep; return suppressed; } #endif This function can be overridden by application for debugging purpose though. -I
Re: [rs6000] Fix reload failures in 64-bit mode with no special constant pool
> Now you don't need to have a special pool to call create_TOC_reference, you > can call it for regular TOC references as well, as done a few lines above: > > /* If this is a SYMBOL_REF that refers to a constant pool entry, >and we have put it in the TOC, we just need to make a TOC-relative >reference to it. */ > if (TARGET_TOC > && GET_CODE (operands[1]) == SYMBOL_REF > && use_toc_relative_ref (operands[1], mode)) > operands[1] = create_TOC_reference (operands[1], operands[0]); > > So the attached patch does it there too. > > Tested on PowerPC64/Linux (LRA) and VxWorks (reload), OK for the mainline? Revised version attached, with Pmode formally changed to mode (but mode == Pmode here so no functional change whatsoever). Tested on PowerPC64/Linux, OK for the mainline? * config/rs6000/rs6000.c (rs6000_emit_move): Also use a TOC reference after forcing to constant memory when the code model is medium. -- Eric BotcazouIndex: config/rs6000/rs6000.c === --- config/rs6000/rs6000.c (revision 241856) +++ config/rs6000/rs6000.c (working copy) @@ -10673,10 +10673,7 @@ rs6000_emit_move (rtx dest, rtx source, if (TARGET_TOC && GET_CODE (XEXP (operands[1], 0)) == SYMBOL_REF - && constant_pool_expr_p (XEXP (operands[1], 0)) - && ASM_OUTPUT_SPECIAL_POOL_ENTRY_P ( - get_pool_constant (XEXP (operands[1], 0)), - get_pool_mode (XEXP (operands[1], 0 + && use_toc_relative_ref (XEXP (operands[1], 0), mode)) { rtx tocref = create_TOC_reference (XEXP (operands[1], 0), operands[0]);
Re: [PATCH] fix a few minor nits in -Walloca documentation
On Sat, Nov 5, 2016 at 3:25 AM, Jeff Law wrote: > On 11/04/2016 06:26 PM, Martin Sebor wrote: >> >> While experimenting with -Walloca and cross-referencing the manual >> I noticed a few minor nits that I thought could stand to corrected >> and/or clarified. Attached is a patch. >> >> In the update I mentioned that the alloca argument must have integer >> type for the bounds checking to be recognized to make it clear that >> for example floating point arguments are not considered to be bounded >> even if they are constrained. (Apparently VRP doesn't handle those.) > > Right. VRP doesn't handle floating point. THere's been some talk of > starting to track a few key values so we can say things like "this is not a > NaN". Yup. Basically add sth along SSA_NAME_RANGE_INFO for floats and track answers to isnan, isnormal, etc. -- basically record fpclassify () for each SSA name. I'd do this conveniently in tree-ssa-forwprop.c which iterates in RPO order, folding all stmts. The actual worker would be a int gimple_fpclassify (gimple *stmt) function classifying the result of stmt (using that SSA info on arguments). Or if you want it really fancy do it decomposed, int op_fpclassify (enum tree_code code, tree arg1 [, tree arg2 [, tree arg3]]) int op_fpclassify (enum built_in_function, tree arg1 [, tree arg2 [, tree arg3]]) Wherever we test stuff like HONOR_NANS we can replace it with sth operand specific that also evaluates the SSA info. It shouldn't be much work to start sth along this line. Richard. > > The patch is OK for the trunk. > > Thanks, > Jeff
Re: Simplify X / X, 0 / X and X % X
On Fri, Nov 4, 2016 at 9:07 PM, Marc Glisse wrote: > Hello, > > since we were discussing this recently... > > The condition is copied from the existing 0 % X case, visible in the context > of the diff. > > As far as I understand, the main case where we do not want to optimize is > during constexpr evaluation in the C++ front-end (it wants to detect the > undefined behavior), and with late folding I think this means we only need > to care about an explicit 0/0, not about X/X where X would become 0 after > the simplification. > > And later, if we do have something like X/0, we could handle it the same way > as we currently handle *(char*)0, insert a trap after that instruction and > clear the following code, which likely gives better code than replacing 0/0 > with 1. > > Bootstrap+regtest on powerpc64le-unknown-linux-gnu. Ok. Thanks, Richard. > 2016-11-07 Marc Glisse > > gcc/ > * match.pd (0 / X, X / X, X % X): New simplifications. > > gcc/testsuite/ > * gcc.dg/tree-ssa/divide-5.c: New file. > > -- > Marc Glisse
Re: Simplify X / X, 0 / X and X % X
On Sat, Nov 5, 2016 at 3:30 AM, Jeff Law wrote: > On 11/04/2016 02:07 PM, Marc Glisse wrote: >> >> Hello, >> >> since we were discussing this recently... >> >> The condition is copied from the existing 0 % X case, visible in the >> context of the diff. >> >> As far as I understand, the main case where we do not want to optimize >> is during constexpr evaluation in the C++ front-end (it wants to detect >> the undefined behavior), and with late folding I think this means we >> only need to care about an explicit 0/0, not about X/X where X would >> become 0 after the simplification. >> >> And later, if we do have something like X/0, we could handle it the same >> way as we currently handle *(char*)0, insert a trap after that >> instruction and clear the following code, which likely gives better code >> than replacing 0/0 with 1. > > Yup. I'd prefer to insert a trap if we ultimately expose a division by zero > -- including cases where that division occurs as a result of a PHI arg being > zero and the PHI result being used as a denominator in a division > expression. > > It ought to be extremely easy to detect & transform that case (and probably > warn for it too). We have gimple-ssa-isolate-paths.c for that, right? Richard. > > > I'm leaving the actual review to Richi. > jeff >
Re: [PATCH 0/7] Libsanitizer merge from upstream r285547.
On 07/11/16 12:20, Jakub Jelinek wrote: On Mon, Nov 07, 2016 at 12:14:39PM +0300, Maxim Ostapenko wrote: libubsan is definitely compatible. Nice. For libtsan we have several changes: 1) Several interceptors (34 of them) were added and __interceptor_lstat{64} were removed. That is bad, I think we need to readd those and perhaps just do what lstat*/stat* do. Weren't we solving the same thing a year ago on some other symbol? Yeah, that was __tls_get_addr. Actually, *stat interceptors were moved from tsan to common, but it seems that lstat/lstat64 were missed. This should be fixed upstream, I suppose. 2) __interceptor_strchr has change in its parameters type: __interceptor_strchr(char*, int) -> __interceptor_strchr(const char*, int) That is not a big deal, the function is extern "C". 3) tsan's internal type __tsan::ReportDesc has several changes, but it seems that this doesn't introduce ABI incompatibility with compiler side. If __tsan::ReportDesc is not defined in publicly installed headers, I think we are fine. I don't see __tsan::ReportDesc in any tsan interface header: $ grep -nr ReportDesc libsanitizer/tsan/tsan_interface* $ But since tsan has weak SANITIZER_WEAK_CXX_DEFAULT_IMPL bool OnReport(const ReportDesc *rep, bool suppressed { ... } that can be overwritten by C++ application (in debugging purposes though), is it OK to not change libtsan version? Jakub
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v3)
Hello. After discussion with Jakub, I'm resending new version of the patch, where I changed following: 1) gimplify_ctxp->live_switch_vars is used to track variables introduced in switch_expr. Every time a case_label_expr is seen, these are unpoisoned. It's quite conservative, however it covers all corner cases on can come up with. Compared to clang, we are much more precise in switch statements where a variable liveness crosses label boundary. 2) I found a bug where ASAN_CHECK was optimized out due to missing check of IFN_ASAN_MARK internal fn. Test was added for that. 3) Multiple switch tests have been added, which is going to be sent in upcoming email. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests (+ asan bootstrap finishes successfully). Martin >From 2b37a59dd639ad740fdbd49d57b9f1975fc35046 Mon Sep 17 00:00:00 2001 From: marxin Date: Tue, 3 May 2016 15:35:22 +0200 Subject: [PATCH 1/2] Introduce -fsanitize-address-use-after-scope gcc/c-family/ChangeLog: 2016-10-27 Martin Liska * c-warn.c (warn_for_unused_label): Save all labels used in goto or in &label. gcc/ChangeLog: 2016-10-27 Martin Liska * asan.c (enum asan_check_flags): Move the enum to header file. (asan_init_shadow_ptr_types): Make type creation more generic. (shadow_mem_size): New function. (asan_emit_stack_protection): Use newly added ASAN_SHADOW_GRANULARITY. Rewritten stack unpoisoning code. (build_shadow_mem_access): Add new argument return_address. (instrument_derefs): Instrument local variables if use after scope sanitization is enabled. (asan_store_shadow_bytes): New function. (asan_expand_mark_ifn): Likewise. (asan_sanitize_stack_p): Moved from asan_sanitize_stack_p. * asan.h (enum asan_mark_flags): Moved here from asan.c (asan_protect_stack_decl): Protect all declaration that need to live in memory. (asan_sanitize_use_after_scope): New function. (asan_no_sanitize_address_p): Likewise. * cfgexpand.c (partition_stack_vars): Consider asan_sanitize_use_after_scope in condition. (expand_stack_vars): Likewise. * common.opt (-fsanitize-address-use-after-scope): New option. * doc/invoke.texi (use-after-scope-direct-emission-threshold): Explain the parameter. * flag-types.h (enum sanitize_code): Define SANITIZE_USE_AFTER_SCOPE. * gimplify.c (build_asan_poison_call_expr): New function. (asan_poison_variable): Likewise. (gimplify_bind_expr): Generate poisoning/unpoisoning for local variables that have address taken. (gimplify_decl_expr): Likewise. (gimplify_target_expr): Likewise for C++ temporaries. (sort_by_decl_uid): New function. (gimplify_expr): Unpoison all variables for a label we can jump from outside of a scope. (gimplify_switch_expr): Unpoison variables defined in the switch context. (gimplify_function_tree): Clear asan_poisoned_variables. (asan_poison_variables): New function. (warn_switch_unreachable_r): Handle IFN_ASAN_MARK. * internal-fn.c (expand_ASAN_MARK): New function. * internal-fn.def (ASAN_MARK): Declare. * opts.c (finish_options): Handle -fstack-reuse if -fsanitize-address-use-after-scope is enabled. (common_handle_option): Enable address sanitization if -fsanitize-address-use-after-scope is enabled. * params.def (PARAM_USE_AFTER_SCOPE_DIRECT_EMISSION_THRESHOLD): New parameter. * params.h: Likewise. * sancov.c (pass_sanopt::execute): Handle IFN_ASAN_MARK. * sanitizer.def: Define __asan_poison_stack_memory and __asan_unpoison_stack_memory functions. * asan.c (asan_mark_poison_p): New function. (transform_statements): Handle asan_mark_poison_p calls. * gimple.c (nonfreeing_call_p): Handle IFN_ASAN_MARK. --- gcc/asan.c| 302 +- gcc/asan.h| 66 +-- gcc/c-family/c-warn.c | 9 +- gcc/cfgexpand.c | 18 +-- gcc/common.opt| 3 + gcc/doc/invoke.texi | 15 ++- gcc/gimple.c | 3 + gcc/gimplify.c| 234 +++--- gcc/internal-fn.c | 9 ++ gcc/internal-fn.def | 1 + gcc/opts.c| 27 - gcc/params.def| 6 + gcc/params.h | 2 + gcc/sanitizer.def | 4 + gcc/sanopt.c | 3 + 15 files changed, 603 insertions(+), 99 deletions(-) diff --git a/gcc/asan.c b/gcc/asan.c index c6d9240..1e0ce8d 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -245,6 +245,22 @@ static unsigned HOST_WIDE_INT asan_shadow_offset_value; static bool asan_shadow_offset_computed; static vec sanitized_sections; +/* Return true if STMT is ASAN_MARK poisoning internal function call. */ +static inline bool +asan_mark_poison_p (gimple *stmt) +{ + return (gimple_call_internal_p (stmt, IFN_ASAN_MARK) + && tree_to_uhwi (gimple_call_arg (stmt, 0)) == ASAN_MARK_CLOBBER); + +} + +/* Set of variable declarations that are going to be guarded by + use-after-scope sanitizer. */ + +static hash_set *asan_handled_variables = NULL; + +hash_set *asan_used_labels = N
Re: [PATCH 0/7] Libsanitizer merge from upstream r285547.
On Mon, Nov 07, 2016 at 11:22:28AM +0300, Maxim Ostapenko wrote: > Hi, > > this patch set performs libsanitizer merge from upstream. > > Patch 1 is the library merge itself. > > Patch 2 is the reapplied change for SPARC by David S. Miller. > > Patch 3 changes heuristic for extracting last PC from stack frame for ARM in > fast unwind routine. More details can be found here > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61771). > > Patch 4 replaces Jakub's fix for > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63888 and removes > CheckODRViolationViaPoisoning call from RegisterGlobal to avoid false > positive odr violation reports. > > Patch 5 combines necessary compiler changes. > > Patch 6 adds several new tests, backported from upstream. The patches 1-6 are ok for trunk now, if you fix the missing space before ( in patch 5. > Patch 7 adds support for ASan odr indicators at compiler side. This one can be applied incrementally once the issues reported in there are resolved. And the libtsan ABI stuff (__intercept*stat*) can be resolved incrementally too. Thanks. Jakub
Re: [PATCH, 02/N] Introduce tests for -fsanitize-address-use-after-scope (v3)
Third version of the patch. Martin >From e790d926afd3d2d6ad41d14d1e91698bf651b41a Mon Sep 17 00:00:00 2001 From: marxin Date: Mon, 19 Sep 2016 17:39:29 +0200 Subject: [PATCH 2/2] Introduce tests for -fsanitize-address-use-after-scope gcc/testsuite/ChangeLog: 2016-09-26 Martin Liska * c-c++-common/asan/force-inline-opt0-1.c: Disable -f-sanitize-address-use-after-scope. * c-c++-common/asan/inc.c: Change number of expected ASAN_CHECK internal fn calls. * g++.dg/asan/use-after-scope-1.C: New test. * g++.dg/asan/use-after-scope-2.C: Likewise. * g++.dg/asan/use-after-scope-3.C: Likewise. * g++.dg/asan/use-after-scope-types-1.C: Likewise. * g++.dg/asan/use-after-scope-types-2.C: Likewise. * g++.dg/asan/use-after-scope-types-3.C: Likewise. * g++.dg/asan/use-after-scope-types-4.C: Likewise. * g++.dg/asan/use-after-scope-types-5.C: Likewise. * g++.dg/asan/use-after-scope-types.h: Likewise. * gcc.dg/asan/use-after-scope-1.c: Likewise. * gcc.dg/asan/use-after-scope-2.c: Likewise. * gcc.dg/asan/use-after-scope-3.c: Likewise. * gcc.dg/asan/use-after-scope-4.c: Likewise. * gcc.dg/asan/use-after-scope-5.c: Likewise. * gcc.dg/asan/use-after-scope-6.c: Likewise. * gcc.dg/asan/use-after-scope-7.c: Likewise. * gcc.dg/asan/use-after-scope-8.c: Likewise. * gcc.dg/asan/use-after-scope-9.c: Likewise. * gcc.dg/asan/use-after-scope-switch-1.c: Likewise. * gcc.dg/asan/use-after-scope-switch-2.c: Likewise. * gcc.dg/asan/use-after-scope-switch-3.c: Likewise. * gcc.dg/asan/use-after-scope-goto-1.c: Likewise. * gcc.dg/asan/use-after-scope-goto-2.c: Likewise. --- .../c-c++-common/asan/force-inline-opt0-1.c| 1 + gcc/testsuite/c-c++-common/asan/inc.c | 3 +- gcc/testsuite/g++.dg/asan/use-after-scope-1.C | 21 ++ gcc/testsuite/g++.dg/asan/use-after-scope-2.C | 40 ++ gcc/testsuite/g++.dg/asan/use-after-scope-3.C | 22 ++ .../g++.dg/asan/use-after-scope-types-1.C | 17 .../g++.dg/asan/use-after-scope-types-2.C | 17 .../g++.dg/asan/use-after-scope-types-3.C | 17 .../g++.dg/asan/use-after-scope-types-4.C | 17 .../g++.dg/asan/use-after-scope-types-5.C | 17 gcc/testsuite/g++.dg/asan/use-after-scope-types.h | 30 ++ gcc/testsuite/gcc.dg/asan/use-after-scope-1.c | 18 + gcc/testsuite/gcc.dg/asan/use-after-scope-2.c | 47 ++ gcc/testsuite/gcc.dg/asan/use-after-scope-3.c | 20 + gcc/testsuite/gcc.dg/asan/use-after-scope-4.c | 16 gcc/testsuite/gcc.dg/asan/use-after-scope-5.c | 27 + gcc/testsuite/gcc.dg/asan/use-after-scope-6.c | 15 +++ gcc/testsuite/gcc.dg/asan/use-after-scope-7.c | 15 +++ gcc/testsuite/gcc.dg/asan/use-after-scope-8.c | 14 +++ gcc/testsuite/gcc.dg/asan/use-after-scope-9.c | 20 + gcc/testsuite/gcc.dg/asan/use-after-scope-goto-1.c | 47 ++ gcc/testsuite/gcc.dg/asan/use-after-scope-goto-2.c | 25 .../gcc.dg/asan/use-after-scope-switch-1.c | 25 .../gcc.dg/asan/use-after-scope-switch-2.c | 33 +++ .../gcc.dg/asan/use-after-scope-switch-3.c | 36 + 25 files changed, 559 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-1.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-2.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-3.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-types-1.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-types-2.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-types-3.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-types-4.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-types-5.C create mode 100644 gcc/testsuite/g++.dg/asan/use-after-scope-types.h create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-1.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-2.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-3.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-4.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-5.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-6.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-7.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-8.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-9.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-goto-1.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-goto-2.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-switch-1.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-switch-2.c create mode 100644 gcc/testsuite/gcc.dg/asan/use-after-scope-switch-3.c diff --git a/gcc/testsuite/c-c++-common/asan/force-inlin
Re: [PATCH] Fix PR driver/78206 by silently ignoring EPERM as well as ENOENT
On Sun, Nov 6, 2016 at 2:36 PM, Jack Howarth wrote: > The use of an Apple sandbox with denied file access permissions into > /usr/local > exposed that cc1 fails on errors of... > > cc1: error: /usr/local/include: Operation not permitted > > The commonly suggested solution of using --with-local-prefix= set to something > other than /usr/local is undeirable on darwin because that creates a compiler > which retains library searches in /usr/local/lib despite no longer searching > for headers in /usr/local/include (which makes it suspicable to header/library > mismatches during builds). > > The following trivial fix solves the issue by silently ignoring errors from > denied permissions as well as non-existent dirs from the stat (cur->name, &st) > call in remove_dup() of gcc/incpath.c. Okay for gcc trunk and backports to > gcc-5-branch and gcc-6-branch? I think the patch is reasonable, thus it is ok (also for backporting). Thanks, Richard. >Jack Howarth
Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v3)
On Mon, Nov 07, 2016 at 11:03:11AM +0100, Martin Liška wrote: > Hello. > > After discussion with Jakub, I'm resending new version of the patch, where I > changed following: > 1) gimplify_ctxp->live_switch_vars is used to track variables introduced in > switch_expr. Every time >a case_label_expr is seen, these are unpoisoned. It's quite conservative, > however it covers all >corner cases on can come up with. Compared to clang, we are much more > precise in switch statements >where a variable liveness crosses label boundary. > 2) I found a bug where ASAN_CHECK was optimized out due to missing check of > IFN_ASAN_MARK internal fn. >Test was added for that. > 3) Multiple switch tests have been added, which is going to be sent in > upcoming email. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests (+ > asan bootstrap finishes > successfully). Ok for trunk. Hopefully we can resolve the most common cases for switch incrementally, either still during stage1 or early in stage3. Jakub
Re: [match.pd] Fix for PR35691
On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: > On 4 November 2016 at 13:41, Richard Biener wrote: > > On Thu, 3 Nov 2016, Marc Glisse wrote: > > > >> On Thu, 3 Nov 2016, Richard Biener wrote: > >> > >> > > > > The transform would also work for vectors (element_precision for > >> > > > > the test but also a value-matching zero which should ensure the > >> > > > > same number of elements). > >> > > > Um sorry, I didn't get how to check vectors to be of equal length by > >> > > > a > >> > > > matching zero. > >> > > > Could you please elaborate on that ? > >> > > > >> > > He may have meant something like: > >> > > > >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) > >> > > >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was > >> > the > >> > point of the pattern. > >> > >> Oups, that's what I had written first, and then I somehow managed to > >> confuse > >> myself enough to remove it so as to remove the call to types_match :-( > >> > >> > > So the last operand is checked with operand_equal_p instead of > >> > > integer_zerop. But the fact that we could compute bit_ior on the > >> > > comparison results should already imply that the number of elements is > >> > > the > >> > > same. > >> > > >> > Though for equality compares we also allow scalar results IIRC. > >> > >> Oh, right, I keep forgetting that :-( And I have no idea how to generate > >> one > >> for a testcase, at least until the GIMPLE FE lands... > >> > >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some > >> > > vector mode on s390?), it would be cool to do the same for floats (most > >> > > likely at the RTL level). > >> > > >> > On GIMPLE view-converts could come to the rescue here as well. Or we cab > >> > just allow bit-and/or on floats as much as we allow them on pointers. > >> > >> Would that generate sensible code on targets that do not have logic insns > >> for > >> floats? Actually, even on x86_64 that generates inefficient code, so there > >> would be some work (for instance grep finds no gen_iordf3, only > >> gen_iorv2df3). > >> > >> I am also a bit wary of doing those obfuscating optimizations too early... > >> a==0 is something that other optimizations might use. long > >> c=(long&)a|(long&)b; (double&)c==0; less so... > >> > >> (and I am assuming that signaling NaNs don't make the whole transformation > >> impossible, which might be wrong) > > > > Yeah. I also think it's not so much important - I just wanted to mention > > vectors... > > > > Btw, I still think we need a more sensible infrastructure for passes > > to gather, analyze and modify complex conditions. (I'm always pointing > > to tree-affine.c as an, albeit not very good, example for handling > > a similar problem) > Thanks for mentioning the value-matching capture @@, I wasn't aware of > this match.pd feature. > The current patch keeps it restricted to only bitwise operators on integers. > Bootstrap+test running on x86_64-unknown-linux-gnu. > OK to commit if passes ? +/* PR35691: Transform + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ + Please omit the vertical space +(for bitop (bit_and bit_ior) + cmp (eq ne) + (simplify + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) if you capture the first integer_zerop as @2 then you can re-use it... + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))) +(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); ... here inplace of the { build_zero_cst ... }. Ok with that changes. Richard.
Re: [PATCH, 02/N] Introduce tests for -fsanitize-address-use-after-scope (v3)
On Mon, Nov 07, 2016 at 11:04:23AM +0100, Martin Liška wrote: > Third version of the patch. > > Martin > >From e790d926afd3d2d6ad41d14d1e91698bf651b41a Mon Sep 17 00:00:00 2001 > From: marxin > Date: Mon, 19 Sep 2016 17:39:29 +0200 > Subject: [PATCH 2/2] Introduce tests for -fsanitize-address-use-after-scope > > gcc/testsuite/ChangeLog: > > 2016-09-26 Martin Liska > > * c-c++-common/asan/force-inline-opt0-1.c: Disable > -f-sanitize-address-use-after-scope. > * c-c++-common/asan/inc.c: Change number of expected ASAN_CHECK > internal fn calls. > * g++.dg/asan/use-after-scope-1.C: New test. > * g++.dg/asan/use-after-scope-2.C: Likewise. > * g++.dg/asan/use-after-scope-3.C: Likewise. > * g++.dg/asan/use-after-scope-types-1.C: Likewise. > * g++.dg/asan/use-after-scope-types-2.C: Likewise. > * g++.dg/asan/use-after-scope-types-3.C: Likewise. > * g++.dg/asan/use-after-scope-types-4.C: Likewise. > * g++.dg/asan/use-after-scope-types-5.C: Likewise. > * g++.dg/asan/use-after-scope-types.h: Likewise. > * gcc.dg/asan/use-after-scope-1.c: Likewise. > * gcc.dg/asan/use-after-scope-2.c: Likewise. > * gcc.dg/asan/use-after-scope-3.c: Likewise. > * gcc.dg/asan/use-after-scope-4.c: Likewise. > * gcc.dg/asan/use-after-scope-5.c: Likewise. > * gcc.dg/asan/use-after-scope-6.c: Likewise. > * gcc.dg/asan/use-after-scope-7.c: Likewise. > * gcc.dg/asan/use-after-scope-8.c: Likewise. > * gcc.dg/asan/use-after-scope-9.c: Likewise. > * gcc.dg/asan/use-after-scope-switch-1.c: Likewise. > * gcc.dg/asan/use-after-scope-switch-2.c: Likewise. > * gcc.dg/asan/use-after-scope-switch-3.c: Likewise. > * gcc.dg/asan/use-after-scope-goto-1.c: Likewise. > * gcc.dg/asan/use-after-scope-goto-2.c: Likewise. Ok, thanks. Jakub
Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)
On Fri, 4 Nov 2016, Jakub Jelinek wrote: > Hi! > > Just 2 nits: > > On Fri, Oct 28, 2016 at 01:46:57PM +0200, Richard Biener wrote: > > +/* Return a pointer to the Nth token in PARERs tokens_buf. */ > > PARSERs ? Fixed. > > @@ -454,7 +423,7 @@ c_lex_one_token (c_parser *parser, c_token *token) > > /* Return a pointer to the next token from PARSER, reading it in if > > necessary. */ > > > > -static inline c_token * > > +c_token * > > c_parser_peek_token (c_parser *parser) > > { > >if (parser->tokens_avail == 0) > > I wonder if turning all of these into non-inlines is a good idea. > Can't you move them to the common header instead? The issue with moving is that I failed to export the definition of c_parser in c-parser.h due to gengtype putting vec handlers into gtype-c.h but not gtype-objc.h and thus objc bootstrap fails :/ I believe (well, I hope) that code generation for the C parser should be mostly unaffected (inlining is still done as determined useful) and the performance of the GIMPLE parser shouldn't be too important. If anybody feels like digging into the gengtype issue, I gave up after trying for half a day to trick it to do what I want (like for example also putting it in gtype-objc.h). > The rest I defer to Joseph or Marek. Thanks, Richard.
Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)
On Mon, 7 Nov 2016, Richard Biener wrote: > On Fri, 4 Nov 2016, Jakub Jelinek wrote: > > > Hi! > > > > Just 2 nits: > > > > On Fri, Oct 28, 2016 at 01:46:57PM +0200, Richard Biener wrote: > > > +/* Return a pointer to the Nth token in PARERs tokens_buf. */ > > > > PARSERs ? > > Fixed. > > > > @@ -454,7 +423,7 @@ c_lex_one_token (c_parser *parser, c_token *token) > > > /* Return a pointer to the next token from PARSER, reading it in if > > > necessary. */ > > > > > > -static inline c_token * > > > +c_token * > > > c_parser_peek_token (c_parser *parser) > > > { > > >if (parser->tokens_avail == 0) > > > > I wonder if turning all of these into non-inlines is a good idea. > > Can't you move them to the common header instead? > > The issue with moving is that I failed to export the definition of > c_parser in c-parser.h due to gengtype putting vec > handlers into gtype-c.h but not gtype-objc.h and thus objc bootstrap > fails :/ If anybody wants to try, f82dc04b921a52a9a5c90d957a824e1c2d04 has it (objc build) still broken on the gimplefe git branch. > I believe (well, I hope) that code generation for the C parser > should be mostly unaffected (inlining is still done as determined > useful) and the performance of the GIMPLE parser shouldn't be > too important. > > If anybody feels like digging into the gengtype issue, I gave up > after trying for half a day to trick it to do what I want > (like for example also putting it in gtype-objc.h). > > > The rest I defer to Joseph or Marek. > > Thanks, > Richard. > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH 0/7] Libsanitizer merge from upstream r285547.
On 07/11/16 13:04, Jakub Jelinek wrote: On Mon, Nov 07, 2016 at 11:22:28AM +0300, Maxim Ostapenko wrote: Hi, this patch set performs libsanitizer merge from upstream. Patch 1 is the library merge itself. Patch 2 is the reapplied change for SPARC by David S. Miller. Patch 3 changes heuristic for extracting last PC from stack frame for ARM in fast unwind routine. More details can be found here (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61771). Patch 4 replaces Jakub's fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63888 and removes CheckODRViolationViaPoisoning call from RegisterGlobal to avoid false positive odr violation reports. Patch 5 combines necessary compiler changes. Patch 6 adds several new tests, backported from upstream. The patches 1-6 are ok for trunk now, if you fix the missing space before ( in patch 5. Ok, I'm going to land these shortly, thank you for review. Patch 7 adds support for ASan odr indicators at compiler side. This one can be applied incrementally once the issues reported in there are resolved. Yes, I'll fix the patch. And the libtsan ABI stuff (__intercept*stat*) can be resolved incrementally too. Thanks. Jakub
[RFC] Fix PR rtl-optimization/59461
It's a missed optimization of a redundant zero-extension on the SPARC, which originally comes from PR rtl-optimization/58295 for ARM. The extension is eliminated on the ARM because the load is explicitly zero-extended in RTL; on the SPARC the load is implicitly zero-extended by means of LOAD_EXTEND_OP and the combiner is blocked by limitations of the nonzero_bits machinery. The approach is two-pronged: 1. it lifts a limitation in reg_nonzero_bits_for_combine that was recently added (https://gcc.gnu.org/ml/gcc-patches/2013-11/msg03782.html) and prevents the combiner from reasoning on larger modes under certain circumstances. 2. it makes nonzero_bits1 propagate results from inner REGs to paradoxical SUBREGs if both WORD_REGISTER_OPERATIONS and LOAD_EXTEND_OP are set. This also eliminate quite a few zero-extensions in the compile.exp testsuite at -O2 on the SPARC. Tested on x86-64/Linux and SPARC/Solaris. 2016-11-07 Eric Botcazou PR rtl-optimization/59461 * doc/rtl.texi (paradoxical subregs): Add missing word. * combine.c (reg_nonzero_bits_for_combine): Do not discard results in modes with precision larger than that of last_set_mode. * rtlanal.c (nonzero_bits1) : If WORD_REGISTER_OPERATIONS is set and LOAD_EXTEND_OP is appropriate, propagate results from inner REGs to paradoxical SUBREGs. (num_sign_bit_copies1) : Likewise. Check that the mode is not larger than a word before invoking LOAD_EXTEND_OP on it. 2016-11-07 Eric Botcazou * gcc.target/sparc/pr59461.c: New test. -- Eric Botcazou/* PR rtl-optimization/59461 */ /* { dg-do compile } */ /* { dg-options "-O2" } */ extern char zeb_test_array[10]; unsigned char ee_isdigit2(unsigned int i) { unsigned char c = zeb_test_array[i]; unsigned char retval; retval = ((c>='0') & (c<='9')) ? 1 : 0; return retval; } /* { dg-final { scan-assembler-not "and\t%" } } */ Index: doc/rtl.texi === --- doc/rtl.texi (revision 241856) +++ doc/rtl.texi (working copy) @@ -1882,7 +1882,7 @@ When used as an rvalue, the low-order bi taken from @var{reg} while the high-order bits may or may not be defined. -The high-order bits of rvalues are in the following circumstances: +The high-order bits of rvalues are defined in the following circumstances: @itemize @item @code{subreg}s of @code{mem} Index: combine.c === --- combine.c (revision 241856) +++ combine.c (working copy) @@ -9878,18 +9878,17 @@ reg_nonzero_bits_for_combine (const_rtx (DF_LR_IN (ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb), REGNO (x) { - unsigned HOST_WIDE_INT mask = rsp->last_set_nonzero_bits; - - if (GET_MODE_PRECISION (rsp->last_set_mode) < GET_MODE_PRECISION (mode)) - /* We don't know anything about the upper bits. */ - mask |= GET_MODE_MASK (mode) ^ GET_MODE_MASK (rsp->last_set_mode); - - *nonzero &= mask; + /* Note that, even if the precision of last_set_mode is lower than that + of mode, record_value_for_reg invoked nonzero_bits on the register + with nonzero_bits_mode (because last_set_mode is necessarily integral + and HWI_COMPUTABLE_MODE_P in this case) so bits in nonzero_bits_mode + are all valid, hence in mode too since nonzero_bits_mode is defined + to the largest HWI_COMPUTABLE_MODE_P mode. */ + *nonzero &= rsp->last_set_nonzero_bits; return NULL; } tem = get_last_value (x); - if (tem) { if (SHORT_IMMEDIATES_SIGN_EXTEND) @@ -9898,7 +9897,8 @@ reg_nonzero_bits_for_combine (const_rtx return tem; } - else if (nonzero_sign_valid && rsp->nonzero_bits) + + if (nonzero_sign_valid && rsp->nonzero_bits) { unsigned HOST_WIDE_INT mask = rsp->nonzero_bits; Index: rtlanal.c === --- rtlanal.c (revision 241856) +++ rtlanal.c (working copy) @@ -4242,7 +4242,7 @@ cached_nonzero_bits (const_rtx x, machin /* Given an expression, X, compute which bits in X can be nonzero. We don't care about bits outside of those defined in MODE. - For most X this is simply GET_MODE_MASK (GET_MODE (MODE)), but if X is + For most X this is simply GET_MODE_MASK (GET_MODE (X)), but if X is an arithmetic operation, we can do better. */ static unsigned HOST_WIDE_INT @@ -4549,18 +4549,17 @@ nonzero_bits1 (const_rtx x, machine_mode /* If this is a SUBREG formed for a promoted variable that has been zero-extended, we know that at least the high-order bits are zero, though others might be too. */ - if (SUBREG_PROMOTED_VAR_P (x) && SUBREG_PROMOTED_UNSIGNED_P (x)) nonzero = GET_MODE_MASK (GET_MODE (x)) & cached_nonzero_bits (SUBREG_REG (x), GET_MODE (x), known_x, known_mode, known_ret); - inner_mode = GET_MODE (SUBREG_REG (x)); /* If the inner mode is a single word f
Re: Ping^6 Re: [Patch AArch64] Add floatdihf2 and floatunsdihf2 patterns
On Fri, Oct 21, 2016 at 05:31:14PM +0100, James Greenhalgh wrote: > On Wed, Oct 12, 2016 at 04:56:52PM +0100, James Greenhalgh wrote: > > On Wed, Sep 28, 2016 at 05:17:14PM +0100, James Greenhalgh wrote: > > > On Wed, Sep 21, 2016 at 10:42:03AM +0100, James Greenhalgh wrote: > > > > On Tue, Sep 13, 2016 at 10:31:28AM +0100, James Greenhalgh wrote: > > > > > On Tue, Sep 06, 2016 at 10:19:50AM +0100, James Greenhalgh wrote: > > > > > > This patch adds patterns for conversion from 64-bit integer to > > > > > > 16-bit > > > > > > floating-point values under AArch64 targets which don't have > > > > > > support for > > > > > > the ARMv8.2-A 16-bit floating point extensions. > > > > > > > > > > > > We implement these by first saturating to a SImode (we know that any > > > > > > values >= 65504 will round to infinity after conversion to HFmode), > > > > > > then > > > > > > converting to a DFmode (unsigned conversions could go to SFmode, > > > > > > but there > > > > > > is no performance benefit to this). Then converting to HFmode. > > > > > > > > > > > > Having added these patterns, the expansion path in "expand_float" > > > > > > will > > > > > > now try to use them for conversions from SImode to HFmode as there > > > > > > is no > > > > > > floatsihf2 pattern. expand_float first tries widening the integer > > > > > > size and > > > > > > looking for a match, so it will try SImode -> DImode. But our DI > > > > > > mode > > > > > > pattern is going to then saturate us back to SImode which is > > > > > > wasteful. > > > > > > > > > > > > Better, would be for us to provide float(uns)sihf2 patterns > > > > > > directly. > > > > > > So that's what this patch does. > > > > > > > > > > > > The testcase add in this patch would fail on trunk for AArch64. > > > > > > There is > > > > > > no libgcc routine to make the conversion, and we don't provide > > > > > > appropriate > > > > > > patterns in the backend, so we get a link-time error. > > > > > > > > > > > > Bootstrapped and tested on aarch64-none-linux-gnu > > > > > > > > > > > > OK for trunk? > > > > > > > > > > Ping. > > > > > > > > Ping^2 > > > > > > Ping^3 > > > > Ping^4 > > Ping^5 Ping^6 Thanks, James > > > > > > 2016-09-06 James Greenhalgh > > > > > > > > > > > > * config/aarch64/aarch64.md (sihf2): Convert to expand. > > > > > > (dihf2): Likewise. > > > > > > (aarch64_fp16_hf2): New. > > > > > > > > > > > > 2016-09-06 James Greenhalgh > > > > > > > > > > > > * gcc.target/aarch64/floatdihf2_1.c: New. > > > > > > > > > > > > > > > > > diff --git a/gcc/config/aarch64/aarch64.md > > > > > > b/gcc/config/aarch64/aarch64.md > > > > > > index 6afaf90..1882a72 100644 > > > > > > --- a/gcc/config/aarch64/aarch64.md > > > > > > +++ b/gcc/config/aarch64/aarch64.md > > > > > > @@ -4630,7 +4630,14 @@ > > > > > >[(set_attr "type" "f_cvti2f")] > > > > > > ) > > > > > > > > > > > > -(define_insn "hf2" > > > > > > +;; If we do not have ARMv8.2-A 16-bit floating point extensions, > > > > > > the > > > > > > +;; midend will arrange for an SImode conversion to HFmode to first > > > > > > go > > > > > > +;; through DFmode, then to HFmode. But first it will try > > > > > > converting > > > > > > +;; to DImode then down, which would match our DImode pattern below > > > > > > and > > > > > > +;; give very poor code-generation. So, we must provide our own > > > > > > emulation > > > > > > +;; of the mid-end logic. > > > > > > + > > > > > > +(define_insn "aarch64_fp16_hf2" > > > > > >[(set (match_operand:HF 0 "register_operand" "=w") > > > > > > (FLOATUORS:HF (match_operand:GPI 1 "register_operand" "r")))] > > > > > >"TARGET_FP_F16INST" > > > > > > @@ -4638,6 +4645,53 @@ > > > > > >[(set_attr "type" "f_cvti2f")] > > > > > > ) > > > > > > > > > > > > +(define_expand "sihf2" > > > > > > + [(set (match_operand:HF 0 "register_operand") > > > > > > + (FLOATUORS:HF (match_operand:SI 1 "register_operand")))] > > > > > > + "TARGET_FLOAT" > > > > > > +{ > > > > > > + if (TARGET_FP_F16INST) > > > > > > +emit_insn (gen_aarch64_fp16_sihf2 (operands[0], > > > > > > operands[1])); > > > > > > + else > > > > > > +{ > > > > > > + rtx convert_target = gen_reg_rtx (DFmode); > > > > > > + emit_insn (gen_sidf2 (convert_target, operands[1])); > > > > > > + emit_insn (gen_truncdfhf2 (operands[0], convert_target)); > > > > > > +} > > > > > > + DONE; > > > > > > +} > > > > > > +) > > > > > > + > > > > > > +;; For DImode there is no wide enough floating-point mode that we > > > > > > +;; can convert through natively (TFmode would work, but requires a > > > > > > library > > > > > > +;; call). However, we know that any value >= 65504 will be rounded > > > > > > +;; to infinity on conversion. This is well within the range of > > > > > > SImode, so > > > > > > +;; we can: > > > > > > +;; Saturate to SImode. > > > > > > +;; Convert from that to DFmode > > > > > > +;; Convert from that
Re: [PATCH][AArch64] Fix PR target/77822: Use tighter predicates for zero_extract patterns
Ping. Thanks, Kyrill On 31/10/16 12:10, Kyrill Tkachov wrote: Ping. Thanks, Kyrill On 24/10/16 14:12, Kyrill Tkachov wrote: On 24/10/16 12:29, Kyrill Tkachov wrote: Ping. https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01321.html I just noticed my original ChangeLog entry was truncated. It is 2016-10-04 Kyrylo Tkachov PR target/77822 * config/aarch64/aarch64.md (*tb1): Use aarch64_simd_shift_imm_ predicate for operand 1. (, ANY_EXTRACT): Use tighter predicates on operands 2 and 3 to restrict them to an appropriate range and add FAIL check if the region they specify is out of range. Delete useless constraint strings. (*, ANY_EXTRACT): Add appropriate predicates on operands 2 and 3 to restrict their range and add pattern predicate. 2016-10-04 Kyrylo Tkachov PR target/77822 * g++.dg/torture/pr77822.C: New test. Kyrill On 17/10/16 17:15, Kyrill Tkachov wrote: Hi all, For the attached testcase the code ends up trying to extract bits outside the range of the normal register widths. The aarch64 patterns for ubfz and tbnz end up accepting such operands and emitting invalid assembly such as 'ubfx x18,x2,192,32' The solution is to add proper predicates and guards to the operands of the zero_extract operations that are going on. I had a look at all the other patterns in aarch64 that generate/use zero_extract and they all have guards on their operands in one form or another to avoid them accessing an area that is out of range. With this patch the testcase compiles and assembles fine. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2016-10-17 Kyrylo Tkachov PR target/77822 * config/aarch64/aarch64.md (*tb1): Use aarch64_simd_shift_imm_ predicate for operand 1. (, ANY_EXTRACT): Use tighter predicates on operands 2 and 3 to restrict them to an appropriate range and add FAIL check if the region they specify is out of range. Delete useless constraint strings. (*, ANY_EXTRACT): Add appropriate predicates on operands 2 and 3 to restrict their range and add pattern predicate. 2016-10-17 Kyrylo Tkachov PR target/77822
Re: [PATCH][AArch64] Fix PR target/77822: Use tighter predicates for zero_extract patterns
On Mon, Oct 17, 2016 at 05:15:21PM +0100, Kyrill Tkachov wrote: > Hi all, > > For the attached testcase the code ends up trying to extract bits outside the > range of the normal register widths. The aarch64 patterns for ubfz and tbnz > end up accepting such operands and emitting invalid assembly > such as 'ubfx x18,x2,192,32' > > The solution is to add proper predicates and guards to the operands of the > zero_extract operations that are going on. I had a look at all the other > patterns in aarch64 that generate/use zero_extract and they all have guards > on their > operands in one form or another to avoid them accessing an area that is out > of range. > > With this patch the testcase compiles and assembles fine. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Ok for trunk? Ok, sorry for the delay on review. Thanks, James > 2016-10-17 Kyrylo Tkachov > > PR target/77822 > * config/aarch64/aarch64.md (*tb1): Use > aarch64_simd_shift_imm_ predicate for operand 1. > (, ANY_EXTRACT): Use tighter predicates on operands 2 and 3 > to restrict them to an appropriate range and add FAIL check if the > region they specify is out of range. Delete useless constraint > strings. > (*, ANY_EXTRACT): Add appropriate predicates on operands > 2 and 3 to restrict their range and add pattern predicate. >
[PATCH] Fix PR78228
The following fixes phiopt to not introduce undefined behavior in its abs replacement code in case we negate only positive values in the original code. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2016-11-07 Richard Biener PR tree-optimization/78228 * tree-ssa-phiopt.c (abs_replacement): Avoid introducing undefined behavior. * gcc.dg/tree-ssa/phi-opt-15.c: New testcase. Index: gcc/tree-ssa-phiopt.c === --- gcc/tree-ssa-phiopt.c (revision 241891) +++ gcc/tree-ssa-phiopt.c (working copy) @@ -1453,6 +1453,14 @@ abs_replacement (basic_block cond_bb, ba else negate = false; + /* If the code negates only iff positive then make sure to not + introduce undefined behavior when negating or computing the absolute. + ??? We could use range info if present to check for arg1 == INT_MIN. */ + if (negate + && (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg1)) + && ! TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg1 +return false; + result = duplicate_ssa_name (result, NULL); if (negate) Index: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c === --- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int +foo (int i) +{ + if (i > 0) +i = -i; + return i; +} + +/* { dg-final { scan-tree-dump-not "ABS" "optimized" } } */
[PATCH] Fix PR78218
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2016-11-07 Richard Biener PR tree-optimization/78218 * gimple-ssa-store-merging.c (pass_store_merging::terminate_all_aliasing_chains): Drop unused argument, fix alias check to also consider uses. (pass_store_merging::execute): Adjust. * gcc.dg/torture/pr78218.c: New testcase. Index: gcc/gimple-ssa-store-merging.c === --- gcc/gimple-ssa-store-merging.c (revision 241893) +++ gcc/gimple-ssa-store-merging.c (working copy) @@ -726,7 +726,7 @@ private: hash_map m_stores; bool terminate_and_process_all_chains (); - bool terminate_all_aliasing_chains (tree, imm_store_chain_info **, + bool terminate_all_aliasing_chains (imm_store_chain_info **, bool, gimple *); bool terminate_and_release_chain (imm_store_chain_info *); }; // class pass_store_merging @@ -755,8 +755,7 @@ pass_store_merging::terminate_and_proces If that is the case we have to terminate any chain anchored at BASE. */ bool -pass_store_merging::terminate_all_aliasing_chains (tree dest, - imm_store_chain_info +pass_store_merging::terminate_all_aliasing_chains (imm_store_chain_info **chain_info, bool var_offset_p, gimple *stmt) @@ -788,7 +787,10 @@ pass_store_merging::terminate_all_aliasi unsigned int i; FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info) { - if (stmt_may_clobber_ref_p (info->stmt, dest)) + if (ref_maybe_used_by_stmt_p (stmt, + gimple_assign_lhs (info->stmt)) + || stmt_may_clobber_ref_p (stmt, +gimple_assign_lhs (info->stmt))) { if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -1458,7 +1460,7 @@ pass_store_merging::execute (function *f } /* Store aliases any existing chain? */ - terminate_all_aliasing_chains (lhs, chain_info, false, stmt); + terminate_all_aliasing_chains (chain_info, false, stmt); /* Start a new chain. */ struct imm_store_chain_info *new_chain = new imm_store_chain_info (base_addr); @@ -1477,13 +1479,13 @@ pass_store_merging::execute (function *f } } else - terminate_all_aliasing_chains (lhs, chain_info, + terminate_all_aliasing_chains (chain_info, offset != NULL_TREE, stmt); continue; } - terminate_all_aliasing_chains (NULL_TREE, NULL, false, stmt); + terminate_all_aliasing_chains (NULL, false, stmt); } terminate_and_process_all_chains (); } Index: gcc/testsuite/gcc.dg/torture/pr78218.c === --- gcc/testsuite/gcc.dg/torture/pr78218.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/pr78218.c (working copy) @@ -0,0 +1,24 @@ +/* { dg-do run } */ + +struct +{ + int v; +} a[2]; + +int b; + +void __attribute__((noinline,noclone)) +check () +{ + if (a[0].v != 1) +__builtin_abort (); +} + +int main () +{ + a[1].v = 1; + a[0] = a[1]; + a[1].v = 0; + check (a); + return 0; +}
[PATCH] Fix PR78229
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk and branch. Richard. 2016-11-07 Richard Biener PR target/78229 * config/i386/i386.c (ix86_gimple_fold_builtin): Do not adjust EH info. * g++.dg/pr78229.C: New testcase. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 241891) +++ gcc/config/i386/i386.c (working copy) @@ -37664,7 +37664,7 @@ ix86_gimple_fold_builtin (gimple_stmt_it gsi_insert_before (gsi, g, GSI_SAME_STMT); g = gimple_build_assign (gimple_call_lhs (stmt), NOP_EXPR, lhs); gimple_set_location (g, loc); - gsi_replace (gsi, g, true); + gsi_replace (gsi, g, false); return true; } break; Index: gcc/testsuite/g++.dg/pr78229.C === --- gcc/testsuite/g++.dg/pr78229.C (revision 0) +++ gcc/testsuite/g++.dg/pr78229.C (working copy) @@ -0,0 +1,24 @@ +/* { dg-do compile { target x86_64-*-* i?86-*-* } } */ +/* { dg-options "-O2 -mbmi -w" } */ + +void a(); +inline int b(int c) { +int d = c; +return __builtin_ia32_tzcnt_u32(d); +} +struct e {}; +int f, g, h; +void fn3() { +float j; +&j; + { + e k; + while (h) { + if (g == 0) + continue; + int i = b(g); + f = i; + } + a(); + } +}
Re: [PATCH] Fix PR78228
On Mon, Nov 07, 2016 at 01:17:25PM +0100, Richard Biener wrote: > > The following fixes phiopt to not introduce undefined behavior > in its abs replacement code in case we negate only positive values > in the original code. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > Richard. > > 2016-11-07 Richard Biener > > PR tree-optimization/78228 > * tree-ssa-phiopt.c (abs_replacement): Avoid introducing > undefined behavior. > > * gcc.dg/tree-ssa/phi-opt-15.c: New testcase. > > Index: gcc/tree-ssa-phiopt.c > === > --- gcc/tree-ssa-phiopt.c (revision 241891) > +++ gcc/tree-ssa-phiopt.c (working copy) > @@ -1453,6 +1453,14 @@ abs_replacement (basic_block cond_bb, ba >else > negate = false; > > + /* If the code negates only iff positive then make sure to not > + introduce undefined behavior when negating or computing the absolute. > + ??? We could use range info if present to check for arg1 == INT_MIN. > */ Perhaps just > + if (negate > + && (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg1)) > + && ! TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg1 { wide_int minv = TYPE_MIN_VALUE (TYPE_DOMAIN (TREE_TYPE (arg1))); if (!expr_not_equal_to (arg1, minv)) return false; } ? Jakub
Re: [PATCH] Fix PR78228
On Mon, 7 Nov 2016, Jakub Jelinek wrote: > On Mon, Nov 07, 2016 at 01:17:25PM +0100, Richard Biener wrote: > > > > The following fixes phiopt to not introduce undefined behavior > > in its abs replacement code in case we negate only positive values > > in the original code. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > > > Richard. > > > > 2016-11-07 Richard Biener > > > > PR tree-optimization/78228 > > * tree-ssa-phiopt.c (abs_replacement): Avoid introducing > > undefined behavior. > > > > * gcc.dg/tree-ssa/phi-opt-15.c: New testcase. > > > > Index: gcc/tree-ssa-phiopt.c > > === > > --- gcc/tree-ssa-phiopt.c (revision 241891) > > +++ gcc/tree-ssa-phiopt.c (working copy) > > @@ -1453,6 +1453,14 @@ abs_replacement (basic_block cond_bb, ba > >else > > negate = false; > > > > + /* If the code negates only iff positive then make sure to not > > + introduce undefined behavior when negating or computing the absolute. > > + ??? We could use range info if present to check for arg1 == INT_MIN. > > */ > > Perhaps just > > > + if (negate > > + && (ANY_INTEGRAL_TYPE_P (TREE_TYPE (arg1)) > > + && ! TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg1 > { > wide_int minv = TYPE_MIN_VALUE (TYPE_DOMAIN (TREE_TYPE (arg1))); > if (!expr_not_equal_to (arg1, minv)) > return false; > } > ? rather wi::min_value (TREE_TYPE (arg1), SIGNED) I guess. Didn't know of expr_not_equal_to, seems to be only used from i386.c at the moment. We can improve things on trunk but I'd prefer to be safe on the branch(es). Richard.
[PATCH] Fix PR78205 -- fix BB SLP "gap" handling
The following moves a overly conservative check that we do not access excess elements when vectorizing a BB to a place where we can do a better job with respect to the elements we actually use. This means that for the included testcase we are not confused by the read from c[4] but just do not vectorize the stores to x[0] and x[1]. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2016-11-07 Richard Biener PR tree-optimization/78205 * tree-vect-stmts.c (vectorizable_load): Move check whether we may run into gaps when BB vectorizing SLP permutations ... * tree-vect-slp.c (vect_supported_load_permutation_p): ... here where we can do a more precise check. * gcc.dg/vect/bb-slp-pr78205.c: New testcase. Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 241893) +++ gcc/tree-vect-stmts.c (working copy) @@ -6548,18 +6611,6 @@ vectorizable_load (gimple *stmt, gimple_ if (slp && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()) slp_perm = true; - /* ??? The following is overly pessimistic (as well as the loop - case above) in the case we can statically determine the excess -elements loaded are within the bounds of a decl that is accessed. -Likewise for BB vectorizations using masked loads is a possibility. */ - if (bb_vinfo && slp_perm && group_size % nunits != 0) - { - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "BB vectorization with gaps at the end of a load " - "is not supported\n"); - return false; - } - /* Invalidate assumptions made by dependence analysis when vectorization on the unrolled body effectively re-orders stmts. */ if (!PURE_SLP_STMT (stmt_info) Index: gcc/tree-vect-slp.c === --- gcc/tree-vect-slp.c (revision 241893) +++ gcc/tree-vect-slp.c (working copy) @@ -1459,6 +1459,25 @@ vect_supported_load_permutation_p (slp_i SLP_TREE_LOAD_PERMUTATION (node).release (); else { + stmt_vec_info group_info + = vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (node)[0]); + group_info = vinfo_for_stmt (GROUP_FIRST_ELEMENT (group_info)); + unsigned nunits + = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (group_info)); + unsigned k, maxk = 0; + FOR_EACH_VEC_ELT (SLP_TREE_LOAD_PERMUTATION (node), j, k) + if (k > maxk) + maxk = k; + /* In BB vectorization we may not actually use a loaded vector +accessing elements in excess of GROUP_SIZE. */ + if (maxk >= (GROUP_SIZE (group_info) & ~(nunits - 1))) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "BB vectorization with gaps at the end of " + "a load is not supported\n"); + return false; + } + /* Verify the permutation can be generated. */ vec tem; unsigned n_perms; Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c === --- gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c (revision 0) +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c (working copy) @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-additional-options "-fdump-tree-optimized" } */ + +double x[2], a[4], b[4], c[5]; + +void foo () +{ + a[0] = c[0]; + a[1] = c[1]; + a[2] = c[0]; + a[3] = c[1]; + b[0] = c[2]; + b[1] = c[3]; + b[2] = c[2]; + b[3] = c[3]; + x[0] = c[4]; + x[1] = c[4]; +} + +/* We may not vectorize the store to x[] as it accesses c out-of bounds + but we do want to vectorize the other two store groups. */ + +/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */ +/* { dg-final { scan-tree-dump-times "x\\\[\[0-1\]\\\] = " 2 "optimized" } } */
[patch,avr] Add new option -mabsdata.
This patch adds a new command line option -mabsdata which can be ised to set attribute absdata for all data in static storage so it can be accessed by LDS and STS instructions. This is only useful for some reduced Tiny devices like ATtiny40. For other reduced Tiny where all of SRAM fits LDS / STS, the new option is automatically set by the device specs file. For ordinary devices the option is accepted but has no effect. Ok for trunk? Johann gcc/ PR target/78093 * doc/invoke.texi (AVR Options) [-mabsdata]: Document new option. * config/avr/avr.opt (-mabsdata): New option. * config/avr/avr.c (avr_encode_section_info) [AVR_TINY]: If -mabsdata & symbol is not progmem, tag as AVR_SYMBOL_FLAG_TINY_ABSDATA. * config/avr/avr-mcus.def (attiny4/5/9/10/20): Use AVR_ISA_LDS. * config/avr/gen-avr-mmcu-specs.c (print_mcu): Print cc1_absdata spec depending on AVR_ISA_LDS. * config/avr/specs.h (CC1_SPEC): Enhanced by cc1_absdata spec. gcc/testsuite/ PR target/78093 * gcc.target/avr/torture/tiny-absdata-2.c: New test. Index: config/avr/avr-arch.h === --- config/avr/avr-arch.h (revision 241841) +++ config/avr/avr-arch.h (working copy) @@ -157,7 +157,9 @@ enum avr_device_specific_features AVR_ISA_NONE, AVR_ISA_RMW = 0x1, /* device has RMW instructions. */ AVR_SHORT_SP= 0x2, /* Stack Pointer has 8 bits width. */ - AVR_ERRATA_SKIP = 0x4 /* device has a core erratum. */ + AVR_ERRATA_SKIP = 0x4, /* device has a core erratum. */ + AVR_ISA_LDS = 0x8 /* whether LDS / STS is valid for all data in static +storage. Only useful for reduced Tiny. */ }; /* Map architecture to its texinfo string. */ Index: config/avr/avr-mcus.def === --- config/avr/avr-mcus.def (revision 241841) +++ config/avr/avr-mcus.def (working copy) @@ -341,11 +341,11 @@ AVR_MCU ("atxmega128a1u",ARCH_AVRXME AVR_MCU ("atxmega128a4u",ARCH_AVRXMEGA7, AVR_ISA_RMW, "__AVR_ATxmega128A4U__",0x2000, 0x0, 3) /* Tiny family */ AVR_MCU ("avrtiny", ARCH_AVRTINY, AVR_ISA_NONE, NULL, 0x0040, 0x0, 1) -AVR_MCU ("attiny4", ARCH_AVRTINY, AVR_ISA_NONE, "__AVR_ATtiny4__",0x0040, 0x0, 1) -AVR_MCU ("attiny5", ARCH_AVRTINY, AVR_ISA_NONE, "__AVR_ATtiny5__",0x0040, 0x0, 1) -AVR_MCU ("attiny9", ARCH_AVRTINY, AVR_ISA_NONE, "__AVR_ATtiny9__",0x0040, 0x0, 1) -AVR_MCU ("attiny10", ARCH_AVRTINY, AVR_ISA_NONE, "__AVR_ATtiny10__", 0x0040, 0x0, 1) -AVR_MCU ("attiny20", ARCH_AVRTINY, AVR_ISA_NONE, "__AVR_ATtiny20__", 0x0040, 0x0, 1) +AVR_MCU ("attiny4", ARCH_AVRTINY, AVR_ISA_LDS, "__AVR_ATtiny4__",0x0040, 0x0, 1) +AVR_MCU ("attiny5", ARCH_AVRTINY, AVR_ISA_LDS, "__AVR_ATtiny5__",0x0040, 0x0, 1) +AVR_MCU ("attiny9", ARCH_AVRTINY, AVR_ISA_LDS, "__AVR_ATtiny9__",0x0040, 0x0, 1) +AVR_MCU ("attiny10", ARCH_AVRTINY, AVR_ISA_LDS, "__AVR_ATtiny10__", 0x0040, 0x0, 1) +AVR_MCU ("attiny20", ARCH_AVRTINY, AVR_ISA_LDS, "__AVR_ATtiny20__", 0x0040, 0x0, 1) AVR_MCU ("attiny40", ARCH_AVRTINY, AVR_ISA_NONE, "__AVR_ATtiny40__", 0x0040, 0x0, 1) /* Assembler only. */ AVR_MCU ("avr1", ARCH_AVR1, AVR_ISA_NONE, NULL,0x0060, 0x0, 1) Index: config/avr/avr.c === --- config/avr/avr.c (revision 241841) +++ config/avr/avr.c (working copy) @@ -10182,14 +10182,18 @@ avr_encode_section_info (tree decl, rtx && SYMBOL_REF_P (XEXP (rtl, 0))) { rtx sym = XEXP (rtl, 0); + bool progmem_p = -1 == avr_progmem_p (decl, DECL_ATTRIBUTES (decl)); - if (-1 == avr_progmem_p (decl, DECL_ATTRIBUTES (decl))) + if (progmem_p) { // Tag symbols for later addition of 0x4000 (AVR_TINY_PM_OFFSET). SYMBOL_REF_FLAGS (sym) |= AVR_SYMBOL_FLAG_TINY_PM; } if (avr_decl_absdata_p (decl, DECL_ATTRIBUTES (decl)) + || (TARGET_ABSDATA + && !progmem_p + && !addr_attr) || (addr_attr // If addr_attr is non-null, it has an argument. Peek into it. && TREE_INT_CST_LOW (TREE_VALUE (TREE_VALUE (addr_attr))) < 0xc0)) @@ -10198,7 +10202,7 @@ avr_encode_section_info (tree decl, rtx SYMBOL_REF_FLAGS (sym) |= AVR_SYMBOL_FLAG_TINY_ABSDATA; } - if (-1 == avr_progmem_p (decl, DECL_ATTRIBUTES (decl)) + if (progmem_p && avr_decl_absdata_p (decl, DECL_ATTRIBUTES (decl))) { error ("%q+D has incompatible attributes %qs and %qs", Index: config/avr/avr.opt === --- config/avr/avr.opt (revision 241841) +
Re: [PATCH] combine lhs zero_extract fix (PR78186)
On 7 November 2016 at 10:14, Segher Boessenkool wrote: > Hi Christophe, > > On Fri, Nov 04, 2016 at 02:31:28PM +0100, Christophe Lyon wrote: >> Since this commit I have noticed execution failures on "old" arm targets: >> >> gcc.dg/torture/pr48124-4.c -O1 execution test >> gcc.dg/torture/pr48124-4.c -O2 execution test >> gcc.dg/torture/pr48124-4.c -O2 -flto -fno-use-linker-plugin >> -flto-partition=none execution test >> gcc.dg/torture/pr48124-4.c -O2 -flto -fuse-linker-plugin >> -fno-fat-lto-objects execution test >> gcc.dg/torture/pr48124-4.c -O3 -g execution test >> gcc.dg/torture/pr48124-4.c -Os execution test >> >> For instance on target arm-none-linux-gnueabi --with-cpu=cortex-a9 >> --with-mode=arm >> and running the tests with -march=armv5t > > Confirmed. What a nasty, nasty bug, and it has been here for decades > it seems. Could you please open a PR? > > Sure, I've created PR78232 for this. Thanks. Christophe > Segher
[PATCH] Fix -O0 AVX512 comparison ICE (PR target/78227)
Hi! The following testcases ICE at -O0, because ix86_expand_sse_cmp avoid using the passed in dest only if optimize or if there is some value overlap, but we actually need to do that also if we have a maskcmp where we want to use a different mode than dest has. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-11-07 Jakub Jelinek PR target/78227 * config/i386/i386.c (ix86_expand_sse_cmp): Force dest into cmp_mode argument even for -O0 if cmp_mode != mode and maskcmp. * gcc.target/i386/pr78227-1.c: New test. * gcc.target/i386/pr78227-2.c: New test. --- gcc/config/i386/i386.c.jj 2016-11-04 20:09:48.0 +0100 +++ gcc/config/i386/i386.c 2016-11-07 10:14:15.625018144 +0100 @@ -23561,6 +23561,7 @@ ix86_expand_sse_cmp (rtx dest, enum rtx_ cmp_op1 = force_reg (cmp_ops_mode, cmp_op1); if (optimize + || (cmp_mode != mode && maskcmp) || (op_true && reg_overlap_mentioned_p (dest, op_true)) || (op_false && reg_overlap_mentioned_p (dest, op_false))) dest = gen_reg_rtx (maskcmp ? cmp_mode : mode); --- gcc/testsuite/gcc.target/i386/pr78227-1.c.jj2016-11-07 10:15:52.606762613 +0100 +++ gcc/testsuite/gcc.target/i386/pr78227-1.c 2016-11-07 10:24:58.821480125 +0100 @@ -0,0 +1,30 @@ +/* PR target/78227 */ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O0 -Wno-psabi" } */ + +typedef int V __attribute__((vector_size (64))); +typedef long long int W __attribute__((vector_size (64))); + +V +foo1 (V v) +{ + return v > 0; +} + +V +bar1 (V v) +{ + return v != 0; +} + +W +foo2 (W w) +{ + return w > 0; +} + +W +bar2 (W w) +{ + return w != 0; +} --- gcc/testsuite/gcc.target/i386/pr78227-2.c.jj2016-11-07 10:22:17.055670476 +0100 +++ gcc/testsuite/gcc.target/i386/pr78227-2.c 2016-11-07 10:25:03.722413765 +0100 @@ -0,0 +1,30 @@ +/* PR target/78227 */ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -O0 -Wno-psabi" } */ + +typedef signed char V __attribute__((vector_size (64))); +typedef short int W __attribute__((vector_size (64))); + +V +foo1 (V v) +{ + return v > 0; +} + +V +bar1 (V v) +{ + return v != 0; +} + +W +foo2 (W w) +{ + return w > 0; +} + +W +bar2 (W w) +{ + return w != 0; +} Jakub
Re: [patch,avr] Add new option -mabsdata.
On 07.11.2016 13:54, Georg-Johann Lay wrote: This patch adds a new command line option -mabsdata which can be ised to set attribute absdata for all data in static storage so it can be accessed by LDS and STS instructions. This is only useful for some reduced Tiny devices like ATtiny40. For other reduced Tiny where all of SRAM fits LDS / STS, the new option is automatically set by the device specs file. For ordinary devices the option is accepted but has no effect. Ok for trunk? Johann gcc/ PR target/78093 * doc/invoke.texi (AVR Options) [-mabsdata]: Document new option. * config/avr/avr.opt (-mabsdata): New option. * config/avr/avr.c (avr_encode_section_info) [AVR_TINY]: If -mabsdata & symbol is not progmem, tag as AVR_SYMBOL_FLAG_TINY_ABSDATA. * config/avr/avr-mcus.def (attiny4/5/9/10/20): Use AVR_ISA_LDS. * config/avr/gen-avr-mmcu-specs.c (print_mcu): Print cc1_absdata spec depending on AVR_ISA_LDS. * config/avr/specs.h (CC1_SPEC): Enhanced by cc1_absdata spec. gcc/testsuite/ PR target/78093 * gcc.target/avr/torture/tiny-absdata-2.c: New test. Here is the complete lag entry (avr-arch.h was missing): gcc/ PR target/78093 * doc/invoke.texi (AVR Options) [-mabsdata]: Document new option. * config/avr/avr.opt (-mabsdata): New option. * config/avr/avr-arch.h (avr_device_specific_features): Add AVR_ISA_LDS. * config/avr/avr.c (avr_encode_section_info) [AVR_TINY]: If -mabsdata & symbol is not progmem, tag as AVR_SYMBOL_FLAG_TINY_ABSDATA. * config/avr/avr-mcus.def (attiny4/5/9/10/20): Use AVR_ISA_LDS. * config/avr/gen-avr-mmcu-specs.c (print_mcu): Print cc1_absdata spec depending on AVR_ISA_LDS. * config/avr/specs.h (CC1_SPEC): Enhanced by cc1_absdata spec. gcc/testsuite/ PR target/78093 * gcc.target/avr/torture/tiny-absdata-2.c: New test.
[committed] Move 3 gcc.target/i386/*.C tests
Hi! Richard noticed 3 misplaced tests - C++ tests don't belong into gcc.target/ which tests just C. I've bootstrapped/regtested this on x86_64-linux and i686-linux and committed to trunk as obvious. 2016-11-07 Jakub Jelinek PR middle-end/71529 * gcc.target/i386/pr71529.C: Moved to ... * g++.dg/opt/pr71529.C: ... here. New test. Guard for i?86/x86_64. PR target/64411 * gcc.target/i386/pr64411.C: Moved to ... * g++.dg/opt/pr64411.C: ... here. New test. Guard for i?86/x86_64 lp64. PR target/65105 * gcc.target/i386/pr65105-4.C: Moved to ... * g++.dg/opt/pr65105-4.C: ... here. New test. Guard for i?86/x86_64. Run into compile test rather than execute test. --- gcc/testsuite/gcc.target/i386/pr71529.C.jj 2016-06-15 19:09:09.0 +0200 +++ gcc/testsuite/gcc.target/i386/pr71529.C 2016-11-07 10:56:21.835713206 +0100 @@ -1,22 +0,0 @@ -/* PR71529 */ -/* { dg-do compile { target { ! x32 } } } */ -/* { dg-options "-fcheck-pointer-bounds -mmpx -O2" } */ - -class c1 -{ - public: - virtual ~c1 (); -}; - -class c2 -{ - public: - virtual ~c2 (); -}; - -class c3 : c1, c2 { }; - -int main (int, char **) -{ - c3 obj; -} --- gcc/testsuite/gcc.target/i386/pr64411.C.jj 2016-03-15 17:10:18.0 +0100 +++ gcc/testsuite/gcc.target/i386/pr64411.C 2016-11-07 10:54:34.485101960 +0100 @@ -1,27 +0,0 @@ -/* { dg-do compile } */ -/* { dg-options "-Os -mcmodel=medium -fPIC -fschedule-insns -fselective-scheduling" } */ - -typedef __SIZE_TYPE__ size_t; - -extern "C" long strtol () - { return 0; } - -static struct { - void *sp[2]; -} info; - -union S813 -{ - void * c[5]; -} -s813; - -S813 a813[5]; -S813 check813 (S813, S813 *, S813); - -void checkx813 () -{ - __builtin_memset (&s813, '\0', sizeof (s813)); - __builtin_memset (&info, '\0', sizeof (info)); - check813 (s813, &a813[1], a813[2]); -} --- gcc/testsuite/gcc.target/i386/pr65105-4.C.jj2015-10-11 19:11:14.214767354 +0200 +++ gcc/testsuite/gcc.target/i386/pr65105-4.C 2016-11-07 10:51:05.333808029 +0100 @@ -1,19 +0,0 @@ -/* PR target/pr65105 */ -/* { dg-do run { target { ia32 } } } */ -/* { dg-options "-O2 -march=slm" } */ - -struct s { - long long l1, l2, l3, l4, l5; -} *a; -long long b; -long long fn1() -{ - try -{ - b = (a->l1 | a->l2 | a->l3 | a->l4 | a->l5); - return a->l1; -} - catch (int) -{ -} -} --- gcc/testsuite/g++.dg/opt/pr71529.C.jj 2016-11-07 10:55:34.151330081 +0100 +++ gcc/testsuite/g++.dg/opt/pr71529.C 2016-11-07 10:56:13.319823373 +0100 @@ -0,0 +1,22 @@ +// PR middle-end/71529 +// { dg-do compile { target { { i?86-*-* x86_64-*-* } && { ! x32 } } } } +// { dg-options "-fcheck-pointer-bounds -mmpx -O2" } + +class c1 +{ + public: + virtual ~c1 (); +}; + +class c2 +{ + public: + virtual ~c2 (); +}; + +class c3 : c1, c2 { }; + +int main (int, char **) +{ + c3 obj; +} --- gcc/testsuite/g++.dg/opt/pr64411.C.jj 2016-11-07 10:51:38.557378145 +0100 +++ gcc/testsuite/g++.dg/opt/pr64411.C 2016-11-07 10:54:13.115378412 +0100 @@ -0,0 +1,28 @@ +// PR target/64411 +// { dg-do compile { target { { i?86-*-* x86_64-*-* } && lp64 } } } +// { dg-options "-Os -mcmodel=medium -fPIC -fschedule-insns -fselective-scheduling" } + +typedef __SIZE_TYPE__ size_t; + +extern "C" long strtol () + { return 0; } + +static struct { + void *sp[2]; +} info; + +union S813 +{ + void * c[5]; +} +s813; + +S813 a813[5]; +S813 check813 (S813, S813 *, S813); + +void checkx813 () +{ + __builtin_memset (&s813, '\0', sizeof (s813)); + __builtin_memset (&info, '\0', sizeof (info)); + check813 (s813, &a813[1], a813[2]); +} --- gcc/testsuite/g++.dg/opt/pr65105-4.C.jj 2016-11-07 10:48:58.587448018 +0100 +++ gcc/testsuite/g++.dg/opt/pr65105-4.C2016-11-07 10:50:52.066979690 +0100 @@ -0,0 +1,19 @@ +// PR target/65105 +// { dg-do compile { target { { i?86-*-* x86_64-*-* } && ia32 } } } +// { dg-options "-O2 -march=slm" } + +struct s { + long long l1, l2, l3, l4, l5; +} *a; +long long b; +long long fn1() +{ + try +{ + b = (a->l1 | a->l2 | a->l3 | a->l4 | a->l5); + return a->l1; +} + catch (int) +{ +} +} Jakub
Re: [PATCH] combine lhs zero_extract fix (PR78186)
On Mon, Nov 07, 2016 at 02:00:46PM +0100, Christophe Lyon wrote: > > Confirmed. What a nasty, nasty bug, and it has been here for decades > > it seems. Could you please open a PR? > > > Sure, I've created PR78232 for this. Thanks! I have a patch btw, it's regstrapping. Not sure it is fully correct (whether it handles all possible cases), but hey. Segher
[PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834, take 3)
On Fri, Nov 04, 2016 at 08:07:37PM +0100, Richard Biener wrote: > >If/once this is in, I'm planning to test/submit a patch adding > > /* If one decl is known to be a function or label in a function and > > the other is some kind of data, they can't overlap. */ > > if ((TREE_CODE (exprx) == FUNCTION_DECL > > || TREE_CODE (exprx) == LABEL_DECL) > > != (TREE_CODE (expry) == FUNCTION_DECL > > || TREE_CODE (expry) == LABEL_DECL)) > >return 1; > >before that. > > > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > OK for trunk and branches (if appropriate) And here is the incremental patch to disambiguate between code section objects and variables. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk (only)? 2016-11-07 Jakub Jelinek PR target/77834 * alias.c (nonoverlapping_memrefs_p): If one decl is FUNCTION_DECL or LABEL_DECL and the other is not, return 1. --- gcc/alias.c.jj 2016-11-04 20:13:32.0 +0100 +++ gcc/alias.c 2016-11-07 11:18:57.982160034 +0100 @@ -2755,6 +2755,14 @@ nonoverlapping_memrefs_p (const_rtx x, c || TREE_CODE (expry) == CONST_DECL) return 1; + /* If one decl is known to be a function or label in a function and + the other is some kind of data, they can't overlap. */ + if ((TREE_CODE (exprx) == FUNCTION_DECL + || TREE_CODE (exprx) == LABEL_DECL) + != (TREE_CODE (expry) == FUNCTION_DECL + || TREE_CODE (expry) == LABEL_DECL)) +return 1; + /* If either of the decls doesn't have DECL_RTL set (e.g. marked as living in multiple places), we can't tell anything. Exception are FUNCTION_DECLs for which we can create DECL_RTL on demand. */ @@ -2804,7 +2812,7 @@ nonoverlapping_memrefs_p (const_rtx x, c /* Offset based disambiguation not appropriate for loop invariant */ if (loop_invariant) -return 0; +return 0; /* Offset based disambiguation is OK even if we do not know that the declarations are necessarily different Jakub
Re: [PATCH] rs6000: Do swdiv at expand time
On Mon, Nov 7, 2016 at 4:32 AM, Segher Boessenkool wrote: > We transform floating point divide instructions to a faster series of > simple instructions, "swdiv". Currently we do not do that until the > first splitter pass, which is much too late for most optimisations > that can happen on those new instructions, e.g. the constant loads > are not CSEd inside an unrolled loop. This patch changes things so > those divide instructions are expanded during expand already. > > Bootstrapped and tested on powerpc64-linux; Bill has run SPEC on it, > and if anything it shows a slight improvement. > > Is this okay for trunk? Okay. But commenting on the ChangeLog entry is half the fun! - David
Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834, take 3)
On Mon, 7 Nov 2016, Jakub Jelinek wrote: > On Fri, Nov 04, 2016 at 08:07:37PM +0100, Richard Biener wrote: > > >If/once this is in, I'm planning to test/submit a patch adding > > > /* If one decl is known to be a function or label in a function and > > > the other is some kind of data, they can't overlap. */ > > > if ((TREE_CODE (exprx) == FUNCTION_DECL > > > || TREE_CODE (exprx) == LABEL_DECL) > > > != (TREE_CODE (expry) == FUNCTION_DECL > > > || TREE_CODE (expry) == LABEL_DECL)) > > >return 1; > > >before that. > > > > > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > > OK for trunk and branches (if appropriate) > > And here is the incremental patch to disambiguate between code section > objects and variables. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk (only)? Ok. Richard. > 2016-11-07 Jakub Jelinek > > PR target/77834 > * alias.c (nonoverlapping_memrefs_p): If one decl is > FUNCTION_DECL or LABEL_DECL and the other is not, return 1. > > --- gcc/alias.c.jj2016-11-04 20:13:32.0 +0100 > +++ gcc/alias.c 2016-11-07 11:18:57.982160034 +0100 > @@ -2755,6 +2755,14 @@ nonoverlapping_memrefs_p (const_rtx x, c >|| TREE_CODE (expry) == CONST_DECL) > return 1; > > + /* If one decl is known to be a function or label in a function and > + the other is some kind of data, they can't overlap. */ > + if ((TREE_CODE (exprx) == FUNCTION_DECL > + || TREE_CODE (exprx) == LABEL_DECL) > + != (TREE_CODE (expry) == FUNCTION_DECL > + || TREE_CODE (expry) == LABEL_DECL)) > +return 1; > + >/* If either of the decls doesn't have DECL_RTL set (e.g. marked as > living in multiple places), we can't tell anything. Exception > are FUNCTION_DECLs for which we can create DECL_RTL on demand. */ > @@ -2804,7 +2812,7 @@ nonoverlapping_memrefs_p (const_rtx x, c > >/* Offset based disambiguation not appropriate for loop invariant */ >if (loop_invariant) > -return 0; > +return 0; > >/* Offset based disambiguation is OK even if we do not know that the > declarations are necessarily different > > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
[PATCH][AArch64] Optimized implementation of search_line_fast for the CPP lexer
This patch contains an implementation of search_line_fast for the CPP lexer. It's based in part on the AArch32 (ARM) code but incorporates new instructions available in AArch64 (reduction add operations) plus some tricks for reducing the realignment overheads. We assume a page size of 4k, but that's a safe assumption -- AArch64 systems can never have a smaller page size than that: on systems with larger pages we will go through the realignment code more often than strictly necessary, but it's still likely to be in the noise (less than 0.5% of the time). Bootstrapped on aarch64-none-linux-gnu. Although this is AArch64 specific and therefore I don't think it requires approval from anyone else, I'll wait 24 hours for comments. * lex.c (search_line_fast): New implementation for AArch64. R. diff --git a/libcpp/lex.c b/libcpp/lex.c index 6f65fa1..cea8848 100644 --- a/libcpp/lex.c +++ b/libcpp/lex.c @@ -752,6 +752,101 @@ search_line_fast (const uchar *s, const uchar *end ATTRIBUTE_UNUSED) } } +#elif defined (__ARM_NEON) && defined (__ARM_64BIT_STATE) +#include "arm_neon.h" + +/* This doesn't have to be the exact page size, but no system may use + a size smaller than this. ARMv8 requires a minimum page size of + 4k. The impact of being conservative here is a small number of + cases will take the slightly slower entry path into the main + loop. */ + +#define AARCH64_MIN_PAGE_SIZE 4096 + +static const uchar * +search_line_fast (const uchar *s, const uchar *end ATTRIBUTE_UNUSED) +{ + const uint8x16_t repl_nl = vdupq_n_u8 ('\n'); + const uint8x16_t repl_cr = vdupq_n_u8 ('\r'); + const uint8x16_t repl_bs = vdupq_n_u8 ('\\'); + const uint8x16_t repl_qm = vdupq_n_u8 ('?'); + const uint8x16_t xmask = (uint8x16_t) vdupq_n_u64 (0x8040201008040201ULL); + +#ifdef __AARCH64EB + const int16x8_t shift = {8, 8, 8, 8, 0, 0, 0, 0}; +#else + const int16x8_t shift = {0, 0, 0, 0, 8, 8, 8, 8}; +#endif + + unsigned int found; + const uint8_t *p; + uint8x16_t data; + uint8x16_t t; + uint16x8_t m; + uint8x16_t u, v, w; + + /* Align the source pointer. */ + p = (const uint8_t *)((uintptr_t)s & -16); + + /* Assuming random string start positions, with a 4k page size we'll take + the slow path about 0.37% of the time. */ + if (__builtin_expect ((AARCH64_MIN_PAGE_SIZE +- (((uintptr_t) s) & (AARCH64_MIN_PAGE_SIZE - 1))) + < 16, 0)) +{ + /* Slow path: the string starts near a possible page boundary. */ + uint32_t misalign, mask; + + misalign = (uintptr_t)s & 15; + mask = (-1u << misalign) & 0x; + data = vld1q_u8 (p); + t = vceqq_u8 (data, repl_nl); + u = vceqq_u8 (data, repl_cr); + v = vorrq_u8 (t, vceqq_u8 (data, repl_bs)); + w = vorrq_u8 (u, vceqq_u8 (data, repl_qm)); + t = vorrq_u8 (v, w); + t = vandq_u8 (t, xmask); + m = vpaddlq_u8 (t); + m = vshlq_u16 (m, shift); + found = vaddvq_u16 (m); + found &= mask; + if (found) + return (const uchar*)p + __builtin_ctz (found); +} + else +{ + data = vld1q_u8 ((const uint8_t *) s); + t = vceqq_u8 (data, repl_nl); + u = vceqq_u8 (data, repl_cr); + v = vorrq_u8 (t, vceqq_u8 (data, repl_bs)); + w = vorrq_u8 (u, vceqq_u8 (data, repl_qm)); + t = vorrq_u8 (v, w); + if (__builtin_expect (vpaddd_u64 ((uint64x2_t)t), 0)) + goto done; +} + + do +{ + p += 16; + data = vld1q_u8 (p); + t = vceqq_u8 (data, repl_nl); + u = vceqq_u8 (data, repl_cr); + v = vorrq_u8 (t, vceqq_u8 (data, repl_bs)); + w = vorrq_u8 (u, vceqq_u8 (data, repl_qm)); + t = vorrq_u8 (v, w); +} while (!vpaddd_u64 ((uint64x2_t)t)); + +done: + /* Now that we've found the terminating substring, work out precisely where + we need to stop. */ + t = vandq_u8 (t, xmask); + m = vpaddlq_u8 (t); + m = vshlq_u16 (m, shift); + found = vaddvq_u16 (m); + return (uintptr_t) p) < (uintptr_t) s) ? s : (const uchar *)p) + + __builtin_ctz (found)); +} + #elif defined (__ARM_NEON) #include "arm_neon.h"
Re: Add missing symbols for versioned namespace
On 03/11/16 21:54 +0100, François Dumont wrote: Hi I might not be the right one to propose this patch as I am not sure that I fully understand gnu-versioned-namespace.ver organization. But with it following test failures when using versioned namespace vanish: FAIL: 20_util/allocator/overaligned.cc (test for excess errors) FAIL: ext/bitmap_allocator/overaligned.cc (test for excess errors) FAIL: ext/mt_allocator/overaligned.cc (test for excess errors) FAIL: ext/new_allocator/overaligned.cc (test for excess errors) FAIL: ext/pool_allocator/overaligned.cc (test for excess errors) Ok to commit ? This looks correct. OK for trunk, thanks.
[AArch64][GCC][PATCHv2 1/3] Add missing Poly64_t intrinsics to GCC
Hi all, This patch (1 of 3) adds the following NEON intrinsics to the Aarch64 back-end of GCC: * vsli_n_p64 * vsliq_n_p64 * vld1_p64 * vld1q_p64 * vld1_dup_p64 * vld1q_dup_p64 * vst1_p64 * vst1q_p64 * vld2_p64 * vld3_p64 * vld4_p64 * vld2q_p64 * vld3q_p64 * vld4q_p64 * vld2_dup_p64 * vld3_dup_p64james.greenha...@arm.com * vld4_dup_p64 * __aarch64_vdup_lane_p64 * __aarch64_vdup_laneq_p64 * __aarch64_vdupq_lane_p64 * __aarch64_vdupq_laneq_p64 * vget_lane_p64 * vgetq_lane_p64 * vreinterpret_p8_p64 * vreinterpretq_p8_p64 * vreinterpret_p16_p64 * vreinterpretq_p16_p64 * vreinterpret_p64_f16 * vreinterpret_p64_f64 * vreinterpret_p64_s8 * vreinterpret_p64_s16 * vreinterpret_p64_s32 * vreinterpret_p64_s64 * vreinterpret_p64_f32 * vreinterpret_p64_u8 * vreinterpret_p64_u16 * vreinterpret_p64_u32 * vreinterpret_p64_u64 * vreinterpret_p64_p8 * vreinterpretq_p64_f64 * vreinterpretq_p64_s8 * vreinterpretq_p64_s16 * vreinterpretq_p64_s32 * vreinterpretq_p64_s64 * vreinterpretq_p64_f16 * vreinterpretq_p64_f32 * vreinterpretq_p64_u8 * vreinterpretq_p64_u16 * vreinterpretq_p64_u32 * vreinterpretq_p64_u64 * vreinterpretq_p64_p8 * vreinterpret_f16_p64 * vreinterpretq_f16_p64 * vreinterpret_f32_p64 * vreinterpretq_f32_p64 * vreinterpret_f64_p64 * vreinterpretq_f64_p64 * vreinterpret_s64_p64 * vreinterpretq_s64_p64 * vreinterpret_u64_p64 * vreinterpretq_u64_p64 * vreinterpret_s8_p64 * vreinterpretq_s8_p64 * vreinterpret_s16_p64 * vreinterpret_s32_p64 * vreinterpretq_s32_p64 * vreinterpret_u8_p64 * vreinterpret_u16_p64 * vreinterpretq_u16_p64 * vreinterpret_u32_p64 * vreinterpretq_u32_p64 * vset_lane_p64 * vsetq_lane_p64 * vget_low_p64 * vget_high_p64 * vcombine_p64 * vcreate_p64 * vst2_lane_p64 * vst3_lane_p64 * vst4_lane_p64 * vst2q_lane_p64 * vst3q_lane_p64 * vst4q_lane_p64 * vget_lane_p64 * vget_laneq_p64 * vset_lane_p64 * vset_laneq_p64 * vcopy_lane_p64 * vcopy_laneq_p64 * vdup_n_p64 * vdupq_n_p64 * vdup_lane_p64 * vdup_laneq_p64 * vld1_p64 * vld1q_p64 * vld1_dup_p64 * vld1q_dup_p64 * vld1q_dup_p64 * vmov_n_p64 * vmovq_n_p64 * vst3q_p64 * vst4q_p64 * vld1_lane_p64 * vld1q_lane_p64 * vst1_lane_p64 * vst1q_lane_p64 * vcopy_laneq_p64 * vcopyq_laneq_p64 * vdupq_laneq_p64 Added new tests for these and ran regression tests on aarch64-none-linux-gnu and on arm-none-linux-gnueabihf. Ok for trunk? Thanks, Tamar gcc/ 2016-11-04 Tamar Christina * config/aarch64/aarch64-builtins.c (TYPES_SETREGP): Added poly type. (TYPES_GETREGP): Likewise. (TYPES_SHIFTINSERTP): Likewise. (TYPES_COMBINEP): Likewise. (TYPES_STORE1P): Likewise. * config/aarch64/aarch64-simd-builtins.def (combine): Added poly generator. (get_dregoi): Likewise. (get_dregci): Likewise. (get_dregxi): Likewise. (ssli_n): Likewise. (ld1): Likewise. (st1): Likewise. * config/aarch64/arm_neon.h (poly64x1x2_t, poly64x1x3_t): New. (poly64x1x4_t, poly64x2x2_t): Likewise. (poly64x2x3_t, poly64x2x4_t): Likewise. (poly64x1_t): Likewise. (vcreate_p64, vcombine_p64): Likewise. (vdup_n_p64, vdupq_n_p64): Likewise. (vld2_p64, vld2q_p64): Likewise. (vld3_p64, vld3q_p64): Likewise. (vld4_p64, vld4q_p64): Likewise. (vld2_dup_p64, vld3_dup_p64): Likewise. (vld4_dup_p64, vsli_n_p64): Likewise. (vsliq_n_p64, vst1_p64): Likewise. (vst1q_p64, vst2_p64): Likewise. (vst3_p64, vst4_p64): Likewise. (__aarch64_vdup_lane_p64, __aarch64_vdup_laneq_p64): Likewise. (__aarch64_vdupq_lane_p64, __aarch64_vdupq_laneq_p64): Likewise. (vget_lane_p64, vgetq_lane_p64): Likewise. (vreinterpret_p8_p64, vreinterpretq_p8_p64): Likewise. (vreinterpret_p16_p64, vreinterpretq_p16_p64): Likewise. (vreinterpret_p64_f16, vreinterpret_p64_f64): Likewise. (vreinterpret_p64_s8, vreinterpret_p64_s16): Likewise. (vreinterpret_p64_s32, vreinterpret_p64_s64): Likewise. (vreinterpret_p64_f32, vreinterpret_p64_u8): Likewise. (vreinterpret_p64_u16, vreinterpret_p64_u32): Likewise. (vreinterpret_p64_u64, vreinterpret_p64_p8): Likewise. (vreinterpretq_p64_f64, vreinterpretq_p64_s8): Likewise. (vreinterpretq_p64_s16, vreinterpretq_p64_s32): Likewise. (vreinterpretq_p64_s64, vreinterpretq_p64_f16): Likewise. (vreinterpretq_p64_f32, vreinterpretq_p64_u8): Likewise. (vreinterpretq_p64_u16, vreinterpretq_p64_u32): Likewise. (vreinterpretq_p64_u64, vreinterpretq_p64_p8): Likewise. (vreinterpret_f16_p64, vreinterpretq_f16_p64): Likewise. (vreinterpret_f32_p64, vreinterpretq_f32_p64): Likewise. (vreinterpret_f64_p64, vreinterpretq_f64_p64): Likewise. (vreinterpret_s64_p64, vreinterpretq_s64_p64): Likewise. (vreinterpret_u64_p64, vreinterpretq_u64_p64): Likewise. (vreinterpret_s8_p64, vreinterpretq_s8_p64): Li
[AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC
Hi all, This patch (3 of 3) adds updates tests for the NEON intrinsics added by the previous patches: Ran regression tests on aarch64-none-linux-gnu and on arm-none-linux-gnueabihf. Ok for trunk? Thanks, Tamar gcc/testsuite/ 2016-11-04 Tamar Christina * gcc.target/aarch64/advsimd-intrinsics/p64.c: New. * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (Poly64x1_t, Poly64x2_t): Added type. (AARCH64_ONLY): Added macro. * gcc.target/aarch64/advsimd-intrinsics/vcombine.c: Added test for Poly64. * gcc.target/aarch64/advsimd-intrinsics/vcreate.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vget_high.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vget_lane.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vget_low.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vldX.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vst1_lane.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vld1.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c: Added AArch64 flags. * gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c: Added Aarch64 flags.diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h index 462141586b3db7c5256c74b08fa0449210634226..174c1948221025b860aaac503354b406fa804007 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h @@ -32,6 +32,13 @@ extern size_t strlen(const char *); VECT_VAR(expected, int, 16, 4) -> expected_int16x4 VECT_VAR_DECL(expected, int, 16, 4) -> int16x4_t expected_int16x4 */ +/* Some instructions don't exist on ARM. + Use this macro to guard against them. */ +#ifdef __aarch64__ +#define AARCH64_ONLY(X) X +#else +#define AARCH64_ONLY(X) +#endif #define xSTR(X) #X #define STR(X) xSTR(X) @@ -92,6 +99,13 @@ extern size_t strlen(const char *); fprintf(stderr, "CHECKED %s %s\n", STR(VECT_TYPE(T, W, N)), MSG); \ } +#if defined (__ARM_FEATURE_CRYPTO) +#define CHECK_CRYPTO(MSG,T,W,N,FMT,EXPECTED,COMMENT) \ + CHECK(MSG,T,W,N,FMT,EXPECTED,COMMENT) +#else +#define CHECK_CRYPTO(MSG,T,W,N,FMT,EXPECTED,COMMENT) +#endif + /* Floating-point variant. */ #define CHECK_FP(MSG,T,W,N,FMT,EXPECTED,COMMENT) \ { \ @@ -184,6 +198,9 @@ extern ARRAY(expected, uint, 32, 2); extern ARRAY(expected, uint, 64, 1); extern ARRAY(expected, poly, 8, 8); extern ARRAY(expected, poly, 16, 4); +#if defined (__ARM_FEATURE_CRYPTO) +extern ARRAY(expected, poly, 64, 1); +#endif extern ARRAY(expected, hfloat, 16, 4); extern ARRAY(expected, hfloat, 32, 2); extern ARRAY(expected, hfloat, 64, 1); @@ -197,11 +214,14 @@ extern ARRAY(expected, uint, 32, 4); extern ARRAY(expected, uint, 64, 2); extern ARRAY(expected, poly, 8, 16); extern ARRAY(expected, poly, 16, 8); +#if defined (__ARM_FEATURE_CRYPTO) +extern ARRAY(expected, poly, 64, 2); +#endif extern ARRAY(expected, hfloat, 16, 8); extern ARRAY(expected, hfloat, 32, 4); extern ARRAY(expected, hfloat, 64, 2); -#define CHECK_RESULTS_NAMED_NO_FP16(test_name,EXPECTED,comment) \ +#define CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64(test_name,EXPECTED,comment) \ { \ CHECK(test_name, int, 8, 8, PRIx8, EXPECTED, comment); \ CHECK(test_name, int, 16, 4, PRIx16, EXPECTED, comment); \ @@ -228,6 +248,13 @@ extern ARRAY(expected, hfloat, 64, 2); CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment); \ } \ +#define CHECK_RESULTS_NAMED_NO_FP16(test_name,EXPECTED,comment) \ + { \ +CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64(test_name, EXPECTED, comment); \ +CHECK_CRYPTO(test_name, poly, 64, 1, PRIx64, EXPECTED, comment); \ +CHECK_CRYPTO(test_name, poly, 64, 2, PRIx64, EXPECTED, comment); \ + } \ + /* Check results against EXPECTED. Operates on all possible vector types. */ #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE) #define CHECK_RESULTS_NAMED(test_name,EXPECTED,comment) \ @@ -398,6 +425,9 @@ static void clean_results (void) CLEAN(result, uint, 64, 1); CLEAN(result, poly, 8, 8); CLEAN(result, poly, 16, 4); +#if defined (__ARM_FEATURE_CRYPTO) + CLEAN(result, poly, 64, 1); +#endif #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE) CLEAN(result, float, 16, 4); #endif @@ -413,6 +443,9 @@ static void clean_results (void) CLEAN(result, uint, 64, 2); CLEAN(resul
[ARM][GCC][PATCHv2 2/3] Add missing Poly64_t intrinsics to GCC
Hi all, This patch (2 of 3) adds the following NEON intrinsics to the ARM back-end of GCC: * vget_lane_p64 Added new tests for these and ran regression tests on aarch64-none-linux-gnu and on arm-none-linux-gnueabihf. Ok for trunk? Thanks, Tamar gcc/ 2016-11-04 Tamar Christina * config/arm/arm_neon.h (vget_lane_p64): New. diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 3898ff7302dc3f21e6b50a8a7b835033c1ae2021..ab29da74e0971cc09ee63b561ecc79e9762e3fb4 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -5411,6 +5411,15 @@ vget_lane_s64 (int64x1_t __a, const int __b) return (int64_t)__builtin_neon_vget_lanedi (__a, __b); } +#pragma GCC push_options +#pragma GCC target ("fpu=crypto-neon-fp-armv8") +__extension__ static __inline poly64_t __attribute__ ((__always_inline__)) +vget_lane_p64 (poly64x1_t __a, const int __b) +{ + return (poly64_t)__builtin_neon_vget_lanedi ((int64x1_t) __a, __b); +} + +#pragma GCC pop_options __extension__ static __inline uint64_t __attribute__ ((__always_inline__)) vget_lane_u64 (uint64x1_t __a, const int __b) {
[PATCH] Fix PR78224
The following fixes an ICE with call cdce where it fails to handle PHIs in the fallthru destination of a call with EH. My simple fix is to simply split the fallthru edge if the dest may contain PHI nodes. This may also remove the need to free dominance info (hope there's a testcase for that -- I'll leave the branches alone). Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2016-11-07 Richard Biener PR tree-optimization/78224 * tree-call-cdce.c (shrink_wrap_one_built_in_call_with_conds): Split the fallthru edge in case its successor may have PHIs. Do not free dominance info. * g++.dg/torture/pr78224.C: New testcase. Index: gcc/tree-call-cdce.c === --- gcc/tree-call-cdce.c(revision 241893) +++ gcc/tree-call-cdce.c(working copy) @@ -807,15 +807,20 @@ shrink_wrap_one_built_in_call_with_conds can_guard_call_p. */ join_tgt_in_edge_from_call = find_fallthru_edge (bi_call_bb->succs); gcc_assert (join_tgt_in_edge_from_call); - free_dominance_info (CDI_DOMINATORS); + /* We don't want to handle PHIs. */ + if (EDGE_COUNT (join_tgt_in_edge_from_call->dest->preds) > 1) + join_tgt_bb = split_edge (join_tgt_in_edge_from_call); + else + join_tgt_bb = join_tgt_in_edge_from_call->dest; } else -join_tgt_in_edge_from_call = split_block (bi_call_bb, bi_call); +{ + join_tgt_in_edge_from_call = split_block (bi_call_bb, bi_call); + join_tgt_bb = join_tgt_in_edge_from_call->dest; +} bi_call_bsi = gsi_for_stmt (bi_call); - join_tgt_bb = join_tgt_in_edge_from_call->dest; - /* Now it is time to insert the first conditional expression into bi_call_bb and split this bb so that bi_call is shrink-wrapped. */ Index: gcc/testsuite/g++.dg/torture/pr78224.C === --- gcc/testsuite/g++.dg/torture/pr78224.C (revision 0) +++ gcc/testsuite/g++.dg/torture/pr78224.C (working copy) @@ -0,0 +1,51 @@ +// { dg-do compile } + +extern "C"{ + float sqrtf(float); +} + +inline float squareroot(const float f) +{ + return sqrtf(f); +} + +inline int squareroot(const int f) +{ + return static_cast(sqrtf(static_cast(f))); +} + +template +class vector2d +{ +public: + vector2d(T nx, T ny) : X(nx), Y(ny) {} + T getLength() const { return squareroot( X*X + Y*Y ); } + T X; + T Y; +}; + +vector2d getMousePos(); + +class Client +{ +public: + Client(); + ~Client(); +}; + +void the_game(float turn_amount) +{ + Client client; + bool first = true; + + while (1) { + if (first) { +first = false; + } else { +int dx = getMousePos().X; +int dy = getMousePos().Y; + +turn_amount = vector2d(dx, dy).getLength(); + } + } +}
Re: [PATCH, GCC, wwwdocs] Document new Cortex-M23 and Cortex-M33 processors support in ARM backend
What about ARM maintainers? Best regards, Thomas On 04/11/16 22:16, Gerald Pfeifer wrote: On Fri, 4 Nov 2016, Thomas Preudhomme wrote: This patch document the newly added support in GCC 7 for Cortex-M23 and Cortex-M33 processors [1][2]. : Is this ok for ? Surely so for me. Gerald
Re: [PATCH, GCC, wwwdocs] Document new Cortex-M23 and Cortex-M33 processors support in ARM backend
On 07/11/16 14:00, Thomas Preudhomme wrote: What about ARM maintainers? Fine with me too. Thanks, Kyrill Best regards, Thomas On 04/11/16 22:16, Gerald Pfeifer wrote: On Fri, 4 Nov 2016, Thomas Preudhomme wrote: This patch document the newly added support in GCC 7 for Cortex-M23 and Cortex-M33 processors [1][2]. : Is this ok for ? Surely so for me. Gerald
[GCC][AArch64][PATCH][Testsuite] Fix failing test vector_initialization_nostack.c
Hi all, This fixes (PR78142) by turning off scheduling for the test. r241590 is causing more registers to be used and so the SP registered happens to be picked and used. This test I believe was checking explicitly that the SP is not used if not needed. Ran regression tests on aarch64-none-linux-gnu. Ok for trunk? Thanks, Tamar gcc/testsuite/ 2016-11-07 Tamar Christina PR middle-end/78142 * gcc.target/aarch64/vector_initialization_nostack.c (dg-options): Disabled scheduling.diff --git a/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c b/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c index bbad04d00263b6a91b826b4911af92bdd226c821..71699281c5ce79fb5cf37e47b8ba078721c19f3a 100644 --- a/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c +++ b/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -ftree-vectorize -fno-vect-cost-model" } */ +/* { dg-options "-O3 -ftree-vectorize -fno-vect-cost-model -fno-schedule-insns" } */ float arr_f[100][100]; float f9 (void)
Re: [PATCH][GCC/TESTSUITE] Make test for traditional-cpp depend on
Ping. From: gcc-patches-ow...@gcc.gnu.org on behalf of Tamar Christina Sent: Tuesday, November 1, 2016 3:46:07 PM To: GCC Patches; r...@cebitec.uni-bielefeld.de; mikest...@comcast.net Cc: nd Subject: [PATCH][GCC/TESTSUITE] Make test for traditional-cpp depend on Hi all, A glibc update recently broke this test by adding a CPP macro that uses the ## string function which traditional-cpp does not support. The change in glibc that made the test fail is from 6962682ffe5e5f0373047a0b894fee7a774be254. This fixes (PR78136) by changing the test to use a local include file instead of one from glibc. The intention of the test is to test that traditional-cpp does not expand values inside <> blocks of #includes. As such the include has to be included via <> syntax. To do this the .exp has been modified to add the test directory to the Include search path. Ran regression tests on aarch64-none-linux-gnu. Ok for trunk? Thanks, Tamar gcc/testsuite/ 2016-10-31 Tamar Christina PR testsuite/78136 * gcc.dg/cpp/trad/trad.exp (dg-runtest): Added $srcdir/$subdir/ to Include dirs. * gcc.dg/cpp/trad/include.c: Use local header file.
Re: New option -flimit-function-alignment
On 10/14/2016 08:28 PM, Bernd Schmidt wrote: On 10/12/2016 09:27 PM, Denys Vlasenko wrote: Yes, something like "if max_skip >= func_size, temporarily lower max_skip to func_size-1" (because otherwise we can create padding bigger-or-equal to the entire function in size, which is stupid - it's better to just put the function in that space). This would be a nice. That would be this patch. Bootstrapped and tested on x86_64-linux, ok? Ping. https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01187.html Bernd
[GCC][PATCH] Fix ada compile error on Windows x86_64 (committed as r241907 under the obvious rule)
Hi all, The changes in r240999 re-arranged includes and left out signal.h for Windows x86 builds. This breaks the build and prevents GCC builds from completing with messages such as: adaint.c:3317:19: error: 'SIGINT' undeclared (first use in this function); did you mean 'SAIT'? else if (sig == SIGINT) ^~ Bootstrapped successfully on x86_64-w64-mingw32. Committed as r241907. Thanks, Tamar gcc/ 2016-11-07 Tamar Christina * gcc/ada/adaint.c: Added signal.h for Windows.diff --git a/gcc/ada/adaint.c b/gcc/ada/adaint.c index 353914708adbdf301f9d59aaa55debfed469f901..819ea47e449725b08c1a531b340ddc6a74b0e5db 100644 --- a/gcc/ada/adaint.c +++ b/gcc/ada/adaint.c @@ -190,6 +190,7 @@ UINT CurrentCCSEncoding; #include #include #include +#include #undef DIR_SEPARATOR #define DIR_SEPARATOR '\\'
Re: [PATCH 0/2] strncmp builtin expansion improvement
On Sun, Nov 6, 2016 at 5:32 AM, Aaron Sawdey wrote: > On Fri, 2016-11-04 at 20:43 -0600, Jeff Law wrote: >> So what's the motivation here? When we don't have any constants >> then >> I'd think we'd be better off punting into the library. > > When none of the args to strncmp are constant, I'd be inclined to > agree. However the current state of affairs is that strncmp is not > expanded in the case where the length is a constant but the strings are > not. This patch allows the expansion to be attempted. > > The target's cmpstrnsi pattern can then make the decision of which > cases to expand and which cases to punt to the library. For instance RX > might always want to expand this for all cases as that target has an > instruction that is intended to map to strncmp. > > My particular motivation is that I'm working on a cmpstrnsi pattern for > powerpc64 and I want to have access to the case where the strings are > not constant but the length is. Your patchset doesn't contain a testcase so I really wonder which case we know the string length but it is not constant. Yes, there's COND_EXPR handling in c_strlen but that should be mostly dead code -- the real code should be using get_maxval_strlen or get_range_strlen but c_strlen does not use those. Ideally the str optabs would get profile data and alignment similar to the mem ones. Care to share a testcase? Thanks, Richard. > Thanks, >Aaron > > -- > Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com > 050-2/C113 (507) 253-7520 home: 507/263-0782 > IBM Linux Technology Center - PPC Toolchain >
[PATCH] Avoid peeling for gaps if accesses are aligned
Currently we force peeling for gaps whenever element overrun can occur but for aligned accesses we know that the loads won't trap and thus we can avoid this. Bootstrap and regtest running on x86_64-unknown-linux-gnu (I expect some testsuite fallout here so didn't bother to invent a new testcase). Just in case somebody thinks the overrun is a bad idea in general (even when not trapping). Like for ASAN or valgrind. Richard. 2016-11-07 Richard Biener * tree-vect-stmts.c (get_group_load_store_type): If the access is aligned do not trigger peeling for gaps. Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 241893) +++ gcc/tree-vect-stmts.c (working copy) @@ -1770,6 +1771,11 @@ get_group_load_store_type (gimple *stmt, " non-consecutive accesses\n"); return false; } + /* If the access is aligned an overrun is fine. */ + if (overrun_p + && aligned_access_p + (STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt + overrun_p = false; if (overrun_p && !can_overrun_p) { if (dump_enabled_p ()) @@ -1789,6 +1795,10 @@ get_group_load_store_type (gimple *stmt, /* If there is a gap at the end of the group then these optimizations would access excess elements in the last iteration. */ bool would_overrun_p = (gap != 0); + /* If the access is aligned an overrun is fine. */ + if (would_overrun_p + && aligned_access_p (STMT_VINFO_DATA_REF (stmt_info))) + would_overrun_p = false; if (!STMT_VINFO_STRIDED_P (stmt_info) && (can_overrun_p || !would_overrun_p) && compare_step_with_zero (stmt) > 0)
Re: [PATCH] Fix -O0 AVX512 comparison ICE (PR target/78227)
On Mon, Nov 7, 2016 at 2:02 PM, Jakub Jelinek wrote: > Hi! > > The following testcases ICE at -O0, because ix86_expand_sse_cmp avoid using > the passed in dest only if optimize or if there is some value overlap, but > we actually need to do that also if we have a maskcmp where we want to use > a different mode than dest has. > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2016-11-07 Jakub Jelinek > > PR target/78227 > * config/i386/i386.c (ix86_expand_sse_cmp): Force dest into > cmp_mode argument even for -O0 if cmp_mode != mode and maskcmp. > > * gcc.target/i386/pr78227-1.c: New test. > * gcc.target/i386/pr78227-2.c: New test. OK with a small nit, please see inline ... Thanks, Uros. > --- gcc/config/i386/i386.c.jj 2016-11-04 20:09:48.0 +0100 > +++ gcc/config/i386/i386.c 2016-11-07 10:14:15.625018144 +0100 > @@ -23561,6 +23561,7 @@ ix86_expand_sse_cmp (rtx dest, enum rtx_ > cmp_op1 = force_reg (cmp_ops_mode, cmp_op1); > >if (optimize > + || (cmp_mode != mode && maskcmp) Maybe beter to switch condition around, so: "(maskcmp && cmp_mode != mode)" >|| (op_true && reg_overlap_mentioned_p (dest, op_true)) >|| (op_false && reg_overlap_mentioned_p (dest, op_false))) > dest = gen_reg_rtx (maskcmp ? cmp_mode : mode); > --- gcc/testsuite/gcc.target/i386/pr78227-1.c.jj2016-11-07 > 10:15:52.606762613 +0100 > +++ gcc/testsuite/gcc.target/i386/pr78227-1.c 2016-11-07 10:24:58.821480125 > +0100 > @@ -0,0 +1,30 @@ > +/* PR target/78227 */ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -O0 -Wno-psabi" } */ > + > +typedef int V __attribute__((vector_size (64))); > +typedef long long int W __attribute__((vector_size (64))); > + > +V > +foo1 (V v) > +{ > + return v > 0; > +} > + > +V > +bar1 (V v) > +{ > + return v != 0; > +} > + > +W > +foo2 (W w) > +{ > + return w > 0; > +} > + > +W > +bar2 (W w) > +{ > + return w != 0; > +} > --- gcc/testsuite/gcc.target/i386/pr78227-2.c.jj2016-11-07 > 10:22:17.055670476 +0100 > +++ gcc/testsuite/gcc.target/i386/pr78227-2.c 2016-11-07 10:25:03.722413765 > +0100 > @@ -0,0 +1,30 @@ > +/* PR target/78227 */ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512bw -O0 -Wno-psabi" } */ > + > +typedef signed char V __attribute__((vector_size (64))); > +typedef short int W __attribute__((vector_size (64))); > + > +V > +foo1 (V v) > +{ > + return v > 0; > +} > + > +V > +bar1 (V v) > +{ > + return v != 0; > +} > + > +W > +foo2 (W w) > +{ > + return w > 0; > +} > + > +W > +bar2 (W w) > +{ > + return w != 0; > +} > > Jakub
Re: [RFA] Fix various PPC build failures due to int-in-boolean-context code
On Fri, Oct 28, 2016 at 09:12:29AM -0600, Jeff Law wrote: > > The PPC port is stumbling over the new integer in boolean context warnings. > > In particular this code from rs6000_option_override_internal is > problematical: > > HOST_WIDE_INT flags = ((TARGET_DEFAULT) ? TARGET_DEFAULT > : > processor_target_table[cpu_index].target_enable); > > The compiler is flagging the (TARGET_DEFAULT) condition. That's > supposed to to be a boolean. > > After all the macro expansions are done it ultimately looks something > like this: > > long flags = (((1L << 7)) ? (1L << 7) > : processor_target_table[cpu_index].target_enable); > > Note the (1L << 7) used as the condition for the ternary. That's what > has the int-in-boolean-context warning tripping. It's a false positive > IMHO. Hmm... From the warning's perspective it would look far less suspicious, if we make this an unsigned shift op. I looked at options.h and I think we could also use one bit more if the shift was unsigned. Furthermore there are macros TARGET_..._P which do not put brackets around the macro parameter. So how about this? Cross-compiler for powerpc-eabi builds without warning. Bootstrapped and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Bernd. 2016-11-07 Bernd Edlinger * opth-gen.awk: Use unsigned shifts for bit masks. Allow all bits to be used. Add brackets around macro argument. Index: gcc/opth-gen.awk === --- gcc/opth-gen.awk (revision 241884) +++ gcc/opth-gen.awk (working copy) @@ -350,11 +350,11 @@ for (i = 0; i < n_opts; i++) { mask_bits[name] = 1 vname = var_name(flags[i]) mask = "MASK_" - mask_1 = "1" + mask_1 = "1U" if (vname != "") { mask = "OPTION_MASK_" if (host_wide_int[vname] == "yes") -mask_1 = "HOST_WIDE_INT_1" +mask_1 = "HOST_WIDE_INT_1U" } else extra_mask_bits[name] = 1 print "#define " mask name " (" mask_1 " << " masknum[vname]++ ")" @@ -362,16 +362,16 @@ for (i = 0; i < n_opts; i++) { } for (i = 0; i < n_extra_masks; i++) { if (extra_mask_bits[extra_masks[i]] == 0) - print "#define MASK_" extra_masks[i] " (1 << " masknum[""]++ ")" + print "#define MASK_" extra_masks[i] " (1U << " masknum[""]++ ")" } for (var in masknum) { if (var != "" && host_wide_int[var] == "yes") { - print" #if defined(HOST_BITS_PER_WIDE_INT) && " masknum[var] " >= HOST_BITS_PER_WIDE_INT" + print "#if defined(HOST_BITS_PER_WIDE_INT) && " masknum[var] " > HOST_BITS_PER_WIDE_INT" print "#error too many masks for " var print "#endif" } - else if (masknum[var] > 31) { + else if (masknum[var] > 32) { if (var == "") print "#error too many target masks" else @@ -401,7 +401,7 @@ for (i = 0; i < n_opts; i++) { print "#define TARGET_" name \ " ((" vname " & " mask name ") != 0)" print "#define TARGET_" name "_P(" vname ")" \ - " ((" vname " & " mask name ") != 0)" + " (((" vname ") & " mask name ") != 0)" } } for (i = 0; i < n_extra_masks; i++) {
Re: Simplify X / X, 0 / X and X % X
On 11/07/2016 03:02 AM, Richard Biener wrote: On Sat, Nov 5, 2016 at 3:30 AM, Jeff Law wrote: On 11/04/2016 02:07 PM, Marc Glisse wrote: Hello, since we were discussing this recently... The condition is copied from the existing 0 % X case, visible in the context of the diff. As far as I understand, the main case where we do not want to optimize is during constexpr evaluation in the C++ front-end (it wants to detect the undefined behavior), and with late folding I think this means we only need to care about an explicit 0/0, not about X/X where X would become 0 after the simplification. And later, if we do have something like X/0, we could handle it the same way as we currently handle *(char*)0, insert a trap after that instruction and clear the following code, which likely gives better code than replacing 0/0 with 1. Yup. I'd prefer to insert a trap if we ultimately expose a division by zero -- including cases where that division occurs as a result of a PHI arg being zero and the PHI result being used as a denominator in a division expression. It ought to be extremely easy to detect & transform that case (and probably warn for it too). We have gimple-ssa-isolate-paths.c for that, right? Right. I was thinking about instrumenting for it today to see if it's worth any effort. It shouldn't take more than a few minutes once I refamiliarize myself with isolate-paths. jeff
Re: [PATCH] rtx_writer: avoid printing trailing default values
On Fri, 2016-11-04 at 20:40 +0100, Bernd Schmidt wrote: > On 11/04/2016 08:25 PM, David Malcolm wrote: > > > return m_compact; > > Ok with this one plus a comment. > Thanks. Using m_compact required turning the static function into a (private) member function. For reference, here's what I committed (r241908), having verified bootstrap®rtest. Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 241907) +++ gcc/ChangeLog (revision 241908) @@ -1,3 +1,16 @@ +2016-11-07 David Malcolm + + * print-rtl.c (rtx_writer::operand_has_default_value_p): New + method. + (rtx_writer::print_rtx): In compact mode, omit trailing operands + that have the default values. + * print-rtl.h (rtx_writer::operand_has_default_value_p): New + method. + * rtl-tests.c (selftest::test_dumping_insns): Remove empty + label string from expected dump. + (seltest::test_uncond_jump): Remove trailing "(nil)" for REG_NOTES + from expected dump. + 2016-11-07 Jakub Jelinek PR target/77834 Index: gcc/print-rtl.c === --- gcc/print-rtl.c (revision 241907) +++ gcc/print-rtl.c (revision 241908) @@ -564,6 +564,43 @@ } } +/* Subroutine of rtx_writer::print_rtx. + In compact mode, determine if operand IDX of IN_RTX is interesting + to dump, or (if in a trailing position) it can be omitted. */ + +bool +rtx_writer::operand_has_default_value_p (const_rtx in_rtx, int idx) +{ + const char *format_ptr = GET_RTX_FORMAT (GET_CODE (in_rtx)); + + switch (format_ptr[idx]) +{ +case 'e': +case 'u': + return XEXP (in_rtx, idx) == NULL_RTX; + +case 's': + return XSTR (in_rtx, idx) == NULL; + +case '0': + switch (GET_CODE (in_rtx)) + { + case JUMP_INSN: + /* JUMP_LABELs are always omitted in compact mode, so treat + any value here as omittable, so that earlier operands can + potentially be omitted also. */ + return m_compact; + + default: + return false; + + } + +default: + return false; +} +} + /* Print IN_RTX onto m_outfile. This is the recursive part of printing. */ void @@ -681,9 +718,18 @@ fprintf (m_outfile, " %d", INSN_UID (in_rtx)); } + /* Determine which is the final operand to print. + In compact mode, skip trailing operands that have the default values + e.g. trailing "(nil)" values. */ + int limit = GET_RTX_LENGTH (GET_CODE (in_rtx)); + if (m_compact) +while (limit > idx && operand_has_default_value_p (in_rtx, limit - 1)) + limit--; + /* Get the format string and skip the first elements if we have handled them already. */ - for (; idx < GET_RTX_LENGTH (GET_CODE (in_rtx)); idx++) + + for (; idx < limit; idx++) print_rtx_operand (in_rtx, idx); switch (GET_CODE (in_rtx)) Index: gcc/print-rtl.h === --- gcc/print-rtl.h (revision 241907) +++ gcc/print-rtl.h (revision 241908) @@ -39,6 +39,7 @@ void print_rtx_operand_code_r (const_rtx in_rtx); void print_rtx_operand_code_u (const_rtx in_rtx, int idx); void print_rtx_operand (const_rtx in_rtx, int idx); + bool operand_has_default_value_p (const_rtx in_rtx, int idx); private: FILE *m_outfile; Index: gcc/rtl-tests.c === --- gcc/rtl-tests.c (revision 241907) +++ gcc/rtl-tests.c (revision 241908) @@ -122,7 +122,7 @@ /* Labels. */ rtx_insn *label = gen_label_rtx (); CODE_LABEL_NUMBER (label) = 42; - ASSERT_RTL_DUMP_EQ ("(clabel 0 42 \"\")\n", label); + ASSERT_RTL_DUMP_EQ ("(clabel 0 42)\n", label); LABEL_NAME (label)= "some_label"; ASSERT_RTL_DUMP_EQ ("(clabel 0 42 (\"some_label\"))\n", label); @@ -176,8 +176,7 @@ ASSERT_TRUE (control_flow_insn_p (jump_insn)); ASSERT_RTL_DUMP_EQ ("(cjump_insn 1 (set (pc)\n" - "(label_ref 0))\n" - " (nil))\n", + "(label_ref 0)))\n", jump_insn); }
[patch, fortran, committed] Fill in some more locations
Hello world, I have committed the little patchlet below as obvious, after regression-testing. Regards Thomas 2016-11-07 Thomas Koenig PR fortran/78826 * match.c (gfc_match_select_type): Add where for expr1. * resolve.c (resolev_select_type): Add where for expr1 of new statement. Index: match.c === --- match.c (Revision 241887) +++ match.c (Arbeitskopie) @@ -5898,6 +5898,7 @@ gfc_match_select_type (void) { expr1 = gfc_get_expr (); expr1->expr_type = EXPR_VARIABLE; + expr1->where = expr2->where; if (gfc_get_sym_tree (name, NULL, &expr1->symtree, false)) { m = MATCH_ERROR; Index: resolve.c === --- resolve.c (Revision 241887) +++ resolve.c (Arbeitskopie) @@ -8857,6 +8857,7 @@ resolve_select_type (gfc_code *code, gfc_namespace new_st->expr1->value.function.actual = gfc_get_actual_arglist (); new_st->expr1->value.function.actual->expr = gfc_get_variable_expr (selector_expr->symtree); new_st->expr1->value.function.actual->expr->where = code->loc; + new_st->expr1->where = code->loc; gfc_add_vptr_component (new_st->expr1->value.function.actual->expr); vtab = gfc_find_derived_vtab (body->ext.block.case_list->ts.u.derived); st = gfc_find_symtree (vtab->ns->sym_root, vtab->name);
Re: [patch, fortran, committed] Fill in some more locations
Am 07.11.2016 um 16:25 schrieb Thomas Koenig: PR fortran/78826 ... should have been PR 78226.
[PING, PATCH] Do not simplify "(and (reg) (const bit))" to if_then_else.
Ping. https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02525.html On Mon, Oct 31, 2016 at 08:56:10PM +0100, Dominik Vogt wrote: > The attached patch does a little change in > combine.c:combine_simplify_rtx() to prevent a "simplification" > where the rtl code gets more complex in reality. The complete > description of the change can be found in the commit comment in > the attached patch. > > The patch reduces the number of patterns in the s390 backend and > slightly reduces the size of the compiled SPEC2006 code. (Code > size or runtime only tested on s390x with -m64.) It is > theoretically possible that this patch leads to somewhat worse > code on some target if that only has a pattern for the formerly replaced > rtl expression but not for the original one. > > The patch has passed the testsuite on s390, s390x biarch, x86_64 > and Power biarch. > > -- > > (I'm not sure whether the const_int expression can appear in both > operands or only as the second. If the latter is the case, the > conditions can be simplified a bit.) > > What do you think about this patch? Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Fix build of jit (was Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v3))
On Mon, 2016-11-07 at 11:03 +0100, Martin Liška wrote: > Hello. > > After discussion with Jakub, I'm resending new version of the patch, > where I changed following: > 1) gimplify_ctxp->live_switch_vars is used to track variables > introduced in switch_expr. Every time >a case_label_expr is seen, these are unpoisoned. It's quite > conservative, however it covers all >corner cases on can come up with. Compared to clang, we are much > more precise in switch statements >where a variable liveness crosses label boundary. > 2) I found a bug where ASAN_CHECK was optimized out due to missing > check of IFN_ASAN_MARK internal fn. >Test was added for that. > 3) Multiple switch tests have been added, which is going to be sent > in upcoming email. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression > tests (+ asan bootstrap finishes > successfully). The patch (r241896) introduced an error in the build of the jit: ../../src/gcc/jit/jit-builtins.c:62:1: error: invalid conversion from ‘int’ to ‘gcc::jit::built_in_attribute’ [-fpermissive] }; ^ which seems to be due to the "0" for ATTRS in: --- a/gcc/sanitizer.def +++ b/gcc/sanitizer.def @@ -165,6 +165,10 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_BEFORE_DYNAMIC_INIT, DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_AFTER_DYNAMIC_INIT, "__asan_after_dynamic_init", BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) +DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_CLOBBER_N, "__asan_poison_stack_memory", + BT_FN_VOID_PTR_PTRMODE, 0) +DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_UNCLOBBER_N, "__asan_unpoison_stack_memory", + BT_FN_VOID_PTR_PTRMODE, 0) Is the attached patch OK as a fix? (assuming testing passes) Or should these builtins have other attrs? (sorry, am not very familiar with the sanitizer code). Dave From 6db5f9e50dc95f504d33970ee553172bbf400ae7 Mon Sep 17 00:00:00 2001 From: David Malcolm Date: Mon, 7 Nov 2016 11:21:20 -0500 Subject: [PATCH] Fix build of jit gcc/ChangeLog: * asan.c (ATTR_NULL): Define. * sanitizer.def (BUILT_IN_ASAN_CLOBBER_N): Use ATTR_NULL rather than 0. (BUILT_IN_ASAN_UNCLOBBER_N): Likewise. --- gcc/asan.c| 2 ++ gcc/sanitizer.def | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/asan.c b/gcc/asan.c index 1e0ce8d..4a124cb 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -2463,6 +2463,8 @@ initialize_sanitizer_builtins (void) #define BT_FN_I16_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[4] #define BT_FN_I16_VPTR_I16_INT BT_FN_IX_VPTR_IX_INT[4] #define BT_FN_VOID_VPTR_I16_INT BT_FN_VOID_VPTR_IX_INT[4] +#undef ATTR_NULL +#define ATTR_NULL 0 #undef ATTR_NOTHROW_LEAF_LIST #define ATTR_NOTHROW_LEAF_LIST ECF_NOTHROW | ECF_LEAF #undef ATTR_TMPURE_NOTHROW_LEAF_LIST diff --git a/gcc/sanitizer.def b/gcc/sanitizer.def index 1c142e9..596b8b0 100644 --- a/gcc/sanitizer.def +++ b/gcc/sanitizer.def @@ -166,9 +166,9 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_AFTER_DYNAMIC_INIT, "__asan_after_dynamic_init", BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_CLOBBER_N, "__asan_poison_stack_memory", - BT_FN_VOID_PTR_PTRMODE, 0) + BT_FN_VOID_PTR_PTRMODE, ATTR_NULL) DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_UNCLOBBER_N, "__asan_unpoison_stack_memory", - BT_FN_VOID_PTR_PTRMODE, 0) + BT_FN_VOID_PTR_PTRMODE, ATTR_NULL) /* Thread Sanitizer */ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_INIT, "__tsan_init", -- 1.8.5.3
Re: [PATCH] Fix DSE not to consider calls as reads from function's body (PR target/77834)
On 11/04/2016 05:35 PM, Jakub Jelinek wrote: 2016-11-04 Jakub Jelinek PR target/77834 * dse.c (dse_step5): Call scan_reads even if just insn_info->frame_read. Improve and fix dump file messages. Sounds reasonable, and I checked and it seems not to change code generation for any .i files from my collection. So, OK. Bernd
Re: Fix build of jit (was Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v3))
On Mon, Nov 07, 2016 at 11:07:13AM -0500, David Malcolm wrote: > The patch (r241896) introduced an error in the build of the jit: > > ../../src/gcc/jit/jit-builtins.c:62:1: error: invalid conversion from > ‘int’ to ‘gcc::jit::built_in_attribute’ [-fpermissive] > }; > ^ > > which seems to be due to the "0" for ATTRS in: > > --- a/gcc/sanitizer.def > +++ b/gcc/sanitizer.def > @@ -165,6 +165,10 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_BEFORE_DYNAMIC_INIT, > DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_AFTER_DYNAMIC_INIT, > "__asan_after_dynamic_init", > BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST) > +DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_CLOBBER_N, "__asan_poison_stack_memory", > + BT_FN_VOID_PTR_PTRMODE, 0) > +DEF_SANITIZER_BUILTIN(BUILT_IN_ASAN_UNCLOBBER_N, > "__asan_unpoison_stack_memory", > + BT_FN_VOID_PTR_PTRMODE, 0) I believe the 0 here is a bug, I'd think we should be using something like ATTR_TMPURE_NOTHROW_LEAF_LIST that we are using __asan_load* - the functions aren't going to throw, nor call anything in the current TU. Not 100% sure about the TMPURE, after all they do write/read memory (the shadow one). So maybe ATTR_NOTHROW_LEAF_LIST instead for now? Martin? > Is the attached patch OK as a fix? (assuming testing passes) Or should > these builtins have other attrs? (sorry, am not very familiar with the > sanitizer code). Jakub
[PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN) with MIPS III.
Hi, The (HAS_MOVN) option should cause an upgrade to MIPS IV if the target is pre-MIPS IV. However, the upgrade condition checks for "$isa < 3", which means that we won't upgrade if we're targeting MIPS III. This results in failures for the movcc-{1,2,3}.c and branch-cost-2.c tests when the target is MIPS III. This patch fixes the condition to include MIPS III. Tested with mips-mti-elf. Regards, Toma Tabacu gcc/testsuite/ChangeLog: 2016-11-07 Toma Tabacu * gcc.target/mips/mips.exp (mips-dg-options): Upgrade to MIPS IV if using (HAS_MOVN) with MIPS III. diff --git a/gcc/testsuite/gcc.target/mips/mips.exp b/gcc/testsuite/gcc.target/mips/mips.exp index 39f44ff..e22d782 100644 --- a/gcc/testsuite/gcc.target/mips/mips.exp +++ b/gcc/testsuite/gcc.target/mips/mips.exp @@ -1129,7 +1129,7 @@ proc mips-dg-options { args } { # We need MIPS IV or higher for: # # - } elseif { $isa < 3 + } elseif { $isa < 4 && [mips_have_test_option_p options "HAS_MOVN"] } { mips_make_test_option options "-mips4" # We need MIPS III or higher for:
Re: [PATCH] Make direct emission of time profiler counter
On 7 November 2016 at 09:58, Martin Liška wrote: > On 11/05/2016 09:38 AM, Jan Hubicka wrote: >> Looks OK if it passes. >> >> Honza > > Thanks, fixed on trunk as r241894. > Martin Thanks, this fixed the problems I reported. Christophe
Re: [PATCH 0/2] strncmp builtin expansion improvement
On Mon, 2016-11-07 at 15:26 +0100, Richard Biener wrote: > Your patchset doesn't contain a testcase so I really wonder which > case > we know the string length but it is not constant. > > Yes, there's COND_EXPR handling in c_strlen but that should be mostly > dead code -- the real code should be using get_maxval_strlen or > get_range_strlen but c_strlen does not use those. > > Ideally the str optabs would get profile data and alignment similar > to > the mem ones. > > Care to share a testcase? I think I haven't explained this well. The case I am interested in is where the string arguments are indeed of unknown length, but the length argument to strncmp is a constant. This is the case that I'm attempting to address with this patch series. This is from the strncmp-1.c test case, but modified for a constant length argument to strncmp. #include #include #include void test (const unsigned char *s1, const unsigned char *s2, int expected) { register int value = strncmp ((char *) s1, (char *) s2, 5); if (expected < 0 && value >= 0) abort (); else if (expected == 0 && value != 0) abort (); else if (expected > 0 && value <= 0) abort (); } I added this small bit to builtins.c so we can see what happens: Index: gcc/builtins.c === --- gcc/builtins.c (revision 241911) +++ gcc/builtins.c (working copy) @@ -67,6 +67,7 @@ #include "internal-fn.h" #include "case-cfn-macros.h" #include "gimple-fold.h" +#include "print-tree.h" struct target_builtins default_target_builtins; @@ -3932,6 +3933,9 @@ len1 = c_strlen (arg1, 1); len2 = c_strlen (arg2, 1); +printf("len1 = %p len2 = %p\n",(void*)len1,(void*)len2); +debug_tree(arg3); + if (len1) len1 = size_binop_loc (loc, PLUS_EXPR, ssize_int (1), len1); if (len2) The output then is as follows: build/gcc/xgcc -B build/gcc -S -O1 strncmp-test.c len1 = (nil) len2 = (nil) constant 5> Looking in the .s file you can see that strncmp was not expanded. However the current code in i386.md for cmpstrnsi does not handle the case where the 0 byte in both strings may occur before the length given to strncmp. test: .LFB22: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movl%edx, %ebx movl$5, %edx callstrncmp movl%ebx, %edx I think it's pretty clear from the code in expand_builtin_strncmp that if len1 and len2 are both NULL, you end up with len=len2 and then it returns NULL_RTX. Thanks, Aaron -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain
Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy
On 11/03/2016 03:00 PM, Eric Botcazou wrote: FWIW here's a more complete version of my patch which I'm currently testing. Let me know if you think it's at least a good enough intermediate step to be installed. It is, thanks. Testing showed the same issue as Jiong found, so I've committed it with that extra tweak. Bernd
Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy
On 07/11/16 17:04, Bernd Schmidt wrote: On 11/03/2016 03:00 PM, Eric Botcazou wrote: FWIW here's a more complete version of my patch which I'm currently testing. Let me know if you think it's at least a good enough intermediate step to be installed. It is, thanks. Testing showed the same issue as Jiong found, so I've committed it with that extra tweak. Thanks very much! I have closed PR middle-end/78016 Regards, Jiong
Re: [match.pd] Fix for PR35691
On 7 November 2016 at 15:43, Richard Biener wrote: > On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: > >> On 4 November 2016 at 13:41, Richard Biener wrote: >> > On Thu, 3 Nov 2016, Marc Glisse wrote: >> > >> >> On Thu, 3 Nov 2016, Richard Biener wrote: >> >> >> >> > > > > The transform would also work for vectors (element_precision for >> >> > > > > the test but also a value-matching zero which should ensure the >> >> > > > > same number of elements). >> >> > > > Um sorry, I didn't get how to check vectors to be of equal length >> >> > > > by a >> >> > > > matching zero. >> >> > > > Could you please elaborate on that ? >> >> > > >> >> > > He may have meant something like: >> >> > > >> >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) >> >> > >> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was >> >> > the >> >> > point of the pattern. >> >> >> >> Oups, that's what I had written first, and then I somehow managed to >> >> confuse >> >> myself enough to remove it so as to remove the call to types_match :-( >> >> >> >> > > So the last operand is checked with operand_equal_p instead of >> >> > > integer_zerop. But the fact that we could compute bit_ior on the >> >> > > comparison results should already imply that the number of elements >> >> > > is the >> >> > > same. >> >> > >> >> > Though for equality compares we also allow scalar results IIRC. >> >> >> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate >> >> one >> >> for a testcase, at least until the GIMPLE FE lands... >> >> >> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe >> >> > > some >> >> > > vector mode on s390?), it would be cool to do the same for floats >> >> > > (most >> >> > > likely at the RTL level). >> >> > >> >> > On GIMPLE view-converts could come to the rescue here as well. Or we >> >> > cab >> >> > just allow bit-and/or on floats as much as we allow them on pointers. >> >> >> >> Would that generate sensible code on targets that do not have logic insns >> >> for >> >> floats? Actually, even on x86_64 that generates inefficient code, so there >> >> would be some work (for instance grep finds no gen_iordf3, only >> >> gen_iorv2df3). >> >> >> >> I am also a bit wary of doing those obfuscating optimizations too early... >> >> a==0 is something that other optimizations might use. long >> >> c=(long&)a|(long&)b; (double&)c==0; less so... >> >> >> >> (and I am assuming that signaling NaNs don't make the whole transformation >> >> impossible, which might be wrong) >> > >> > Yeah. I also think it's not so much important - I just wanted to mention >> > vectors... >> > >> > Btw, I still think we need a more sensible infrastructure for passes >> > to gather, analyze and modify complex conditions. (I'm always pointing >> > to tree-affine.c as an, albeit not very good, example for handling >> > a similar problem) >> Thanks for mentioning the value-matching capture @@, I wasn't aware of >> this match.pd feature. >> The current patch keeps it restricted to only bitwise operators on integers. >> Bootstrap+test running on x86_64-unknown-linux-gnu. >> OK to commit if passes ? > > +/* PR35691: Transform > + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. > + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ > + > > Please omit the vertical space > > +(for bitop (bit_and bit_ior) > + cmp (eq ne) > + (simplify > + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) > > if you capture the first integer_zerop as @2 then you can re-use it... > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) > + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE > (@1))) > +(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); > > ... here inplace of the { build_zero_cst ... }. > > Ok with that changes. Thanks, committed the attached version as r241915. > > Richard. 2016-11-07 Prathamesh Kulkarni PR middle-end/35691 * match.pd: Add following two patterns: (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. testsuite/ * gcc.dg/pr35691-1.c: New test-case. * gcc.dg/pr35691-4.c: Likewise. diff --git a/gcc/match.pd b/gcc/match.pd index 48f7351..29ddcd8 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -519,6 +519,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (TYPE_UNSIGNED (type)) (bit_and @0 (bit_not (lshift { build_all_ones_cst (type); } @1) +/* PR35691: Transform + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ +(for bitop (bit_and bit_ior) + cmp (eq ne) + (simplify + (bitop (cmp @0 integer_zerop@2) (cmp @1 integer_zerop)) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))) +(cmp (bit_ior @0 (convert @1)) @2 + /* Fold (A
Re: [match.pd] Fix for PR35691
On 7 November 2016 at 23:06, Prathamesh Kulkarni wrote: > On 7 November 2016 at 15:43, Richard Biener wrote: >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: >> >>> On 4 November 2016 at 13:41, Richard Biener wrote: >>> > On Thu, 3 Nov 2016, Marc Glisse wrote: >>> > >>> >> On Thu, 3 Nov 2016, Richard Biener wrote: >>> >> >>> >> > > > > The transform would also work for vectors (element_precision for >>> >> > > > > the test but also a value-matching zero which should ensure the >>> >> > > > > same number of elements). >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length >>> >> > > > by a >>> >> > > > matching zero. >>> >> > > > Could you please elaborate on that ? >>> >> > > >>> >> > > He may have meant something like: >>> >> > > >>> >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) >>> >> > >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which >>> >> > was the >>> >> > point of the pattern. >>> >> >>> >> Oups, that's what I had written first, and then I somehow managed to >>> >> confuse >>> >> myself enough to remove it so as to remove the call to types_match :-( >>> >> >>> >> > > So the last operand is checked with operand_equal_p instead of >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the >>> >> > > comparison results should already imply that the number of elements >>> >> > > is the >>> >> > > same. >>> >> > >>> >> > Though for equality compares we also allow scalar results IIRC. >>> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate >>> >> one >>> >> for a testcase, at least until the GIMPLE FE lands... >>> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe >>> >> > > some >>> >> > > vector mode on s390?), it would be cool to do the same for floats >>> >> > > (most >>> >> > > likely at the RTL level). >>> >> > >>> >> > On GIMPLE view-converts could come to the rescue here as well. Or we >>> >> > cab >>> >> > just allow bit-and/or on floats as much as we allow them on pointers. >>> >> >>> >> Would that generate sensible code on targets that do not have logic >>> >> insns for >>> >> floats? Actually, even on x86_64 that generates inefficient code, so >>> >> there >>> >> would be some work (for instance grep finds no gen_iordf3, only >>> >> gen_iorv2df3). >>> >> >>> >> I am also a bit wary of doing those obfuscating optimizations too >>> >> early... >>> >> a==0 is something that other optimizations might use. long >>> >> c=(long&)a|(long&)b; (double&)c==0; less so... >>> >> >>> >> (and I am assuming that signaling NaNs don't make the whole >>> >> transformation >>> >> impossible, which might be wrong) >>> > >>> > Yeah. I also think it's not so much important - I just wanted to mention >>> > vectors... >>> > >>> > Btw, I still think we need a more sensible infrastructure for passes >>> > to gather, analyze and modify complex conditions. (I'm always pointing >>> > to tree-affine.c as an, albeit not very good, example for handling >>> > a similar problem) >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of >>> this match.pd feature. >>> The current patch keeps it restricted to only bitwise operators on integers. >>> Bootstrap+test running on x86_64-unknown-linux-gnu. >>> OK to commit if passes ? >> >> +/* PR35691: Transform >> + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. >> + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ >> + >> >> Please omit the vertical space >> >> +(for bitop (bit_and bit_ior) >> + cmp (eq ne) >> + (simplify >> + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) >> >> if you capture the first integer_zerop as @2 then you can re-use it... >> >> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) >> + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) >> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE >> (@1))) >> +(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); >> >> ... here inplace of the { build_zero_cst ... }. >> >> Ok with that changes. > Thanks, committed the attached version as r241915. ugh, the svn commit message has: testsuite/ * gcc.dg/pr35691-1.c: New test-case. * gcc.dg/pr35691-4.c: Likewise. pr35691-4.c was a typo, should be pr35691-2.c :/ However testsuite/ChangeLog correctly has entry for pr35691-2.c Is it possible to edit the commit message for r241915 ? Sorry about this. Regards, Prathamesh > >> >> Richard.
Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker
On Nov 6, 2016, at 11:39 AM, Iain Sandoe wrote: > This is an initial patch in a series that converts Darwin's configury to > detect ld64 features, rather than the current process of hard-coding them on > target system version. So, I really do hate to ask, but does this have to be a config option? Normally, we'd just have configure examine things by itself. For canadian crosses, there should be enough state present to key off of directly, specially if they are wired up to work. I've rather have the thing that doesn't just work without that config flag, just work. I'd like to think I can figure how how to make it just work, if given an idea of what doesn't actually work. Essentially, you do the operation that doesn't work, detect it failed to work, then the you know it didn't work.
Re: [PATCH][GCC/TESTSUITE] Make test for traditional-cpp depend on
On Nov 1, 2016, at 8:46 AM, Tamar Christina wrote: > > A glibc update recently broke this test by adding a CPP > macro that uses the ## string function which traditional-cpp > does not support. > The change in glibc that made the test fail is from > 6962682ffe5e5f0373047a0b894fee7a774be254. > > This fixes (PR78136) by changing the test to use a local > include file instead of one from glibc. > The intention of the test is to test that traditional-cpp does > not expand values inside <> blocks of #includes. > As such the include has to be included via <> syntax. To do this > the .exp has been modified to add the test directory to the > Include search path. > > Ran regression tests on aarch64-none-linux-gnu. > > Ok for trunk? Ok. Can you remove the comment: Newlib uses ## when including stdlib.h as of 2007-09-07. while you are at it? I think it doesn't make any sense post the change unless one reads history. > 2016-10-31 Tamar Christina > > PR testsuite/78136 > * gcc.dg/cpp/trad/trad.exp > (dg-runtest): Added $srcdir/$subdir/ to Include dirs. > * gcc.dg/cpp/trad/include.c: Use local header > file.
Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker
> On 7 Nov 2016, at 09:51, Mike Stump wrote: > > [ possible dup ] > >> Begin forwarded message: >> >> From: Mike Stump >> Subject: Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to >> be detected as Darwin's linker >> Date: November 7, 2016 at 9:48:53 AM PST >> To: Iain Sandoe >> Cc: GCC Patches , Jeff Law >> >> On Nov 6, 2016, at 11:39 AM, Iain Sandoe wrote: >>> This is an initial patch in a series that converts Darwin's configury to >>> detect ld64 features, rather than the current process of hard-coding them >>> on target system version. >> >> So, I really do hate to ask, but does this have to be a config option? >> Normally, we'd just have configure examine things by itself. For canadian >> crosses, there should be enough state present to key off of directly, >> specially if they are wired up to work. >> >> I've rather have the thing that doesn't just work without that config flag, >> just work. I'd like to think I can figure how how to make it just work, if >> given an idea of what doesn't actually work. >> >> Essentially, you do the operation that doesn't work, detect it failed to >> work, then the you know it didn't work. Well, if you can run the tool, that’s fine - I wanted to cover the base where we have a native or canadian that’s using a newer ld64 than is installed by the ‘last available xcode’ on a given platform - which is the common case (since the older versions of ld64 in particular don’t really support the features we want, they def. won’t support building LLVM for ex.). I am *really really* trying to get away from the assumption that darwinNN implies some ld64 capability - because that’s just wrong, really - makes way too many assuptions. I also want to get to the “end game” that we just configure *-*-darwin and use the cross-capability of the toolchain (we’re a ways away from that upstream, but my local patch set acheives it at least for 5.4 and 6.2). It’s true that adding configure options is not #1 choice in life - but I think darwin is getting to the stage where there are too many choices to cover without. Open to alternate suggestions, of course Iain
Re: [PATCH fix PR71767 1/4 : ld64 atoms] Make PIC indirections and constant labels linker-visible.
On Nov 6, 2016, at 11:37 AM, Iain Sandoe wrote: > OK for trunk? > OK for open branches? Ok. > 2016-11-06 Iain Sandoe > > PR target/71767 > * config/darwin.c (imachopic_indirection_name): Make data section > indirections > linker-visible. > * config/darwin.h (ASM_GENERATE_INTERNAL_LABEL): Make local constant > labels linker-visible.
Re: [PATCH fix PR71767 4/4 : testsuite] Fix testsuite fallout from section and linker sym visibility changes.
On Nov 6, 2016, at 11:41 AM, Iain Sandoe wrote: > OK for trunk (after the relevant patches are applied)? > OK for open branches (likewise)? Ok. > PR target/71767 > > * g++.dg/abi/key2.C: Adjust for changed Darwin sections and > linker-visible symbols. > * g++.dg/torture/darwin-cfstring-3.C: Likewise. > * gcc.dg/const-uniq-1.c: Likewise. > * gcc.dg/torture/darwin-cfstring-3.c: Likewise. > * gcc.target/i386/pr70799-1.c: Likewise.
Re: [PATCH fix PR71767 3/4 : Darwin sections] Fix PR71767 - adjust the sections used in response to ld64 version.
On Nov 6, 2016, at 11:40 AM, Iain Sandoe wrote: > > OK for trunk? > OK for open branches? Ok. > 2016-11-06 Iain Sandoe > > PR target/71767 > * config/darwin-sections.def (picbase_thunk_section): New. > * config/darwin.c (darwin_init_sections): Set up picbase thunk section. > (darwin_rodata_section, darwin_objc2_section, machopic_select_section, > darwin_asm_declare_constant_name, darwin_emit_weak_or_comdat, > darwin_function_section): Don’t use coalesced with newer linkers. > (darwin_override_options): Decide on usage of coalesed sections on the > basis of the target linker version. > * config/darwin.h (MIN_LD64_NO_COAL_SECTS): New. > * config/darwin.opt (mtarget-linker): New. > * config/i386/i386.c (ix86_code_end): Do not force the thunks into a > coalesced > section, instead use a thunks section.
Re: [PATCH, Darwin] fix for PR67710 : Update 'as' specs and inputs to handle newer assembler versions.
On Nov 6, 2016, at 12:53 PM, Iain Sandoe wrote: > OK for trunk? > OK for open branches? Ok. > 2016-11-06 Iain Sandoe > Rainer Orth > > target/PR67710 > * config.in: Regenerate > * config/darwin-driver.c (darwin_driver_init): Emit a version string > for the assembler. > * config/darwin.h(ASM_MMACOSX_VERSION_MIN_SPEC): New, new tests. > * config/darwin.opt(asm_macosx_version_min): New. > * config/i386/darwin.h: Handle ASM_MMACOSX_VERSION_MIN_SPEC. > * configure: Regenerate > * configure.ac: Check for mmacosx-version-min handling. > > gcc/testsuite/ > > 2016-11-06 Iain Sandoe > Rainer Orth > > target/PR67710 > * gcc.dg/darwin-minversion-1.c: Update min version check. > * gcc.dg/darwin-minversion-2.c: Likewise. > * gcc.dg/darwin-minversion-3.c: Likewise. > > libgcc/ > > 2016-11-06 Iain Sandoe > Rainer Orth > > target/PR67710 > * libgcc/config/t-darwin: Default builds to 10.5 codegen.
Re: [PATCH, Darwin] Fix PR57438 by avoiding empty function bodies and trailing labels.
On Nov 6, 2016, at 12:13 PM, Iain Sandoe wrote: > > OK for trunk? > OK for open branches? For the darwin parts, Ok. > 2016-11-06 Iain Sandoe > > PR target/57438 > * config/i386/i386.c (ix86_code_end): Note that we emitted code where > the > function might otherwise appear empty for picbase thunks. > (ix86_output_function_epilogue): If we find a zero-sized function > assume that > reaching it is UB and trap. If we find a trailing label append a nop. > * config/rs6000/rs6000.c (rs6000_output_function_epilogue): If we find > a zero-sized function assume that reaching it is UB and trap. If we > find a > trailing label, append a nop. > > gcc/testsuite/ > > 2016-11-06 Iain Sandoe > > PR target/57438 > * gcc.dg/pr57438-1.c: New. > * gcc.dg/pr57438-2.c: New.
Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker
On Sun, 6 Nov 2016, Iain Sandoe wrote: > This adds an option --with-ld64[=version] that allows the configurer to New configure options should be documented in install.texi. -- Joseph S. Myers jos...@codesourcery.com
[hsa-branch] Append UID to local variable names
Hi, when looking at stuff to merge to trunk, I have found out that this patch has slipped thorough the cracks. It adds the UID to names of private symbols so that variables with the same name but different scope, particularly OpenMP re-mapped ones, do not clash. Committed to the hsa branch, will include it in the merge to trunk too. Thanks, Martin 2016-11-07 Martin Jambor * hsa-gen.c (hsa_get_declaration_name): Append UID to local variable names. --- gcc/hsa-gen.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c index b6e8345..f138434 100644 --- a/gcc/hsa-gen.c +++ b/gcc/hsa-gen.c @@ -781,7 +781,8 @@ hsa_needs_cvt (BrigType16_t dtype, BrigType16_t stype) return false; } -/* Return declaration name if exists. */ +/* Return declaration name if it exists or create one from UID if it does not. + If DECL is a local variable, make UID part of its name. */ const char * hsa_get_declaration_name (tree decl) @@ -789,7 +790,7 @@ hsa_get_declaration_name (tree decl) if (!DECL_NAME (decl)) { char buf[64]; - snprintf (buf, 64, "__hsa_anon_%i", DECL_UID (decl)); + snprintf (buf, 64, "__hsa_anon_%u", DECL_UID (decl)); size_t len = strlen (buf); char *copy = (char *) obstack_alloc (&hsa_obstack, len + 1); memcpy (copy, buf, len + 1); @@ -808,7 +809,19 @@ hsa_get_declaration_name (tree decl) if (name[0] == '*') name++; - return name; + if ((TREE_CODE (decl) == VAR_DECL) + && decl_function_context (decl)) +{ + size_t len = strlen (name); + char *buf = (char *) alloca (len + 32); + snprintf (buf, len + 32, "%s_%u", name, DECL_UID (decl)); + len = strlen (buf); + char *copy = (char *) obstack_alloc (&hsa_obstack, len + 1); + memcpy (copy, buf, len + 1); + return copy; +} + else +return name; } /* Lookup or create the associated hsa_symbol structure with a given VAR_DECL -- 2.10.1
[hsa-branch] Remove superfluous lastprivate check
Hi, this is another simple cleanup that I forgot to commit, which just removes a lastprivate check (which hsa now can handle) at a place where it cannot ever be anyway. Committed to the hsa branch, will include it in the pile of OpenMP stuff to request to merge to trunk later this week. Thanks, Martin 2016-11-07 Martin Jambor * omp-low.c (grid_target_follows_gridifiable_pattern): Do not check for lastprivate clause on teams construct. --- gcc/omp-low.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index ac87a91..65b0ddc 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -17972,13 +17972,6 @@ grid_target_follows_gridifiable_pattern (gomp_target *target, grid_prop *grid) "clause is present\n "); return false; - case OMP_CLAUSE_LASTPRIVATE: - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, tloc, -GRID_MISSED_MSG_PREFIX "a lastprivate " -"clause is present\n "); - return false; - case OMP_CLAUSE_THREAD_LIMIT: if (!integer_zerop (OMP_CLAUSE_OPERAND (clauses, 0))) group_size = OMP_CLAUSE_OPERAND (clauses, 0); -- 2.10.1
Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker
On 11/07/2016 10:48 AM, Mike Stump wrote: On Nov 6, 2016, at 11:39 AM, Iain Sandoe wrote: This is an initial patch in a series that converts Darwin's configury to detect ld64 features, rather than the current process of hard-coding them on target system version. So, I really do hate to ask, but does this have to be a config option? Normally, we'd just have configure examine things by itself. For canadian crosses, there should be enough state present to key off of directly, specially if they are wired up to work. I've rather have the thing that doesn't just work without that config flag, just work. I'd like to think I can figure how how to make it just work, if given an idea of what doesn't actually work. Essentially, you do the operation that doesn't work, detect it failed to work, then the you know it didn't work. But how is that supposed to work in a cross environment when he can't directly query the linker's behavior? In an ideal world we could trivially query the linker's behavior prior to invocation. But we don't have that kind of infrastructure in place. ISTM the way to go is to have a configure test to try and DTRT automatically for native builds and a flag to set for crosses (or potentially override the configure test). Jeff
[PATCH] A special predicate for type size equality
Hi, this has been in my TODO list for at least two years, probably longer, although I do no longer remember why I added it there. The idea is to introduce a special wrapper around operands_equal_p for TYPE_SIZE comparisons, which would try simple pointer equality before calling more complex operand_equal_p (TYPE_SIZE (t1), TYPE_SIZE (t2), 0), because when equal, the sizes are most likely going to be the same tree anyway. All users also test whether both TYPE_SIZEs are NULL, most of them to test for known size equality, but unfortunately there is one (ODR warning) that tests for known inequality. Nevertheless, the former use case seems so much natural that I have outlined it into the new predicate as well. I am no longer sure whether it is a scenario that happens so often to justify a wrapper, but I'd like to propose it anyway, at least to remove it from the TODO list as a not-so-good-idea-after-all :-) Bootstrapped and tested on x86_64-linux. Is it a good idea? OK for trunk? Thanks, Martin 2016-11-03 Martin Jambor * fold-const.c (type_sizes_equal_p): New function. * fold-const.h (type_sizes_equal_p): Declare. * ipa-devirt.c (odr_types_equivalent_p): Use it. * ipa-polymorphic-call.c (meet_with): Likewise. * tree-ssa-alias.c (stmt_kills_ref_p): Likewise. --- gcc/fold-const.c | 19 +++ gcc/fold-const.h | 1 + gcc/ipa-devirt.c | 2 +- gcc/ipa-polymorphic-call.c | 10 ++ gcc/tree-ssa-alias.c | 7 +-- 5 files changed, 24 insertions(+), 15 deletions(-) diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 603aff0..ab77b8d 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -3342,6 +3342,25 @@ operand_equal_for_comparison_p (tree arg0, tree arg1, tree other) return 0; } + +/* Given two types, return true if both have a non-NULL TYPE_SIZE and these + sizes have the same value. */ + +bool +type_sizes_equal_p (const_tree t1, const_tree t2) +{ + gcc_checking_assert (TYPE_P (t1)); + gcc_checking_assert (TYPE_P (t2)); + t1 = TYPE_SIZE (t1); + t2 = TYPE_SIZE (t2); + + if (!t1 || !t2) +return false; + else if (t1 == t2) +return true; + else +return operand_equal_p (t1, t2, 0); +} /* See if ARG is an expression that is either a comparison or is performing arithmetic on comparisons. The comparisons must only be comparing diff --git a/gcc/fold-const.h b/gcc/fold-const.h index ae37142..014ca34 100644 --- a/gcc/fold-const.h +++ b/gcc/fold-const.h @@ -89,6 +89,7 @@ extern void fold_undefer_and_ignore_overflow_warnings (void); extern bool fold_deferring_overflow_warnings_p (void); extern void fold_overflow_warning (const char*, enum warn_strict_overflow_code); extern int operand_equal_p (const_tree, const_tree, unsigned int); +extern bool type_sizes_equal_p (const_tree, const_tree); extern int multiple_of_p (tree, const_tree, const_tree); #define omit_one_operand(T1,T2,T3)\ omit_one_operand_loc (UNKNOWN_LOCATION, T1, T2, T3) diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c index 49e2195..d2db6f2 100644 --- a/gcc/ipa-devirt.c +++ b/gcc/ipa-devirt.c @@ -1671,7 +1671,7 @@ odr_types_equivalent_p (tree t1, tree t2, bool warn, bool *warned, /* Those are better to come last as they are utterly uninformative. */ if (TYPE_SIZE (t1) && TYPE_SIZE (t2) - && !operand_equal_p (TYPE_SIZE (t1), TYPE_SIZE (t2), 0)) + && !type_sizes_equal_p (t1, t2)) { warn_odr (t1, t2, NULL, NULL, warn, warned, G_("a type with different size " diff --git a/gcc/ipa-polymorphic-call.c b/gcc/ipa-polymorphic-call.c index 8d9f22a..b66fd76 100644 --- a/gcc/ipa-polymorphic-call.c +++ b/gcc/ipa-polymorphic-call.c @@ -2454,10 +2454,7 @@ ipa_polymorphic_call_context::meet_with (ipa_polymorphic_call_context ctx, if (!dynamic && (ctx.dynamic || (!otr_type - && (!TYPE_SIZE (ctx.outer_type) - || !TYPE_SIZE (outer_type) - || !operand_equal_p (TYPE_SIZE (ctx.outer_type), - TYPE_SIZE (outer_type), 0) + && (!type_sizes_equal_p (ctx.outer_type, outer_type) { dynamic = true; updated = true; @@ -2472,10 +2469,7 @@ ipa_polymorphic_call_context::meet_with (ipa_polymorphic_call_context ctx, if (!dynamic && (ctx.dynamic || (!otr_type - && (!TYPE_SIZE (ctx.outer_type) - || !TYPE_SIZE (outer_type) - || !operand_equal_p (TYPE_SIZE (ctx.outer_type), - TYPE_SIZE (outer_type), 0) + && (!type_sizes_equal_p (ctx.outer_type, outer_type) dynamic = true; outer_type = ctx.outer_type; offset = ctx.offset; diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c index ebae6cf..98cd1d7 100644 --- a/gcc/tree-ssa-alias.c +++ b/gcc/tr
Re: [PATCH v2] aarch64: Add split-stack initial support
On 14/10/2016 15:59, Wilco Dijkstra wrote: > Hi, > Thanks for the thoughtful review and sorry for late response. >> Split-stack prologue on function entry is as follow (this goes before the >> usual function prologue): > >> mrsx9, tpidr_el0 >> movx10, - > > As Jiong already remarked, the nop won't work. Do we know the maximum > adjustment > that the linker is allowed to make? If so, and we can limit the adjustment to > 16MB in > most cases, emitting 2 subtracts is best. Larger offset need mov/movk/sub but > that > should be extremely rare. There is no limit afaik on gold split stack allocation handling, and I think one could be added for each backend (in the method override require to implement it). In fact it is not really required to tie the nop generation with the instruction generated by 'aarch64_internal_mov_immediate', it is just a matter to simplify linker code. And although 16MB should be rare, nilptr2.go tests allocates 134217824 so this test fails with this low stack limit. I am not sure how well is the stack usage on 'go', but I think we should at least support current testcase scenario. So for current iteration I kept my current approach, but I am open to suggestions. > >> nop/movk > >> addx10, sp, x10 >> ldrx9, [x9, 16] > > Is there any need to detect underflow of x10 or is there a guarantee that > stacks are > never allocated in the low 2GB (given the maximum adjustment is 2GB)? It's > safe > to do a signed comparison. I do not think so, at least none of current backend that implements split stack do so. > >> cmpx10, x9 >> b.csenough > > Why save/restore x30 and the call x30+8 trick when we could pass the > continuation address and use a tailcall? That also avoids emitting extra > unwind info. > >> stpx30, [sp, -16] >> bl __morestack >> ldpx30, [sp], 16 >> ret > > This part doesn't make any sense - both x28 and carry flag as an input, and > spread > across the prolog - why??? > >> enough: >> mov x10, sp > [prolog] >> b.cscontinue >> mov x10, x28 > continue: > [rest of function] > > Why not do this? > > function: > mrsx9, tpidr_el0 > subx10, sp, N & 0xfff000 > subx10, x10, N & 0xfff > ldrx9, [x9, 16] > adr x12, main_fn_entry > movx11, sp [if function has stacked arguments] > cmpx10, x9 > b.gemain_fn_entry > b __morestack > main_fn_entry: [x11 is argument pointer] > [prolog] > [rest of function] > > In __morestack you need to save x8 as well (another argument register!) and > x12 (the > continuation address). After returning from the call x8 doesn't need to be > preserved. Indeed this strategy is way better and I adjusted the code follow it. The only change is I am using a: [...] cmp x9, x10 b.ltmain_fn_entr b __morestack. [...] So I can issue a 'cmp , 0' on __morestack to indicate the function was called. > > There are several issues with unwinding in __morestack. x28 is not described > as a callee-save > so will be corrupted if unwinding across a __morestack call. This won't > unwind correctly after > the ldp as the unwinder will use the restored frame pointer to try to restore > x29/x30: > > + ldp x29, x30, [x28, STACKFRAME_BASE] > + ldr x28, [x28, STACKFRAME_BASE + 80] > + > + .cfi_remember_state > + .cfi_restore 30 > + .cfi_restore 29 > + .cfi_def_cfa 31, 0 Indeed, it misses x28 save/restore. I think I have added the missing bits, but I must confess that I am not well versed in CFI directives. I will appreciate if you could help me on this new version. > > This stores a random x30 value on the stack, what is the purpose of this? > Nothing can unwind > to here: > > + # Start using new stack > + stp x29, x30, [x0, -16]! > + mov sp, x0 > > Also we no longer need split_stack_arg_pointer_used_p () or any code that > uses it (functions > that don't have any arguments passed on the stack could omit the mov x11, sp). Right, we new strategy you proposed to do a branch this is indeed not really required. I remove it from on this new patch. > > Wilco > From dd2927aa5deb8d609c748014f3b566962fb852c5 Mon Sep 17 00:00:00 2001 From: Adhemerval Zanella Date: Wed, 4 May 2016 21:13:39 + Subject: [PATCH 2/2] aarch64: Add split-stack initial support This patch adds the split-stack support on aarch64 (PR #67877). As for other ports this patch should be used along with glibc and gold support. The support is done similar to other architectures: a __private_ss field is added on TCB in glibc, a target-specific __morestack implementation and helper functions are added in libgcc and compiler supported in adjusted (split-stack prologue, va_start for argument handling). I also plan to send the gold support to adjus
Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker
Iain, It certainly looks like you dropped a file here. The proposed ChangeLog shows... * config.in: Likewise. but the previously proposed hunk from... diff --git a/gcc/config.in b/gcc/config.in index a736de3..a7ff3ee 100644 --- a/gcc/config.in +++ b/gcc/config.in @@ -1934,6 +1934,18 @@ #endif +/* Define to 1 if ld64 supports '-export_dynamic'. */ +#ifndef USED_FOR_TARGET +#undef LD64_HAS_EXPORT_DYNAMIC +#endif + + +/* Define to ld64 version. */ +#ifndef USED_FOR_TARGET +#undef LD64_VERSION +#endif + + /* Define to the linker option to ignore unused dependencies. */ #ifndef USED_FOR_TARGET #undef LD_AS_NEEDED_OPTION from PR71767-vs-240230 has gone missing. The current patch still produces a compiler which triggers warnings of... warning: section "__textcoal_nt" is deprecated during the bootstrap until that hunk of the original patch is restored. Jack On Sun, Nov 6, 2016 at 2:39 PM, Iain Sandoe wrote: > Hi Folks, > > This is an initial patch in a series that converts Darwin's configury to > detect ld64 features, rather than the current process of hard-coding them on > target system version. > > This adds an option --with-ld64[=version] that allows the configurer to > specify that the Darwin ld64 linker is in use. If the version is given then > that will be used to determine the capabilities of the linker in native and > canadian crosses. For Darwin targets this flag will default to "on", since > such targets require an ld64-compatible linker. > > If a DEFAULT_LINKER is set via --with-ld= then this will also be tested to > see if it is ld64. > > The ld64 version is determined (unless overridden by --with-ld64=version) and > this is exported for use in setting a default value for -mtarget-linker > (needed for run-time code-gen changes to section choices). > > In this initial patch, support for -rdynamic is converted to be detected at > config time, or by the ld64 version if that is explicitly given (as an > example of usage). > > OK for trunk? > OK for open branches? > Iain > > gcc/ > > 2016-11-06 Iain Sandoe > >PR target/71767 > * configure.ac (with-ld64): New arg-with. gcc_ld64_version: New, > new test. gcc_cv_ld64_export_dynamic: New, New test. > * configure: Regenerate. > * config.in: Likewise. > * darwin.h: Use LD64_HAS_DYNAMIC export. DEF_LD64: New, define. > * darwin10.h(DEF_LD64): Update for this target version. > * darwin12.h(LINK_GCC_C_SEQUENCE_SPEC): Remove rdynamic test. > (DEF_LD64): Update for this target version. > --- > gcc/config/darwin.h | 16 ++- > gcc/config/darwin10.h | 5 > gcc/config/darwin12.h | 7 - > gcc/configure.ac | 74 > +++ > 4 files changed, 100 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h > index 045f70b..541bcb3 100644 > --- a/gcc/config/darwin.h > +++ b/gcc/config/darwin.h > @@ -165,6 +165,12 @@ extern GTY(()) int darwin_ms_struct; > specifying the handling of options understood by generic Unix > linkers, and for positional arguments like libraries. */ > > +#if LD64_HAS_EXPORT_DYNAMIC > +#define DARWIN_EXPORT_DYNAMIC " %{rdynamic:-export_dynamic}" > +#else > +#define DARWIN_EXPORT_DYNAMIC " %{rdynamic: %nrdynamic is not supported}" > +#endif > + > #define LINK_COMMAND_SPEC_A \ > "%{!fdump=*:%{!fsyntax-only:%{!c:%{!M:%{!MM:%{!E:%{!S:\ > %(linker)" \ > @@ -185,7 +191,9 @@ extern GTY(()) int darwin_ms_struct; > %{!nostdlib:%{!nodefaultlibs:\ >%{%:sanitize(address): -lasan } \ >%{%:sanitize(undefined): -lubsan } \ > - %(link_ssp) %(link_gcc_c_sequence)\ > + %(link_ssp) \ > + " DARWIN_EXPORT_DYNAMIC " % + %(link_gcc_c_sequence) \ > }}\ > %{!nostdlib:%{!nostartfiles:%E}} %{T*} %{F*} }}}" > > @@ -932,4 +940,10 @@ extern void darwin_driver_init (unsigned int *,struct > cl_decoded_option **); > fall-back default. */ > #define DEF_MIN_OSX_VERSION "10.5" > > +#ifndef LD64_VERSION > +#define LD64_VERSION "85.2" > +#else > +#define DEF_LD64 LD64_VERSION > +#endif > + > #endif /* CONFIG_DARWIN_H */ > diff --git a/gcc/config/darwin10.h b/gcc/config/darwin10.h > index 5829d78..a81fbdc 100644 > --- a/gcc/config/darwin10.h > +++ b/gcc/config/darwin10.h > @@ -32,3 +32,8 @@ along with GCC; see the file COPYING3. If not see > > #undef DEF_MIN_OSX_VERSION > #define DEF_MIN_OSX_VERSION "10.6" > + > +#ifndef LD64_VERSION > +#undef DEF_LD64 > +#define DEF_LD64 "97.7" > +#endif > diff --git a/gcc/config/darwin12.h b/gcc/config/darwin12.h > index e366982..f88e2a4 100644 > --- a/gcc/config/darwin12.h > +++ b/gcc/config/darwin12.h > @@ -21,10 +21,15 @@ along with GCC; see the file COPYING3. If not see > #undef LINK_GCC_C_SEQUENCE_SPEC > #define LINK_GCC_C_SEQUENCE_SPEC \ > "%:version-compare(>= 10.6 mmacosx-version-min= -no_compact_unwind) \ > - %{rdynamic:-export_dynamic} %{!stati
[PATCH, rs6000] Modify include paths in config.gcc for Advance Toolchain builds
Gabriel and I have been tracking down an include path issue for GCC 6 Advance Toolchain builds (ie, --with-advance-toolchain=...). The solution that fixes the problem for us is to configure with --with-local-prefix=... and removing the following hunk from config.gcc. Gabriel has confirmed this fixes his AT builds (native and cross) and I've verified that this patch bootstraps with no regressions. Is this ok for trunk and the GCC 6 branch? Peter * config.gcc (powerpc*-*-*, rs6000*-*-*): Remove setting of INCLUDE_EXTRA_SPEC for Advance Toolchain builds. Index: gcc/config.gcc === --- gcc/config.gcc (revision 241917) +++ gcc/config.gcc (working copy) @@ -4137,16 +4137,6 @@ case "${target}" in (at="/opt/$with_advance_toolchain" echo "/* Use Advance Toolchain $at */" echo -echo "#ifndef USE_AT_INCLUDE_FILES" -echo "#define USE_AT_INCLUDE_FILES 1" -echo "#endif" -echo -echo "#if USE_AT_INCLUDE_FILES" -echo "#undef INCLUDE_EXTRA_SPEC" -echo "#define INCLUDE_EXTRA_SPEC" \ - "\"-isystem $at/include\"" -echo "#endif" -echo echo "#undef LINK_OS_EXTRA_SPEC32" echo "#define LINK_OS_EXTRA_SPEC32" \ "\"%(link_os_new_dtags)" \
C++ PATCH to announce template instantiations if not -quiet
It occurred to me that a simple trace of template instantiations would fit simply into the stream of function declarations that announce_function prints when -quiet is not specified to the compiler. Tested x86_64-pc-linux-gnu, applying to trunk. commit ae7b4a929fbd05de433451a1d92794d962366646 Author: Jason Merrill Date: Fri Nov 4 09:22:32 2016 -0400 Add template instantiations to the announce_function stream. * pt.c (push_tinst_level_loc): Add template instantiations to the announce_function stream. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index c8d4a06..f910d40 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -9170,6 +9170,13 @@ push_tinst_level_loc (tree d, location_t loc) if (limit_bad_template_recursion (d)) return false; + /* When not -quiet, dump template instantiations other than functions, since + announce_function will take care of those. */ + if (!quiet_flag + && TREE_CODE (d) != TREE_LIST + && TREE_CODE (d) != FUNCTION_DECL) +fprintf (stderr, " %s", decl_as_string (d, TFF_DECL_SPECIFIERS)); + new_level = ggc_alloc (); new_level->decl = d; new_level->locus = loc;
[PATCH] print_rtx: implement support for reuse IDs (v2)
On Tue, 2016-10-25 at 14:47 +0200, Bernd Schmidt wrote: > On 10/21/2016 10:27 PM, David Malcolm wrote: > > Thanks. I attemped to use those fields of recog_data, but it > > doesn't > > seem to be exactly what's needed here. > > Yeah, I may have been confused. I'm not sure that just looking at > SCRATCHes is the right thing either, but I think you're on the right > track, and we can use something like your patch for now and extend it > later if necessary. > > > + public: > > + rtx_reuse_manager (); > > + ~rtx_reuse_manager (); > > + static rtx_reuse_manager *get () { return singleton; } > > OTOH, this setup looks a bit odd to me. Are you trying to avoid > converting the print_rtx stuff to its own class, or avoid passing the > reuse manager as an argument to a lot of functions? > > Some of this setup might not even be necessary. We have a "used" flag > on > rtx objects which is used to unshare RTL, and I think could also be > used > for a similar purpose when dumping. So, before printing, call > reset_insn_used_flags on everything, then have another pass to set > bits > on everything that could conceivably be shared, and when you find > something that already has the bit set, enter it into a table. > Finally, > print everything out, using the table. I think this would be somewhat > simpler than adding another header file and class definition. Now that we have a class rtx_writer, it's much clearer to drop the singleton. In this version I've eliminated the rtx_reuse_manager singleton, instead allowing callers to pass a rtx_reuse_manager * to rtx_writer's ctor. This can be NULL, allowing most dumps to opt out of the reuse-tracking, minimizing the risk of changing an existing testcase; only print_rtl_function makes use of it (and the selftests). I eliminated print-rtl-reuse.h, moving class rtx_reuse_manager into print-rtl.h and print-rtl.c I kept the class rtx_reuse_manager, as it seems appropriate to put responsibility for this aspect of dumping into its own class. I attempted to move it into rtx_writer itself, but doing so made the code less clear. > > +void > > +rtx_reuse_manager::preprocess (const_rtx x) > > +{ > > + subrtx_iterator::array_type array; > > + FOR_EACH_SUBRTX (iter, array, x, NONCONST) > > +if (uses_rtx_reuse_p (*iter)) > > + { > > + if (int *count = m_rtx_occurrence_count.get (*iter)) > > + { > > + if (*count == 1) > > + { > > + m_rtx_reuse_ids.put (*iter, m_next_id++); > > + } > > + (*count)++; > > + } > > + else > > + m_rtx_occurrence_count.put (*iter, 1); > > + } > > Formatting rules suggest no braces around single statements, I think > a > more readable version of this would be: > >if (uses_rtx_reuse_p (*iter)) > { >int *count = m_rtx_occurrence_count.get (*iter) >if (count) > { >if ((*count)++ == 1) > m_rtx_reuse_ids.put (*iter, m_next_id++); > } >else > m_rtx_occurrence_count.put (*iter, 1); > } > > > Bernd Fixed in the way you you noted. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: * config/i386/i386.c: Include print-rtl.h. (selftest::ix86_test_dumping_memory_blockage): New function. (selftest::ix86_run_selftests): Call it. * print-rtl-function.c (print_rtx_function): Create an rtx_reuse_manager and use it. * print-rtl.c: Include "rtl-iter.h". (rtx_writer::rtx_writer): Add reuse_manager param. (rtx_reuse_manager::rtx_reuse_manager): New ctor. (uses_rtx_reuse_p): New function. (rtx_reuse_manager::preprocess): New function. (rtx_reuse_manager::has_reuse_id): New function. (rtx_reuse_manager::seen_def_p): New function. (rtx_reuse_manager::set_seen_def): New function. (rtx_writer::print_rtx): If "in_rtx" has a reuse ID, print it as a prefix the first time in_rtx is seen, and print reuse_rtx subsequently. (print_inline_rtx): Supply NULL for new reuse_manager param. (debug_rtx): Likewise. (print_rtl): Likewise. (print_rtl_single): Likewise. (rtx_writer::print_rtl_single_with_indent): Likewise. * print-rtl.h: Include bitmap.h when building for host. (rtx_writer::rtx_writer): Add reuse_manager param. (rtx_writer::m_rtx_reuse_manager): New field. (class rtx_reuse_manager): New class. * rtl-tests.c (selftest::assert_rtl_dump_eq): Add reuse_manager param and use it when constructing rtx_writer. (selftest::test_dumping_rtx_reuse): New function. (selftest::rtl_tests_c_tests): Call it. * selftest-rtl.h (class rtx_reuse_manager): New forward decl. (selftest::assert_rtl_dump_eq): Add reuse_manager param. (ASSERT_RTL_DUMP_EQ): Supply NULL for reuse_manager param. (ASSERT_RTL_DUMP_EQ_WITH_REUSE): New macro. --- gcc/config/i386/i