Re: [PATCH][GCC][ARM] Fix can_change_mode_class for big-endian
Sending to list Thu 03/19/2018 08:48, Tamar Christina wrote: > Hi Christophe, > > The 03/16/2018 13:50, Christophe Lyon wrote: > > On 15 March 2018 at 11:19, Kyrill Tkachov > > wrote: > > > Hi Tamar, > > > > > > > > >> Regtested on armeb-none-eabi and no regressions. > > >> Bootstrapped on arm-none-linux-gnueabihf and no issues. > > >> > > >> Ok for trunk? and for backport to GCC 7? > > >> > > > > > > Ok for trunk. > > > Please wait for a few days before backporting. > > > > > > > Hi Tamar, > > > > Strangely I have noticed regressions on armeb, I have updated bugzilla > > accordingly. > > Thanks, seems testsuite didn't catch it with our default options. > > This seems to be a combine issue as it's removing subregs it thinks is a > no-op, causing the vec_select not to be done. It was working before because > we were saying any subreg is invalid in arm-be, so if the code didn't get > stuck in an endless loop, it would skip the no-op check. > > I don't see anyway to fix this in the back-end alone as combine gives no way > to tell it that the instruction is not a no-op. Even if I split the > instruction > early on to explicitly use a new register, combine will combine the mov and > vec_select > again and come to the same conclusion. = > > So I will revert this for GCC 8 and fix it for GCC 9. Better the compiler ICE > than > generate bad code. > > Thanks, > Tamar > > > > > Thanks, > > > > Christophe > > > > > Thanks, > > > Kyrill > > > > > > > > >> Thanks, > > >> Tamar > > >> > > >> gcc/ > > >> 2018-03-05 Tamar Christina > > >> > > >> PR target/84711 > > >> * config/arm/arm.c (arm_can_change_mode_class): Use > > >> GET_MODE_UNIT_SIZE > > >> instead of GET_MODE_SIZE when comparing Units. > > >> > > >> gcc/testsuite/ > > >> 2018-03-05 Tamar Christina > > >> > > >> PR target/84711 > > >> * gcc.target/arm/big-endian-subreg.c: New. > > >> > > >> -- > > > > > > > > -- --
Re: [PATCH] Fix PR84512
On Fri, 16 Mar 2018, Tom de Vries wrote: > On 03/16/2018 12:55 PM, Richard Biener wrote: > > On Fri, 16 Mar 2018, Tom de Vries wrote: > > > > > On 02/27/2018 01:42 PM, Richard Biener wrote: > > > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c > > > > === > > > > --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (nonexistent) > > > > +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (working copy) > > > > @@ -0,0 +1,15 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O3 -fdump-tree-optimized" } */ > > > > + > > > > +int foo() > > > > +{ > > > > + int a[10]; > > > > + for(int i = 0; i < 10; ++i) > > > > +a[i] = i*i; > > > > + int res = 0; > > > > + for(int i = 0; i < 10; ++i) > > > > +res += a[i]; > > > > + return res; > > > > +} > > > > + > > > > +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */ > > > > > > This fails for nvptx, because it doesn't have the required vector > > > operations. > > > To fix the fail, I've added requiring effective target vect_int_mult. > > > > On targets that do not vectorize you should see the scalar loops unrolled > > instead. Or do you have only one loop vectorized? > > Sort of. Loop vectorization has no effect, and the scalar loops are completely > unrolled. But then slp vectorization vectorizes the stores. > > So at optimized we have: > ... > MEM[(int *)&a] = { 0, 1 }; > MEM[(int *)&a + 8B] = { 4, 9 }; > MEM[(int *)&a + 16B] = { 16, 25 }; > MEM[(int *)&a + 24B] = { 36, 49 }; > MEM[(int *)&a + 32B] = { 64, 81 }; > _6 = a[0]; > _28 = a[1]; > res_29 = _6 + _28; > _35 = a[2]; > res_36 = res_29 + _35; > _42 = a[3]; > res_43 = res_36 + _42; > _49 = a[4]; > res_50 = res_43 + _49; > _56 = a[5]; > res_57 = res_50 + _56; > _63 = a[6]; > res_64 = res_57 + _63; > _70 = a[7]; > res_71 = res_64 + _70; > _77 = a[8]; > res_78 = res_71 + _77; > _2 = a[9]; > res_11 = _2 + res_78; > a ={v} {CLOBBER}; > return res_11; > ... > > The stores and loads are eliminated by dse1 in the rtl phase, and in the end > we have: > ... > .visible .func (.param.u32 %value_out) foo > { > .reg.u32 %value; > .local .align 16 .b8 %frame_ar[48]; > .reg.u64 %frame; > cvta.local.u64 %frame, %frame_ar; > mov.u32 %value, 285; > st.param.u32[%value_out], %value; > ret; > } > ... > > > That's precisely > > what the PR was about... which means it isn't fixed for nvptx :/ > > Indeed the assembly is not optimal, and would be optimal if we'd have optimal > code at optimized. > > FWIW, using this patch we generate optimal code at optimized: > ... > diff --git a/gcc/passes.def b/gcc/passes.def > index 3ebcfc30349..6b64f600c4a 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -325,6 +325,7 @@ along with GCC; see the file COPYING3. If not see >NEXT_PASS (pass_tracer); >NEXT_PASS (pass_thread_jumps); >NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); > + NEXT_PASS (pass_fre); >NEXT_PASS (pass_strlen); >NEXT_PASS (pass_thread_jumps); >NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */); > ... > > and we get: > ... > .visible .func (.param.u32 %value_out) foo > { > .reg.u32 %value; > mov.u32 %value, 285; > st.param.u32[%value_out], %value; > ret; > } > ... > > I could file a missing optimization PR for nvptx, but I'm not sure where this > should be fixed. Ah, yeah... the usual issue then. Can you please XFAIL the test on nvptx instead of requiring vect_int_mult? Thanks, Richard.
C++ PATCH for c++/84925, ICE in enclosing_instantiation_of
Seems like with this testcase we end up in a scenario where, when counting the lambda count in enclosing_instantiation_of, we arrive at decl_function_context being null, so checking DECL_TEMPLATE_INFO then crashes. It appears that just the following does the right thing, but I'm not too sure about this function. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2018-03-19 Marek Polacek PR c++/84925 * pt.c (enclosing_instantiation_of): Check if fn is null. * g++.dg/cpp1z/lambda-__func__.C: New test. diff --git gcc/cp/pt.c gcc/cp/pt.c index 745c9acd6ee..066cb627a70 100644 --- gcc/cp/pt.c +++ gcc/cp/pt.c @@ -12898,7 +12898,7 @@ enclosing_instantiation_of (tree otctx) for (; flambda_count < lambda_count && fn && LAMBDA_FUNCTION_P (fn); fn = decl_function_context (fn)) ++flambda_count; - if (DECL_TEMPLATE_INFO (fn) + if ((fn && DECL_TEMPLATE_INFO (fn)) ? most_general_template (fn) != most_general_template (tctx) : fn != tctx) continue; diff --git gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C index e69de29bb2d..d5d1c1cb7b6 100644 --- gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C +++ gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C @@ -0,0 +1,13 @@ +// PR c++/84925 +// { dg-options "-std=c++17" } + +template +struct A { + static const int value = 0; + static auto constexpr fn = [] { return __func__; }; +}; + +template +int x = A::value; + +auto s = x; Marek
Re: [PATCH] Fix PR84859
On Fri, Mar 16, 2018 at 03:08:17PM +0100, Richard Biener wrote: > Bootstrapped and tested on x86_64-unknown-linux-gnu, ok? Ok, thanks. Jakub
Re: [Patch AArch64] Turn on -fasynchronous-unwind-tables and -funwind-tables by default.
On Tue, Mar 13, 2018 at 01:35:56PM +, Ramana Radhakrishnan wrote: > Jakub commented here that > https://gcc.gnu.org/ml/gcc-patches/2018-02/msg01325.html we don't turn > on fasynchronous-unwind-tables for AArch64. I note that x86_64 turns on > funwind-tables as well. Thus this patch turns on both. > > Note that this doesn't increase code size but will likely increase > binary size because we now have a lot more eh_frame / debug_frame > information being spat out. We probably need to do get the clang /llvm > guys to do the same but I'll prod them separately. > > Bootstrapped and regression tested on aarch64-none-linux-gnu. > > Ok ? OK. Thanks, James
Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp
On Wed, Mar 14, 2018 at 05:40:49PM +, Sudi Das wrote: > Hi > > This patch is another partial fix for PR 84521. This is adding a > definition to one of the target hooks used in the SJLJ implemetation so > that AArch64 defines the hard_frame_pointer_rtx as the > TARGET_BUILTIN_SETJMP_FRAME_VALUE. As pointed out by Wilco there is > still a lot more work to be done for these builtins in the future. > > Testing: Bootstrapped and regtested on aarch64-none-linux-gnu and added > new test. > > Is this ok for trunk? OK. Thanks, James > *** gcc/ChangeLog *** > > 2018-03-14 Sudakshina Das > > * builtins.c (expand_builtin_setjmp_receiver): Update condition > to restore frame pointer. > * config/aarch64/aarch64.h (DONT_USE_BUILTIN_SETJMP): Update > comment. > * config/aarch64/aarch64.c (aarch64_builtin_setjmp_frame_value): > New. > (TARGET_BUILTIN_SETJMP_FRAME_VALUE): Define. > > *** gcc/testsuite/ChangeLog *** > > 2018-03-14 Sudakshina Das > > * gcc.c-torture/execute/pr84521.c: New test. > diff --git a/gcc/builtins.c b/gcc/builtins.c > index 85affa7..640f1a9 100644 > --- a/gcc/builtins.c > +++ b/gcc/builtins.c > @@ -898,7 +898,8 @@ expand_builtin_setjmp_receiver (rtx receiver_label) > >/* Now put in the code to restore the frame pointer, and argument > pointer, if needed. */ > - if (! targetm.have_nonlocal_goto ()) > + if (! targetm.have_nonlocal_goto () > + && targetm.builtin_setjmp_frame_value () != hard_frame_pointer_rtx) > { >/* First adjust our frame pointer to its actual value. It was >previously set to the start of the virtual area corresponding to > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index e3c52f6..7a21c14 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -474,7 +474,9 @@ extern unsigned aarch64_architecture_version; > #define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, R4_REGNUM) > #define EH_RETURN_HANDLER_RTX aarch64_eh_return_handler_rtx () > > -/* Don't use __builtin_setjmp until we've defined it. */ > +/* Don't use __builtin_setjmp until we've defined it. > + CAUTION: This macro is only used during exception unwinding. > + Don't fall for its name. */ > #undef DONT_USE_BUILTIN_SETJMP > #define DONT_USE_BUILTIN_SETJMP 1 > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index e1fb87f..e7ac0fe 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -12128,6 +12128,13 @@ aarch64_expand_builtin_va_start (tree valist, rtx > nextarg ATTRIBUTE_UNUSED) >expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL); > } > > +/* Implement TARGET_BUILTIN_SETJMP_FRAME_VALUE. */ > +static rtx > +aarch64_builtin_setjmp_frame_value (void) > +{ > + return hard_frame_pointer_rtx; > +} > + > /* Implement TARGET_GIMPLIFY_VA_ARG_EXPR. */ > > static tree > @@ -17505,6 +17512,9 @@ aarch64_run_selftests (void) > #undef TARGET_FOLD_BUILTIN > #define TARGET_FOLD_BUILTIN aarch64_fold_builtin > > +#undef TARGET_BUILTIN_SETJMP_FRAME_VALUE > +#define TARGET_BUILTIN_SETJMP_FRAME_VALUE aarch64_builtin_setjmp_frame_value > + > #undef TARGET_FUNCTION_ARG > #define TARGET_FUNCTION_ARG aarch64_function_arg > > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr84521.c > b/gcc/testsuite/gcc.c-torture/execute/pr84521.c > new file mode 100644 > index 000..76b10d2 > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/execute/pr84521.c > @@ -0,0 +1,49 @@ > +/* { dg-require-effective-target indirect_jumps } */ > + > +#include > + > +jmp_buf buf; > + > +int uses_longjmp (void) > +{ > + __builtin_longjmp (buf, 1); > +} > + > +int gl; > +void after_longjmp (void) > +{ > + gl = 5; > +} > + > +int > +test_1 (int n) > +{ > + volatile int *p = alloca (n); > + if (__builtin_setjmp (buf)) > +{ > + after_longjmp (); > +} > + else > +{ > + uses_longjmp (); > +} > + > + return 0; > +} > + > +int __attribute__ ((optimize ("no-omit-frame-pointer"))) > +test_2 (int n) > +{ > + int i; > + int *ptr = (int *)__builtin_alloca (sizeof (int) * n); > + for (i = 0; i < n; i++) > +ptr[i] = i; > + test_1 (n); > + return 0; > +} > + > +int main (int argc, const char **argv) > +{ > + __builtin_memset (&buf, 0xaf, sizeof (buf)); > + test_2 (100); > +}
[PATCH] Fix PR84929
This fixes PR84929. analyze_siv_subscript_cst_affine chrec_convert()s chrec_b but doesn't handle the case of that simply wrapping a NOP around the CHREC. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2018-03-19 Richard Biener PR tree-optimization/84929 * tree-data-ref.c (analyze_siv_subscript_cst_affine): Guard chrec_is_positive against non-chrec arg. * gcc.dg/torture/pr84929.c: New testcase. Index: gcc/tree-data-ref.c === --- gcc/tree-data-ref.c (revision 258641) +++ gcc/tree-data-ref.c (working copy) @@ -3015,7 +3017,8 @@ analyze_siv_subscript_cst_affine (tree c { if (value0 == false) { - if (!chrec_is_positive (CHREC_RIGHT (chrec_b), &value1)) + if (TREE_CODE (chrec_b) != POLYNOMIAL_CHREC + || !chrec_is_positive (CHREC_RIGHT (chrec_b), &value1)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "siv test failed: chrec not positive.\n"); @@ -3096,7 +3099,8 @@ analyze_siv_subscript_cst_affine (tree c } else { - if (!chrec_is_positive (CHREC_RIGHT (chrec_b), &value2)) + if (TREE_CODE (chrec_b) != POLYNOMIAL_CHREC + || !chrec_is_positive (CHREC_RIGHT (chrec_b), &value2)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "siv test failed: chrec not positive.\n"); Index: gcc/testsuite/gcc.dg/torture/pr84929.c === --- gcc/testsuite/gcc.dg/torture/pr84929.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/pr84929.c (working copy) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target int32plus } */ + +int a[4]; +void fn1() +{ + __UINT64_TYPE__ b = 7818038963515661296; + for (;; b++) +a[0xA699ECD2C348A3A0] = a[b]; +}
Re: [PATCH ] PR 844422 Fix FCTID, FCTIW with -mcpu=power7
On Mar 16, 2018, at 5:51 PM, Segher Boessenkool wrote: > > Hi Carl, > > On Wed, Mar 14, 2018 at 08:27:08AM -0700, Carl Love wrote: >> The following patch fixes an ICE when compiling the test case >> >> gcc -mcpu=power7 builtin-fctid-fctiw-runnable.c >> >> The GCC compiler now gives a message >> >>"error: builtin function ‘__builtin_fctiw’ requires the ‘-mpower8-vector’ >> option" >> >> and exits without generating an internal error. > > This is an improvement over the ICE, for sure. > > But fctiw is an ISA 1.xx instruction already (and fctid is as well, > but explicitly undefined for 32-bit implementations until some 2.xx, > I forgot which, 2.02 I think?) > > Requiring power8 for it is weird and surprising. power8-vector doubly so. > > It does not seem to be a good idea to only enable the builtin for cases > where the current implementation is not broken. > > (The issue is that pre-power8 we do not allow SImode in FPR registers. > Which makes the current fctiw implementation fail). > > I think we have two good options: > > 1) Remove these builtins; > or > 2) Make them work. > I don't think we can remove them, as for a while they were the only way to access these instructions. We now have some vec_* intrinsics that are the preferred interfaces, but for backward compatibility I think we have to keep the old ones. Thanks, Bill > > Segher >
Re: [PATCH ] PR 844422 Fix FCTID, FCTIW with -mcpu=power7
> On Mar 19, 2018, at 8:19 AM, Bill Schmidt wrote: > > On Mar 16, 2018, at 5:51 PM, Segher Boessenkool > wrote: >> >> Hi Carl, >> >> On Wed, Mar 14, 2018 at 08:27:08AM -0700, Carl Love wrote: >>> The following patch fixes an ICE when compiling the test case >>> >>> gcc -mcpu=power7 builtin-fctid-fctiw-runnable.c >>> >>> The GCC compiler now gives a message >>> >>> "error: builtin function ‘__builtin_fctiw’ requires the ‘-mpower8-vector’ >>> option" >>> >>> and exits without generating an internal error. >> >> This is an improvement over the ICE, for sure. >> >> But fctiw is an ISA 1.xx instruction already (and fctid is as well, >> but explicitly undefined for 32-bit implementations until some 2.xx, >> I forgot which, 2.02 I think?) >> >> Requiring power8 for it is weird and surprising. power8-vector doubly so. >> >> It does not seem to be a good idea to only enable the builtin for cases >> where the current implementation is not broken. >> >> (The issue is that pre-power8 we do not allow SImode in FPR registers. >> Which makes the current fctiw implementation fail). >> >> I think we have two good options: >> >> 1) Remove these builtins; >> or >> 2) Make them work. >> > I don't think we can remove them, as for a while they were the only way to > access these instructions. We now have some vec_* intrinsics that are > the preferred interfaces, but for backward compatibility I think we have to > keep the old ones. Sorry, never mind, I was thinking of vector forms. Please ignore me. Bill > > Thanks, > Bill > >> >> Segher
Re: [PATCH ] PR 844422 Fix FCTID, FCTIW with -mcpu=power7
On Mon, Mar 19, 2018 at 08:19:18AM -0500, Bill Schmidt wrote: > > Requiring power8 for it is weird and surprising. power8-vector doubly so. > > > > It does not seem to be a good idea to only enable the builtin for cases > > where the current implementation is not broken. > > > > (The issue is that pre-power8 we do not allow SImode in FPR registers. > > Which makes the current fctiw implementation fail). > > > > I think we have two good options: > > > > 1) Remove these builtins; > > or > > 2) Make them work. > > > I don't think we can remove them, as for a while they were the only way to > access these instructions. We now have some vec_* intrinsics that are > the preferred interfaces, but for backward compatibility I think we have to > keep the old ones. It is new for GCC 8, added in r253238. We can still remove it for GCC 8 if it needs more work; or we can fix it. I'd rather not ship a broken feature (that no one yet uses). Segher
[PATCH] Fix PR84933
The following fixes PR84933 without really tackling the underlying issue of having out-of-bound ranges in the lattice (out-of-bound with respect to TYPE_MIN/MAX_VALUE) which is not sth I'd like to resolve at this point. It simply avoids the bogus transform in set_and_canonicalize_value_range that results in this case (~[0, 5] -> [4, 1] with TYPE_MAX_VALUE == 1). I realize there are similar conditions throughout the code so a more conservative "fix" would be to not use TYPE_MIN/MAX_VALUE at all but IIRC that will regress some optimization cases - still that's sth to consider for GCC9. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2018-03-19 Richard Biener PR tree-optimization/84933 * tree-vrp.c (set_and_canonicalize_value_range): Treat out-of-bound values as -INF/INF when canonicalizing an ANTI_RANGE to a RANGE. * g++.dg/pr84933.C: New testcase. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 258641) +++ gcc/tree-vrp.c (working copy) @@ -386,8 +386,13 @@ set_and_canonicalize_value_range (value_ /* Anti-ranges that can be represented as ranges should be so. */ if (t == VR_ANTI_RANGE) { - bool is_min = vrp_val_is_min (min); - bool is_max = vrp_val_is_max (max); + /* For -fstrict-enums we may receive out-of-range ranges so consider + values < -INF and values > INF as -INF/INF as well. */ + tree type = TREE_TYPE (min); + bool is_min = (INTEGRAL_TYPE_P (type) +&& tree_int_cst_compare (min, TYPE_MIN_VALUE (type)) <= 0); + bool is_max = (INTEGRAL_TYPE_P (type) +&& tree_int_cst_compare (max, TYPE_MAX_VALUE (type)) >= 0); if (is_min && is_max) { Index: gcc/testsuite/g++.dg/pr84933.C === --- gcc/testsuite/g++.dg/pr84933.C (nonexistent) +++ gcc/testsuite/g++.dg/pr84933.C (working copy) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fstrict-enums -fno-inline" } */ + +enum a {}; +int *d; +int b, e, f; +a c, g; +class h { +virtual unsigned i(); +}; +class j : h { +unsigned i() { + for (;;) { + b = c <= 0; + if (b) + e = *d; + b = g && c; + if (b) + f = *d; + } +} +}; +void k() { new j; }
[C++/84812] ICE with local fn decl
This bug happens when the machinery to check for an implicit extern "C" linkage on a local declaration, meets an ambiguous local lookup. I was misled by a comment that implied TREE_LIST -> overload. That was true way back, but now means ambiguous lookup. I declined the opportunity to micro-optimize the test to 'TREE_CODE (x) == ERROR_MARK || TTEE_CODE (X) == TREE_LIST'. nathan -- Nathan Sidwell 2018-03-19 Nathan Sidwell PR c++/84812 * name-lookup.c (set_local_extern_decl_linkage): Defend against ambiguous lookups. PR c++/84812 * g++.dg/lookup/pr84812.C: New. Index: cp/name-lookup.c === --- cp/name-lookup.c (revision 258642) +++ cp/name-lookup.c (working copy) @@ -2878,7 +2878,9 @@ set_local_extern_decl_linkage (tree decl = find_namespace_value (current_namespace, DECL_NAME (decl)); loc_value = ns_value; } - if (loc_value == error_mark_node) + if (loc_value == error_mark_node + /* An ambiguous lookup. */ + || (loc_value && TREE_CODE (loc_value) == TREE_LIST)) loc_value = NULL_TREE; for (ovl_iterator iter (loc_value); iter; ++iter) @@ -2926,7 +2928,8 @@ set_local_extern_decl_linkage (tree decl if (ns_value == decl) ns_value = find_namespace_value (current_namespace, DECL_NAME (decl)); - if (ns_value == error_mark_node) + if (ns_value == error_mark_node + || (ns_value && TREE_CODE (ns_value) == TREE_LIST)) ns_value = NULL_TREE; for (ovl_iterator iter (ns_value); iter; ++iter) Index: testsuite/g++.dg/lookup/pr84812.C === --- testsuite/g++.dg/lookup/pr84812.C (revision 0) +++ testsuite/g++.dg/lookup/pr84812.C (working copy) @@ -0,0 +1,18 @@ +// PR 84812. ICE determining implicit "C" linkage + +struct A { void foo(); }; +struct B { void foo(); }; + +struct C : A, B +{ + void X (); +}; + +void C::X () +{ + void foo (); // local decl of ::foo + + foo (); +} + +// { dg-final { scan-assembler "_Z3foov" } }
Re: [og7] Update nvptx_fork/join barrier placement
On 03/09/2018 05:55 PM, Cesar Philippidis wrote: On 03/09/2018 08:21 AM, Tom de Vries wrote: On 03/09/2018 12:31 AM, Cesar Philippidis wrote: Nvidia Volta GPUs now support warp-level synchronization. Well, let's try to make that statement a bit more precise. All Nvidia architectures have supported synchronization of threads in a warp on a very basic level: by means of convergence (and unfortunately, we've seen that this is very error-prone). What is new in ptx 6.0 combined with sm_70 is the ability to sync divergent threads without having to converge, f.i. by using new instructions bar.warp.sync and barrier.sync. Yes. The major difference sm_70 GPU architectures and earlier GPUs is that sm_70 allows the user to explicitly synchronize divergent warps. At least on Maxwell and Pascal, the PTX SASS compiler uses two instructions to branch, SYNC and BRA. I think, SYNC guarantees that a warp is convergent at the SYNC point, whereas BRA makes no such guarantees. If you want to understand the interplay of sync (or .s suffix), branch and ssy, please read https://people.engr.ncsu.edu/hzhou/ispass_15-poster.pdf . What's worse, once a warp has become divergent on sm_60 and earlier GPUs, there's no way to reliably reconverge them. So, to avoid that problem, it critical that the PTX SASS compiler use SYNC instructions when possible. Fortunately, bar.warp.sync resolves the divergent warp problem on sm_70+. As such, the semantics of legacy bar.sync instructions have slightly changed on newer GPUs. Before in ptx 3.1, we have for bar.sync: ... Barriers are executed on a per-warp basis as if all the threads in a warp are active. Thus, if any thread in a warp executes a bar instruction, it is as if all the threads in the warp have executed the bar instruction. All threads in the warp are stalled until the barrier completes, and the arrival count for the barrier is incremented by the warp size (not the number of active threads in the warp). In conditionally executed code, a bar instruction should only be used if it is known that all threads evaluate the condition identically (the warp does not diverge). ... But in ptx 6.0, we have: ... bar.sync is equivalent to barrier.sync.aligned ... and: ... Instruction barrier has optional .aligned modifier. When specified, it indicates that all threads in CTA will execute the same barrier instruction. In conditionally executed code, an aligned barrier instruction should only be used if it is known that all threads in CTA evaluate the condition identically, otherwise behavior is undefined. ... So, in ptx 3.1 bar.sync should be executed in convergent mode (all the threads in each warp executing the same). But in ptx 6.0, bar.sync should be executed in the mode that the whole CTA is executing the same code. So going from the description of ptx, it seems indeed that the semantics of bar.sync has changed. That is however surprising, since it would break the forward compatibility that AFAIU is the idea behind ptx. So for now my hope is that this is a documentation error. I spent a lot of time debugging deadlocks with the vector length changes and I have see no changes in the SASS code generated in the newer Nvidia drivers when compared to the older ones, at lease with respect to the barrier instructions. This isn't the first time I've seen inconsistencies with thread synchronization in Nvidia's documentation. For the longest time, the "CUDA Programming Guide" provided slightly conflicting semantics for the __syncthreads() function, which ultimately gets implemented as bar.sync in PTX. The PTX JIT will now, occasionally, emit a warpsync instruction immediately before a bar.sync for Volta GPUs. That implies that warps must be convergent on entry to those threads barriers. That warps must be convergent on entry to bar.sync is already required by ptx 3.1. [ And bar.warp.sync does not force convergence, so if the warpsync instruction you mention is equivalent to bar.warp.sync then your reasoning is incorrect. ] I'm under the impression that bar.warp.sync converges all of the non-exited threads in a warp. I have not played around with the instruction yet, so I'm not sure, but what I read from the docs is that bar.warp.sync converges all of the non-exited threads in a warp only and only if it's positioned at a point post-dominating a divergent branch. Consider this case: ... if (tid.x == 0) { A; bar.warp.sync 32; B; } else { C; bar.warp.sync 32; D; } ... AFAIU, this allows bar.warp.sync to synchronize the threads in the warp, _without_ converging. You'd still need to use bar.sync or some variant of the new barrier instruction to converge the entire CTA. But at the moment, we're still generating code that's backwards compatible with sm_30. The problem in og7, and trunk, is that GCC emits barrier instructions at the wrong spots. E.g., consider the following OpenACC parallel region: #pragma acc parallel loop worker
Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md
On Fri, Dec 15, 2017 at 11:57:46AM +, Sudi Das wrote: > Hi > > This patch fixes the inconsistent behavior observed at -O3 for the > unordered comparisons. According to the online docs > (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), > > all of the following should not raise an FP exception: > - UNGE_EXPR > - UNGT_EXPR > - UNLE_EXPR > - UNLT_EXPR > - UNEQ_EXPR > Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one. > > The aarch64-simd.md handling of these were generating exception raising > instructions such as fcmgt. This patch changes the instructions that are > emitted to in order to not give out the exceptions. We first check each > operand for NaNs and force any elements containing NaN to zero before > using them in the compare. > > Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : > a, isnan (b) ? 0.0 : b)) > > > The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and > UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)). > > Testing done: Checked for regressions on bootstrapped > aarch64-none-linux-gnu and added a new test case. > > Is this ok for trunk? This will probably need a back-port to > gcc-7-branch as well. OK. Let it soak on trunk for a while before the backport. Thanks, James > ChangeLog Entries: > > *** gcc/ChangeLog *** > > 2017-12-15 Sudakshina Das > > PR target/81647 > * config/aarch64/aarch64-simd.md (vec_cmp): Modify > instructions for > UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED. > > *** gcc/testsuite/ChangeLog *** > > 2017-12-15 Sudakshina Das > > PR target/81647 > * gcc.target/aarch64/pr81647.c: New. > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -2731,10 +2731,10 @@ > break; > } >/* Fall through. */ > -case UNGE: > +case UNLT: >std::swap (operands[2], operands[3]); >/* Fall through. */ > -case UNLE: > +case UNGT: > case GT: >comparison = gen_aarch64_cmgt; >break; > @@ -2745,10 +2745,10 @@ > break; > } >/* Fall through. */ > -case UNGT: > +case UNLE: >std::swap (operands[2], operands[3]); >/* Fall through. */ > -case UNLT: > +case UNGE: > case GE: >comparison = gen_aarch64_cmge; >break; > @@ -2771,21 +2771,35 @@ > case UNGT: > case UNLE: > case UNLT: > -case NE: > - /* FCM returns false for lanes which are unordered, so if we use > - the inverse of the comparison we actually want to emit, then > - invert the result, we will end up with the correct result. > - Note that a NE NaN and NaN NE b are true for all a, b. > - > - Our transformations are: > - a UNGE b -> !(b GT a) > - a UNGT b -> !(b GE a) > - a UNLE b -> !(a GT b) > - a UNLT b -> !(a GE b) > - a NE b -> !(a EQ b) */ > - gcc_assert (comparison != NULL); > - emit_insn (comparison (operands[0], operands[2], operands[3])); > - emit_insn (gen_one_cmpl2 (operands[0], operands[0])); > + { > + /* All of the above must not raise any FP exceptions. Thus we first > +check each operand for NaNs and force any elements containing NaN to > +zero before using them in the compare. > +Example: UN (a, b) -> UNORDERED (a, b) | > + (cm (isnan (a) ? 0.0 : a, > + isnan (b) ? 0.0 : b)) > +We use the following transformations for doing the comparisions: > +a UNGE b -> a GE b > +a UNGT b -> a GT b > +a UNLE b -> b GE a > +a UNLT b -> b GT a. */ > + > + rtx tmp0 = gen_reg_rtx (mode); > + rtx tmp1 = gen_reg_rtx (mode); > + rtx tmp2 = gen_reg_rtx (mode); > + emit_insn (gen_aarch64_cmeq (tmp0, operands[2], operands[2])); > + emit_insn (gen_aarch64_cmeq (tmp1, operands[3], operands[3])); > + emit_insn (gen_and3 (tmp2, tmp0, tmp1)); > + emit_insn (gen_and3 (tmp0, tmp0, > + lowpart_subreg (mode, operands[2], > mode))); > + emit_insn (gen_and3 (tmp1, tmp1, > + lowpart_subreg (mode, operands[3], > mode))); > + gcc_assert (comparison != NULL); > + emit_insn (comparison (operands[0], > +lowpart_subreg (mode, tmp0, > mode), > +lowpart_subreg (mode, tmp1, > mode))); > + emit_insn (gen_orn3 (operands[0], tmp2, operands[0])); > + } >break; > > case LT: > @@ -2793,25 +2807,19 @@ > case GT: > case GE: > case EQ: > +case NE: >/* The easy case. Here we emit one of FCMGE, FCMGT or FCMEQ
New Swedish PO file for 'gcc' (version 8.1-b20180128)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Swedish team of translators. The file is available at: http://translationproject.org/latest/gcc/sv.po (This file, 'gcc-8.1-b20180128.sv.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
Re: Use SCEV information when aligning for vectorisation (PR 84005)
On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford wrote: > This PR is another regression caused by the removal of the simple_iv > check in dr_analyze_innermost for BB analysis. Without splitting out > the step, we weren't able to find an underlying object whose alignment > could be increased. > > As with PR81635, I think the simple_iv was only handling one special > case of something that ought to be more general. The more general > thing here is that if the address can be analysed as a scalar > evolution, and if all updates preserve alignment N, it's possible > to align the address to N by increasing the alignment of the base > object to N. That applies also to outer loops, and to both loop > and BB analysis. > > I wasn't sure where the new functions ought to live, but tree-data-ref.c > seemed OK since (a) that already does scev analysis on addresses and > (b) you'd want to use dr_analyze_innermost first if you were analysing > a reference. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > > Richard > > > 2018-03-17 Richard Sandiford > > gcc/ > PR tree-optimization/84005 > * tree-data-ref.h (get_base_for_alignment): Declare. > * tree-data-ref.c (get_base_for_alignment_1): New function. > (get_base_for_alignment): Likewise. > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use > get_base_for_alignment to find a suitable base object, instead > of always using drb->base_address. > > gcc/testsuite/ > PR tree-optimization/84005 > * gcc.dg/vect/bb-slp-1.c: Make sure there is no message about > failing to force the alignment. > > Index: gcc/tree-data-ref.h > === > --- gcc/tree-data-ref.h 2018-01-13 18:02:00.948360274 + > +++ gcc/tree-data-ref.h 2018-03-17 10:43:42.544669549 + > @@ -463,6 +463,7 @@ extern bool compute_all_dependences (vec > extern tree find_data_references_in_bb (struct loop *, basic_block, > vec *); > extern unsigned int dr_alignment (innermost_loop_behavior *); > +extern tree get_base_for_alignment (tree, unsigned int *); > > /* Return the alignment in bytes that DR is guaranteed to have at all > times. */ > Index: gcc/tree-data-ref.c > === > --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 + > +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 + > @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d >return alignment; > } > > +/* If BASE is a pointer-typed SSA name, try to find the object that it > + is based on. Return this object X on success and store the alignment > + in bytes of BASE - &X in *ALIGNMENT_OUT. */ > + > +static tree > +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) > +{ > + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) > +return NULL_TREE; > + > + gimple *def = SSA_NAME_DEF_STMT (base); > + base = analyze_scalar_evolution (loop_containing_stmt (def), base); I think it is an error to use the def stmt of base here. Instead you need to pass down the point/loop of the use. For the same reason you also want to instantiate parameters after analyzing the evolution. In the end, if the BB to be vectorized is contained in a loop nest we want to instantiate up to the point where (eventually) a DECL we can re-align appears, but using the containing loop for now looks OK. > + /* Peel chrecs and record the minimum alignment preserved by > + all steps. */ > + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; > + while (TREE_CODE (base) == POLYNOMIAL_CHREC) > +{ > + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base)); > + alignment = MIN (alignment, step_alignment); > + base = CHREC_LEFT (base); > +} > + > + /* Punt if the expression is too complicated to handle. */ > + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE > (base))) > +return NULL_TREE; > + > + /* Analyze the base to which the steps we peeled were applied. */ > + poly_int64 bitsize, bitpos, bytepos; > + machine_mode mode; > + int unsignedp, reversep, volatilep; > + tree offset; > + base = get_inner_reference (build_fold_indirect_ref (base), ick :/ what happens if you simply use get_pointer_alignment here and strip down POINTER_PLUS_EXPRs to the ultimate LHS? (basically what get_pointer_alignment_1 does) After all get_base_for_alignment_1 itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4]. Thanks, Richard. > + &bitsize, &bitpos, &offset, &mode, > + &unsignedp, &reversep, &volatilep); > + if (!base || !multiple_p (bitpos, BITS_PER_UNIT, &bytepos)) > +return NULL_TREE; > + > + /* Restrict the alignment to that guaranteed by the offsets. */ > + unsig
Re: [og7] Update nvptx_fork/join barrier placement
On 03/19/2018 07:04 AM, Tom de Vries wrote: > On 03/09/2018 05:55 PM, Cesar Philippidis wrote: >> On 03/09/2018 08:21 AM, Tom de Vries wrote: >>> On 03/09/2018 12:31 AM, Cesar Philippidis wrote: Nvidia Volta GPUs now support warp-level synchronization. >>> >>> Well, let's try to make that statement a bit more precise. >>> >>> All Nvidia architectures have supported synchronization of threads in a >>> warp on a very basic level: by means of convergence (and unfortunately, >>> we've seen that this is very error-prone). >>> >>> What is new in ptx 6.0 combined with sm_70 is the ability to sync >>> divergent threads without having to converge, f.i. by using new >>> instructions bar.warp.sync and barrier.sync. >> >> Yes. The major difference sm_70 GPU architectures and earlier GPUs is >> that sm_70 allows the user to explicitly synchronize divergent warps. At >> least on Maxwell and Pascal, the PTX SASS compiler uses two instructions >> to branch, SYNC and BRA. I think, SYNC guarantees that a warp is >> convergent at the SYNC point, whereas BRA makes no such guarantees. >> > > If you want to understand the interplay of sync (or .s suffix), branch > and ssy, please read > https://people.engr.ncsu.edu/hzhou/ispass_15-poster.pdf . Interesting, thanks! >> What's worse, once a warp has become divergent on sm_60 and earlier >> GPUs, there's no way to reliably reconverge them. So, to avoid that >> problem, it critical that the PTX SASS compiler use SYNC instructions >> when possible. Fortunately, bar.warp.sync resolves the divergent warp >> problem on sm_70+. >> As such, the semantics of legacy bar.sync instructions have slightly changed on newer GPUs. >>> >>> Before in ptx 3.1, we have for bar.sync: >>> ... >>> Barriers are executed on a per-warp basis as if all the threads in a >>> warp are active. Thus, if any thread in a warp executes a bar >>> instruction, it is as if all the threads in the warp have executed >>> the bar instruction. All threads in the warp are stalled until the >>> barrier completes, and the arrival count for the barrier is incremented >>> by the warp size (not the number of active threads in the warp). In >>> conditionally executed code, a bar instruction should only be used if it >>> is known that all threads evaluate the condition identically (the warp >>> does not diverge). >>> ... >>> >>> But in ptx 6.0, we have: >>> ... >>> bar.sync is equivalent to barrier.sync.aligned >>> ... >>> and: >>> ... >>> Instruction barrier has optional .aligned modifier. When specified, it >>> indicates that all threads in CTA will execute the same barrier >>> instruction. In conditionally executed code, an aligned barrier >>> instruction should only be used if it is known that all threads in >>> CTA evaluate the condition identically, otherwise behavior is undefined. >>> ... >>> >>> So, in ptx 3.1 bar.sync should be executed in convergent mode (all the >>> threads in each warp executing the same). But in ptx 6.0, bar.sync >>> should be executed in the mode that the whole CTA is executing the same >>> code. >>> >>> So going from the description of ptx, it seems indeed that the semantics >>> of bar.sync has changed. That is however surprising, since it would >>> break the forward compatibility that AFAIU is the idea behind ptx. >>> >>> So for now my hope is that this is a documentation error. >> >> I spent a lot of time debugging deadlocks with the vector length changes >> and I have see no changes in the SASS code generated in the newer Nvidia >> drivers when compared to the older ones, at lease with respect to the >> barrier instructions. This isn't the first time I've seen >> inconsistencies with thread synchronization in Nvidia's documentation. >> For the longest time, the "CUDA Programming Guide" provided slightly >> conflicting semantics for the __syncthreads() function, which ultimately >> gets implemented as bar.sync in PTX. >> The PTX JIT will now, occasionally, emit a warpsync instruction immediately before a bar.sync for Volta GPUs. That implies that warps must be convergent on entry to those threads barriers. >>> >>> That warps must be convergent on entry to bar.sync is already required >>> by ptx 3.1. >>> >>> [ And bar.warp.sync does not force convergence, so if the warpsync >>> instruction you mention is equivalent to bar.warp.sync then your >>> reasoning is incorrect. ] >> >> I'm under the impression that bar.warp.sync converges all of the >> non-exited threads in a warp. > > I have not played around with the instruction yet, so I'm not sure, but > what I read from the docs is that bar.warp.sync converges all of the > non-exited threads in a warp only and only if it's positioned at a point > post-dominating a divergent branch. > > Consider this case: > ... > if (tid.x == 0) > { > A; > bar.warp.sync 32; > B; > } > else > { > C; > bar.warp.sync 32; > D; > } > ... > AFAIU, this allows bar.warp.sync to synchronize the thr
Re: [og7] Update nvptx_fork/join barrier placement
On 03/19/2018 03:55 PM, Cesar Philippidis wrote: Is your patch purely for debugging, or are you planning on committing it to og7 and trunk? I plan to commit it. We have no test-cases testing the neutering code order explicitly. So this check is the only thing that allows us to detect regressions, other than execution failures on newer archs. Thanks, - Tom
[PATCH][PR sanitizer/78651] Incorrect exception handling when catch clause uses local class and PIC and sanitizer are active
Hi, as noted in bugzilla, ASan inserts redzones for `.LDFCM*' variables and breaks internal ABI between GCC and libstdc++ because libstdc++ tries to obtain a pointer to `typeinfo for (anonymous namespace)::SomeRandomType' from a constant offset from `.LDFCM*' labels and hits these redzones. This can be trivially fixed by not sanitizing `.LDFCM*' variables (and other debug variables) at all. Attached patch is tested/bootstrapped on x86_64-unknown-linux-gnu and ppc64le-redhat-linux targets. Is it OK for trunk now, or should I wait for stage 1? -Maxim gcc/ PR sanitizer/78651 * dwarf2asm.c (dw2_output_indirect_constant_1): Call assemble_variable with new parameter. * output.h (assemble_variable): Add new parameter. * varasm.c (assemble_variable): Add new parameter. Do not protect variable with ASan if is_debug_var is true. gcc/testsuite/ChangeLog: PR sanitizer/78651 * g++.dg/asan/pr78651.C: New test. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index b5b5559..a26f05e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2018-03-19 Maxim Ostapenko + + PR sanitizer/78651 + * dwarf2asm.c (dw2_output_indirect_constant_1): Call assemble_variable + with new parameter. + * output.h (assemble_variable): Add new parameter. + * varasm.c (assemble_variable): Add new parameter. Do not protect + variable with ASan if is_debug_var is true. + 2018-03-19 Richard Biener PR tree-optimization/84929 diff --git a/gcc/dwarf2asm.c b/gcc/dwarf2asm.c index e9b18b8..5c90994 100644 --- a/gcc/dwarf2asm.c +++ b/gcc/dwarf2asm.c @@ -967,7 +967,7 @@ dw2_output_indirect_constant_1 (const char *sym, tree id) } sym_ref = gen_rtx_SYMBOL_REF (Pmode, sym); - assemble_variable (decl, 1, 1, 1); + assemble_variable (decl, 1, 1, 1, 1); assemble_integer (sym_ref, POINTER_SIZE_UNITS, POINTER_SIZE, 1); return 0; diff --git a/gcc/output.h b/gcc/output.h index f708cc7..678c370 100644 --- a/gcc/output.h +++ b/gcc/output.h @@ -200,7 +200,7 @@ extern void assemble_end_function (tree, const char *); to define things that have had only tentative definitions. DONT_OUTPUT_DATA if nonzero means don't actually output the initial value (that will be done by the caller). */ -extern void assemble_variable (tree, int, int, int); +extern void assemble_variable (tree, int, int, int, int is_debug_var = 0); /* Put the vtable verification constructor initialization function into the preinit array. */ diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 868d8e8..6ff9217 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2018-03-19 Maxim Ostapenko + + PR sanitizer/78651 + * g++.dg/asan/pr78651.C: New test. + 2018-03-19 Richard Biener PR tree-optimization/84929 diff --git a/gcc/testsuite/g++.dg/asan/pr78651.C b/gcc/testsuite/g++.dg/asan/pr78651.C new file mode 100644 index 000..3f14be7 --- /dev/null +++ b/gcc/testsuite/g++.dg/asan/pr78651.C @@ -0,0 +1,24 @@ +// { dg-do run { target fpic } } + +struct A { }; + +namespace { + +void thisThrows () { + throw A(); +} + +struct SomeRandomType {}; +} + +int main() { + try { +thisThrows(); + } + catch (SomeRandomType) { +throw; + } + catch (A) { + } + return 0; +} diff --git a/gcc/varasm.c b/gcc/varasm.c index 2b5c70c..b18d011 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -2158,11 +2158,14 @@ assemble_undefined_decl (tree decl) AT_END is nonzero if this is the special handling, at end of compilation, to define things that have had only tentative definitions. DONT_OUTPUT_DATA if nonzero means don't actually output the - initial value (that will be done by the caller). */ + initial value (that will be done by the caller). + IS_DEBUG_VAR if nonzero that we are assembling debug variable. + This information is useful for ASan, see pr78651. */ void assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED, - int at_end ATTRIBUTE_UNUSED, int dont_output_data) + int at_end ATTRIBUTE_UNUSED, int dont_output_data, + int is_debug_var) { const char *name; rtx decl_rtl, symbol; @@ -2258,6 +2261,7 @@ assemble_variable (tree decl, int top_level ATTRIBUTE_UNUSED, align_variable (decl, dont_output_data); if ((flag_sanitize & SANITIZE_ADDRESS) + && !is_debug_var && asan_protect_global (decl)) { asan_protected = true;
Re: [PATCH][PR sanitizer/78651] Incorrect exception handling when catch clause uses local class and PIC and sanitizer are active
On 03/19/2018 06:55 PM, Jakub Jelinek wrote: On Mon, Mar 19, 2018 at 06:44:39PM +0300, Maxim Ostapenko wrote: as noted in bugzilla, ASan inserts redzones for `.LDFCM*' variables and breaks internal ABI between GCC and libstdc++ because libstdc++ tries to obtain a pointer to `typeinfo for (anonymous namespace)::SomeRandomType' from a constant offset from `.LDFCM*' labels and hits these redzones. This can be trivially fixed by not sanitizing `.LDFCM*' variables (and other debug variables) at all. I don't like very much adding an extra argument for such so frequently used function to handle a corner case. Wouldn't just: /* Don't instrument this decl with -fsanitize=*address. */ unsigned int save_flag_sanitize = flag_sanitize; flag_sanitize &= ~(SANITIZE_ADDRESS | SANITIZE_USER_ADDRESS | SANITIZE_KERNEL_ADDRESS); assemble_variable (decl, 1, 1, 1); flag_sanitize = save_flag_sanitize; DTRT? Yes, it works, attaching the patch. -Maxim diff --git a/gcc/ChangeLog b/gcc/ChangeLog index b5b5559..356e68c 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2018-03-19 Maxim Ostapenko + + PR sanitizer/78651 + * dwarf2asm.c (dw2_output_indirect_constant_1): Disable ASan before + calling assemble_variable. + 2018-03-19 Richard Biener PR tree-optimization/84929 diff --git a/gcc/dwarf2asm.c b/gcc/dwarf2asm.c index e9b18b8..065406b 100644 --- a/gcc/dwarf2asm.c +++ b/gcc/dwarf2asm.c @@ -967,7 +967,11 @@ dw2_output_indirect_constant_1 (const char *sym, tree id) } sym_ref = gen_rtx_SYMBOL_REF (Pmode, sym); + unsigned int save_flag_sanitize = flag_sanitize; + flag_sanitize &= ~(SANITIZE_ADDRESS | SANITIZE_USER_ADDRESS + | SANITIZE_KERNEL_ADDRESS); assemble_variable (decl, 1, 1, 1); + flag_sanitize = save_flag_sanitize; assemble_integer (sym_ref, POINTER_SIZE_UNITS, POINTER_SIZE, 1); return 0; diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 868d8e8..6ff9217 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2018-03-19 Maxim Ostapenko + + PR sanitizer/78651 + * g++.dg/asan/pr78651.C: New test. + 2018-03-19 Richard Biener PR tree-optimization/84929 diff --git a/gcc/testsuite/g++.dg/asan/pr78651.C b/gcc/testsuite/g++.dg/asan/pr78651.C new file mode 100644 index 000..3f14be7 --- /dev/null +++ b/gcc/testsuite/g++.dg/asan/pr78651.C @@ -0,0 +1,24 @@ +// { dg-do run { target fpic } } + +struct A { }; + +namespace { + +void thisThrows () { + throw A(); +} + +struct SomeRandomType {}; +} + +int main() { + try { +thisThrows(); + } + catch (SomeRandomType) { +throw; + } + catch (A) { + } + return 0; +}
Re: [PATCH][PR sanitizer/78651] Incorrect exception handling when catch clause uses local class and PIC and sanitizer are active
On Mon, Mar 19, 2018 at 06:44:39PM +0300, Maxim Ostapenko wrote: > as noted in bugzilla, ASan inserts redzones for `.LDFCM*' variables and > breaks internal ABI between GCC and libstdc++ because libstdc++ tries to > obtain a pointer to `typeinfo for (anonymous namespace)::SomeRandomType' > from a constant offset from `.LDFCM*' labels and hits these redzones. This > can be trivially fixed by not sanitizing `.LDFCM*' variables (and other > debug variables) at all. I don't like very much adding an extra argument for such so frequently used function to handle a corner case. Wouldn't just: /* Don't instrument this decl with -fsanitize=*address. */ unsigned int save_flag_sanitize = flag_sanitize; flag_sanitize &= ~(SANITIZE_ADDRESS | SANITIZE_USER_ADDRESS | SANITIZE_KERNEL_ADDRESS); assemble_variable (decl, 1, 1, 1); flag_sanitize = save_flag_sanitize; DTRT? Jakub
Re: [PATCH][PR sanitizer/78651] Incorrect exception handling when catch clause uses local class and PIC and sanitizer are active
On 03/19/2018 07:35 PM, Jakub Jelinek wrote: On Mon, Mar 19, 2018 at 07:28:11PM +0300, Maxim Ostapenko wrote: Yes, it works, attaching the patch. Can you please add a short comment why we do this? Also, the testcase needs work, see below. Ok for trunk with those changes and after a while for affected release branches as well. Ok, thanks, attaching the patch I'm going to apply. -Maxim diff --git a/gcc/ChangeLog b/gcc/ChangeLog index b5b5559..356e68c 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2018-03-19 Maxim Ostapenko + + PR sanitizer/78651 + * dwarf2asm.c (dw2_output_indirect_constant_1): Disable ASan before + calling assemble_variable. + 2018-03-19 Richard Biener PR tree-optimization/84929 diff --git a/gcc/dwarf2asm.c b/gcc/dwarf2asm.c index e9b18b8..2e108ac 100644 --- a/gcc/dwarf2asm.c +++ b/gcc/dwarf2asm.c @@ -967,7 +967,13 @@ dw2_output_indirect_constant_1 (const char *sym, tree id) } sym_ref = gen_rtx_SYMBOL_REF (Pmode, sym); + /* Disable ASan for decl because redzones cause ABI breakage between GCC and + libstdc++ for `.LDFCM*' variables. See PR 78651 for details. */ + unsigned int save_flag_sanitize = flag_sanitize; + flag_sanitize &= ~(SANITIZE_ADDRESS | SANITIZE_USER_ADDRESS + | SANITIZE_KERNEL_ADDRESS); assemble_variable (decl, 1, 1, 1); + flag_sanitize = save_flag_sanitize; assemble_integer (sym_ref, POINTER_SIZE_UNITS, POINTER_SIZE, 1); return 0; diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 868d8e8..6ff9217 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2018-03-19 Maxim Ostapenko + + PR sanitizer/78651 + * g++.dg/asan/pr78651.C: New test. + 2018-03-19 Richard Biener PR tree-optimization/84929 diff --git a/gcc/testsuite/g++.dg/asan/pr78651.C b/gcc/testsuite/g++.dg/asan/pr78651.C new file mode 100644 index 000..09f1be5 --- /dev/null +++ b/gcc/testsuite/g++.dg/asan/pr78651.C @@ -0,0 +1,26 @@ +// PR sanitizer/78651 +// { dg-do run } +// { dg-additional-options "-fpic" { target fpic } } + +struct A { }; + +namespace { + +void thisThrows () { + throw A(); +} + +struct SomeRandomType {}; +} + +int main() { + try { +thisThrows(); + } + catch (SomeRandomType) { +throw; + } + catch (A) { + } + return 0; +}
Re: [og7] Update nvptx_fork/join barrier placement
On 03/19/2018 03:55 PM, Cesar Philippidis wrote: Note that this changes ordering of the vector-neutering jump and worker-neutering jump at the end. In principle, this should not be harmful, but it violates the invariant that vector-neutering branch-around code should be as short-lived as possible. So, this needs to be fixed. I've found this issue by adding verification of the neutering, as attached below. ACK, thanks. I'll take a closer look at this. I've got a tentative patch at https://gcc.gnu.org/bugzilla/attachment.cgi?id=43707 ( PR84952 - "[nvptx] bar.sync generated in divergent code" ). Thanks, - Tom
Re: [GCC 6] PATCH: Backport -mindirect-branch= patches
On Mon, Mar 19, 2018 at 10:00 AM, H.J. Lu wrote: > On Thu, Mar 15, 2018 at 1:00 PM, H.J. Lu wrote: >> On Thu, Mar 15, 2018 at 12:54 PM, Jan Hubicka wrote: On Thu, Mar 15, 2018 at 10:47 AM, Jan Hubicka wrote: >> On Thu, Mar 15, 2018 at 9:03 AM, Jan Hubicka wrote: >> >> > What is the reason for using different names for return and >> >> > indirect thunks at first place? >> >> > >> >> >> >> These 2 thunks are identical. But one may want to provide an >> >> alternate thunk only for >> >> indirect branch and leaves return thunk alone. You can't do that if >> >> both have the same >> >> name. >> > >> > Hmm, OK, what is the benefit to have two different thunks? It is just >> > safety precaution so we could adjust one without adjusting the other >> > in >> > future? >> > >> >> That is correct. > > Hmm, I guess the patch is OK. Things are slightly more flexible this way > and > duplicating thunk is not terribly expensive. One can always link with > non-comdat+ alias. > That is true. OK to backport to GCC 7 after a few days? >>> OK. I suppose you are testing return thunks on some real environment, like >>> GCC bootstrap :) >> >> Yes. >> > > I checked my backport into GCC 7 branch. > > Here are GCC 6 patches to backport all -mindirect-branch= patches. > OK for GCC 6. > Resend the patches compressed with xz. Sorry if you get the patches twice. -- H.J. 0013-i386-Don-t-generate-alias-for-function-return-thunk.patch.xz Description: application/xz 0001-i386-Move-struct-ix86_frame-to-machine_function.patch.xz Description: application/xz 0012-i386-Add-TARGET_INDIRECT_BRANCH_REGISTER.patch.xz Description: application/xz 0011-i386-Update-mfunction-return-for-return-with-pop.patch.xz Description: application/xz 0010-i386-Pass-INVALID_REGNUM-as-invalid-register-number.patch.xz Description: application/xz 0009-Use-INVALID_REGNUM-in-indirect-thunk-processing.patch.xz Description: application/xz 0008-x86-Disallow-mindirect-branch-mfunction-return-with-.patch.xz Description: application/xz 0007-x86-Add-V-register-operand-modifier.patch.xz Description: application/xz 0006-x86-Add-mindirect-branch-register.patch.xz Description: application/xz 0005-x86-Add-mfunction-return.patch.xz Description: application/xz 0004-x86-Add-mindirect-branch.patch.xz Description: application/xz 0003-i386-Use-const-reference-of-struct-ix86_frame-to-avo.patch.xz Description: application/xz 0002-i386-Use-reference-of-struct-ix86_frame-to-avoid-cop.patch.xz Description: application/xz
[PR middle-end/70359] uncoalesce IVs outside of loops
Hi Richard. As discussed in the PR, the problem here is that we have two different iterations of an IV live outside of a loop. This inhibits us from using autoinc/dec addressing on ARM, and causes extra lea's on x86. An abbreviated example is this: loop: # p_9 = PHI p_20 = p_9 + 18446744073709551615; goto loop p_24 = p_9 + 18446744073709551614; MEM[(char *)p_20 + -1B] = 45; Here we have both the previous IV (p_9) and the current IV (p_20) used outside of the loop. On Arm this keeps us from using auto-dec addressing, because one use is -2 and the other one is -1. With the attached patch we attempt to rewrite out-of-loop uses of the IV in terms of the current/last IV (p_20 in the case above). With it, we end up with: p_24 = p_20 + 18446744073709551615; *p_24 = 45; ...which helps both x86 and Arm. As you have suggested in comment 38 on the PR, I handle specially out-of-loop IV uses of the form IV+CST and propagate those accordingly (along with the MEM_REF above). Otherwise, in less specific cases, we un-do the IV increment, and use that value in all out-of-loop uses. For instance, in the attached testcase, we rewrite: george (p_9); into _26 = p_20 + 1; ... george (_26); The attached testcase tests the IV+CST specific case, as well as the more generic case with george(). Although the original PR was for ARM, this behavior can be noticed on x86, so I tested on x86 with a full bootstrap + tests. I also ran the specific test on an x86 cross ARM build and made sure we had 2 auto-dec with the test. For the original test (slightly different than the testcase in this patch), with this patch we are at 104 bytes versus 116 without it. There is still the issue of a division optimization which would further reduce the code size. I will discuss this separately as it is independent from this patch. Oh yeah, we could make this more generic, and maybe handle any multiple of the constant, or perhaps *= and /=. Perhaps something for next stage1... OK for trunk? Aldy gcc/ PR middle-end/70359 * tree-outof-ssa.c (uncoalesce_ivs_outside_loop): New. (is_iv_plus_constant): New. (maybe_optimize_iv_plus_constant): New. (undo_iv_increment): New. diff --git a/gcc/testsuite/gcc.dg/pr70359.c b/gcc/testsuite/gcc.dg/pr70359.c new file mode 100644 index 000..b903548e45d --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr70359.c @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-Os -fdump-rtl-expand-details" } */ + +/* Test uncoalesce_ivs_outside_loop(). */ + +void george (const char *); + +char* inttostr(int i, char* buf, int len) +{ + unsigned int ui = (i > 0) ? i : -i; + char *p = buf + len - 1; + *p = '\0'; + do { +*--p = '0' + (ui % 10); + } while ((ui /= 10) != 0); + if (i < 0) { +/* p+1 is the IV in the loop. This line should trigger un-doing + the increment and cause this: */ +/* { dg-final { scan-rtl-dump-times "Adjusted one out of loop IV" 1 "expand" } } */ +george (p+1); + +*--p = '-'; + } + return p; +} + +/* At the last gimple dump we have: + + p_22 = p_8 + 4294967294; + MEM[(char *)p_19 + 4294967295B] = 45; + + The above should be converted by RTL expansion time into: + + p_22 = p_19 + 4294967295; + *p_22 = 45; +*/ + +/* { dg-final { scan-rtl-dump "p_\[0-9\]+ = p_\[0-9\]+ \\+ \[0-9\]+;\[\n\r \]+\\*p_\[0-9\]+ = 45;" "expand" } } */ + +/* On ARM we should get two post-increment stores. */ +/* { dg-final { scan-assembler-times "str\[^\n\r]+!" 2 { target arm-*-* } } } */ + +/* On x86 we should not get more than two lea's. */ +/* { dg-final { scan-assembler-times "lea.\t" 2 { target i?86-*-* x86_64-*-* } } } */ diff --git a/gcc/tree-outof-ssa.c b/gcc/tree-outof-ssa.c index 59bdcd6fadd..42cc53d7052 100644 --- a/gcc/tree-outof-ssa.c +++ b/gcc/tree-outof-ssa.c @@ -43,6 +43,9 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa-coalesce.h" #include "tree-outof-ssa.h" #include "dojump.h" +#include "fold-const.h" +#include "tree-ssa-loop-niter.h" +#include "cfgloop.h" /* FIXME: A lot of code here deals with expanding to RTL. All that code should be in cfgexpand.c. */ @@ -1035,6 +1038,221 @@ trivially_conflicts_p (basic_block bb, tree result, tree arg) return false; } +/* Return TRUE if STMT is in the form: x_99 = SSA +p INT. */ +static bool +is_iv_plus_constant (gimple *stmt, tree ssa) +{ + return is_gimple_assign (stmt) +&& gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR +&& gimple_assign_rhs1 (stmt) == ssa +&& TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST; +} + +/* Helper function for uncoalesce_ivs_outside_loop. Optimize out of + loop uses of the IV when they are of the form IV+CST. + + Return TRUE if we were able to rewrite: + iv_incr = iv + CST + goto LOOP; + iv_use = iv + 2*CST +into: + iv_use = iv_incr + CST + + IV is the induction variable for the current iteration. + IV_INCR is the IV value fo
Re: [og7] Update nvptx_fork/join barrier placement
On 03/19/2018 10:02 AM, Tom de Vries wrote: > On 03/19/2018 03:55 PM, Cesar Philippidis wrote: >>> Note that this changes ordering of the vector-neutering jump and >>> worker-neutering jump at the end. In principle, this should not be >>> harmful, but it violates the invariant that vector-neutering >>> branch-around code should be as short-lived as possible. So, this needs >>> to be fixed. >>> >>> I've found this issue by adding verification of the neutering, as >>> attached below. >> ACK, thanks. I'll take a closer look at this. > > I've got a tentative patch at > https://gcc.gnu.org/bugzilla/attachment.cgi?id=43707 ( PR84952 - > "[nvptx] bar.sync generated in divergent code" ). I attached my WIP patch. But, given that you've spent a lot of time on this, I'll let you continue working on it. Just remember to backport any fix to og7. Thanks, Cesar diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index a6f444340fd..0d288cb81ba 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -4037,6 +4037,22 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) return; } + /* NVPTX_BARSYNC barriers are placed immediately before NVPTX_JOIN + in order to ensure that all of the threads in a CTA reach the + barrier. Don't nueter BLOCK if head is NVPTX_BARSYNC and tail is + NVPTX_JOIN. */ + if (from == to + && recog_memoized (head) == CODE_FOR_nvptx_barsync + && recog_memoized (tail) == CODE_FOR_nvptx_join) +return; + + /* Adjust HEAD to point to the NVPTX_JOIN instruction after a + NVPTX_BARSYNC, so that any successive state neutering code does + not get placed before the dummy JOIN comment. */ + if (recog_memoized (head) == CODE_FOR_nvptx_barsync + && recog_memoized (NEXT_INSN (head)) == CODE_FOR_nvptx_join) +head = NEXT_INSN (head); + /* Insert the vector test inside the worker test. */ unsigned mode; rtx_insn *before = tail; @@ -4057,7 +4073,23 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) br = gen_br_true (pred, label); else br = gen_br_true_uni (pred, label); - emit_insn_before (br, head); + + if (recog_memoized (head) == CODE_FOR_nvptx_forked + && recog_memoized (NEXT_INSN (head)) == CODE_FOR_nvptx_barsync) + { + head = NEXT_INSN (head); + emit_insn_after (br, head); + } + else if (recog_memoized (head) == CODE_FOR_nvptx_join) + { + if (recog_memoized (NEXT_INSN (head)) == CODE_FOR_br_true_uni + && mode == GOMP_DIM_VECTOR) + emit_insn_after (br, NEXT_INSN (head)); + else + emit_insn_after (br, head); + } + else + emit_insn_before (br, head); LABEL_NUSES (label)++; if (tail_branch) @@ -4276,7 +4308,7 @@ nvptx_process_pars (parallel *par) nvptx_wpropagate (true, par->forked_block, par->fork_insn); /* Insert begin and end synchronizations. */ emit_insn_after (nvptx_wsync (false), par->forked_insn); - emit_insn_before (nvptx_wsync (true), par->joining_insn); + emit_insn_before (nvptx_wsync (true), par->join_insn); } else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)) nvptx_vpropagate (par->forked_block, par->forked_insn);
[PATCH] RISC-V: Fix bootstrap failure.
This fixes the RISC-V bootstrap failure, based on a suggestion/question from Serge Belyshev asking why the patch wasn't using PREFERRED_STACK_BOUNDARY, which turns out to be a good solution. So the patch renames STACK_BOUNDARY to PREFERRED_STACK_BOUNDARY, and renames MIN_STACK_BOUNDARY to STACK_BOUNDARY. This was tested with a cross compiler build with extra warning option to force the failure, and a cross make check. Tested with a native build. Tested by dissembling code with and without the patch to verify that I didn't accidentally change the ABI. And hand testing to verify that the option -mpreferred-stack-boundary was still working. Committed. Jim gcc/ PR bootstrap/84856 * config/riscv/riscv.c (riscv_function_arg_boundary): Use PREFERRED_STACK_BOUNDARY instead of STACK_BOUNDARY. (riscv_first_stack_step): Likewise. (riscv_option_override): Use STACK_BOUNDARY instead of MIN_STACK_BOUNDARY. * config/riscv/riscv.h (STACK_BOUNDARY): Renamed from MIN_STACK_BOUNDARY. (BIGGEST_ALIGNMENT): Set to 128. (PREFERRED_STACK_BOUNDARY): Renamed from STACK_BOUNDARY. (RISCV_STACK_ALIGN): Use PREFERRED_STACK_BOUNDARY instead of STACK_BOUNDARY. --- gcc/config/riscv/riscv.c | 8 gcc/config/riscv/riscv.h | 8 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index c38f6c394d5..0a75c8ae17f 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -2202,7 +2202,7 @@ riscv_expand_conditional_branch (rtx label, rtx_code code, rtx op0, rtx op1) /* Implement TARGET_FUNCTION_ARG_BOUNDARY. Every parameter gets at least PARM_BOUNDARY bits of alignment, but will be given anything up - to STACK_BOUNDARY bits if the type requires it. */ + to PREFERRED_STACK_BOUNDARY bits if the type requires it. */ static unsigned int riscv_function_arg_boundary (machine_mode mode, const_tree type) @@ -2215,7 +2215,7 @@ riscv_function_arg_boundary (machine_mode mode, const_tree type) else alignment = type ? TYPE_ALIGN (type) : GET_MODE_ALIGNMENT (mode); - return MIN (STACK_BOUNDARY, MAX (PARM_BOUNDARY, alignment)); + return MIN (PREFERRED_STACK_BOUNDARY, MAX (PARM_BOUNDARY, alignment)); } /* If MODE represents an argument that can be passed or returned in @@ -3506,7 +3506,7 @@ riscv_first_stack_step (struct riscv_frame_info *frame) return frame->total_size; HOST_WIDE_INT min_first_step = frame->total_size - frame->fp_sp_offset; - HOST_WIDE_INT max_first_step = IMM_REACH / 2 - STACK_BOUNDARY / 8; + HOST_WIDE_INT max_first_step = IMM_REACH / 2 - PREFERRED_STACK_BOUNDARY / 8; HOST_WIDE_INT min_second_step = frame->total_size - max_first_step; gcc_assert (min_first_step <= max_first_step); @@ -4137,7 +4137,7 @@ riscv_option_override (void) riscv_stack_boundary = ABI_STACK_BOUNDARY; if (riscv_preferred_stack_boundary_arg) { - int min = ctz_hwi (MIN_STACK_BOUNDARY / 8); + int min = ctz_hwi (STACK_BOUNDARY / 8); int max = 8; if (!IN_RANGE (riscv_preferred_stack_boundary_arg, min, max)) diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index 6144e267727..ebd80c0a5f2 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -124,13 +124,13 @@ along with GCC; see the file COPYING3. If not see #define FUNCTION_BOUNDARY (TARGET_RVC ? 16 : 32) /* The smallest supported stack boundary the calling convention supports. */ -#define MIN_STACK_BOUNDARY (2 * BITS_PER_WORD) +#define STACK_BOUNDARY (2 * BITS_PER_WORD) /* The ABI stack alignment. */ #define ABI_STACK_BOUNDARY 128 /* There is no point aligning anything to a rounder boundary than this. */ -#define BIGGEST_ALIGNMENT STACK_BOUNDARY +#define BIGGEST_ALIGNMENT 128 /* The user-level ISA permits unaligned accesses, but they are not required of the privileged architecture. */ @@ -482,7 +482,7 @@ enum reg_class `crtl->outgoing_args_size'. */ #define OUTGOING_REG_PARM_STACK_SPACE(FNTYPE) 1 -#define STACK_BOUNDARY riscv_stack_boundary +#define PREFERRED_STACK_BOUNDARY riscv_stack_boundary /* Symbolic macros for the registers used to return integer and floating point values. */ @@ -540,7 +540,7 @@ typedef struct { /* Align based on stack boundary, which might have been set by the user. */ #define RISCV_STACK_ALIGN(LOC) \ - (((LOC) + ((STACK_BOUNDARY/8)-1)) & -(STACK_BOUNDARY/8)) + (((LOC) + ((PREFERRED_STACK_BOUNDARY/8)-1)) & -(PREFERRED_STACK_BOUNDARY/8)) /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function, the stack pointer does not matter. The value is tested only in -- 2.14.1
C++ PATCH for c++/84927, ICE with NSDMI and reference
This nice test crashes in verify_constructor_flags because while the constructor as a whole is TREE_CONSTANT, one of its elements is not. We wound up in this situation because when we first get to cxx_eval_bare_aggregate, ctx->ctor is "{}", which is TREE_CONSTANT, while T is "{.r=(int &) &j, .i=*(&)->r}". We evaluate the first element of T and then append it to the empty ctx->ctor, so now ctx->ctor is "{.r=(int &) &j}", a constant ctor with a non-constant element. Then we get onto evaluating the second element of T. The second elt has a PLACEHOLDER_EXPR, so we get to 4695 case PLACEHOLDER_EXPR: 4696 /* Use of the value or address of the current object. */ 4697 if (tree ctor = lookup_placeholder (ctx, lval, TREE_TYPE (t))) 4698 return cxx_eval_constant_expression (ctx, ctor, lval, 4699 non_constant_p, overflow_p); where lookup_placeholder returns "{.r=(int &) &j}", which we try to evaluate, and crash in verify_constructor_flags in the CONSTRUCTOR case. I think let's update the flags of the constructor as we evaluate elements in cxx_eval_bare_aggregate; this should be cheaper than calling recompute_constructor_flags someplace else. I took the liberty to remove a stale FIXME; testing didn't reveal any new issues. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2018-03-19 Marek Polacek PR c++/84927 * constexpr.c (cxx_eval_bare_aggregate): Update constructor's flags as we evaluate the elements. (cxx_eval_constant_expression): Verify constructor's flags unconditionally. * g++.dg/cpp1y/nsdmi-aggr9.C: New test. diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c index 05a1cb64d61..894bcd0bb3e 100644 --- gcc/cp/constexpr.c +++ gcc/cp/constexpr.c @@ -2880,7 +2880,12 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree t, (*p)->last().value = elt; } else - CONSTRUCTOR_APPEND_ELT (*p, index, elt); + { + CONSTRUCTOR_APPEND_ELT (*p, index, elt); + /* Adding an element might change the ctor's flags. */ + TREE_CONSTANT (ctx->ctor) = constant_p; + TREE_SIDE_EFFECTS (ctx->ctor) = side_effects_p; + } } if (*non_constant_p || !changed) return t; @@ -4530,11 +4535,7 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t, { /* Don't re-process a constant CONSTRUCTOR, but do fold it to VECTOR_CST if applicable. */ - /* FIXME after GCC 6 branches, make the verify unconditional. */ - if (CHECKING_P) - verify_constructor_flags (t); - else - recompute_constructor_flags (t); + verify_constructor_flags (t); if (TREE_CONSTANT (t)) return fold (t); } diff --git gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C index e69de29bb2d..4e13fc5c9d8 100644 --- gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C +++ gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C @@ -0,0 +1,14 @@ +// PR c++/84927 - ICE with NSDMI and reference +// { dg-do compile { target c++14 } } + +struct A +{ + int& r; + int i = r; +}; + +void foo() +{ + int j; + A a = A{j}; +} Marek
Re: C++ PATCH for c++/84927, ICE with NSDMI and reference
OK. On Mon, Mar 19, 2018 at 2:22 PM, Marek Polacek wrote: > This nice test crashes in verify_constructor_flags because while the > constructor as a whole is TREE_CONSTANT, one of its elements is not. > > We wound up in this situation because when we first get to > cxx_eval_bare_aggregate, ctx->ctor is "{}", which is TREE_CONSTANT, while T is > "{.r=(int &) &j, .i=*(&)->r}". We evaluate the > first element of T and then append it to the empty ctx->ctor, so now ctx->ctor > is "{.r=(int &) &j}", a constant ctor with a non-constant element. Then we > get > onto evaluating the second element of T. The second elt has a > PLACEHOLDER_EXPR, so we get to > 4695 case PLACEHOLDER_EXPR: > 4696 /* Use of the value or address of the current object. */ > 4697 if (tree ctor = lookup_placeholder (ctx, lval, TREE_TYPE (t))) > 4698 return cxx_eval_constant_expression (ctx, ctor, lval, > 4699 non_constant_p, overflow_p); > where lookup_placeholder returns "{.r=(int &) &j}", which we try to > evaluate, and crash in verify_constructor_flags in the CONSTRUCTOR case. > > I think let's update the flags of the constructor as we evaluate elements in > cxx_eval_bare_aggregate; this should be cheaper than calling > recompute_constructor_flags someplace else. > > I took the liberty to remove a stale FIXME; testing didn't reveal any new > issues. > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > 2018-03-19 Marek Polacek > > PR c++/84927 > * constexpr.c (cxx_eval_bare_aggregate): Update constructor's flags > as we evaluate the elements. > (cxx_eval_constant_expression): Verify constructor's flags > unconditionally. > > * g++.dg/cpp1y/nsdmi-aggr9.C: New test. > > diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c > index 05a1cb64d61..894bcd0bb3e 100644 > --- gcc/cp/constexpr.c > +++ gcc/cp/constexpr.c > @@ -2880,7 +2880,12 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, > tree t, > (*p)->last().value = elt; > } >else > - CONSTRUCTOR_APPEND_ELT (*p, index, elt); > + { > + CONSTRUCTOR_APPEND_ELT (*p, index, elt); > + /* Adding an element might change the ctor's flags. */ > + TREE_CONSTANT (ctx->ctor) = constant_p; > + TREE_SIDE_EFFECTS (ctx->ctor) = side_effects_p; > + } > } >if (*non_constant_p || !changed) > return t; > @@ -4530,11 +4535,7 @@ cxx_eval_constant_expression (const constexpr_ctx > *ctx, tree t, > { > /* Don't re-process a constant CONSTRUCTOR, but do fold it to > VECTOR_CST if applicable. */ > - /* FIXME after GCC 6 branches, make the verify unconditional. */ > - if (CHECKING_P) > - verify_constructor_flags (t); > - else > - recompute_constructor_flags (t); > + verify_constructor_flags (t); > if (TREE_CONSTANT (t)) > return fold (t); > } > diff --git gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C > gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C > index e69de29bb2d..4e13fc5c9d8 100644 > --- gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C > +++ gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr9.C > @@ -0,0 +1,14 @@ > +// PR c++/84927 - ICE with NSDMI and reference > +// { dg-do compile { target c++14 } } > + > +struct A > +{ > + int& r; > + int i = r; > +}; > + > +void foo() > +{ > + int j; > + A a = A{j}; > +} > > Marek
[PR c++/84835] ICE with generic lambda in extern "C"
In an extern "C" region the lambda member fns were getting C linkage. and triggering an assert I added. I'm not sure why just the generic case was being triggered, but the fix is to just force the language to be C++ -- this is what we do in finish_member_declaration. We were also not copying the linkage into the template header. Mostly that was ok, because we then ignored it. But sometimes, like this, things went wrong. applying to trunk. nathan -- Nathan Sidwell 2018-03-19 Nathan Sidwell PR c++/84835 * lambda.c (maybe_add_lambda_conv_op): Force C++ linkage. * pt.c (build_template_decl): Propagate language linkage. PR c++/84835 * g++.dg/cpp1y/pr84835.C: New. Index: cp/lambda.c === --- cp/lambda.c (revision 258642) +++ cp/lambda.c (working copy) @@ -1176,6 +1176,7 @@ maybe_add_lambda_conv_op (tree type) tree thistype = cp_build_qualified_type (type, TYPE_QUAL_CONST); tree fntype = build_method_type_directly (thistype, rettype, void_list_node); tree convfn = build_lang_decl (FUNCTION_DECL, name, fntype); + SET_DECL_LANGUAGE (convfn, lang_cplusplus); tree fn = convfn; DECL_SOURCE_LOCATION (fn) = DECL_SOURCE_LOCATION (callop); SET_DECL_ALIGN (fn, MINIMUM_METHOD_BOUNDARY); @@ -1208,6 +1209,7 @@ maybe_add_lambda_conv_op (tree type) name = get_identifier ("_FUN"); tree statfn = build_lang_decl (FUNCTION_DECL, name, stattype); + SET_DECL_LANGUAGE (statfn, lang_cplusplus); fn = statfn; DECL_SOURCE_LOCATION (fn) = DECL_SOURCE_LOCATION (callop); grokclassfn (type, fn, NO_SPECIAL); Index: cp/pt.c === --- cp/pt.c (revision 258642) +++ cp/pt.c (working copy) @@ -4677,6 +4677,7 @@ tree build_template_decl (tree decl, tree parms, bool member_template_p) { tree tmpl = build_lang_decl (TEMPLATE_DECL, DECL_NAME (decl), NULL_TREE); + SET_DECL_LANGUAGE (tmpl, DECL_LANGUAGE (decl)); DECL_TEMPLATE_PARMS (tmpl) = parms; DECL_CONTEXT (tmpl) = DECL_CONTEXT (decl); DECL_SOURCE_LOCATION (tmpl) = DECL_SOURCE_LOCATION (decl); Index: testsuite/g++.dg/cpp1y/pr84835.C === --- testsuite/g++.dg/cpp1y/pr84835.C (revision 0) +++ testsuite/g++.dg/cpp1y/pr84835.C (working copy) @@ -0,0 +1,20 @@ +// { dg-do compile { target c++14 } } +// PR c++/84835 +// ICE with generic lambda inside extern "C" + +extern "C" +{ + auto r = [] (auto x) + { +void baz (); // extern "C" +baz (); + }; + +} + +void g () +{ + r (0); +} + +// { dg-final { scan-assembler "\[^0-9\]baz" } }
Re: [PATCH PR81647][AARCH64] Fix handling of Unordered Comparisons in aarch64-simd.md
Hi On 19/03/18 14:29, James Greenhalgh wrote: On Fri, Dec 15, 2017 at 11:57:46AM +, Sudi Das wrote: Hi This patch fixes the inconsistent behavior observed at -O3 for the unordered comparisons. According to the online docs (https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), all of the following should not raise an FP exception: - UNGE_EXPR - UNGT_EXPR - UNLE_EXPR - UNLT_EXPR - UNEQ_EXPR Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one. The aarch64-simd.md handling of these were generating exception raising instructions such as fcmgt. This patch changes the instructions that are emitted to in order to not give out the exceptions. We first check each operand for NaNs and force any elements containing NaN to zero before using them in the compare. Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : a, isnan (b) ? 0.0 : b)) The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)). Testing done: Checked for regressions on bootstrapped aarch64-none-linux-gnu and added a new test case. Is this ok for trunk? This will probably need a back-port to gcc-7-branch as well. OK. Let it soak on trunk for a while before the backport. Thanks. Committed to trunk as r258653. Will wait a week before backport. Sudi Thanks, James ChangeLog Entries: *** gcc/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * config/aarch64/aarch64-simd.md (vec_cmp): Modify instructions for UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED. *** gcc/testsuite/ChangeLog *** 2017-12-15 Sudakshina Das PR target/81647 * gcc.target/aarch64/pr81647.c: New. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index f90f74fe7fd5990a97b9f4eb68f5735b7d4fb9aa..acff06c753b3e3aaa5775632929909afa4d3294b 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2731,10 +2731,10 @@ break; } /* Fall through. */ -case UNGE: +case UNLT: std::swap (operands[2], operands[3]); /* Fall through. */ -case UNLE: +case UNGT: case GT: comparison = gen_aarch64_cmgt; break; @@ -2745,10 +2745,10 @@ break; } /* Fall through. */ -case UNGT: +case UNLE: std::swap (operands[2], operands[3]); /* Fall through. */ -case UNLT: +case UNGE: case GE: comparison = gen_aarch64_cmge; break; @@ -2771,21 +2771,35 @@ case UNGT: case UNLE: case UNLT: -case NE: - /* FCM returns false for lanes which are unordered, so if we use -the inverse of the comparison we actually want to emit, then -invert the result, we will end up with the correct result. -Note that a NE NaN and NaN NE b are true for all a, b. - -Our transformations are: -a UNGE b -> !(b GT a) -a UNGT b -> !(b GE a) -a UNLE b -> !(a GT b) -a UNLT b -> !(a GE b) -a NE b -> !(a EQ b) */ - gcc_assert (comparison != NULL); - emit_insn (comparison (operands[0], operands[2], operands[3])); - emit_insn (gen_one_cmpl2 (operands[0], operands[0])); + { + /* All of the above must not raise any FP exceptions. Thus we first + check each operand for NaNs and force any elements containing NaN to + zero before using them in the compare. + Example: UN (a, b) -> UNORDERED (a, b) | +(cm (isnan (a) ? 0.0 : a, + isnan (b) ? 0.0 : b)) + We use the following transformations for doing the comparisions: + a UNGE b -> a GE b + a UNGT b -> a GT b + a UNLE b -> b GE a + a UNLT b -> b GT a. */ + + rtx tmp0 = gen_reg_rtx (mode); + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + emit_insn (gen_aarch64_cmeq (tmp0, operands[2], operands[2])); + emit_insn (gen_aarch64_cmeq (tmp1, operands[3], operands[3])); + emit_insn (gen_and3 (tmp2, tmp0, tmp1)); + emit_insn (gen_and3 (tmp0, tmp0, + lowpart_subreg (mode, operands[2], mode))); + emit_insn (gen_and3 (tmp1, tmp1, + lowpart_subreg (mode, operands[3], mode))); + gcc_assert (comparison != NULL); + emit_insn (comparison (operands[0], + lowpart_subreg (mode, tmp0, mode), + lowpart_subreg (mode, tmp1, mode))); + emit_insn (gen_orn3 (operands[0], tmp2, operands[0])); + } break; case LT: @@ -2793,25 +2807,19 @@ case GT: case GE: case EQ: +case NE: /* The easy case. Here we emit one of FCMGE, FCMGT or FCMEQ. As a LT b <=> b GE a && a LE b <=> b GT
Re: C++ PATCH for c++/84925, ICE in enclosing_instantiation_of
OK. On Mon, Mar 19, 2018 at 7:55 AM, Marek Polacek wrote: > Seems like with this testcase we end up in a scenario where, when counting the > lambda count in enclosing_instantiation_of, we arrive at decl_function_context > being null, so checking DECL_TEMPLATE_INFO then crashes. It appears that just > the following does the right thing, but I'm not too sure about this function. > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > 2018-03-19 Marek Polacek > > PR c++/84925 > * pt.c (enclosing_instantiation_of): Check if fn is null. > > * g++.dg/cpp1z/lambda-__func__.C: New test. > > diff --git gcc/cp/pt.c gcc/cp/pt.c > index 745c9acd6ee..066cb627a70 100644 > --- gcc/cp/pt.c > +++ gcc/cp/pt.c > @@ -12898,7 +12898,7 @@ enclosing_instantiation_of (tree otctx) >for (; flambda_count < lambda_count && fn && LAMBDA_FUNCTION_P (fn); >fn = decl_function_context (fn)) > ++flambda_count; > - if (DECL_TEMPLATE_INFO (fn) > + if ((fn && DECL_TEMPLATE_INFO (fn)) > ? most_general_template (fn) != most_general_template (tctx) > : fn != tctx) > continue; > diff --git gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C > gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C > index e69de29bb2d..d5d1c1cb7b6 100644 > --- gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C > +++ gcc/testsuite/g++.dg/cpp1z/lambda-__func__.C > @@ -0,0 +1,13 @@ > +// PR c++/84925 > +// { dg-options "-std=c++17" } > + > +template > +struct A { > + static const int value = 0; > + static auto constexpr fn = [] { return __func__; }; > +}; > + > +template > +int x = A::value; > + > +auto s = x; > > Marek
[PATCH] Fix __builtin_cpu_supports (PR target/84945)
Hi! As mentioned in the PR, this is another case where we've run into too many separate feature bits issues on x86. unfortunately in this case it affects ABI. __cpu_model variable contains just 32 bits for the features, and we now need 4 extra features that won't fit in there. The current code just invokes in the compiler as well as runtime initialization. While on {x86_64,i?86}-linux __cpu_model is exported from libgcc_s.so.1 only for backwards compatibility and new code is always using a hidden copy in each DSO or binary from libgcc.a, that is not the case on other x86 targets. For a Linux only solution, we could just grow up __cpu_model except for the one in libgcc_s.so.1 (i.e. #ifdef SHARED) and as long as GCC 8+ libgcc.a would be used for linking of __builtin_cpu_supports code from both old and new gcc things would just work. For other targets, as __cpu_model is exported data object, binaries could have copy relocations against it, so we really can't change the size. As the preferred way is to put it into libgcc.a only, this patch just puts the overflow features into a new separate variable which is not put into libgcc_s.so.1 at all, which will force use of the libgcc.a copy for at least __builtin_cpu_supports calls that ask for the new features. Bootstrapped/regtested on x86_64-linux and i686-linux (on Haswell-E, don't have access to any of the CPUs with the new features), ok for trunk? 2018-03-19 Jakub Jelinek PR target/84945 * config/i386/i386.c (fold_builtin_cpu): For features above 31 use __cpu_features2 variable instead of __cpu_model.__cpu_features[0]. Use 1U instead of 1. Formatting fixes. * gcc.target/i386/pr84945.c: New test. * config/i386/cpuinfo.h (__cpu_features2): Declare. * config/i386/cpuinfo.c (__cpu_features2): New variable for ifndef SHARED only. (set_feature): Define. (get_available_features): Use set_feature macro. Set __cpu_features2 to the second word of features ifndef SHARED. --- gcc/config/i386/i386.c.jj 2018-03-17 12:11:39.954436461 +0100 +++ gcc/config/i386/i386.c 2018-03-19 16:36:55.565231809 +0100 @@ -33265,8 +33264,8 @@ fold_builtin_cpu (tree fndecl, tree *arg } /* Get the appropriate field in __cpu_model. */ - ref = build3 (COMPONENT_REF, TREE_TYPE (field), __cpu_model_var, -field, NULL_TREE); + ref = build3 (COMPONENT_REF, TREE_TYPE (field), __cpu_model_var, + field, NULL_TREE); /* Check the value. */ final = build2 (EQ_EXPR, unsigned_type_node, ref, @@ -33296,20 +33295,34 @@ fold_builtin_cpu (tree fndecl, tree *arg return integer_zero_node; } + if (isa_names_table[i].feature >= 32) + { + tree __cpu_features2_var = make_var_decl (unsigned_type_node, + "__cpu_features2"); + + varpool_node::add (__cpu_features2_var); + field_val = (1U << (isa_names_table[i].feature - 32)); + /* Return __cpu_features2 & field_val */ + final = build2 (BIT_AND_EXPR, unsigned_type_node, + __cpu_features2_var, + build_int_cstu (unsigned_type_node, field_val)); + return build1 (CONVERT_EXPR, integer_type_node, final); + } + field = TYPE_FIELDS (__processor_model_type); /* Get the last field, which is __cpu_features. */ while (DECL_CHAIN (field)) field = DECL_CHAIN (field); /* Get the appropriate field: __cpu_model.__cpu_features */ - ref = build3 (COMPONENT_REF, TREE_TYPE (field), __cpu_model_var, -field, NULL_TREE); + ref = build3 (COMPONENT_REF, TREE_TYPE (field), __cpu_model_var, + field, NULL_TREE); /* Access the 0th element of __cpu_features array. */ array_elt = build4 (ARRAY_REF, unsigned_type_node, ref, integer_zero_node, NULL_TREE, NULL_TREE); - field_val = (1 << isa_names_table[i].feature); + field_val = (1U << (isa_names_table[i].feature)); /* Return __cpu_model.__cpu_features[0] & field_val */ final = build2 (BIT_AND_EXPR, unsigned_type_node, array_elt, build_int_cstu (unsigned_type_node, field_val)); --- gcc/testsuite/gcc.target/i386/pr84945.c.jj 2018-03-19 16:37:14.667228396 +0100 +++ gcc/testsuite/gcc.target/i386/pr84945.c 2018-03-19 16:36:42.057234231 +0100 @@ -0,0 +1,16 @@ +/* PR target/84945 */ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +int +main () +{ + /* AVX512_VNNI instructions are all EVEX encoded, so if + __builtin_cpu_supports says avx512vnni is available and avx512f is not, + this is a GCC bug. Ditto for AVX512_BITALG */ + if (!__builtin_cpu_supports ("avx512f") + && (__builtin_cpu_supports ("avx512vnni") + || __builtin_cpu_supports ("avx512bitalg"))) +__builtin_abort ();
[PATCH] Avoid UBSAN -fsanitize=enum errors on GCC memmodel enum (PR rtl-optimization/84643)
Hi! The enum memmodel doesn't declare the target specific enumerators, which are just defines in i386.h, but as those are above any enumerators in the generic enum, it is strictly speaking UB when values with that enum memmodel type have the higher target dependent bits set. This patch just makes sure that the enumerator is full signed 32-bit. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-03-19 Jakub Jelinek PR rtl-optimization/84643 * memmodel.h (enum memmodel): Add MEMMODEL_MAX enumerator. --- gcc/memmodel.h.jj 2018-01-03 10:19:55.495534012 +0100 +++ gcc/memmodel.h 2018-03-19 10:31:42.980812157 +0100 @@ -45,7 +45,9 @@ enum memmodel MEMMODEL_LAST = 6, MEMMODEL_SYNC_ACQUIRE = MEMMODEL_ACQUIRE | MEMMODEL_SYNC, MEMMODEL_SYNC_RELEASE = MEMMODEL_RELEASE | MEMMODEL_SYNC, - MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC + MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC, + /* Say that all the higher bits are valid target extensions. */ + MEMMODEL_MAX = INTTYPE_MAXIMUM (int) }; /* Return the memory model from a host integer. */ Jakub
[C++ PATCH] Fix FIX_TRUNC_EXPR instantiation (PR c++/84942)
Hi! We instantiate FIX_TRUNC_EXPR using cp_build_unary_op, but that function doesn't look like a good match for this tree, what it really does for it is: 1) handle error_operand_p 2) on ia64 only diagnose __fpreg uses (I believe a cast of __fpreg to int is fine, so we shouldn't warn) 3) and then it does: argtype = TREE_TYPE (arg); return build1 (code, argtype, arg); which creates FIX_TRUNC_EXPR with the type of the argument rather than the integral type it should have. The following patch thus bypasses that function and just builds the FIX_TRUNC_EXPR with the right type. I think we don't need to tsubst the type, because if it is type dependent, we wouldn't be creating FIX_TRUNC_EXPR, but record it just as a some kind of conversion. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-03-19 Jakub Jelinek PR c++/84942 * pt.c (tsubst_copy_and_build) : Don't call cp_build_unary_op, just use build1. * g++.dg/cpp1y/pr84942.C: New test. --- gcc/cp/pt.c.jj 2018-03-16 21:11:04.440773108 +0100 +++ gcc/cp/pt.c 2018-03-19 10:58:39.803657613 +0100 @@ -17495,8 +17495,10 @@ tsubst_copy_and_build (tree t, complain|decltype_flag)); case FIX_TRUNC_EXPR: - RETURN (cp_build_unary_op (FIX_TRUNC_EXPR, RECUR (TREE_OPERAND (t, 0)), -false, complain)); + op1 = RECUR (TREE_OPERAND (t, 0)); + if (error_operand_p (op1)) +RETURN (error_mark_node); + RETURN (build1 (FIX_TRUNC_EXPR, TREE_TYPE (t), op1)); case ADDR_EXPR: op1 = TREE_OPERAND (t, 0); --- gcc/testsuite/g++.dg/cpp1y/pr84942.C.jj 2018-03-19 11:04:16.201588211 +0100 +++ gcc/testsuite/g++.dg/cpp1y/pr84942.C2018-03-19 11:02:37.950609950 +0100 @@ -0,0 +1,7 @@ +// PR c++/84942 +// { dg-do compile { target c++14 } } +// { dg-options "-w" } + +int a(__attribute__((b((int)__builtin_inf() * 1ULL / auto; +// { dg-error "expected primary-expression before" "" { target *-*-* } .-1 } +// { dg-error "declared as implicit template" "" { target *-*-* } .-2 } Jakub
C++ PATCH for c++/71834, template-id with too few arguments accepted
Here we set "lost" and then returned error_mark_node without ever actually giving an error, because the comment that !arg only occurred with a previous error was wrong (and lacked an assert (seen_error())). We already had code for dealing with the case of fixed packs and too many arguments, but not for too few. With this patch we now calculate the actual number of arguments expected as we go along and adjust them; if we end up with the wrong number, we can give an error at the end. Tested x86_64-pc-linux-gnu, applying to trunk. commit f7f58d8bee895207d10a776a72f8c33da3e648cb Author: Jason Merrill Date: Fri Mar 16 16:15:45 2018 -0400 PR c++/71834 - template-id with too few arguments. * pt.c (coerce_template_parms): Check fixed_parameter_pack_p. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 745c9acd6ee..abdb77f826a 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -8234,6 +8234,11 @@ coerce_template_parms (tree parms, int variadic_args_p = 0; int post_variadic_parms = 0; + /* Adjustment to nparms for fixed parameter packs. */ + int fixed_pack_adjust = 0; + int fixed_packs = 0; + int missing = 0; + /* Likewise for parameters with default arguments. */ int default_p = 0; @@ -8278,6 +8283,7 @@ coerce_template_parms (tree parms, || (TREE_VEC_ELT (parms, nargs) != error_mark_node && !TREE_PURPOSE (TREE_VEC_ELT (parms, nargs)) { +bad_nargs: if (complain & tf_error) { if (variadic_p || default_p) @@ -8389,11 +8395,17 @@ coerce_template_parms (tree parms, lost++; /* We are done with all of the arguments. */ arg_idx = nargs; + break; } else { pack_adjust = TREE_VEC_LENGTH (ARGUMENT_PACK_ARGS (arg)) - 1; arg_idx += pack_adjust; + if (fixed_parameter_pack_p (TREE_VALUE (parm))) + { + ++fixed_packs; + fixed_pack_adjust += pack_adjust; + } } continue; @@ -8451,12 +8463,18 @@ coerce_template_parms (tree parms, error ("template argument %d is invalid", arg_idx + 1); } else if (!arg) -/* This only occurs if there was an error in the template - parameter list itself (which we would already have - reported) that we are trying to recover from, e.g., a class - template with a parameter list such as - template. */ - ++lost; + { + /* This can occur if there was an error in the template + parameter list itself (which we would already have + reported) that we are trying to recover from, e.g., a class + template with a parameter list such as + template (cpp0x/variadic150.C). */ + ++lost; + + /* This can also happen with a fixed parameter pack (71834). */ + if (arg_idx >= nargs) + ++missing; + } else arg = convert_template_argument (TREE_VALUE (parm), arg, new_args, complain, @@ -8469,20 +8487,20 @@ coerce_template_parms (tree parms, cp_unevaluated_operand = saved_unevaluated_operand; c_inhibit_evaluation_warnings = saved_inhibit_evaluation_warnings; - if (variadic_p && arg_idx < nargs) + if (missing || arg_idx < nargs - variadic_args_p) { - if (complain & tf_error) - { - error ("wrong number of template arguments " - "(%d, should be %d)", nargs, arg_idx); - if (in_decl) - error ("provided for %q+D", in_decl); - } - return error_mark_node; + /* If we had fixed parameter packs, we didn't know how many arguments we + actually needed earlier; now we do. */ + nparms += fixed_pack_adjust; + variadic_p -= fixed_packs; + goto bad_nargs; } if (lost) -return error_mark_node; +{ + gcc_assert (!(complain & tf_error) || seen_error ()); + return error_mark_node; +} if (CHECKING_P && !NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args)) SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args, diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-nested3.C b/gcc/testsuite/g++.dg/cpp0x/variadic-nested3.C new file mode 100644 index 000..381ff731c09 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic-nested3.C @@ -0,0 +1,10 @@ +// PR c++/71834 +// { dg-do compile { target c++11 } } + +template < typename ... Ts > struct A +{ + template < Ts ..., typename U > struct B {}; +}; + +// should be, e.g.: A < int >::B < 0, int > e; +A < int >::B < 0 > e; // { dg-error "wrong number of template arguments" }
[PATCH] Security improvement for dwarf indirect constants (PR sanitizer/78651)
Hi! When Maxim posted his patch today, I had a look and found that we emit these indirect constants into .data sections, even when they always contain just some pointer that needs relocation, but shouldn't be changed afterwards; after all, the indirect constants have TREE_READONLY set. The problem is that we use DECL_INITIAL (decl) = decl; and that isn't something the section selection code recognizes as valid or reasonable initializer. This patch uses ADDR_EXPR of the decl instead, i.e. something that needs a relocation if -fpic/-fpie, but can be relro, and in position dependent binaries can be .rodata from the beginning. This will make it harder from somebody trying to change these. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-03-19 Jakub Jelinek PR sanitizer/78651 * dwarf2asm.c: Include fold-const.c. (dw2_output_indirect_constant_1): Set DECL_INITIAL (decl) to ADDR_EXPR of decl rather than decl itself. --- gcc/dwarf2asm.c.jj 2018-02-09 06:44:29.952809199 +0100 +++ gcc/dwarf2asm.c 2018-03-19 17:18:09.82448 +0100 @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. #include "dwarf2.h" #include "function.h" #include "emit-rtl.h" +#include "fold-const.h" #ifndef XCOFF_DEBUGGING_INFO #define XCOFF_DEBUGGING_INFO 0 @@ -954,7 +955,7 @@ dw2_output_indirect_constant_1 (const ch SET_DECL_ASSEMBLER_NAME (decl, id); DECL_ARTIFICIAL (decl) = 1; DECL_IGNORED_P (decl) = 1; - DECL_INITIAL (decl) = decl; + DECL_INITIAL (decl) = build_fold_addr_expr (decl); TREE_READONLY (decl) = 1; TREE_STATIC (decl) = 1; Jakub
Re: [PATCH] [Microblaze]: PIC Data Text Relative
Hi Andrew -- Good work. Please submit your updated patch. Check that you follow GNU coding standards. Also make sure that the new options are documented in gcc/doc/invoke.texi. On 03/18/18 03:27, Andrew Sadek wrote: Hello Michael, I have run the test using the new PIC options. Actually, I have discovered 2 unhandled cases in 'microblaze_expand_move' + missing conditions in linker relax leading some test cases execution to fail. After fixing them, I made a re-run for the whole regression, and the results analogy below: Original, without my patch: === gcc Summary === # of expected passes 90776 # of unexpected failures 1317 # of unexpected successes 3 # of expected failures 207 # of unresolved testcases 115 # of unsupported tests 2828 With my patch, calling '-fPIE - mpic-data-text-rel' === gcc Summary === # of expected passes 90843 # of unexpected failures 1256 # of unexpected successes 3 # of expected failures 207 # of unresolved testcases 115 # of unsupported tests 2853 After running the 'dg-cmp-results.sh' in contrib folder, the PASS->FAIL are below: PASS->FAIL: gcc.dg/uninit-19.c (test for excess errors) PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O0 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O1 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O2 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O3 -g scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -Os scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O0 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O1 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O2 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O3 -g scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -Os scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O0 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O1 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O2 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O3 -g scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -Os scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c -O0 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c -O1 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c -O2 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c -O3 -g scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c -Os scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var4.c -O0 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var4.c -O1 scan-assembler lwi\tr([0-9]|[
Re: [PATCH] Fix ICE in match_asm_constraints_1 (PR inline-asm/84941)
Hi, On Mon, 19 Mar 2018, Jakub Jelinek wrote: > The inline-asm here has "1p" constraint and we end up with > (plus (frame) (const_int ...)) Blaeh. Note that "1p" is actually invalid: -- `0', `1', `2', ... `9' An operand that matches the specified operand number is allowed. If a digit is used together with letters within the same alternative, the digit should come last. -- Changing the order to "p1" would disable the transformation as well, because match_asm_constraints_1() uses strtoul on the constraint start. But let's say we don't want to go there and reject this form for good (though I think we should), so ... > as input; even when the matching output is a REG, I'm not really > sure emit_move_insn can handle arbitrary expressions (consider e.g. > "1X" constraint), and calling reg_overlap_mentioned_p on something other > than REG/SUBREG/MEM/constant (and a couple of selected other cases) > will ICE too. My understanding is that the match_asm_constraints mini-pass > is an optimization, so we can try to handle cases which make sense, but if > there is something too difficult to handle we can just punt and let reload > do its job or error_for_asm if it can't reload it. Especially when I > believe such input expressions can be there only when the numeric constraint > is mixed with some other one, otherwise it should have been a REG or MEM > or something similar. ... this makes sense. But I think you're too generous in supporting strange inputs: >if (! REG_P (output) > || rtx_equal_p (output, input) > || (GET_MODE (input) != VOIDmode > - && GET_MODE (input) != GET_MODE (output))) > + && GET_MODE (input) != GET_MODE (output)) > + || !(REG_P (input) || SUBREG_P (input) > +|| MEM_P (input) || CONSTANT_P (input))) I'd only allow REG_P (input) as well, not any of the other forms. Ciao, Michael.
Re: [PATCH] Security improvement for dwarf indirect constants (PR sanitizer/78651)
On March 19, 2018 9:01:53 PM GMT+01:00, Jakub Jelinek wrote: >Hi! > >When Maxim posted his patch today, I had a look and found that we emit >these indirect constants into .data sections, even when they always >contain >just some pointer that needs relocation, but shouldn't be changed >afterwards; after all, the indirect constants have TREE_READONLY set. > >The problem is that we use DECL_INITIAL (decl) = decl; and that isn't >something the section selection code recognizes as valid or reasonable >initializer. This patch uses ADDR_EXPR of the decl instead, i.e. >something >that needs a relocation if -fpic/-fpie, but can be relro, and in >position >dependent binaries can be .rodata from the beginning. This will make >it >harder from somebody trying to change these. > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Richard. >2018-03-19 Jakub Jelinek > > PR sanitizer/78651 > * dwarf2asm.c: Include fold-const.c. > (dw2_output_indirect_constant_1): Set DECL_INITIAL (decl) to ADDR_EXPR > of decl rather than decl itself. > >--- gcc/dwarf2asm.c.jj 2018-02-09 06:44:29.952809199 +0100 >+++ gcc/dwarf2asm.c2018-03-19 17:18:09.82448 +0100 >@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. > #include "dwarf2.h" > #include "function.h" > #include "emit-rtl.h" >+#include "fold-const.h" > > #ifndef XCOFF_DEBUGGING_INFO > #define XCOFF_DEBUGGING_INFO 0 >@@ -954,7 +955,7 @@ dw2_output_indirect_constant_1 (const ch > SET_DECL_ASSEMBLER_NAME (decl, id); > DECL_ARTIFICIAL (decl) = 1; > DECL_IGNORED_P (decl) = 1; >- DECL_INITIAL (decl) = decl; >+ DECL_INITIAL (decl) = build_fold_addr_expr (decl); > TREE_READONLY (decl) = 1; > TREE_STATIC (decl) = 1; > > > Jakub
Re: [PATCH] Fix libsanitizer on i?86-linux with glibc 2.27+ (PR sanitizer/84761)
On March 19, 2018 8:57:22 PM GMT+01:00, Jakub Jelinek wrote: >Hi! > >As mentioned in the PR, libsanitizer cheats and attempts to call >@GLIBC_PRIVATE functions which are reserved to glibc and can change >anytime. They have changed on i?86 in particular in glibc 2.27, where >internal_function attribute has been dropped, so the functions for some >reason sadly aren't regparm(3) anymore. > >This patch changes libsanitizer, so that when built against glibc 2.26 >and >earlier, it checks at runtime if glibc 2.27+ is used using a symbol >that has >been added ~ a month after the internal_function change, and when built >against glibc 2.27+, it just assumes it will never be regparm(3). > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Richard. >I've submitted it to upstream in https://reviews.llvm.org/D44623 too, >but >it might take some time to get it through there. > >2018-03-19 Jakub Jelinek > > PR sanitizer/84761 > * sanitizer_common/sanitizer_linux_libcdep.cc (__GLIBC_PREREQ): > Define if not defined. > (DL_INTERNAL_FUNCTION): Don't define. > (InitTlsSize): For __i386__ if not compiled against glibc 2.27+ > determine at runtime whether to use regparm(3), stdcall calling > convention for older glibcs or normal calling convention for > newer glibcs for call to _dl_get_tls_static_info. > >--- >libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc.jj2017-10-19 >13:20:58.972958379 +0200 >+++ libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc 2018-03-19 >14:05:56.178952563 +0100 >@@ -147,29 +147,44 @@ bool SanitizerGetThreadName(char *name, > #endif > } > >+#ifndef __GLIBC_PREREQ >+#define __GLIBC_PREREQ(x, y) 0 >+#endif >+ > #if !SANITIZER_FREEBSD && !SANITIZER_ANDROID && !SANITIZER_GO && \ > !SANITIZER_NETBSD > static uptr g_tls_size; > >-#ifdef __i386__ >-# define DL_INTERNAL_FUNCTION __attribute__((regparm(3), stdcall)) >-#else >-# define DL_INTERNAL_FUNCTION >-#endif >- > void InitTlsSize() { > // all current supported platforms have 16 bytes stack alignment > const size_t kStackAlign = 16; >- typedef void (*get_tls_func)(size_t*, size_t*) DL_INTERNAL_FUNCTION; >- get_tls_func get_tls; >- void *get_tls_static_info_ptr = dlsym(RTLD_NEXT, >"_dl_get_tls_static_info"); >- CHECK_EQ(sizeof(get_tls), sizeof(get_tls_static_info_ptr)); >- internal_memcpy(&get_tls, &get_tls_static_info_ptr, >- sizeof(get_tls_static_info_ptr)); >- CHECK_NE(get_tls, 0); > size_t tls_size = 0; > size_t tls_align = 0; >- get_tls(&tls_size, &tls_align); >+ void *get_tls_static_info_ptr = dlsym(RTLD_NEXT, >"_dl_get_tls_static_info"); >+#if defined(__i386__) && !__GLIBC_PREREQ(2, 27) >+ /* On i?86, _dl_get_tls_static_info used to be internal_function, >i.e. >+ __attribute__((regparm(3), stdcall)) before glibc 2.27 and is >normal >+ function in 2.27 and later. */ >+ if (!dlvsym(RTLD_NEXT, "glob", "GLIBC_2.27")) { >+typedef void (*get_tls_func)(size_t*, size_t*) >+ __attribute__((regparm(3), stdcall)); >+get_tls_func get_tls; >+CHECK_EQ(sizeof(get_tls), sizeof(get_tls_static_info_ptr)); >+internal_memcpy(&get_tls, &get_tls_static_info_ptr, >+sizeof(get_tls_static_info_ptr)); >+CHECK_NE(get_tls, 0); >+get_tls(&tls_size, &tls_align); >+ } else >+#endif >+ { >+typedef void (*get_tls_func)(size_t*, size_t*); >+get_tls_func get_tls; >+CHECK_EQ(sizeof(get_tls), sizeof(get_tls_static_info_ptr)); >+internal_memcpy(&get_tls, &get_tls_static_info_ptr, >+sizeof(get_tls_static_info_ptr)); >+CHECK_NE(get_tls, 0); >+get_tls(&tls_size, &tls_align); >+ } > if (tls_align < kStackAlign) > tls_align = kStackAlign; > g_tls_size = RoundUpTo(tls_size, tls_align); > > Jakub
Re: [PATCH][PR sanitizer/78651] Incorrect exception handling when catch clause uses local class and PIC and sanitizer are active
On Mon, Mar 19, 2018 at 07:28:11PM +0300, Maxim Ostapenko wrote: > Yes, it works, attaching the patch. Can you please add a short comment why we do this? Also, the testcase needs work, see below. Ok for trunk with those changes and after a while for affected release branches as well. Thanks. > --- /dev/null > +++ b/gcc/testsuite/g++.dg/asan/pr78651.C > @@ -0,0 +1,24 @@ > +// { dg-do run { target fpic } } Effective target fpic just means that -fpic or -fPIC is supported, nothing else. So, you want instead: // PR sanitizer/78651 // { dg-do run } // { dg-additional-options "-fpic" { target fpic } } and verify make check-c++-all RUNTESTFLAGS=asan.exp=pr78651.C fails without the dwarf2asm.c patch and succeeds with it. Jakub
Re: [PATCH] Fix ICE in match_asm_constraints_1 (PR inline-asm/84941)
On Mon, Mar 19, 2018 at 08:09:00PM +, Michael Matz wrote: > Hi, > > On Mon, 19 Mar 2018, Jakub Jelinek wrote: > > > The inline-asm here has "1p" constraint and we end up with > > (plus (frame) (const_int ...)) > > Blaeh. Note that "1p" is actually invalid: > > -- > `0', `1', `2', ... `9' > An operand that matches the specified operand number is allowed. > If a digit is used together with letters within the same > alternative, the digit should come last. > -- > > Changing the order to "p1" would disable the transformation as well, > because match_asm_constraints_1() uses strtoul on the constraint start. Ah, ok, but asm volatile ("" : "=m,m" (b), "=r,r" (b) : "1,p" (b)); ICEs the same way, and that should be valid even according to the above description. > ... this makes sense. But I think you're too generous in supporting > strange inputs: > > >if (! REG_P (output) > > || rtx_equal_p (output, input) > > || (GET_MODE (input) != VOIDmode > > - && GET_MODE (input) != GET_MODE (output))) > > + && GET_MODE (input) != GET_MODE (output)) > > + || !(REG_P (input) || SUBREG_P (input) > > + || MEM_P (input) || CONSTANT_P (input))) > > I'd only allow REG_P (input) as well, not any of the other forms. I'll try to gather some statistics on what kind of inputs appear there during bootstrap/regtest and will try to write a few testcases to see if just || ! REG_P (output) is sufficient or not. Jakub
[PATCH] Fix ICE in match_asm_constraints_1 (PR inline-asm/84941)
Hi! The inline-asm here has "1p" constraint and we end up with (plus (frame) (const_int ...)) as input; even when the matching output is a REG, I'm not really sure emit_move_insn can handle arbitrary expressions (consider e.g. "1X" constraint), and calling reg_overlap_mentioned_p on something other than REG/SUBREG/MEM/constant (and a couple of selected other cases) will ICE too. My understanding is that the match_asm_constraints mini-pass is an optimization, so we can try to handle cases which make sense, but if there is something too difficult to handle we can just punt and let reload do its job or error_for_asm if it can't reload it. Especially when I believe such input expressions can be there only when the numeric constraint is mixed with some other one, otherwise it should have been a REG or MEM or something similar. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-03-19 Jakub Jelinek PR inline-asm/84941 * function.c (match_asm_constraints_1): Don't do the optimization if input isn't a REG, SUBREG, MEM or constant. * gcc.dg/pr84941.c: New test. --- gcc/function.c.jj 2018-02-22 12:37:02.621387697 +0100 +++ gcc/function.c 2018-03-19 11:51:51.429738406 +0100 @@ -6662,7 +6662,9 @@ match_asm_constraints_1 (rtx_insn *insn, if (! REG_P (output) || rtx_equal_p (output, input) || (GET_MODE (input) != VOIDmode - && GET_MODE (input) != GET_MODE (output))) + && GET_MODE (input) != GET_MODE (output)) + || !(REG_P (input) || SUBREG_P (input) + || MEM_P (input) || CONSTANT_P (input))) continue; /* We can't do anything if the output is also used as input, --- gcc/testsuite/gcc.dg/pr84941.c.jj 2018-03-19 11:56:34.086713406 +0100 +++ gcc/testsuite/gcc.dg/pr84941.c 2018-03-19 11:55:47.301717544 +0100 @@ -0,0 +1,10 @@ +/* PR inline-asm/84941 */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +void +foo (void) +{ + short *b[1] = { 0 }; + asm volatile ("" : "=m" (b), "=r" (b) : "1p" (b)); +} Jakub
Re: [PATCH] Avoid UBSAN -fsanitize=enum errors on GCC memmodel enum (PR rtl-optimization/84643)
On March 19, 2018 8:39:20 PM GMT+01:00, Jakub Jelinek wrote: >Hi! > >The enum memmodel doesn't declare the target specific enumerators, >which are just defines in i386.h, but as those are above any >enumerators in the generic enum, it is strictly speaking UB when >values with that enum memmodel type have the higher target dependent >bits >set. This patch just makes sure that the enumerator is full signed >32-bit. > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Richard. >2018-03-19 Jakub Jelinek > > PR rtl-optimization/84643 > * memmodel.h (enum memmodel): Add MEMMODEL_MAX enumerator. > >--- gcc/memmodel.h.jj 2018-01-03 10:19:55.495534012 +0100 >+++ gcc/memmodel.h 2018-03-19 10:31:42.980812157 +0100 >@@ -45,7 +45,9 @@ enum memmodel > MEMMODEL_LAST = 6, > MEMMODEL_SYNC_ACQUIRE = MEMMODEL_ACQUIRE | MEMMODEL_SYNC, > MEMMODEL_SYNC_RELEASE = MEMMODEL_RELEASE | MEMMODEL_SYNC, >- MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC >+ MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC, >+ /* Say that all the higher bits are valid target extensions. */ >+ MEMMODEL_MAX = INTTYPE_MAXIMUM (int) > }; > > /* Return the memory model from a host integer. */ > > Jakub
[PATCH] Fix libsanitizer on i?86-linux with glibc 2.27+ (PR sanitizer/84761)
Hi! As mentioned in the PR, libsanitizer cheats and attempts to call @GLIBC_PRIVATE functions which are reserved to glibc and can change anytime. They have changed on i?86 in particular in glibc 2.27, where internal_function attribute has been dropped, so the functions for some reason sadly aren't regparm(3) anymore. This patch changes libsanitizer, so that when built against glibc 2.26 and earlier, it checks at runtime if glibc 2.27+ is used using a symbol that has been added ~ a month after the internal_function change, and when built against glibc 2.27+, it just assumes it will never be regparm(3). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? I've submitted it to upstream in https://reviews.llvm.org/D44623 too, but it might take some time to get it through there. 2018-03-19 Jakub Jelinek PR sanitizer/84761 * sanitizer_common/sanitizer_linux_libcdep.cc (__GLIBC_PREREQ): Define if not defined. (DL_INTERNAL_FUNCTION): Don't define. (InitTlsSize): For __i386__ if not compiled against glibc 2.27+ determine at runtime whether to use regparm(3), stdcall calling convention for older glibcs or normal calling convention for newer glibcs for call to _dl_get_tls_static_info. --- libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc.jj 2017-10-19 13:20:58.972958379 +0200 +++ libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc2018-03-19 14:05:56.178952563 +0100 @@ -147,29 +147,44 @@ bool SanitizerGetThreadName(char *name, #endif } +#ifndef __GLIBC_PREREQ +#define __GLIBC_PREREQ(x, y) 0 +#endif + #if !SANITIZER_FREEBSD && !SANITIZER_ANDROID && !SANITIZER_GO && \ !SANITIZER_NETBSD static uptr g_tls_size; -#ifdef __i386__ -# define DL_INTERNAL_FUNCTION __attribute__((regparm(3), stdcall)) -#else -# define DL_INTERNAL_FUNCTION -#endif - void InitTlsSize() { // all current supported platforms have 16 bytes stack alignment const size_t kStackAlign = 16; - typedef void (*get_tls_func)(size_t*, size_t*) DL_INTERNAL_FUNCTION; - get_tls_func get_tls; - void *get_tls_static_info_ptr = dlsym(RTLD_NEXT, "_dl_get_tls_static_info"); - CHECK_EQ(sizeof(get_tls), sizeof(get_tls_static_info_ptr)); - internal_memcpy(&get_tls, &get_tls_static_info_ptr, - sizeof(get_tls_static_info_ptr)); - CHECK_NE(get_tls, 0); size_t tls_size = 0; size_t tls_align = 0; - get_tls(&tls_size, &tls_align); + void *get_tls_static_info_ptr = dlsym(RTLD_NEXT, "_dl_get_tls_static_info"); +#if defined(__i386__) && !__GLIBC_PREREQ(2, 27) + /* On i?86, _dl_get_tls_static_info used to be internal_function, i.e. + __attribute__((regparm(3), stdcall)) before glibc 2.27 and is normal + function in 2.27 and later. */ + if (!dlvsym(RTLD_NEXT, "glob", "GLIBC_2.27")) { +typedef void (*get_tls_func)(size_t*, size_t*) + __attribute__((regparm(3), stdcall)); +get_tls_func get_tls; +CHECK_EQ(sizeof(get_tls), sizeof(get_tls_static_info_ptr)); +internal_memcpy(&get_tls, &get_tls_static_info_ptr, +sizeof(get_tls_static_info_ptr)); +CHECK_NE(get_tls, 0); +get_tls(&tls_size, &tls_align); + } else +#endif + { +typedef void (*get_tls_func)(size_t*, size_t*); +get_tls_func get_tls; +CHECK_EQ(sizeof(get_tls), sizeof(get_tls_static_info_ptr)); +internal_memcpy(&get_tls, &get_tls_static_info_ptr, +sizeof(get_tls_static_info_ptr)); +CHECK_NE(get_tls, 0); +get_tls(&tls_size, &tls_align); + } if (tls_align < kStackAlign) tls_align = kStackAlign; g_tls_size = RoundUpTo(tls_size, tls_align); Jakub
Re: [PATCH] Fix ICE in match_asm_constraints_1 (PR inline-asm/84941)
Hi, On Mon, 19 Mar 2018, Jakub Jelinek wrote: > > Blaeh. Note that "1p" is actually invalid: > > > > -- > > `0', `1', `2', ... `9' > > An operand that matches the specified operand number is allowed. > > If a digit is used together with letters within the same > > alternative, the digit should come last. > > -- > > > > Changing the order to "p1" would disable the transformation as well, > > because match_asm_constraints_1() uses strtoul on the constraint start. > > Ah, ok, but > asm volatile ("" : "=m,m" (b), "=r,r" (b) : "1,p" (b)); > ICEs the same way, and that should be valid even according to the above > description. Yes that's valid and shouldn't ICE. > > ... this makes sense. But I think you're too generous in supporting > > strange inputs: > > > > >if (! REG_P (output) > > > || rtx_equal_p (output, input) > > > || (GET_MODE (input) != VOIDmode > > > - && GET_MODE (input) != GET_MODE (output))) > > > + && GET_MODE (input) != GET_MODE (output)) > > > + || !(REG_P (input) || SUBREG_P (input) > > > +|| MEM_P (input) || CONSTANT_P (input))) > > > > I'd only allow REG_P (input) as well, not any of the other forms. > > I'll try to gather some statistics on what kind of inputs appear there > during bootstrap/regtest and will try to write a few testcases to see > if just || ! REG_P (output) is sufficient or not. I think you don't need to try too hard. The function is designed to handle the situation where the matching constraint is a result from an input-output constraint, not for improving arbitrary matching constraints. As such the expected situation is that input and output are lvalues, and hence (when their type is anything sensible) gimple registers, and therefore pseudos at RTL time. You could basically reject any constraint that isn't just a single integer (i.e. anything with not only digits after the optional '%') and still handle the sitatuations for which this function was invented. I.e. like this: Index: function.c === --- function.c (revision 254008) +++ function.c (working copy) @@ -6605,7 +6605,7 @@ match_asm_constraints_1 (rtx_insn *insn, constraint++; match = strtoul (constraint, &end, 10); - if (end == constraint) + if (end == constraint || *end) continue; gcc_assert (match < noutputs); Ciao, Michael.
Re: [PATCH] Avoid compiler UB in store-merging (PR tree-optimization/84946)
On March 19, 2018 9:05:23 PM GMT+01:00, Jakub Jelinek wrote: >Hi! > >In this code, both bitpos and bitsize are poly_int64, but we know that >bitpos is >= 0 and bitsize. On the testcase mentioned in the PR (but >even >without -mavx512f, so we already invoke it during testing) we would >overflow >the computation, 0x7ffsomething + 0x20 (and then cast >it >to poly_uint64). This patch fixes it by casting to poly_uint64 >earlier. > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Richard. >2018-03-19 Jakub Jelinek > > PR tree-optimization/84946 > * gimple-ssa-store-merging.c (mem_valid_for_store_merging): Compute > bitsize + bitsize in poly_uint64 rather than poly_int64. > >--- gcc/gimple-ssa-store-merging.c.jj 2018-02-22 09:28:07.0 >+0100 >+++ gcc/gimple-ssa-store-merging.c 2018-03-19 17:55:22.486472527 +0100 >@@ -3948,7 +3948,8 @@ mem_valid_for_store_merging (tree mem, p > if (known_eq (bitregion_end, 0U)) > { > bitregion_start = round_down_to_byte_boundary (bitpos); >- bitregion_end = round_up_to_byte_boundary (bitpos + bitsize); >+ bitregion_end = bitpos; >+ bitregion_end = round_up_to_byte_boundary (bitregion_end + >bitsize); > } > > if (offset != NULL_TREE) > > Jakub
[wwwdocs] Update gcc-8/changes.html for some IPA and x86 canges
Hi, this patch adds some documentation for what is new in IPA and x86. For lto we should mention early-debug. Richard, perhaps you can suggest wording? Honza Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v retrieving revision 1.41 diff -u -r1.41 changes.html --- changes.html12 Mar 2018 17:26:20 - 1.41 +++ changes.html19 Mar 2018 21:07:07 - @@ -42,9 +42,32 @@ General Improvements - The ipa-pure-const pass is extended to propagate malloc attribute, and the - corresponding warning option Wsuggest-attribute=malloc emits a - diagnostic for a function, which can be annotated with malloc attribute. + Inter-procedural optimization improvements: + +Reworked runtime estimation metrics leading to more realistic guesses + driving inliner and clonning heuristics. +The ipa-pure-const pass is extended to propagate malloc attribute, and the + corresponding warning option Wsuggest-attribute=malloc emits a + diagnostic for a function, which can be annotated with malloc attribute. + + Profile driven optimization improvements: + +New infrastructure for representing profile (both statically guessed + and profile feedback) which allows propagation of furhter information + about reliablility of the profile. +Number of improvements in profile updating code solving problems + found by new verification code. +Static detection of code which is not executed in valid run of the + program. This includes paths which triggers undefined behaviour + as well as call to functions declared with cold attribute. + Newly noreturn attribute does not imply all effects of + cold to make difference between exit (which + is noreturn/code> and abort (which is in addition + not executed in valid runs). +-freorder-blocks-and-partition, a pass splitting function + bodies into hot and cold regions, is now enabled by default at -O2 + and higher for i386. + A new option -fcf-protection=[full|branch|return|none] is introduced to perform code instrumentation to increase program security by @@ -435,6 +458,11 @@ The x86 port now supports the naked function attribute. + +Better tuning for znver1 and Intel Core based CPUs. + +Vectorization costs metrics has been reworked leading to significant improvments +on some benchmarks GCC now supports the Intel CPU named Cannonlake through -march=cannonlake. The switch enables the AVX512VBMI, AVX512IFMA and SHA ISA extensions.
Re: Use SCEV information when aligning for vectorisation (PR 84005)
Richard Biener writes: > On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford > wrote: >> Index: gcc/tree-data-ref.c >> === >> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 + >> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 + >> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d >>return alignment; >> } >> >> +/* If BASE is a pointer-typed SSA name, try to find the object that it >> + is based on. Return this object X on success and store the alignment >> + in bytes of BASE - &X in *ALIGNMENT_OUT. */ >> + >> +static tree >> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) >> +{ >> + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) >> +return NULL_TREE; >> + >> + gimple *def = SSA_NAME_DEF_STMT (base); >> + base = analyze_scalar_evolution (loop_containing_stmt (def), base); > > I think it is an error to use the def stmt of base here. Instead you > need to pass down the point/loop of the use. For the same reason you > also want to instantiate parameters after analyzing the evolution. > > In the end, if the BB to be vectorized is contained in a loop nest we want to > instantiate up to the point where (eventually) a DECL we can re-align appears, > but using the containing loop for now looks OK. Why's that necessary/better though? We're not interested in the evolution of the value at the point it*s used by the potential vector code, but in how we get from the ultimate base (if there is one) to the definition of the SSA name. I don't think instantiating the SCEV would give any new information, but it could lose some. E.g. if we have: x = &foo; do x += 8; while (*y++ < 10) ...potential vector use of *x... we wouldn't be able to express the address of x after the do-while loop, because there's nothing that counts the number of iterations. But the uninstantiated SCEV could tell us that x = &foo + N * 8 for some (unknown) N. (OK, so that doesn't actually work unless we take the effort to look through loop-closed SSA form, but still...) >> + /* Peel chrecs and record the minimum alignment preserved by >> + all steps. */ >> + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; >> + while (TREE_CODE (base) == POLYNOMIAL_CHREC) >> +{ >> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT >> (base)); >> + alignment = MIN (alignment, step_alignment); >> + base = CHREC_LEFT (base); >> +} >> + >> + /* Punt if the expression is too complicated to handle. */ >> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE >> (base))) >> +return NULL_TREE; >> + >> + /* Analyze the base to which the steps we peeled were applied. */ >> + poly_int64 bitsize, bitpos, bytepos; >> + machine_mode mode; >> + int unsignedp, reversep, volatilep; >> + tree offset; >> + base = get_inner_reference (build_fold_indirect_ref (base), > > ick :/ > > what happens if you simply use get_pointer_alignment here and > strip down POINTER_PLUS_EXPRs to the ultimate LHS? (basically > what get_pointer_alignment_1 does) After all get_base_for_alignment_1 > itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4]. Yeah, but those have (hopefully) been handled by dr_analyze_innermost already, which should have stripped off both the constant and variable parts of the offset. So I think SSA names are the only interesting input. The problem is that once we follow the definitions of an SSA name through CHREC_LEFTs, we get a general address again. get_pointer_alignment and get_pointer_alignment_1 don't do what we want because they take the alignment of the existing object into account, whereas here we explicitly want to ignore that and only calculate the alignment of the offset. I guess we could pass a flag down to ignore the current alignment though? Thanks, Richard
Re: [wwwdocs] Update gcc-8/changes.html for some IPA and x86 canges
Hi Honza, thanks for putting this together. Below you'll find a number of simple suggestions. This patch is fine if you make those changes; no need to review again, though posting the final patch would be good. On Mon, 19 Mar 2018, Jan Hubicka wrote: > +Reworked runtime estimation metrics leading to more realistic guesses > + driving inliner and clonning heuristics. "run-time" (since it's an adjective here) "cloning" > +The ipa-pure-const pass is extended to propagate malloc attribute, > and the "the malloc attribute", i.e., add "the" and markup. > + corresponding warning option Wsuggest-attribute=malloc > emits a > + diagnostic for a function, which can be annotated with malloc > attribute. "with the malloc" > +New infrastructure for representing profile (both statically guessed > + and profile feedback) which allows propagation of furhter information > + about reliablility of the profile. "representing profiles"? "further" (typo) "about the reliability" > +Number of improvements in profile updating code solving problems > + found by new verification code. "A number of..." "in the profile updating code" "> +Static detection of code which is not executed in valid run of the "in a valid run" > + program. This includes paths which triggers undefined behaviour "trigger" (plural) And "behavior" (American English) > + as well as call to functions declared with cold attribute. "calls" (plural) "declared with the" (add article) > + Newly noreturn attribute does not imply all effects of > + cold to make difference between exit (which > + is noreturn/code> and abort (which is in addition > + not executed in valid runs). "The new...attribute", or did you mean "Newly the ...attribute"? "to make a difference" or better "differentiate"? And closing ")" in the description of exit. > + > +Vectorization costs metrics has been reworked leading to significant > improvments > +on some benchmarks "cost metrics" And "." at the very end. Cheers, Gerald
[PATCH] PR fortran/42651 -- Fix accepts-invalid
The attached patch fixes an accepts-invalid while enforcing C1560 (R1530) If RESULT appears, the function-name shall not appear in any specification statement in the scoping unit of the function subprogram. Regression tested on x86_64-*-freebsd. OK to commit? 2018-03-19 Steven G. Kargl PR fortran/42651 * decl.c (check_function_name): Improved error message (gfc_match_volatile, gfc_match_asynchronous) Use check_function_name. 2018-03-19 Steven G. Kargl PR fortran/42651 * gfortran.dg/pr42651.f90: New test. * gfortran.df/func_result_7.f90: Update for new error message. -- Steve Index: gcc/fortran/decl.c === --- gcc/fortran/decl.c (revision 258646) +++ gcc/fortran/decl.c (working copy) @@ -2242,7 +2242,9 @@ check_function_name (char *name) && strcmp (block->result->name, "ppr@") != 0 && strcmp (block->name, name) == 0) { - gfc_error ("Function name %qs not allowed at %C", name); + gfc_error ("RESULT variable %qs at %L prohibits FUNCTION name %qs at %C " + "from appearing in a specification statement", + block->result->name, &block->result->declared_at, name); return false; } } @@ -9091,6 +9093,7 @@ match gfc_match_volatile (void) { gfc_symbol *sym; + char *name; match m; if (!gfc_notify_std (GFC_STD_F2003, "VOLATILE statement at %C")) @@ -9112,6 +9115,10 @@ gfc_match_volatile (void) switch (m) { case MATCH_YES: + name = XCNEWVAR (char, strlen (sym->name) + 1); + strcpy (name, sym->name); + if (!check_function_name (name)) + return MATCH_ERROR; /* F2008, C560+C561. VOLATILE for host-/use-associated variable or for variable in a BLOCK which is defined outside of the BLOCK. */ if (sym->ns != gfc_current_ns && sym->attr.codimension) @@ -9150,6 +9157,7 @@ match gfc_match_asynchronous (void) { gfc_symbol *sym; + char *name; match m; if (!gfc_notify_std (GFC_STD_F2003, "ASYNCHRONOUS statement at %C")) @@ -9171,6 +9179,10 @@ gfc_match_asynchronous (void) switch (m) { case MATCH_YES: + name = XCNEWVAR (char, strlen (sym->name) + 1); + strcpy (name, sym->name); + if (!check_function_name (name)) + return MATCH_ERROR; if (!gfc_add_asynchronous (&sym->attr, sym->name, &gfc_current_locus)) return MATCH_ERROR; goto next_item; Index: gcc/testsuite/gfortran.dg/pr42651.f90 === --- gcc/testsuite/gfortran.dg/pr42651.f90 (nonexistent) +++ gcc/testsuite/gfortran.dg/pr42651.f90 (working copy) @@ -0,0 +1,24 @@ +! { dg-do compile } +! PR fortran/42651 +integer function func() + asynchronous :: func + integer, asynchronous:: b + allocatable :: c + volatile :: func + type t +sequence +integer :: i = 5 + end type t +end function func + +function func2() result(res) ! { dg-error " RESULT variable" } + volatile res + asynchronous res + target func2! { dg-error " RESULT variable" } + volatile func2 ! { dg-error " RESULT variable" } + asynchronous func2 ! { dg-error " RESULT variable" } + allocatable func2 ! { dg-error " RESULT variable" } + dimension func2(2) ! { dg-error " RESULT variable" } + codimension func2[*]! { dg-error " RESULT variable" } + contiguous func2! { dg-error " RESULT variable" } +end function func2 Index: gcc/testsuite/gfortran.dg/func_result_7.f90 === --- gcc/testsuite/gfortran.dg/func_result_7.f90 (revision 258646) +++ gcc/testsuite/gfortran.dg/func_result_7.f90 (working copy) @@ -4,8 +4,8 @@ ! ! Contributed by Vittorio Zecca -function fun() result(f) - pointer fun ! { dg-error "not allowed" } - dimension fun(1) ! { dg-error "not allowed" } +function fun() result(f) ! { dg-error "RESULT variable" } + pointer fun ! { dg-error "RESULT variable" } + dimension fun(1)! { dg-error "RESULT variable" } f=0 end
Re: [PATCH] [Microblaze]: PIC Data Text Relative
Hi Andrew -- Please take a look at the test case description: https://gcc.gnu.org/wiki/HowToPrepareATestcase and see if you can do one of the following: - Modify the regex expression in the scan-assembler to accept either format of generated output or - Add { dg-option } directives to turn off your new options if specified. (You should be able to specify -mno-pic) or - Duplicate the test cases and add { dg-option } directives to specify the correct options, with and without your new options. or - Add test cases with a { dg-option } directive with your new options. This is not required -- your patch appears to work OK. Normally, the new PIC Data options would not be used when running the test suite, so the tests would not fail. It's just nice to have the test suite updated when new options are added. On 03/19/2018 01:07 PM, Michael Eager wrote: Hi Andrew -- Good work. Please submit your updated patch. Check that you follow GNU coding standards. Also make sure that the new options are documented in gcc/doc/invoke.texi. On 03/18/18 03:27, Andrew Sadek wrote: Hello Michael, I have run the test using the new PIC options. Actually, I have discovered 2 unhandled cases in 'microblaze_expand_move' + missing conditions in linker relax leading some test cases execution to fail. After fixing them, I made a re-run for the whole regression, and the results analogy below: Original, without my patch: === gcc Summary === # of expected passes 90776 # of unexpected failures 1317 # of unexpected successes 3 # of expected failures 207 # of unresolved testcases 115 # of unsupported tests 2828 With my patch, calling '-fPIE - mpic-data-text-rel' === gcc Summary === # of expected passes 90843 # of unexpected failures 1256 # of unexpected successes 3 # of expected failures 207 # of unresolved testcases 115 # of unsupported tests 2853 After running the 'dg-cmp-results.sh' in contrib folder, the PASS->FAIL are below: PASS->FAIL: gcc.dg/uninit-19.c (test for excess errors) PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O0 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O1 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O2 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O3 -g scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -Os scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O0 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O1 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O2 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O3 -g scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -Os scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O0 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O1 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O2 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O3 -g scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -Os scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c -O0 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c -O1 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.t
Re: [PATCH] [Microblaze]: PIC Data Text Relative
Also check the { dg-skip-if } directive. https://gcc.gnu.org/onlinedocs/gccint/Directives.html On 03/19/2018 06:14 PM, Michael Eager wrote: Hi Andrew -- Please take a look at the test case description: https://gcc.gnu.org/wiki/HowToPrepareATestcase and see if you can do one of the following: - Modify the regex expression in the scan-assembler to accept either format of generated output or - Add { dg-option } directives to turn off your new options if specified. (You should be able to specify -mno-pic) or - Duplicate the test cases and add { dg-option } directives to specify the correct options, with and without your new options. or - Add test cases with a { dg-option } directive with your new options. This is not required -- your patch appears to work OK. Normally, the new PIC Data options would not be used when running the test suite, so the tests would not fail. It's just nice to have the test suite updated when new options are added. On 03/19/2018 01:07 PM, Michael Eager wrote: Hi Andrew -- Good work. Please submit your updated patch. Check that you follow GNU coding standards. Also make sure that the new options are documented in gcc/doc/invoke.texi. On 03/18/18 03:27, Andrew Sadek wrote: Hello Michael, I have run the test using the new PIC options. Actually, I have discovered 2 unhandled cases in 'microblaze_expand_move' + missing conditions in linker relax leading some test cases execution to fail. After fixing them, I made a re-run for the whole regression, and the results analogy below: Original, without my patch: === gcc Summary === # of expected passes 90776 # of unexpected failures 1317 # of unexpected successes 3 # of expected failures 207 # of unresolved testcases 115 # of unsupported tests 2828 With my patch, calling '-fPIE - mpic-data-text-rel' === gcc Summary === # of expected passes 90843 # of unexpected failures 1256 # of unexpected successes 3 # of expected failures 207 # of unresolved testcases 115 # of unsupported tests 2853 After running the 'dg-cmp-results.sh' in contrib folder, the PASS->FAIL are below: PASS->FAIL: gcc.dg/uninit-19.c (test for excess errors) PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O0 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O1 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O2 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -O3 -g scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var1.c -Os scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O0 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O1 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O2 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -O3 -g scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/data_var2.c -Os scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r0 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O0 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O1 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O2 scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -O3 -g scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var1.c -Os scan-assembler lwi\tr([0-9]|[1-2][0-9]|3[0-1]),r13 PASS->FAIL: gcc.target/microblaze/others/sdata_var2.c -O0 scan-assembler lwi\tr([0-9]|[
[PATCH] relax -Wclass-memaccess for class members (PR 84850)
The -Wclass-memaccess warning makes an exception for ctors and dtors of non-trivial classes with no bases to avoid triggering for uses of raw memory functions with this as the destination. All other members as well as non-members trigger the warning. In bug 84850 the reporter shows that some code (squid in this case) calls memset(this, 0. ...) in both the copy assignment operator and the copy ctor as a short-hand to clear all trivial members without having to explicitly enumerate them. A similar idiom has been seen in Firefox (some of the bugs linked from https://bugzilla.mozilla.org/show_bug.cgi?id=1411029). Since calling memset(this, 0, sizeof *this) from any non-static member of a non-trivial class containing no non-trivial members is safe, the attached patch relaxes the warning to allow this idiom. I also noticed that the exemption for ctors and dtors is overly permissive and causes the warning not to trigger for classes with no bases or virtual functions but containing non-trivial members. This is bug 84851. Jakub suggested to fix just 84850 for GCC 8 and defer 84851 to GCC 9, so the patch follows that suggestion. Fixing the latter is a matter of removing the block in warn_for_restrict() with the FIXME comment above it. Martin PR c++/84850 - -Wclass-memaccess on a memcpy in a copy assignment operator with no nontrivial bases or members gcc/cp/ChangeLog: PR c++/84850 * call.c (first_non_public_field): New template and function. (first_non_trivial_field): New function. (maybe_warn_class_memaccess): Call them. gcc/testsuite/ChangeLog: PR c++/84850 * g++.dg/Wclass-memaccess-3.C: New test. * g++.dg/Wclass-memaccess-4.C: New test. Index: gcc/cp/call.c === --- gcc/cp/call.c (revision 258646) +++ gcc/cp/call.c (working copy) @@ -8261,13 +8261,17 @@ build_over_call (struct z_candidate *cand, int fla return call; } -/* Return the DECL of the first non-public data member of class TYPE - or null if none can be found. */ +namespace +{ -static tree -first_non_public_field (tree type) +/* Return the DECL of the first non-static subobject of class TYPE + that satisfies the predicate PRED or null if none can be found. */ + +template +tree +first_non_static_field (tree type, Predicate pred) { - if (!CLASS_TYPE_P (type)) + if (!type || !CLASS_TYPE_P (type)) return NULL_TREE; for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) @@ -8276,7 +8280,7 @@ build_over_call (struct z_candidate *cand, int fla continue; if (TREE_STATIC (field)) continue; - if (TREE_PRIVATE (field) || TREE_PROTECTED (field)) + if (pred (field)) return field; } @@ -8286,8 +8290,9 @@ build_over_call (struct z_candidate *cand, int fla BINFO_BASE_ITERATE (binfo, i, base_binfo); i++) { tree base = TREE_TYPE (base_binfo); - - if (tree field = first_non_public_field (base)) + if (pred (base)) + return base; + if (tree field = first_non_static_field (base, pred)) return field; } @@ -8294,6 +8299,42 @@ build_over_call (struct z_candidate *cand, int fla return NULL_TREE; } +struct NonPublicField +{ + bool operator() (const_tree t) + { +return DECL_P (t) && (TREE_PRIVATE (t) || TREE_PROTECTED (t)); + } +}; + +/* Return the DECL of the first non-public subobject of class TYPE + or null if none can be found. */ + +static inline tree +first_non_public_field (tree type) +{ + return first_non_static_field (type, NonPublicField ()); +} + +struct NonTrivialField +{ + bool operator() (const_tree t) + { +return !trivial_type_p (DECL_P (t) ? TREE_TYPE (t) : t); + } +}; + +/* Return the DECL of the first non-trivial subobject of class TYPE + or null if none can be found. */ + +static inline tree +first_non_trivial_field (tree type) +{ + return first_non_static_field (type, NonTrivialField ()); +} + +} /* unnamed namespace */ + /* Return true if all copy and move assignment operator overloads for class TYPE are trivial and at least one of them is not deleted and, when ACCESS is set, accessible. Return false otherwise. Set @@ -8419,23 +8460,31 @@ maybe_warn_class_memaccess (location_t loc, tree f if (!desttype || !COMPLETE_TYPE_P (desttype) || !CLASS_TYPE_P (desttype)) return; - /* Check to see if the raw memory call is made by a ctor or dtor - with this as the destination argument for the destination type. - If so, be more permissive. */ + /* Check to see if the raw memory call is made by a non-static member + function with THIS as the destination argument for the destination + type. If so, and if the class has no non-trivial bases or members, + be more permissive. */ if (current_function_decl - && (DECL_CONSTRUCTOR_P (current_function_decl) - || DECL_DESTRUCTOR_P (current_function_decl)) + && DECL_NONSTATIC_MEMBER_FUNCTION_P (current_function_decl) && is_this_parameter (tre
Re: [PATCH, rs6000] Fix PR83789: __builtin_altivec_lvx fails for powerpc for altivec-4.c
On 3/12/18 1:55 PM, Segher Boessenkool wrote: > On Sun, Mar 11, 2018 at 10:23:02AM -0500, Peter Bergner wrote: >> +; The following patterns embody what lvx should usually look like. >> +(define_expand "altivec_lvx_" >> + [(set (match_operand:VM2 0 "register_operand" "") >> +(match_operand:VM2 1 "altivec_indexed_or_indirect_operand" ""))] > > No "" please. Expanders do not have constraints. Cut & paste error on my part I believe. Will fix. >> #ifdef HAVE_V8HFmode >> - else if (mode == V8HFmode) >> -stvx = TARGET_64BIT >> - ? gen_altivec_stvx_v8hf_1op (src_exp, memory_address) >> - : gen_altivec_stvx_v8hf_1op_si (src_exp, memory_address); >> + else if (mode == V8HFmode) >> +stvx = gen_altivec_stvx_v8hf (src_exp, dest_exp); >> #endif > > Btw, don't we always have V8HFmode these days? I have no idea, I was just working around code that was already there. Looking at the history, Kelvin seems to have added the tests. Kelvin, what is the above trying to protect? Looking at mu build dirs insn-modes.h, I don't see HAVE_V8HFmode being defined on either my LE or BE builds. What am I missing? Peter
[PATCH] Fix strpbrk (x, "") folding (PR c/84953)
Hi! We use incorrect type for the NULL return value, the builtin is char *strpbrk (const char *, const char *), so the first argument is cast to const char * and we return (const char *) 0, while we really should return (char *) 0. fold_builtin_n then adds: ret = build1 (NOP_EXPR, TREE_TYPE (ret), ret); SET_EXPR_LOCATION (ret, loc); TREE_NO_WARNING (ret) = 1; and thus we in the end have (char *) (_Literal (const char *) 0). This bug then results in a -Wdiscarded-qualifiers warning. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-03-20 Jakub Jelinek PR c/84953 * builtins.c (fold_builtin_strpbrk): For strpbrk(x, "") use type instead of TREE_TYPE (s1) for the return value. * gcc.dg/pr84953.c: New test. --- gcc/builtins.c.jj 2018-03-07 22:51:58.871478732 +0100 +++ gcc/builtins.c 2018-03-19 18:49:45.313898848 +0100 @@ -9573,7 +9573,7 @@ fold_builtin_strpbrk (location_t loc, tr if (p2[0] == '\0') /* strpbrk(x, "") == NULL. Evaluate and ignore s1 in case it had side-effects. */ - return omit_one_operand_loc (loc, TREE_TYPE (s1), integer_zero_node, s1); + return omit_one_operand_loc (loc, type, integer_zero_node, s1); if (p2[1] != '\0') return NULL_TREE; /* Really call strpbrk. */ --- gcc/testsuite/gcc.dg/pr84953.c.jj 2018-03-19 18:52:48.295893571 +0100 +++ gcc/testsuite/gcc.dg/pr84953.c 2018-03-19 18:52:31.935894043 +0100 @@ -0,0 +1,11 @@ +/* PR c/84953 */ +/* { dg-do compile } */ + +char *strpbrk (const char *, const char *); + +char * +test (char *p) +{ + p = strpbrk (p, ""); /* { dg-bogus "assignment discards 'const' qualifier from pointer target type" } */ + return p; +} Jakub
[PATCH] Avoid compiler UB in store-merging (PR tree-optimization/84946)
Hi! In this code, both bitpos and bitsize are poly_int64, but we know that bitpos is >= 0 and bitsize. On the testcase mentioned in the PR (but even without -mavx512f, so we already invoke it during testing) we would overflow the computation, 0x7ffsomething + 0x20 (and then cast it to poly_uint64). This patch fixes it by casting to poly_uint64 earlier. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-03-19 Jakub Jelinek PR tree-optimization/84946 * gimple-ssa-store-merging.c (mem_valid_for_store_merging): Compute bitsize + bitsize in poly_uint64 rather than poly_int64. --- gcc/gimple-ssa-store-merging.c.jj 2018-02-22 09:28:07.0 +0100 +++ gcc/gimple-ssa-store-merging.c 2018-03-19 17:55:22.486472527 +0100 @@ -3948,7 +3948,8 @@ mem_valid_for_store_merging (tree mem, p if (known_eq (bitregion_end, 0U)) { bitregion_start = round_down_to_byte_boundary (bitpos); - bitregion_end = round_up_to_byte_boundary (bitpos + bitsize); + bitregion_end = bitpos; + bitregion_end = round_up_to_byte_boundary (bitregion_end + bitsize); } if (offset != NULL_TREE) Jakub
Re: [PATCH] Fix strpbrk (x, "") folding (PR c/84953)
On March 20, 2018 7:32:46 AM GMT+01:00, Jakub Jelinek wrote: >Hi! > >We use incorrect type for the NULL return value, the builtin is >char *strpbrk (const char *, const char *), so the first argument >is cast to const char * and we return (const char *) 0, while we >really should return (char *) 0. fold_builtin_n then adds: > ret = build1 (NOP_EXPR, TREE_TYPE (ret), ret); > SET_EXPR_LOCATION (ret, loc); > TREE_NO_WARNING (ret) = 1; >and thus we in the end have (char *) (_Literal (const char *) 0). >This bug then results in a -Wdiscarded-qualifiers warning. > >Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok >for >trunk? OK. Richard. >2018-03-20 Jakub Jelinek > > PR c/84953 > * builtins.c (fold_builtin_strpbrk): For strpbrk(x, "") use type > instead of TREE_TYPE (s1) for the return value. > > * gcc.dg/pr84953.c: New test. > >--- gcc/builtins.c.jj 2018-03-07 22:51:58.871478732 +0100 >+++ gcc/builtins.c 2018-03-19 18:49:45.313898848 +0100 >@@ -9573,7 +9573,7 @@ fold_builtin_strpbrk (location_t loc, tr > if (p2[0] == '\0') > /* strpbrk(x, "") == NULL. > Evaluate and ignore s1 in case it had side-effects. */ >- return omit_one_operand_loc (loc, TREE_TYPE (s1), integer_zero_node, >s1); >+ return omit_one_operand_loc (loc, type, integer_zero_node, s1); > > if (p2[1] != '\0') > return NULL_TREE; /* Really call strpbrk. */ >--- gcc/testsuite/gcc.dg/pr84953.c.jj 2018-03-19 18:52:48.295893571 >+0100 >+++ gcc/testsuite/gcc.dg/pr84953.c 2018-03-19 18:52:31.935894043 +0100 >@@ -0,0 +1,11 @@ >+/* PR c/84953 */ >+/* { dg-do compile } */ >+ >+char *strpbrk (const char *, const char *); >+ >+char * >+test (char *p) >+{ >+ p = strpbrk (p, "");/* { dg-bogus "assignment discards 'const' >qualifier from pointer target type" } */ >+ return p; >+} > > Jakub