Re: [PATCH 3/N] Come up with casm global state.
On Mon, Oct 25, 2021 at 6:32 PM Segher Boessenkool wrote: > > Hi! > > On Mon, Oct 25, 2021 at 03:36:25PM +0200, Martin Liška wrote: > > --- a/gcc/config/rs6000/rs6000-internal.h > > +++ b/gcc/config/rs6000/rs6000-internal.h > > @@ -189,4 +189,13 @@ extern bool rs6000_passes_vector; > > extern bool rs6000_returns_struct; > > extern bool cpu_builtin_p; > > > > +struct rs6000_asm_out_state : public asm_out_state > > +{ > > + /* Initialize ELF sections. */ > > + void init_elf_sections (); > > + > > + /* Initialize XCOFF sections. */ > > + void init_xcoff_sections (); > > +}; > > Our coding convention says to use "class", not "struct" (since this > isn't valid C code at all). > > > - sdata2_section > > + sec.sdata2 > > = get_unnamed_section (SECTION_WRITE, output_section_asm_op, > > SDATA2_SECTION_ASM_OP); > > (broken indentation) > > > +/* Implement TARGET_ASM_INIT_SECTIONS. */ > > That comment is out-of-date. > > > +static asm_out_state * > > +rs6000_elf_asm_init_sections (void) > > +{ > > + rs6000_asm_out_state *target_state > > += new (ggc_alloc ()) rs6000_asm_out_state (); > > Hrm, maybe we can have a macro or function that does this, ggc_new or > something? > > > +/* Implement TARGET_ASM_INIT_SECTIONS. */ > > + > > +static asm_out_state * > > +rs6000_xcoff_asm_init_sections (void) > > Here, too. Both implementations are each one of several functions that > together implement the target macro. > > > +/* The section that holds the DWARF2 frame unwind information, when > > known. > > + The section is set either by the target's init_sections hook or by > > the > > + first call to switch_to_eh_frame_section. */ > > +section *eh_frame; > > + > > +/* RS6000 sections. */ > > Nothing here? Just remove the comment header? > > The idea looks fine to me. Yeah, of course then the target hook does not need to do the allocation and we could simply keep the current init_sections hook but change it to take the asm_out_state to initialize as argument. Note that I'd put +/* RS6000 sections. */ + +/* ELF sections. */ +section *toc; +section *sdata2; + +/* XCOFF sections. */ +section *read_only_data; +section *private_data; +section *tls_data; +section *tls_private_data; +section *read_only_private_data; into a union, thus union { struct /* RS6000 sections */ { /* ELF sections. */ section *toc; ... } rs6000; struct /* darwin sections */ { ... }; not sure whether we need some magic GTY marking here to make it pick up the 'correct' set. Another alternative would be section *target[MAX_TARGET_SECTIONS]; and #defines in the targets mapping the former global variables to indices in that array. All of this isn't "nice C++" of course, but well ... I'm not the one to insist ;) Richard. > > Segher
Re: [PATCH] forwprop: Remove incorrect assertion [PR102897]
On Tue, Oct 26, 2021 at 5:40 AM Kewen.Lin wrote: > > Hi, > > As PR102897 shows, there is one incorrect assertion in function > simplify_permutation, which is based on the wrong assumption that > all cases with op2_type == tgt_type are handled previously, the > proposed fix is to remove this wrong assertion. > > Bootstrapped and regtested on x86_64-redhat-linux, > aarch64-linux-gnu and powerpc64{,le}-linux-gnu. I think you need to enable optimization in the new testcase, gcc.dg/ only runs -O0 by default which wouldn't trigger forwprop? Please verify the testcase ICEs before the fix. Otherwise OK. Thanks, Richard, > BR, > Kewen > - > gcc/ChangeLog: > > PR tree-optimization/102897 > * tree-ssa-forwprop.c (simplify_permutation): Remove a wrong > assertion. > > gcc/testsuite/ChangeLog: > > * gcc.dg/pr102897.c: New test. > --- > gcc/testsuite/gcc.dg/pr102897.c | 16 > gcc/tree-ssa-forwprop.c | 2 -- > 2 files changed, 16 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/pr102897.c > > diff --git a/gcc/testsuite/gcc.dg/pr102897.c b/gcc/testsuite/gcc.dg/pr102897.c > new file mode 100644 > index 000..d96b0e48ccc > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr102897.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* Specify C99 to avoid the warning/error on compound literals. */ > +/* { dg-options "-std=c99" } */ > + > +/* Verify that there is no ICE. */ > + > +typedef __attribute__((vector_size(8))) signed char int8x8_t; > +typedef __attribute__((vector_size(8))) unsigned char uint8x8_t; > + > +int8x8_t fn1 (int8x8_t val20, char tmp) > +{ > + uint8x8_t __trans_tmp_3; > + __trans_tmp_3 = (uint8x8_t){tmp}; > + int8x8_t __a = (int8x8_t) __trans_tmp_3; > + return __builtin_shuffle (__a, val20, (uint8x8_t){0}); > +} > diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c > index 5b30d4c1a76..a830bab78ba 100644 > --- a/gcc/tree-ssa-forwprop.c > +++ b/gcc/tree-ssa-forwprop.c > @@ -2267,8 +2267,6 @@ simplify_permutation (gimple_stmt_iterator *gsi) > if (!VECTOR_TYPE_P (tgt_type)) > return 0; > tree op2_type = TREE_TYPE (op2); > - /* Should have folded this before. */ > - gcc_assert (op2_type != tgt_type); > > /* Figure out the shrunk factor. */ > poly_uint64 tgt_units = TYPE_VECTOR_SUBPARTS (tgt_type); > -- > 2.27.0
Re: [PATCH] Add TSVC tests.
On Tue, Oct 19, 2021 at 8:49 AM Martin Liška wrote: > > On 10/18/21 12:08, Richard Biener wrote: > > Can you please use a subdirectory for the sources, a "toplevel" > > license.txt doesn't make much sense. You can simply amend > > vect.exp to process tsvc/*.c as well as sources so no need for an > > extra .exp file. > > Sure, it's a good idea and I've done that. > > > > > Is the license recognized as > > compatible to the GPL as far as source distribution is concerned? > > Yes: https://www.gnu.org/licenses/license-list.html#NCSA > > > > > Did you test the testcases on any non-x86 target? (power/aarch64/arm) > > Yes, I run the tests also on ppc64le-linux-gnu and aarch64-linux-gnu. > > Thoughts? The overall setup looks fine to me. There are quite some testcases where there are no dg-final directives, some indicate in comments that we do not expect vectorization - for those do we want to add scan-tree-dump-not "loop vectorized" or so to make that clear? For others do we want to add XFAILs so we'll notice when we improve on TSVC? It looks like for example s124 is looking for IVOPTs rather than vectorization? There are testcases exercising float compares (s124 is an example), vectorizing those likely requires a subset of fast-math flags to allow if-conversion and masking, plus masking is not available on all targets. Is the intent to adjust testcase options accordingly? That said, I wonder whether it makes sense to initially only add the parts having dg-final directives (that PASS or XFAIL), just adding testcases for testing compile looks superfluous. All of the testcases are dg-do compile, but vectorizer testcases ideally would come with runtime verification. I assume the original TSVC provides this and as you include tscv.h in all tests I suppose including a runtime harness would be possible, no? Thanks, Richard. > Thanks, > Martin > > > > > Richard.
Re: [PATCH v4] Improve integer bit test on __atomic_fetch_[or|and]_* returns
On Thu, Oct 21, 2021 at 1:09 PM Hongtao Liu wrote: > > i is > > On Wed, Oct 13, 2021 at 8:34 PM Richard Biener via Gcc-patches > wrote: > > > > On Sun, Oct 10, 2021 at 3:49 PM H.J. Lu wrote: > > > > > > Changes in v4: > > > > > > 1. Bypass redundant check when inputs have been transformed to the > > > equivalent canonical form with valid bit operation. > > > > > > Changes in v3: > > > > > > 1. Check invalid bit operation. > > > > > > commit adedd5c173388ae505470df152b9cb3947339566 > > > Author: Jakub Jelinek > > > Date: Tue May 3 13:37:25 2016 +0200 > > > > > > re PR target/49244 (__sync or __atomic builtins will not emit 'lock > > > bts/btr/btc') > > > > > > optimized bit test on __atomic_fetch_or_* and __atomic_fetch_and_* returns > > > with lock bts/btr/btc by turning > > > > > > mask_2 = 1 << cnt_1; > > > _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3); > > > _5 = _4 & mask_2; > > > > > > into > > > > > > _4 = ATOMIC_BIT_TEST_AND_SET (ptr_6, cnt_1, 0, _3); > > > _5 = _4; > > > > > > and > > > > > > mask_6 = 1 << bit_5(D); > > > _1 = ~mask_6; > > > _2 = __atomic_fetch_and_4 (v_8(D), _1, 0); > > > _3 = _2 & mask_6; > > > _4 = _3 != 0; > > > > > > into > > > > > > mask_6 = 1 << bit_5(D); > > > _1 = ~mask_6; > > > _11 = .ATOMIC_BIT_TEST_AND_RESET (v_8(D), bit_5(D), 1, 0); > > > _4 = _11 != 0; > > > > > > But it failed to optimize many equivalent, but slighly different cases: > > > > > > 1. > > > _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); > > > _4 = (_Bool) _1; > > > 2. > > > _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); > > > _4 = (_Bool) _1; > > > 3. > > > _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); > > > _7 = ~_1; > > > _5 = (_Bool) _7; > > > 4. > > > _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); > > > _7 = ~_1; > > > _5 = (_Bool) _7; > > > 5. > > > _1 = __atomic_fetch_or_4 (ptr_6, 1, _3); > > > _2 = (int) _1; > > > _7 = ~_2; > > > _5 = (_Bool) _7; > > > 6. > > > _1 = __atomic_fetch_and_4 (ptr_6, ~1, _3); > > > _2 = (int) _1; > > > _7 = ~_2; > > > _5 = (_Bool) _7; > > > 7. > > > _1 = _atomic_fetch_or_4 (ptr_6, mask, _3); > > > _2 = (int) _1; > > > _5 = _2 & mask; > > > 8. > > > _1 = __atomic_fetch_or_4 (ptr_6, 0x8000, _3); > > > _5 = (signed int) _1; > > > _4 = _5 < 0; > > > 9. > > > _1 = __atomic_fetch_and_4 (ptr_6, 0x7fff, _3); > > > _5 = (signed int) _1; > > > _4 = _5 < 0; > > > 10. > > > _1 = 1 << bit_4(D); > > > mask_5 = (unsigned int) _1; > > > _2 = __atomic_fetch_or_4 (v_7(D), mask_5, 0); > > > _3 = _2 & mask_5; > > > 11. > > > mask_7 = 1 << bit_6(D); > > > _1 = ~mask_7; > > > _2 = (unsigned int) _1; > > > _3 = __atomic_fetch_and_4 (v_9(D), _2, 0); > > > _4 = (int) _3; > > > _5 = _4 & mask_7; > > > > > > We make > > > > > > mask_2 = 1 << cnt_1; > > > _4 = __atomic_fetch_or_* (ptr_6, mask_2, _3); > > > _5 = _4 & mask_2; > > > > > > and > > > > > > mask_6 = 1 << bit_5(D); > > > _1 = ~mask_6; > > > _2 = __atomic_fetch_and_4 (v_8(D), _1, 0); > > > _3 = _2 & mask_6; > > > _4 = _3 != 0; > > > > > > the canonical forms for this optimization and transform cases 1-9 to the > > > equivalent canonical form. For cases 10 and 11, we simply remove the cast > > > before __atomic_fetch_or_4/__atomic_fetch_and_4 with > > > > > > _1 = 1 << bit_4(D); > > > _2 = __atomic_fetch_or_4 (v_7(D), _1, 0); > > > _3 = _2 & _1; > > > > > > and > > > > > > mask_7 = 1 << bit_6(D); > > > _1 = ~mask_7; > > > _3 = __atomic_fetch_and_4 (v_9(D), _1, 0); > > > _6 = _3 & mask_7; > > > _5 = (int) _6; > > > > > > gcc/ > > > > > > PR middle-end/102566 > > > * tree-ssa-ccp.c (convert_atomic_bit_not): New function. > > > (optimize_atomic_bit_test_and): Transform equivalent, but slighly > > > different cases to their canonical forms. > > > > > > gcc/testsuite/ > > > > > > PR middle-end/102566 > > > * g++.target/i386/pr102566-1.C: New test. > > > * g++.target/i386/pr102566-2.C: Likewise. > > > * g++.target/i386/pr102566-3.C: Likewise. > > > * g++.target/i386/pr102566-4.C: Likewise. > > > * g++.target/i386/pr102566-5a.C: Likewise. > > > * g++.target/i386/pr102566-5b.C: Likewise. > > > * g++.target/i386/pr102566-6a.C: Likewise. > > > * g++.target/i386/pr102566-6b.C: Likewise. > > > * gcc.target/i386/pr102566-1a.c: Likewise. > > > * gcc.target/i386/pr102566-1b.c: Likewise. > > > * gcc.target/i386/pr102566-2.c: Likewise. > > > * gcc.target/i386/pr102566-3a.c: Likewise. > > > * gcc.target/i386/pr102566-3b.c: Likewise. > > > * gcc.target/i386/pr102566-4.c: Likewise. > > > * gcc.target/i386/pr102566-5.c: Likewise. > > > * gcc.target/i386/pr102566-6.c: Likewise. > > > * gcc.target/i386/pr102566-7.c: Likewise. > > > * gcc.target/i386/pr102566-8a.c: Likewise. > > > * gcc.target/i386/pr102566-8b.c: Likewise. > > > * g
[PATCH v3 0/1] implement TLS register based stack canary for ARM
Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102352 In the Linux kernel, user processes calling into the kernel are essentially threads running in the same address space, of a program that never terminates. This means that using a global variable for the stack protector canary value is problematic on SMP systems, as we can never change it unless we reboot the system. (Processes that sleep for any reason will do so on a call into the kernel, which means that there will always be live kernel stack frames carrying copies of the canary taken when the function was entered) AArch64 implements -mstack-protector-guard=sysreg for this purpose, as this permits the kernel to use different memory addresses for the stack canary for each CPU, and context switch the chosen system register with the rest of the process, allowing each process to use its own unique value for the stack canary. This patch implements something similar, but for the 32-bit ARM kernel, which will start using the user space TLS register TPIDRURO to index per-process metadata while running in the kernel. This means we can just add an offset to TPIDRURO to obtain the address from which to load the canary value. As for the spilling issues that have been fixed in this code in the past: I suppose a register carrying the TLS register value will never get spilled to begin with? Changes since v2: - fix the template for stack_protect_test_tls so it correctly conveys the fact that it sets the Z flag Comments/suggestions welcome. Cc: Keith Packard Cc: thomas.preudho...@celest.fr Cc: adhemerval.zane...@linaro.org Cc: Qing Zhao Cc: Richard Sandiford Cc: gcc-patches@gcc.gnu.org Ard Biesheuvel (1): [ARM] Add support for TLS register based stack protector canary access gcc/config/arm/arm-opts.h | 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.c| 52 gcc/config/arm/arm.md | 62 +++- gcc/config/arm/arm.opt | 22 +++ gcc/doc/invoke.texi | 9 +++ 6 files changed, 151 insertions(+), 2 deletions(-) -- 2.30.2
[PATCH v3 1/1] [ARM] Add support for TLS register based stack protector canary access
Add support for accessing the stack canary value via the TLS register, so that multiple threads running in the same address space can use distinct canary values. This is intended for the Linux kernel running in SMP mode, where processes entering the kernel are essentially threads running the same program concurrently: using a global variable for the canary in that context is problematic because it can never be rotated, and so the OS is forced to use the same value as long as it remains up. Using the TLS register to index the stack canary helps with this, as it allows each CPU to context switch the TLS register along with the rest of the process, permitting each process to use its own value for the stack canary. 2021-10-21 Ard Biesheuvel * config/arm/arm-opts.h (enum stack_protector_guard): New * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): New * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define (arm_option_override_internal): Handle and put in error checks for stack protector guard options. (arm_option_reconfigure_globals): Likewise (arm_stack_protect_tls_canary_mem): New (arm_stack_protect_guard): New * config/arm/arm.md (stack_protect_set): New (stack_protect_set_tls): Likewise (stack_protect_test): Likewise (stack_protect_test_tls): Likewise * config/arm/arm.opt (-mstack-protector-guard): New (-mstack-protector-guard-offset): New. Signed-off-by: Ard Biesheuvel --- gcc/config/arm/arm-opts.h | 6 ++ gcc/config/arm/arm-protos.h | 2 + gcc/config/arm/arm.c| 52 gcc/config/arm/arm.md | 62 +++- gcc/config/arm/arm.opt | 22 +++ gcc/doc/invoke.texi | 9 +++ 6 files changed, 151 insertions(+), 2 deletions(-) diff --git a/gcc/config/arm/arm-opts.h b/gcc/config/arm/arm-opts.h index 5c4b62f404f7..581ba3c4fbbb 100644 --- a/gcc/config/arm/arm-opts.h +++ b/gcc/config/arm/arm-opts.h @@ -69,4 +69,10 @@ enum arm_tls_type { TLS_GNU, TLS_GNU2 }; + +/* Where to get the canary for the stack protector. */ +enum stack_protector_guard { + SSP_TLSREG, /* per-thread canary in TLS register */ + SSP_GLOBAL /* global canary */ +}; #endif diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 9b1f61394ad7..37e80256a78d 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -195,6 +195,8 @@ extern void arm_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); extern rtx arm_load_tp (rtx); extern bool arm_coproc_builtin_available (enum unspecv); extern bool arm_coproc_ldc_stc_legitimate_address (rtx); +extern rtx arm_stack_protect_tls_canary_mem (void); + #if defined TREE_CODE extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index c4ff06b087eb..0bf06e764dbb 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -829,6 +829,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_MD_ASM_ADJUST #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust + +#undef TARGET_STACK_PROTECT_GUARD +#define TARGET_STACK_PROTECT_GUARD arm_stack_protect_guard /* Obstack for minipool constant handling. */ static struct obstack minipool_obstack; @@ -3155,6 +3158,26 @@ arm_option_override_internal (struct gcc_options *opts, if (TARGET_THUMB2_P (opts->x_target_flags)) opts->x_inline_asm_unified = true; + if (arm_stack_protector_guard == SSP_GLOBAL + && opts->x_arm_stack_protector_guard_offset_str) +{ + error ("incompatible options %'-mstack-protector-guard=global%' and" +"%'-mstack-protector-guard-offset=%qs%'", +arm_stack_protector_guard_offset_str); +} + + if (opts->x_arm_stack_protector_guard_offset_str) +{ + char *end; + const char *str = arm_stack_protector_guard_offset_str; + errno = 0; + long offs = strtol (arm_stack_protector_guard_offset_str, &end, 0); + if (!*str || *end || errno) + error ("%qs is not a valid offset in %qs", str, + "-mstack-protector-guard-offset="); + arm_stack_protector_guard_offset = offs; +} + #ifdef SUBTARGET_OVERRIDE_INTERNAL_OPTIONS SUBTARGET_OVERRIDE_INTERNAL_OPTIONS; #endif @@ -3822,6 +3845,10 @@ arm_option_reconfigure_globals (void) else target_thread_pointer = TP_SOFT; } + + if (arm_stack_protector_guard == SSP_TLSREG + && target_thread_pointer != TP_CP15) +error("%'-mstack-protector-guard=tls%' needs a hardware TLS register"); } /* Perform some validation between the desired architecture and the rest of the @@ -8087,6 +8114,19 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg, } +rtx +arm_stack_protect_tls_canary_mem (void) +{ + rtx tp = gen_reg_rtx (SImode); + emit_insn (gen_load_tp_hard (tp)); + + rtx r
Re: [PATCH] hardened conditionals
On Fri, Oct 22, 2021 at 4:19 AM Alexandre Oliva wrote: > > On Oct 20, 2021, Alexandre Oliva wrote: > > > I suppose it's a latent issue exposed by the patch, > > I was mistaken. Though I even had bisected the -fcompare-debug problem > back to a patch from back in May, that added a new sink_code pass before > store_merging, it was actually a bug in my patch, it was just a little > hard to hit with bootstrap-debug, but it came up with -fcompare-debug in > ways that misled me. > > split_block remains slightly risky to use unless you know you have or > are going to insert nondebug stmts/insns in both blocks. I've often > pondered warning in case split_block completes with only debug > stmts/insns in either block, but IIRC there are multiple passes that > split first and insert code afterwards, which have to be rearranged to > aovid the warning. > > Anyway, here's the fixed patch. Regstrapped on x86_64-linux-gnu, and > bootstrapped with an additional patch that enables both new passes. Ok > to install? OK. Thanks, Richard. > > hardened conditionals > > From: Alexandre Oliva > > This patch introduces optional passes to harden conditionals used in > branches, and in computing boolean expressions, by adding redundant > tests of the reversed conditions, and trapping in case of unexpected > results. Though in abstract machines the redundant tests should never > fail, CPUs may be led to misbehave under certain kinds of attacks, > such as of power deprivation, and these tests reduce the likelihood of > going too far down an unexpected execution path. > > > for gcc/ChangeLog > > * common.opt (fharden-compares): New. > (fharden-conditional-branches): New. > * doc/invoke.texi: Document new options. > * gimple-harden-conditionals.cc: New. > * passes.def: Add new passes. > * tree-pass.h (make_pass_harden_compares): Declare. > (make_pass_harden_conditional_branches): Declare. > > for gcc/ada/ChangeLog > > * doc/gnat_rm/security_hardening_features.rst > (Hardened Conditionals): New. > > for gcc/testsuite/ChangeLog > > * c-c++-common/torture/harden-comp.c: New. > * c-c++-common/torture/harden-cond.c: New. > --- > gcc/Makefile.in|1 > .../doc/gnat_rm/security_hardening_features.rst| 40 ++ > gcc/common.opt |8 > gcc/doc/invoke.texi| 19 + > gcc/gimple-harden-conditionals.cc | 439 > > gcc/passes.def |2 > gcc/testsuite/c-c++-common/torture/harden-comp.c | 14 + > gcc/testsuite/c-c++-common/torture/harden-cond.c | 18 + > gcc/tree-pass.h|3 > 9 files changed, 544 insertions(+) > create mode 100644 gcc/gimple-harden-conditionals.cc > create mode 100644 gcc/testsuite/c-c++-common/torture/harden-comp.c > create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cond.c > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index f36ffa4740b78..a79ff93dd5999 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1389,6 +1389,7 @@ OBJS = \ > gimple-if-to-switch.o \ > gimple-iterator.o \ > gimple-fold.o \ > + gimple-harden-conditionals.o \ > gimple-laddress.o \ > gimple-loop-interchange.o \ > gimple-loop-jam.o \ > diff --git a/gcc/ada/doc/gnat_rm/security_hardening_features.rst > b/gcc/ada/doc/gnat_rm/security_hardening_features.rst > index 1c46e3a4c7b88..52240d7e3dd54 100644 > --- a/gcc/ada/doc/gnat_rm/security_hardening_features.rst > +++ b/gcc/ada/doc/gnat_rm/security_hardening_features.rst > @@ -87,3 +87,43 @@ types and subtypes, may be silently ignored. > Specifically, it is not > currently recommended to rely on any effects this pragma might be > expected to have when calling subprograms through access-to-subprogram > variables. > + > + > +.. Hardened Conditionals: > + > +Hardened Conditionals > += > + > +GNAT can harden conditionals to protect against control flow attacks. > + > +This is accomplished by two complementary transformations, each > +activated by a separate command-line option. > + > +The option *-fharden-compares* enables hardening of compares that > +compute results stored in variables, adding verification that the > +reversed compare yields the opposite result. > + > +The option *-fharden-conditional-branches* enables hardening of > +compares that guard conditional branches, adding verification of the > +reversed compare to both execution paths. > + > +These transformations are introduced late in the compilation pipeline, > +long after boolean expressions are decomposed into separate compares, > +each one turned into either a conditional branch or a compare whose > +result is stored in a boolean variable or temporary. Compiler > +optimizations, if enabled, may also turn conditiona
Re: [PATCH v3] detect out-of-bounds stores by atomic functions [PR102453]
On Mon, Oct 25, 2021 at 4:21 AM Jeff Law wrote: > > > > On 10/24/2021 5:40 PM, Martin Sebor via Gcc-patches wrote: > > Attached is a revised patch for just the access warning pass > > to diagnose out-of-bounds stores by atomic functions, with > > no attr-fnspec changes. > > > > Is this okay for trunk? > > > > Martin > > > > PS Just to clarify the effect of the original patch in case > > it wasn't: it didn't enable optimizations of atomic built-ins. > > It just made it possible, by first calling the new > > atomic_builtin_fnspec() to get the fnspec, and then by actually > > doing something with it. The original patch did not modify > > builtin_fnspec() to call the new atomic_builtin_fnspec(). But > > since you seem to have reservations about exposing the attribute > > in any form I have withdrawn the original patch and replaced it > > with the more limited one. > > > > > I'm letting Richi take the lead here. But I did notice a tiny nit: > > > > + > > + /* Tyhe size in bytes of the access by the function, and the number > s/Tyhe/The/ The patch looks OK to me. Thanks, Richard. > >
RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values
On Mon, 25 Oct 2021, Tamar Christina wrote: > > -Original Message- > > From: Richard Biener > > Sent: Friday, October 15, 2021 12:31 PM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd > > > > Subject: Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns > > on signed values > > > > On Fri, 15 Oct 2021, Tamar Christina wrote: > > > > > Hi All, > > > > > > During testing after rebasing to commit I noticed a failing testcase > > > with the bitmask compare patch. > > > > > > Consider the following C++ testcase: > > > > > > #include > > > > > > #define A __attribute__((noipa)) > > > A bool f5 (double i, double j) { auto c = i <=> j; return c >= 0; } > > > > > > This turns into a comparison against chars, on systems where chars are > > > signed the pattern inserts an unsigned convert such that it's able to > > > do the transformation. > > > > > > i.e.: > > > > > > # RANGE [-1, 2] > > > # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)> > > > # RANGE ~[3, 254] > > > _11 = (unsigned char) c$_M_value_22; > > > _19 = _11 <= 1; > > > # .MEM_24 = VDEF <.MEM_6(D)> > > > D.10434 ={v} {CLOBBER}; > > > # .MEM_14 = VDEF <.MEM_24> > > > D.10407 ={v} {CLOBBER}; > > > # VUSE <.MEM_14> > > > return _19; > > > > > > instead of: > > > > > > # RANGE [-1, 2] > > > # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)> > > > # RANGE [-2, 2] > > > _3 = c$_M_value_5 & -2; > > > _19 = _3 == 0; > > > # .MEM_24 = VDEF <.MEM_6(D)> > > > D.10440 ={v} {CLOBBER}; > > > # .MEM_14 = VDEF <.MEM_24> > > > D.10413 ={v} {CLOBBER}; > > > # VUSE <.MEM_14> > > > return _19; > > > > > > This causes much worse codegen under -ffast-math due to phiops no > > > longer recognizing the pattern. It turns out that phiopts > > > spaceship_replacement is looking for the exact form that was just changed. > > > > > > Trying to get it to recognize the new form is not trivial as the > > > transformation doesn't look to work when the thing it's pointing to is > > > itself > > a phi-node. > > > > What do you mean? Where it handles the BIT_AND it could also handle the > > conversion, no? The later handling would probably more explicitely need to > > distinguish between the BIT_AND and the conversion forms. > > Looks like I misunderstood the code, it was looking at the uses not the defs > of > the value. > > --- inline copy of patch --- > > The comments seems to suggest this code only checks for (res & ~1) == 0 but > the > implementation seems to suggest it's broader. > > As such I added a case to check to see if the value comparison we found is a > type cast. and strips away the type cast and continues. > > In match.pd the typecasts are only added for signed comparisons to == 0 and > != 0 > which are then rewritten into comparisons with 1. > > As such I only check for 1 and LE and GT, which is what match.pd would have > rewritten it to. > > This fixes the regression but this is not code I 100% understand, since I > don't > really know the semantics of the spaceship operator so would appreciate an > extra > look. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > x86_64-pc-linux-gnu and no regressions. > > Ok for master? Please add a testcase. I hope Jakub can review the spaceship_replacement patch since he's the one familiar with the code. Thanks, Richard. > Thanks, > Tamar > > gcc/ChangeLog: > > * tree-ssa-phiopt.c (spaceship_replacement): Handle new canonical > codegen. > > diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c > index > 0e339c46afa29fa97f90d9bc4394370cd9b4b396..65b25be3399b75d5e9cab0f78aa2340418571a33 > 100644 > --- a/gcc/tree-ssa-phiopt.c > +++ b/gcc/tree-ssa-phiopt.c > @@ -2037,6 +2037,7 @@ spaceship_replacement (basic_block cond_bb, basic_block > middle_bb, >tree lhs, rhs; >gimple *orig_use_stmt = use_stmt; >tree orig_use_lhs = NULL_TREE; > + bool is_canon = false; >int prec = TYPE_PRECISION (TREE_TYPE (phires)); >if (is_gimple_assign (use_stmt) >&& gimple_assign_rhs_code (use_stmt) == BIT_AND_EXPR > @@ -2063,6 +2064,26 @@ spaceship_replacement (basic_block cond_bb, > basic_block middle_bb, > } >else if (is_gimple_assign (use_stmt)) > { > + /* Deal with if match.pd has rewritten the (res & ~1) == 0 > + into res <= 1 and has left a type-cast for signed types. */ > + if (gimple_assign_cast_p (use_stmt)) > + { > + orig_use_lhs = gimple_assign_lhs (use_stmt); > + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (orig_use_lhs)) > + return false; > + if (EDGE_COUNT (phi_bb->preds) != 4) > + return false; > + if (!TYPE_UNSIGNED (TREE_TYPE (orig_use_lhs))) > + return false; > + if (!single_imm_use (orig_use_lhs, &use_p, &use_stmt)) > + return false; > + tree_code cmp; > + if (is_gimple_assign (use_stmt) > + && (cmp = gimple_assign_rhs_code (use_stmt)) > + && (cmp == LE_E
RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values
> -Original Message- > From: Richard Biener > Sent: Tuesday, October 26, 2021 9:26 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd > > Subject: RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns > on signed values > > On Mon, 25 Oct 2021, Tamar Christina wrote: > > > > -Original Message- > > > From: Richard Biener > > > Sent: Friday, October 15, 2021 12:31 PM > > > To: Tamar Christina > > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd > > > > > > Subject: Re: [PATCH] middle-end: fix de-optimizations with bitclear > > > patterns on signed values > > > > > > On Fri, 15 Oct 2021, Tamar Christina wrote: > > > > > > > Hi All, > > > > > > > > During testing after rebasing to commit I noticed a failing > > > > testcase with the bitmask compare patch. > > > > > > > > Consider the following C++ testcase: > > > > > > > > #include > > > > > > > > #define A __attribute__((noipa)) > > > > A bool f5 (double i, double j) { auto c = i <=> j; return c >= 0; > > > > } > > > > > > > > This turns into a comparison against chars, on systems where chars > > > > are signed the pattern inserts an unsigned convert such that it's > > > > able to do the transformation. > > > > > > > > i.e.: > > > > > > > > # RANGE [-1, 2] > > > > # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)> > > > > # RANGE ~[3, 254] > > > > _11 = (unsigned char) c$_M_value_22; > > > > _19 = _11 <= 1; > > > > # .MEM_24 = VDEF <.MEM_6(D)> > > > > D.10434 ={v} {CLOBBER}; > > > > # .MEM_14 = VDEF <.MEM_24> > > > > D.10407 ={v} {CLOBBER}; > > > > # VUSE <.MEM_14> > > > > return _19; > > > > > > > > instead of: > > > > > > > > # RANGE [-1, 2] > > > > # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)> > > > > # RANGE [-2, 2] > > > > _3 = c$_M_value_5 & -2; > > > > _19 = _3 == 0; > > > > # .MEM_24 = VDEF <.MEM_6(D)> > > > > D.10440 ={v} {CLOBBER}; > > > > # .MEM_14 = VDEF <.MEM_24> > > > > D.10413 ={v} {CLOBBER}; > > > > # VUSE <.MEM_14> > > > > return _19; > > > > > > > > This causes much worse codegen under -ffast-math due to phiops no > > > > longer recognizing the pattern. It turns out that phiopts > > > > spaceship_replacement is looking for the exact form that was just > changed. > > > > > > > > Trying to get it to recognize the new form is not trivial as the > > > > transformation doesn't look to work when the thing it's pointing > > > > to is itself > > > a phi-node. > > > > > > What do you mean? Where it handles the BIT_AND it could also handle > > > the conversion, no? The later handling would probably more > > > explicitely need to distinguish between the BIT_AND and the conversion > forms. > > > > Looks like I misunderstood the code, it was looking at the uses not > > the defs of the value. > > > > --- inline copy of patch --- > > > > The comments seems to suggest this code only checks for (res & ~1) == > > 0 but the implementation seems to suggest it's broader. > > > > As such I added a case to check to see if the value comparison we > > found is a type cast. and strips away the type cast and continues. > > > > In match.pd the typecasts are only added for signed comparisons to == > > 0 and != 0 which are then rewritten into comparisons with 1. > > > > As such I only check for 1 and LE and GT, which is what match.pd would > > have rewritten it to. > > > > This fixes the regression but this is not code I 100% understand, > > since I don't really know the semantics of the spaceship operator so > > would appreciate an extra look. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > > and no regressions. > > > > Ok for master? > > Please add a testcase. I hope Jakub can review the spaceship_replacement > patch since he's the one familiar with the code. There's already a bunch of testcases that test the various variants: gcc/testsuite/g++.dg/opt/pr94589-1.C and gcc/testsuite/g++.dg/opt/pr94589-2.C which is how I noticed the failure. However they only trigger the failure on signed chars. I tried forcing `-fsigned-char` to see if I can make a general testcase but this seems to have not done it. Is there another flag I can use? Regards, Tamar > > Thanks, > Richard. > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * tree-ssa-phiopt.c (spaceship_replacement): Handle new canonical > > codegen. > > > > diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c index > > > 0e339c46afa29fa97f90d9bc4394370cd9b4b396..65b25be3399b75d5e9cab0f78 > aa2 > > 340418571a33 100644 > > --- a/gcc/tree-ssa-phiopt.c > > +++ b/gcc/tree-ssa-phiopt.c > > @@ -2037,6 +2037,7 @@ spaceship_replacement (basic_block cond_bb, > basic_block middle_bb, > >tree lhs, rhs; > >gimple *orig_use_stmt = use_stmt; > >tree orig_use_lhs = NULL_TREE; > > + bool is_canon = false; > >int prec = TYPE_PRECISION (TREE_TYPE (phires)); > >if (is_gimple_assign (use_stmt) > >&& gimple_assign_rhs_code (use_stmt) == BIT_AND_EX
RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values
On Tue, 26 Oct 2021, Tamar Christina wrote: > > -Original Message- > > From: Richard Biener > > Sent: Tuesday, October 26, 2021 9:26 AM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd > > > > Subject: RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns > > on signed values > > > > On Mon, 25 Oct 2021, Tamar Christina wrote: > > > > > > -Original Message- > > > > From: Richard Biener > > > > Sent: Friday, October 15, 2021 12:31 PM > > > > To: Tamar Christina > > > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd > > > > > > > > Subject: Re: [PATCH] middle-end: fix de-optimizations with bitclear > > > > patterns on signed values > > > > > > > > On Fri, 15 Oct 2021, Tamar Christina wrote: > > > > > > > > > Hi All, > > > > > > > > > > During testing after rebasing to commit I noticed a failing > > > > > testcase with the bitmask compare patch. > > > > > > > > > > Consider the following C++ testcase: > > > > > > > > > > #include > > > > > > > > > > #define A __attribute__((noipa)) > > > > > A bool f5 (double i, double j) { auto c = i <=> j; return c >= 0; > > > > > } > > > > > > > > > > This turns into a comparison against chars, on systems where chars > > > > > are signed the pattern inserts an unsigned convert such that it's > > > > > able to do the transformation. > > > > > > > > > > i.e.: > > > > > > > > > > # RANGE [-1, 2] > > > > > # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)> > > > > > # RANGE ~[3, 254] > > > > > _11 = (unsigned char) c$_M_value_22; > > > > > _19 = _11 <= 1; > > > > > # .MEM_24 = VDEF <.MEM_6(D)> > > > > > D.10434 ={v} {CLOBBER}; > > > > > # .MEM_14 = VDEF <.MEM_24> > > > > > D.10407 ={v} {CLOBBER}; > > > > > # VUSE <.MEM_14> > > > > > return _19; > > > > > > > > > > instead of: > > > > > > > > > > # RANGE [-1, 2] > > > > > # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)> > > > > > # RANGE [-2, 2] > > > > > _3 = c$_M_value_5 & -2; > > > > > _19 = _3 == 0; > > > > > # .MEM_24 = VDEF <.MEM_6(D)> > > > > > D.10440 ={v} {CLOBBER}; > > > > > # .MEM_14 = VDEF <.MEM_24> > > > > > D.10413 ={v} {CLOBBER}; > > > > > # VUSE <.MEM_14> > > > > > return _19; > > > > > > > > > > This causes much worse codegen under -ffast-math due to phiops no > > > > > longer recognizing the pattern. It turns out that phiopts > > > > > spaceship_replacement is looking for the exact form that was just > > changed. > > > > > > > > > > Trying to get it to recognize the new form is not trivial as the > > > > > transformation doesn't look to work when the thing it's pointing > > > > > to is itself > > > > a phi-node. > > > > > > > > What do you mean? Where it handles the BIT_AND it could also handle > > > > the conversion, no? The later handling would probably more > > > > explicitely need to distinguish between the BIT_AND and the conversion > > forms. > > > > > > Looks like I misunderstood the code, it was looking at the uses not > > > the defs of the value. > > > > > > --- inline copy of patch --- > > > > > > The comments seems to suggest this code only checks for (res & ~1) == > > > 0 but the implementation seems to suggest it's broader. > > > > > > As such I added a case to check to see if the value comparison we > > > found is a type cast. and strips away the type cast and continues. > > > > > > In match.pd the typecasts are only added for signed comparisons to == > > > 0 and != 0 which are then rewritten into comparisons with 1. > > > > > > As such I only check for 1 and LE and GT, which is what match.pd would > > > have rewritten it to. > > > > > > This fixes the regression but this is not code I 100% understand, > > > since I don't really know the semantics of the spaceship operator so > > > would appreciate an extra look. > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > > > and no regressions. > > > > > > Ok for master? > > > > Please add a testcase. I hope Jakub can review the spaceship_replacement > > patch since he's the one familiar with the code. > > There's already a bunch of testcases that test the various variants: > gcc/testsuite/g++.dg/opt/pr94589-1.C > and gcc/testsuite/g++.dg/opt/pr94589-2.C which is how I noticed the failure. > However they only trigger > the failure on signed chars. I tried forcing `-fsigned-char` to see if I can > make a general testcase but this > seems to have not done it. > > Is there another flag I can use? I suppose you can copy the testcase(s) and replace 'auto' with 'signed char'? > > Regards, > Tamar > > > > Thanks, > > Richard. > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > * tree-ssa-phiopt.c (spaceship_replacement): Handle new canonical > > > codegen. > > > > > > diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c index > > > > > 0e339c46afa29fa97f90d9bc4394370cd9b4b396..65b25be3399b75d5e9cab0f78 > > aa2 > > > 340418571a33 100644 > > > --- a/gcc/tree-ssa-phiopt.c > > > +++
[PATCH] Turn vect_create_addr_base_for_vector_ref offset into a byte offset
This changes the offset in elements for vect_create_addr_base_for_vector_ref and vect_create_data_ref_ptr to an offset in bytes, easing a following refactoring. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-10-26 Richard Biener * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): Take offset in bytes. (vect_create_data_ref_ptr): Likewise. * tree-vect-loop-manip.c (get_misalign_in_elems): Multiply offset by element size. (vect_create_cond_for_align_checks): Likewise. * tree-vect-stmts.c (get_negative_load_store_type): Likewise. (vectorizable_load): Remove duplicate leftover from merge conflict. --- gcc/tree-vect-data-refs.c | 15 ++- gcc/tree-vect-loop-manip.c | 8 ++-- gcc/tree-vect-stmts.c | 11 +++ 3 files changed, 19 insertions(+), 15 deletions(-) diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index cbcd4b80246..46360c50bd4 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -4753,8 +4753,7 @@ vect_duplicate_ssa_name_ptr_info (tree name, dr_vec_info *dr_info) if LOOP=i_loop: &in (relative to i_loop) if LOOP=j_loop: &in+i*2B(relative to j_loop) BYTE_OFFSET: Optional, defaulted to NULL. If supplied, it is added to the - initial address. Unlike OFFSET, which is number of elements to - be added, BYTE_OFFSET is measured in bytes. + initial address. Both OFFSET and BYTE_OFFSET are measured in bytes. Output: 1. Return an SSA_NAME whose value is the address of the memory location of @@ -4777,7 +4776,6 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, stmt_vec_info stmt_info, tree dest; gimple_seq seq = NULL; tree vect_ptr_type; - tree step = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr))); loop_vec_info loop_vinfo = dyn_cast (vinfo); innermost_loop_behavior *drb = vect_dr_behavior (vinfo, dr_info); @@ -4801,8 +4799,7 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, stmt_vec_info stmt_info, if (offset) { - offset = fold_build2 (MULT_EXPR, sizetype, - fold_convert (sizetype, offset), step); + offset = fold_convert (sizetype, offset); base_offset = fold_build2 (PLUS_EXPR, sizetype, base_offset, offset); } @@ -4860,8 +4857,8 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, stmt_vec_info stmt_info, 2. AGGR_TYPE: the type of the reference, which should be either a vector or an array. 3. AT_LOOP: the loop where the vector memref is to be created. - 4. OFFSET (optional): an offset to be added to the initial address accessed - by the data-ref in STMT_INFO. + 4. OFFSET (optional): a byte offset to be added to the initial address + accessed by the data-ref in STMT_INFO. 5. BSI: location where the new stmts are to be placed if there is no loop 6. ONLY_INIT: indicate if ap is to be updated in the loop, or remain pointing to the initial address. @@ -4885,7 +4882,7 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, stmt_vec_info stmt_info, if OFFSET is not supplied: initial_address = &a[init]; if OFFSET is supplied: - initial_address = &a[init + OFFSET]; +initial_address = &a[init] + OFFSET; if BYTE_OFFSET is supplied: initial_address = &a[init] + BYTE_OFFSET; @@ -5031,7 +5028,7 @@ vect_create_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info, /* (2) Calculate the initial address of the aggregate-pointer, and set the aggregate-pointer to point to it before the loop. */ - /* Create: (&(base[init_val+offset]+byte_offset) in the loop preheader. */ + /* Create: (&(base[init_val]+offset+byte_offset) in the loop preheader. */ new_temp = vect_create_addr_base_for_vector_ref (vinfo, stmt_info, &new_stmt_list, diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 378b1026baa..72d583189c4 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -1625,7 +1625,9 @@ get_misalign_in_elems (gimple **seq, loop_vec_info loop_vinfo) bool negative = tree_int_cst_compare (DR_STEP (dr_info->dr), size_zero_node) < 0; tree offset = (negative -? size_int (-TYPE_VECTOR_SUBPARTS (vectype) + 1) +? size_int ((-TYPE_VECTOR_SUBPARTS (vectype) + 1) +* TREE_INT_CST_LOW +(TYPE_SIZE_UNIT (TREE_TYPE (vectype : size_zero_node); tree start_addr = vect_create_addr_base_for_vector_ref (loop_vinfo, stmt_info, seq, @@ -3227,7 +3229,9 @@ vect_create_cond_for_align_checks (loop_vec_info loop_vinfo, bool ne
[committed] Fortran: Fix character(len=cst) dummies with bind(C) [PR102885] (erratum: should be: 'len=noncst')
I forgot to handle len= correctly. This patch does the same as for non-BIND(C). In the latter case, the call is: gfc_generate_function_code → gfc_trans_deferred_vars → gfc_trans_deferred_array and that function has if (sym->ts.type == BT_CHARACTER && !INTEGER_CST_P (sym->ts.u.cl->backend_decl)) { gfc_conv_string_length (sym->ts.u.cl, NULL, &init); gfc_trans_vla_type_sizes (sym, &init); } Expect a déjà vu if you read the attached patch ... Thanks to Dominique for testing with -flto and finding this issue and to Richard for helping me to understand what goes wrong. Committed as obvious as r12-4703. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit a31a3d0421f0cf1f7eefacfec8cbf37e7f91600d Author: Tobias Burnus Date: Tue Oct 26 10:53:53 2021 +0200 Fortran: Fix character(len=cst) dummies with bind(C) [PR102885] PR fortran/102885 gcc/fortran/ChangeLog: * trans-decl.c (gfc_conv_cfi_to_gfc): Properly handle nonconstant character lenghts. gcc/testsuite/ChangeLog: * gfortran.dg/lto/bind-c-char_0.f90: New test. --- gcc/fortran/trans-decl.c| 7 gcc/testsuite/gfortran.dg/lto/bind-c-char_0.f90 | 49 + 2 files changed, 56 insertions(+) diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c index fe5511b5285..49ba9060eae 100644 --- a/gcc/fortran/trans-decl.c +++ b/gcc/fortran/trans-decl.c @@ -6837,6 +6837,13 @@ gfc_conv_cfi_to_gfc (stmtblock_t *init, stmtblock_t *finally, gfc_add_modify (&block, sym->ts.u.cl->backend_decl, tmp); } + if (sym->ts.type == BT_CHARACTER + && !INTEGER_CST_P (sym->ts.u.cl->backend_decl)) +{ + gfc_conv_string_length (sym->ts.u.cl, NULL, init); + gfc_trans_vla_type_sizes (sym, init); +} + /* gfc->data = cfi->base_addr - or for scalars: gfc = cfi->base_addr. assumed-size/explicit-size arrays end up here for character(len=*) only. */ diff --git a/gcc/testsuite/gfortran.dg/lto/bind-c-char_0.f90 b/gcc/testsuite/gfortran.dg/lto/bind-c-char_0.f90 new file mode 100644 index 000..48b495b1d29 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/lto/bind-c-char_0.f90 @@ -0,0 +1,49 @@ +! { dg-lto-do link } +! { dg-lto-options {{ -O0 -flto }} } +! +! PR fortran/102885 + +module m + use iso_c_binding, only: c_char + implicit none (type, external) + +contains + +! Assumed-shape array, nonallocatable/nonpointer + +subroutine ar3 (xn, n) bind(C) + integer :: n + character(len=n) :: xn(..) + if (size(xn) /= 6) stop + if (len(xn) /= 5) stop + select rank(xn) +rank(1) + xn = ['FDGhf', & +'hdrhg', & +'fDgFl', & +'DFHs3', & +'4a54G', & +'hSs6k'] + rank default +stop + end select +end + +end + +program main + use m + implicit none (type, external) + character(kind=c_char, len=5) :: str5a6(6) + + ! assumed rank - with array descriptor + + str5a6 = ['DDGhf', & +'hdrh$', & +'fDGSl', & +'DFHs3', & +'43grG', & +'hFG$k'] + call ar3 (str5a6, 5) + +end
Re: [PATCH] libstdc++: Fix 17_intro/names.cc on Solaris
On Mon, 25 Oct 2021 at 12:55, Rainer Orth wrote: > > 17_intro/names.cc and experimental/names.cc currently FAIL on Solaris > > FAIL: 17_intro/names.cc (test for excess errors) > FAIL: experimental/names.cc (test for excess errors) > > Excess errors: > /usr/include/sys/timespec_util.h:22: error: expected ')' before ';' token > /usr/include/stdlib.h:157: error: expected unqualified-id before '[' token > /usr/include/stdlib.h:157: error: expected ')' before '[' token > > has > > extern int timespeccompare(const struct timespec *l, const struct timespec > *r); > > while has > > typedef struct drand48_data { > unsigned int _initialised; > unsigned short int x[3]; > unsigned short int a[3]; > unsigned int c; > unsigned short lastx[3]; > } drand48_data; > > both of which are broken by defining r resp. x to ( in the testcase. > > Fixed by undoing the defines. Tested on i386-pc-solaris2.11, > sparc-sun-solaris2.11, and x86_64-pc-linux-gnu. > > Ok for master? OK, thanks
Re: [PATCH] libstdc++: Fix 28_regex/basic_regex/84110.cc on Solaris
On Mon, 25 Oct 2021 at 13:09, Rainer Orth wrote: > > 28_regex/basic_regex/84110.cc currently FAILs on Solaris: > > FAIL: 28_regex/basic_regex/84110.cc (test for excess errors) > UNRESOLVED: 28_regex/basic_regex/84110.cc compilation failed to produce > executable > > Excess errors: > /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc:14: > error: reference to 'extended' is ambiguous > > The issue is seen in the full output: > > /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc: > In function ‘void test01()’: > /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc:14: > error: reference to ‘extended’ is ambiguous > In file included from > /var/gcc/regression/master/11.4-gcc-gas/build/gcc/include-fixed/math.h:391, > from > /var/gcc/regression/master/11.4-gcc-gas/build/i386-pc-solaris2.11/libstdc++-v3/include/cmath:45, > from > /vol/gcc/src/hg/master/local/libstdc++-v3/include/precompiled/stdc++.h:41: > /usr/include/floatingpoint.h:73: note: candidates are: ‘typedef unsigned int > extended [3]’ > > Fixed by qualifying extended. Tested on i386-pc-solaris2.11, > sparc-sun-solaris2.11, and x86_64-pc-linux-gnu. > > Ok for master? > > I'm not certain if this is the best fix, though. Yeah, it's a bit odd to have only one of them qualified. I'll probably forget why it's like that and remove it again to make it consistent ;-) Please do: using namespace std::regex_constants; // See https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582486.html using std::regex_constants::extended; (There's no difference between std::regex::extended and std::regex_constants::extended, but we might as well consistently qualify it one way or the other in a given scope). OK for master like that - thanks!
[PATCH] Move negative stride bias out of dr_misalignment
This moves applying of a bias for negative stride accesses out of dr_misalignment in favor of a more general optional offset argument. The negative bias is now computed by get_load_store_type and applied accordingly to determine the alignment support scheme. Likewise the peeling/versioning code is adjusted albeit that still assumes we'll end up with VMAT_CONTIGUOUS_DOWN or VMAT_CONTIGUOUS_REVERSE but at least when not so (VMAT_STRIDED_SLP is one possibility) then get_load_store_type will _not_ falsely report an aligned access but instead an access with known misalignment. This fixes PR96109. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2021-10-25 Richard Biener PR tree-optimization/96109 * tree-vectorizer.h (dr_misalignment): Add optional offset parameter. * tree-vect-data-refs.c (dr_misalignment): Likewise. Remove offset applied for negative stride accesses. (vect_enhance_data_refs_alignment): Compute negative stride access offset and pass it to dr_misalignment. * tree-vect-stmts.c (get_negative_load_store_type): Pass negative offset to dr_misalignment. (get_group_load_store_type): Likewise. (get_load_store_type): Likewise. (vectorizable_store): Remove asserts about alignment. (vectorizable_load): Likewise. --- gcc/tree-vect-data-refs.c | 58 ++- gcc/tree-vect-stmts.c | 19 + gcc/tree-vectorizer.h | 3 +- 3 files changed, 48 insertions(+), 32 deletions(-) diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 46360c50bd4..cda89ef9725 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -887,10 +887,11 @@ vect_slp_analyze_instance_dependence (vec_info *vinfo, slp_instance instance) return res; } -/* Return the misalignment of DR_INFO accessed in VECTYPE. */ +/* Return the misalignment of DR_INFO accessed in VECTYPE with OFFSET + applied. */ int -dr_misalignment (dr_vec_info *dr_info, tree vectype) +dr_misalignment (dr_vec_info *dr_info, tree vectype, poly_int64 offset) { HOST_WIDE_INT diff = 0; /* Alignment is only analyzed for the first element of a DR group, @@ -919,13 +920,9 @@ dr_misalignment (dr_vec_info *dr_info, tree vectype) targetm.vectorize.preferred_vector_alignment (vectype))) return DR_MISALIGNMENT_UNKNOWN; - /* If this is a backward running DR then first access in the larger - vectype actually is N-1 elements before the address in the DR. - Adjust misalign accordingly. */ - poly_int64 misalignment = misalign + diff; - if (tree_int_cst_sgn (DR_STEP (dr_info->dr)) < 0) -misalignment += ((TYPE_VECTOR_SUBPARTS (vectype) - 1) -* -TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype; + /* Apply the offset from the DR group start and the externally supplied + offset which can for example result from a negative stride access. */ + poly_int64 misalignment = misalign + diff + offset; /* vect_compute_data_ref_alignment will have ensured that target_alignment is constant and otherwise set misalign to DR_MISALIGNMENT_UNKNOWN. */ @@ -1549,8 +1546,15 @@ vect_get_peeling_costs_all_drs (loop_vec_info loop_vinfo, int misalignment; unsigned HOST_WIDE_INT alignment; + bool negative = tree_int_cst_compare (DR_STEP (dr_info->dr), + size_zero_node) < 0; + poly_int64 off = 0; + if (negative) + off = ((TYPE_VECTOR_SUBPARTS (vectype) - 1) + * -TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype; + if (npeel == 0) - misalignment = dr_misalignment (dr_info, vectype); + misalignment = dr_misalignment (dr_info, vectype, off); else if (dr_info == dr0_info || vect_dr_aligned_if_peeled_dr_is (dr_info, dr0_info)) misalignment = 0; @@ -1560,7 +1564,7 @@ vect_get_peeling_costs_all_drs (loop_vec_info loop_vinfo, misalignment = DR_MISALIGNMENT_UNKNOWN; else { - misalignment = dr_misalignment (dr_info, vectype); + misalignment = dr_misalignment (dr_info, vectype, off); misalignment += npeel * TREE_INT_CST_LOW (DR_STEP (dr_info->dr)); misalignment &= alignment - 1; } @@ -1960,8 +1964,11 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) */ unsigned int target_align = DR_TARGET_ALIGNMENT (dr_info).to_constant (); - unsigned int dr_size = vect_get_scalar_dr_size (dr_info); - unsigned int mis = dr_misalignment (dr_info, vectype); + unsigned HOST_WIDE_INT dr_size = vect_get_scalar_dr_size (dr_info); + poly_int64 off = 0; + if (negative) + off = (TYPE_VECTOR_SUBPARTS (vectype) - 1) * -dr_size; + unsigned int mis = dr_misalignment (dr_info, vectype,
Re: [PATCH,Fortran 3/7] Fortran: make some constructor* functions static
On 25.10.21 00:30, Bernhard Reutner-Fischer via Fortran wrote: From: Bernhard Reutner-Fischer gfc_constructor_expr_foreach and gfc_constructor_swap were just stubs. gcc/fortran/ChangeLog: * constructor.c (gfc_constructor_get_base): Make static. (gfc_constructor_expr_foreach, (gfc_constructor_swap): Delete. * constructor.h (gfc_constructor_get_base): Remove declaration. (gfc_constructor_expr_foreach, (gfc_constructor_swap): Delete. OK. Thanks for the cleanup. Tobias --- gcc/fortran/constructor.c | 20 ++-- gcc/fortran/constructor.h | 10 -- 2 files changed, 2 insertions(+), 28 deletions(-) diff --git a/gcc/fortran/constructor.c b/gcc/fortran/constructor.c index 3e4377a5ad3..4b5a748271b 100644 --- a/gcc/fortran/constructor.c +++ b/gcc/fortran/constructor.c @@ -85,7 +85,8 @@ gfc_constructor_get (void) return c; } -gfc_constructor_base gfc_constructor_get_base (void) +static gfc_constructor_base +gfc_constructor_get_base (void) { return splay_tree_new (splay_tree_compare_ints, NULL, node_free); } @@ -209,23 +210,6 @@ gfc_constructor_lookup_expr (gfc_constructor_base base, int offset) } -int -gfc_constructor_expr_foreach (gfc_constructor *ctor ATTRIBUTE_UNUSED, - int(*f)(gfc_expr *) ATTRIBUTE_UNUSED) -{ - gcc_assert (0); - return 0; -} - -void -gfc_constructor_swap (gfc_constructor *ctor ATTRIBUTE_UNUSED, - int n ATTRIBUTE_UNUSED, int m ATTRIBUTE_UNUSED) -{ - gcc_assert (0); -} - - - gfc_constructor * gfc_constructor_first (gfc_constructor_base base) { diff --git a/gcc/fortran/constructor.h b/gcc/fortran/constructor.h index 85a72dcfc0e..25cd6a8f192 100644 --- a/gcc/fortran/constructor.h +++ b/gcc/fortran/constructor.h @@ -23,8 +23,6 @@ along with GCC; see the file COPYING3. If not see /* Get a new constructor structure. */ gfc_constructor *gfc_constructor_get (void); -gfc_constructor_base gfc_constructor_get_base (void); - /* Copy a constructor structure. */ gfc_constructor_base gfc_constructor_copy (gfc_constructor_base base); @@ -64,14 +62,6 @@ gfc_constructor *gfc_constructor_lookup (gfc_constructor_base base, int n); */ gfc_expr *gfc_constructor_lookup_expr (gfc_constructor_base base, int n); - -int gfc_constructor_expr_foreach (gfc_constructor *ctor, int(*)(gfc_expr *)); - - -void gfc_constructor_swap (gfc_constructor *ctor, int n, int m); - - - /* Get the first constructor node in the constructure structure. Returns NULL if there is no such expression. */ gfc_constructor *gfc_constructor_first (gfc_constructor_base base); - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH,Fortran 1/7] Fortran: make some trans* functions static
On 25.10.21 00:30, Bernhard Reutner-Fischer via Fortran wrote: From: Bernhard Reutner-Fischer This makes some trans* functions static and deletes declarations of functions that either do not exist anymore like gfc_get_function_decl or that are unused like gfc_check_any_c_kind. gcc/fortran/ChangeLog: * expr.c (is_non_empty_structure_constructor): Make static. * gfortran.h (gfc_check_any_c_kind): Delete. * match.c (gfc_match_label): Make static. * match.h (gfc_match_label): Delete declaration. * scanner.c (file_changes_cur, file_changes_count, file_changes_allocated): Make static. * trans-expr.c (gfc_get_character_len): Make static. (gfc_class_len_or_zero_get): Make static. (VTAB_GET_FIELD_GEN): Undefine. (gfc_get_class_array_ref): Make static. (gfc_finish_interface_mapping): Make static. * trans-types.c (gfc_check_any_c_kind): Delete. (pfunc_type_node, dtype_type_node, gfc_get_ppc_type): Make static. * trans-types.h (gfc_get_ppc_type): Delete declaration. * trans.c (gfc_msg_wrong_return): Delete. * trans.h (gfc_class_len_or_zero_get, gfc_class_vtab_extends_get, gfc_vptr_extends_get, gfc_get_class_array_ref, gfc_get_character_len, gfc_finish_interface_mapping, gfc_msg_wrong_return, gfc_get_function_decl): Delete declaration. OK. Thanks for the cleanup! Tobias --- gcc/fortran/expr.c| 2 +- gcc/fortran/gfortran.h| 1 - gcc/fortran/match.c | 2 +- gcc/fortran/match.h | 1 - gcc/fortran/scanner.c | 4 ++-- gcc/fortran/trans-expr.c | 10 +- gcc/fortran/trans-types.c | 25 +++-- gcc/fortran/trans-types.h | 1 - gcc/fortran/trans.c | 1 - gcc/fortran/trans.h | 11 --- 10 files changed, 12 insertions(+), 46 deletions(-) diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c index b19d3a26c60..4dea840e348 100644 --- a/gcc/fortran/expr.c +++ b/gcc/fortran/expr.c @@ -4817,7 +4817,7 @@ gfc_apply_init (gfc_typespec *ts, symbol_attribute *attr, gfc_expr *init) /* Check whether an expression is a structure constructor and whether it has other values than NULL. */ -bool +static bool is_non_empty_structure_constructor (gfc_expr * e) { if (e->expr_type != EXPR_STRUCTURE) diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 66192c07d8c..f7662c59a5d 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -3284,7 +3284,6 @@ bool gfc_check_character_range (gfc_char_t, int); extern bool gfc_seen_div0; /* trans-types.c */ -bool gfc_check_any_c_kind (gfc_typespec *); int gfc_validate_kind (bt, int, bool); int gfc_get_int_kind_from_width_isofortranenv (int size); int gfc_get_real_kind_from_width_isofortranenv (int size); diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c index 53a575e616e..91cde55d7a1 100644 --- a/gcc/fortran/match.c +++ b/gcc/fortran/match.c @@ -599,7 +599,7 @@ cleanup: it. We also make sure the symbol does not refer to another (active) block. A matched label is pointed to by gfc_new_block. */ -match +static match gfc_match_label (void) { char name[GFC_MAX_SYMBOL_LEN + 1]; diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h index 21e94f79d95..eb9459ea99c 100644 --- a/gcc/fortran/match.h +++ b/gcc/fortran/match.h @@ -47,7 +47,6 @@ match gfc_match_space (void); match gfc_match_eos (void); match gfc_match_small_literal_int (int *, int *); match gfc_match_st_label (gfc_st_label **); -match gfc_match_label (void); match gfc_match_small_int (int *); match gfc_match_small_int_expr (int *, gfc_expr **); match gfc_match_name (char *); diff --git a/gcc/fortran/scanner.c b/gcc/fortran/scanner.c index 5a450692ba3..69b81ab97f8 100644 --- a/gcc/fortran/scanner.c +++ b/gcc/fortran/scanner.c @@ -78,8 +78,8 @@ static struct gfc_file_change gfc_linebuf *lb; int line; } *file_changes; -size_t file_changes_cur, file_changes_count; -size_t file_changes_allocated; +static size_t file_changes_cur, file_changes_count; +static size_t file_changes_allocated; static gfc_char_t *last_error_char; diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 2d7f9e0fb91..e7aec3845d3 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -45,7 +45,7 @@ along with GCC; see the file COPYING3. If not see /* Calculate the number of characters in a string. */ -tree +static tree gfc_get_character_len (tree type) { tree len; @@ -278,7 +278,7 @@ gfc_class_len_get (tree decl) /* Try to get the _len component of a class. When the class is not unlimited poly, i.e. no _len field exists, then return a zero node. */ -tree +static tree gfc_class_len_or_zero_get (tree decl) { tree len; @@ -382,7 +382,7 @@ VTAB_GET_FIELD_GEN (def_init, VTABLE_DEF_INIT_FIELD) VTAB_GET_FIELD_GEN (copy, VTABLE_COPY_FIELD) VTAB_GET_FIELD_GEN (final, VTABLE_FINAL_FIELD) VTAB_GET_FIELD_GEN (dealloc
Re: [PATCH,Fortran 2/7] Fortran: make some match* functions static
On 25.10.21 00:30, Bernhard Reutner-Fischer via Gcc-patches wrote: From: Bernhard Reutner-Fischer gfc_match_small_int_expr was unused, delete it. gfc_match_gcc_unroll should use gfc_match_small_literal_int and then (but wasn't in this patch) gfc_match_small_int can be deleted since it will be unused. gcc/fortran/ChangeLog: * decl.c (gfc_match_old_kind_spec, set_com_block_bind_c, set_verify_bind_c_sym, set_verify_bind_c_com_block, get_bind_c_idents, gfc_match_suffix, gfc_get_type_attr_spec, (check_extended_derived_type): Make static. (gfc_match_gcc_unroll): Add comment. * match.c (gfc_match_small_int_expr): Delete definition. * match.h (gfc_match_small_int_expr): Delete declaration. (gfc_match_name_C, gfc_match_old_kind_spec, set_com_block_bind_c, set_verify_bind_c_sym, set_verify_bind_c_com_block, get_bind_c_idents, gfc_match_suffix, gfc_get_type_attr_spec): Delete declaration. OK. Thanks for the cleanup. Tobias --- gcc/fortran/decl.c | 15 --- gcc/fortran/match.c | 26 -- gcc/fortran/match.h | 9 - 3 files changed, 8 insertions(+), 42 deletions(-) diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 6043e100fbb..1e034d1b344 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -3128,7 +3128,7 @@ cleanup: This assumes that the byte size is equal to the kind number for non-COMPLEX types, and equal to twice the kind number for COMPLEX. */ -match +static match gfc_match_old_kind_spec (gfc_typespec *ts) { match m; @@ -5867,7 +5867,7 @@ set_binding_label (const char **dest_label, const char *sym_name, /* Set the status of the given common block as being BIND(C) or not, depending on the given parameter, is_bind_c. */ -void +static void set_com_block_bind_c (gfc_common_head *com_block, int is_bind_c) { com_block->is_bind_c = is_bind_c; @@ -6055,7 +6055,7 @@ verify_bind_c_sym (gfc_symbol *tmp_sym, gfc_typespec *ts, the type is C interoperable. Errors are reported by the functions used to set/test these fields. */ -bool +static bool set_verify_bind_c_sym (gfc_symbol *tmp_sym, int num_idents) { bool retval = true; @@ -6075,7 +6075,7 @@ set_verify_bind_c_sym (gfc_symbol *tmp_sym, int num_idents) /* Set the fields marking the given common block as BIND(C), including a binding label, and report any errors encountered. */ -bool +static bool set_verify_bind_c_com_block (gfc_common_head *com_block, int num_idents) { bool retval = true; @@ -6095,7 +6095,7 @@ set_verify_bind_c_com_block (gfc_common_head *com_block, int num_idents) /* Retrieve the list of one or more identifiers that the given bind(c) attribute applies to. */ -bool +static bool get_bind_c_idents (void) { char name[GFC_MAX_SYMBOL_LEN + 1]; @@ -6804,7 +6804,7 @@ match_result (gfc_symbol *function, gfc_symbol **result) clause and BIND(C), either one, or neither. The draft does not require them to come in a specific order. */ -match +static match gfc_match_suffix (gfc_symbol *sym, gfc_symbol **result) { match is_bind_c; /* Found bind(c). */ @@ -10116,7 +10116,7 @@ check_extended_derived_type (char *name) not a handled attribute, and MATCH_YES otherwise. TODO: More error checking on attribute conflicts needs to be done. */ -match +static match gfc_get_type_attr_spec (symbol_attribute *attr, char *name) { /* See if the derived type is marked as private. */ @@ -11794,6 +11794,7 @@ gfc_match_gcc_unroll (void) { int value; + /* FIXME: use gfc_match_small_literal_int instead, delete small_int */ if (gfc_match_small_int (&value) == MATCH_YES) { if (value < 0 || value > USHRT_MAX) diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c index 91cde55d7a1..5d07f897e45 100644 --- a/gcc/fortran/match.c +++ b/gcc/fortran/match.c @@ -530,32 +530,6 @@ gfc_match_small_int (int *value) } -/* This function is the same as the gfc_match_small_int, except that - we're keeping the pointer to the expr. This function could just be - removed and the previously mentioned one modified, though all calls - to it would have to be modified then (and there were a number of - them). Return MATCH_ERROR if fail to extract the int; otherwise, - return the result of gfc_match_expr(). The expr (if any) that was - matched is returned in the parameter expr. */ - -match -gfc_match_small_int_expr (int *value, gfc_expr **expr) -{ - match m; - int i; - - m = gfc_match_expr (expr); - if (m != MATCH_YES) -return m; - - if (gfc_extract_int (*expr, &i, 1)) -m = MATCH_ERROR; - - *value = i; - return m; -} - - /* Matches a statement label. Uses gfc_match_small_literal_int() to do most of the work. */ diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h index eb9459ea99c..e9368db281d 100644 --- a/gcc/fortran/match.h +++ b/gcc/fortran/match.h @@ -48,9 +48,7 @@ m
Re: [PATCH,Fortran 4/7] Fortran: make some trans-array functions static
On 25.10.21 00:30, Bernhard Reutner-Fischer via Fortran wrote: From: Bernhard Reutner-Fischer gcc/fortran/ChangeLog: * trans-array.c (gfc_trans_scalarized_loop_end): Make static. * trans-array.h (gfc_trans_scalarized_loop_end, gfc_conv_tmp_ref, gfc_conv_array_transpose): Delete declaration. (the later are declared but no longer exist) OK. Thanks for the cleanup! Tobias --- gcc/fortran/trans-array.c | 2 +- gcc/fortran/trans-array.h | 6 -- 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index bceb8b24ba4..5ceb261b698 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -4161,7 +4161,7 @@ gfc_start_scalarized_body (gfc_loopinfo * loop, stmtblock_t * pbody) /* Generates the actual loop code for a scalarization loop. */ -void +static void gfc_trans_scalarized_loop_end (gfc_loopinfo * loop, int n, stmtblock_t * pbody) { diff --git a/gcc/fortran/trans-array.h b/gcc/fortran/trans-array.h index 1d3dc4819eb..12068c742a5 100644 --- a/gcc/fortran/trans-array.h +++ b/gcc/fortran/trans-array.h @@ -118,8 +118,6 @@ void gfc_copy_loopinfo_to_se (gfc_se *, gfc_loopinfo *); /* Marks the start of a scalarized expression, and declares loop variables. */ void gfc_start_scalarized_body (gfc_loopinfo *, stmtblock_t *); -/* Generates one actual loop for a scalarized expression. */ -void gfc_trans_scalarized_loop_end (gfc_loopinfo *, int, stmtblock_t *); /* Generates the actual loops for a scalarized expression. */ void gfc_trans_scalarizing_loops (gfc_loopinfo *, stmtblock_t *); /* Mark the end of the main loop body and the start of the copying loop. */ @@ -137,8 +135,6 @@ tree gfc_build_null_descriptor (tree); void gfc_conv_array_ref (gfc_se *, gfc_array_ref *, gfc_expr *, locus *); /* Translate a reference to a temporary array. */ void gfc_conv_tmp_array_ref (gfc_se * se); -/* Translate a reference to an array temporary. */ -void gfc_conv_tmp_ref (gfc_se *); /* Calculate the overall offset, including subreferences. */ void gfc_get_dataptr_offset (stmtblock_t*, tree, tree, tree, bool, gfc_expr*); @@ -149,8 +145,6 @@ void gfc_conv_expr_descriptor (gfc_se *, gfc_expr *); /* Convert an array for passing as an actual function parameter. */ void gfc_conv_array_parameter (gfc_se *, gfc_expr *, bool, const gfc_symbol *, const char *, tree *); -/* Evaluate and transpose a matrix expression. */ -void gfc_conv_array_transpose (gfc_se *, gfc_expr *); /* These work with both descriptors and descriptorless arrays. */ tree gfc_conv_array_data (tree); - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH,Fortran 5/7] Fortran: Delete unused decl in trans-stmt.h
On 25.10.21 00:30, Bernhard Reutner-Fischer via Gcc-patches wrote: From: Bernhard Reutner-Fischer gcc/fortran/ChangeLog: * trans-stmt.h (gfc_trans_deallocate_array): Delete. OK. Thanks! Tobias --- gcc/fortran/trans-stmt.h | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/fortran/trans-stmt.h b/gcc/fortran/trans-stmt.h index 1a24d9b4cdc..e824caf4d08 100644 --- a/gcc/fortran/trans-stmt.h +++ b/gcc/fortran/trans-stmt.h @@ -66,7 +66,6 @@ tree gfc_trans_sync_team (gfc_code *); tree gfc_trans_where (gfc_code *); tree gfc_trans_allocate (gfc_code *); tree gfc_trans_deallocate (gfc_code *); -tree gfc_trans_deallocate_array (tree); /* trans-openmp.c */ tree gfc_trans_omp_directive (gfc_code *); - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH,Fortran 6/7] Fortran: Delete unused decl in trans-types.h
On 25.10.21 00:30, Bernhard Reutner-Fischer via Fortran wrote: From: Bernhard Reutner-Fischer gcc/fortran/ChangeLog: * trans-types.h (gfc_convert_function_code): Delete. OK. Tobias --- gcc/fortran/trans-types.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/gcc/fortran/trans-types.h b/gcc/fortran/trans-types.h index 1b43503092b..3bc236cad0d 100644 --- a/gcc/fortran/trans-types.h +++ b/gcc/fortran/trans-types.h @@ -65,9 +65,6 @@ enum gfc_packed { PACKED_STATIC }; -/* be-function.c */ -void gfc_convert_function_code (gfc_namespace *); - /* trans-types.c */ void gfc_init_kinds (void); void gfc_init_types (void); - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[PATCH] target: [PR102941] Fix inline-asm flags with non-REG_P output
From: Andrew Pinski So the problem here is that arm_md_asm_adjust would just create a set directly to the output memory which is wrong. It needs to output to a temp register first and then do a move. OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. I have no way to test on arm even though this touches common code. PR target/102941 gcc/ChangeLog: * config/arm/aarch-common.c (arm_md_asm_adjust): Use a temp if !REG_P. gcc/testsuite/ChangeLog: * gcc.target/aarch64/asm-flag-7.c: New test. * gcc.target/arm/asm-flag-7.c: New test. --- gcc/config/arm/aarch-common.c | 2 +- gcc/testsuite/gcc.target/aarch64/asm-flag-7.c | 22 ++ gcc/testsuite/gcc.target/arm/asm-flag-7.c | 23 +++ 3 files changed, 46 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/asm-flag-7.c create mode 100644 gcc/testsuite/gcc.target/arm/asm-flag-7.c diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c index 67343fe4025..60b3516c1df 100644 --- a/gcc/config/arm/aarch-common.c +++ b/gcc/config/arm/aarch-common.c @@ -641,7 +641,7 @@ arm_md_asm_adjust (vec &outputs, vec & /*inputs*/, rtx x = gen_rtx_REG (mode, CC_REGNUM); x = gen_rtx_fmt_ee (code, word_mode, x, const0_rtx); - if (dest_mode == word_mode) + if (dest_mode == word_mode && REG_P (dest)) emit_insn (gen_rtx_SET (dest, x)); else { diff --git a/gcc/testsuite/gcc.target/aarch64/asm-flag-7.c b/gcc/testsuite/gcc.target/aarch64/asm-flag-7.c new file mode 100644 index 000..6c31b854b0b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/asm-flag-7.c @@ -0,0 +1,22 @@ +/* Test that "=@cc*" works with MEM_P RTX */ +/* PR target/102941 */ +/* { dg-do compile } */ +/* { dg-options "-O" } */ + +#ifndef __GCC_ASM_FLAG_OUTPUTS__ +#error "missing preprocessor define" +#endif +int test_cmpu_x; + +void f(long *); +long +test_cmpu_y() { + long le; + f(&le); + __asm__("cmp %" + "[x], %" + "[y]" + : "=@ccls"(le) + : [x] ""(test_cmpu_x), [y] ""(test_cmpu_y)); +return le; +} diff --git a/gcc/testsuite/gcc.target/arm/asm-flag-7.c b/gcc/testsuite/gcc.target/arm/asm-flag-7.c new file mode 100644 index 000..ac11da0a3a8 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/asm-flag-7.c @@ -0,0 +1,23 @@ +/* Test that "=@cc*" works with MEM_P RTX */ +/* PR target/102941 */ +/* { dg-do compile } */ +/* { dg-options "-O" } */ +/* { dg-skip-if "" { arm_thumb1 } } */ + +#ifndef __GCC_ASM_FLAG_OUTPUTS__ +#error "missing preprocessor define" +#endif +int test_cmpu_x; + +void f(long *); +long +test_cmpu_y() { + long le; + f(&le); + __asm__("cmp %" + "[x], %" + "[y]" + : "=@ccls"(le) + : [x] ""(test_cmpu_x), [y] ""(test_cmpu_y)); +return le; +} -- 2.17.1
Re: [PATCH] libcody: add mostlyclean Makefile target
On 10/25/21 18:10, Eric Gallager wrote: On Mon, Oct 25, 2021 at 7:35 AM Martin Liška wrote: Hello. The patch adds missing Makefile mostlyclean. Ready to be installed? Thanks, Martin Generally the way the various "*clean" targets are arranged, in order of cleanliness, from least clean to most clean, is: mostlyclean clean distclean maintainer-clean ...with each target depending on the previous one in the order. So thus, instead of mostlyclean depending on clean, it'd be the other way around, with clean depending on mostlyclean. See how the gcc/ subdirectory does it, for example. See the "Standard Targets for Users" section of the GNU Coding Standards: https://www.gnu.org/prep/standards/html_node/Standard-Targets.html#Standard-Targets Thank you for the explanation. There's updated version of the patch. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin PR other/102657 libcody/ChangeLog: * Makefile.in: Add mostlyclean Makefile target. --- libcody/Makefile.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/libcody/Makefile.in b/libcody/Makefile.in index b8b45a2e310..d8f1e8216d4 100644 --- a/libcody/Makefile.in +++ b/libcody/Makefile.in @@ -111,7 +111,7 @@ maintainer-clean:: distclean clean:: rm -f $(shell find $(srcdir) -name '*~') -.PHONY: all check clean distclean maintainer-clean +.PHONY: all check clean distclean maintainer-clean mostlyclean CXXFLAGS/ := -I$(srcdir) LIBCODY.O := buffer.o client.o fatal.o netclient.o netserver.o \ @@ -127,6 +127,8 @@ clean:: rm -f $(LIBCODY.O) $(LIBCODY.O:.o=.d) rm -f libcody.a +mostlyclean: clean + CXXFLAGS/fatal.cc = -DSRCDIR='"$(srcdir)"' fatal.o: Makefile revision -- 2.33.1 From fcad6039f910b49dfc4022d3b1eeb993025dabca Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Mon, 25 Oct 2021 16:32:55 +0200 Subject: [PATCH] libcody: add mostlyclean Makefile target PR other/102657 libcody/ChangeLog: * Makefile.in: Add mostlyclean Makefile target. --- libcody/Makefile.in | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/libcody/Makefile.in b/libcody/Makefile.in index b8b45a2e310..7eaf8ace8ce 100644 --- a/libcody/Makefile.in +++ b/libcody/Makefile.in @@ -60,6 +60,8 @@ vpath %.cc $(srcdir) all:: Makefile +mostlyclean:: + clean:: Makefile # FIXME: Delete @@ -77,7 +79,9 @@ revision.stamp: $(srcdir)/. revision: revision.stamp @cmp -s $< $@ || cp -f $< $@ -clean:: +mostlyclean:: + +clean:: mostlyclean rm -f revision.stamp revision distclean:: clean @@ -102,16 +106,18 @@ endif config.status: $(srcdir)/configure $(srcdir)/config.h.in if test -x $@; then ./$@ -recheck; else $< @configure_args@; fi +mostlyclean:: + +clean:: mostlyclean + rm -f $(shell find $(srcdir) -name '*~') + distclean:: clean rm -f config.h maintainer-clean:: distclean rm -f $(srcdir)/config.h.in -clean:: - rm -f $(shell find $(srcdir) -name '*~') - -.PHONY: all check clean distclean maintainer-clean +.PHONY: all check clean distclean maintainer-clean mostlyclean CXXFLAGS/ := -I$(srcdir) LIBCODY.O := buffer.o client.o fatal.o netclient.o netserver.o \ @@ -123,7 +129,9 @@ libcody.a: $(LIBCODY.O) $(AR) -cr $@ $^ $(RANLIB) $@ -clean:: +mostlyclean:: + +clean:: mostlyclean rm -f $(LIBCODY.O) $(LIBCODY.O:.o=.d) rm -f libcody.a -- 2.33.1
[PATCH] AVX512FP16: Optimize _Float16 reciprocal for div and sqrt
Hi, For _Float16 type, add insn and expanders to optimize x / y to x * rcp (y), and x / sqrt (y) to x * rsqrt (y). As Half float only have minor precision difference between div and mul * rcp, there is no need for Newton-Rhapson approximation. Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde. Ok for master? gcc/ChangeLog: * config/i386/i386.c (use_rsqrt_p): Add mode parameter, enable HFmode rsqrt without TARGET_SSE_MATH. (ix86_optab_supported_p): Refactor rint, adjust floor, ceil, btrunc condition to be restricted by -ftrapping-math, adjust use_rsqrt_p function call. * config/i386/i386.md (rcphf2): New define_insn. (rsqrthf2): Likewise. * config/i386/sse.md (div3): Change VF2H to VF2. (div3): New expander for HF mode. (rsqrt2): Likewise. (*avx512fp16_vmrcpv8hf2): New define_insn for rpad pass. (*avx512fp16_vmrsqrtv8hf2): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-recip-1.c: New test. * gcc.target/i386/avx512fp16-recip-2.c: Ditto. * gcc.target/i386/pr102464.c: Add -fno-trapping-math. --- gcc/config/i386/i386.c| 29 +++--- gcc/config/i386/i386.md | 44 - gcc/config/i386/sse.md| 63 +++- .../gcc.target/i386/avx512fp16-recip-1.c | 43 .../gcc.target/i386/avx512fp16-recip-2.c | 97 +++ gcc/testsuite/gcc.target/i386/pr102464.c | 2 +- 6 files changed, 258 insertions(+), 20 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-recip-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-recip-2.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 299e1ab2621..c5789365d3b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -18905,9 +18905,10 @@ ix86_vectorize_builtin_scatter (const_tree vectype, 1.0/sqrt. */ static bool -use_rsqrt_p () +use_rsqrt_p (machine_mode mode) { - return (TARGET_SSE && TARGET_SSE_MATH + return ((mode == HFmode + || (TARGET_SSE && TARGET_SSE_MATH)) && flag_finite_math_only && !flag_trapping_math && flag_unsafe_math_optimizations); @@ -23603,29 +23604,27 @@ ix86_optab_supported_p (int op, machine_mode mode1, machine_mode, return opt_type == OPTIMIZE_FOR_SPEED; case rint_optab: - if (mode1 == HFmode) - return true; - else if (SSE_FLOAT_MODE_P (mode1) - && TARGET_SSE_MATH - && !flag_trapping_math - && !TARGET_SSE4_1) + if (SSE_FLOAT_MODE_P (mode1) + && TARGET_SSE_MATH + && !flag_trapping_math + && !TARGET_SSE4_1 + && mode1 != HFmode) return opt_type == OPTIMIZE_FOR_SPEED; return true; case floor_optab: case ceil_optab: case btrunc_optab: - if (mode1 == HFmode) - return true; - else if (SSE_FLOAT_MODE_P (mode1) - && TARGET_SSE_MATH - && !flag_trapping_math - && TARGET_SSE4_1) + if (((SSE_FLOAT_MODE_P (mode1) + && TARGET_SSE_MATH + && TARGET_SSE4_1) + || mode1 == HFmode) + && !flag_trapping_math) return true; return opt_type == OPTIMIZE_FOR_SPEED; case rsqrt_optab: - return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (); + return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (mode1); default: return true; diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index e733a40fc90..11535df5425 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -8417,11 +8417,27 @@ (match_operand:XF 2 "register_operand")))] "TARGET_80387") +/* There is no more precision loss than Newton-Rhapson approximation + when using HFmode rcp/rsqrt, so do the transformation directly under + TARGET_RECIP_DIV and fast-math. */ (define_expand "divhf3" [(set (match_operand:HF 0 "register_operand") (div:HF (match_operand:HF 1 "register_operand") (match_operand:HF 2 "nonimmediate_operand")))] - "TARGET_AVX512FP16") + "TARGET_AVX512FP16" +{ + if (TARGET_RECIP_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) +{ + rtx op = gen_reg_rtx (HFmode); + operands[2] = force_reg (HFmode, operands[2]); + emit_insn (gen_rcphf2 (op, operands[2])); + emit_insn (gen_mulhf3 (operands[0], operands[1], op)); + DONE; +} +}) (define_expand "div3" [(set (match_operand:MODEF 0 "register_operand") @@ -16973,6 +16989,19 @@ ] (symbol_ref "true")))]) +(define_insn "rcphf2" + [(set (match_operand:HF 0 "register_operand" "=v,v") + (unspec:HF [(match_operand:HF 1 "nonimmediate_operand" "v,m")] + UNSPEC_RCP))] + "TARGET_AVX512FP16"
Re: [PATCH] PR fortran/102917 - PDT type parameters are not restricted to default integer
Dear Harald, dear all, On 24.10.21 21:00, Harald Anlauf via Fortran wrote: I've created PR 102917 for tracking this issue and packaged the attached patch. Regtested on x86_64-pc-linux-gnu. OK mainline? OK. I wonder whether a valid len/kind example should be added which uses such a PDT with non-default-kind integer. Tobias Gesendet: Freitag, 22. Oktober 2021 um 22:25 Uhr Von: "Steve Kargl" An: "Harald Anlauf" Cc: fort...@gcc.gnu.org Betreff: Re: PDT type parameters are not restricted to default integer On Fri, Oct 22, 2021 at 10:16:05PM +0200, Harald Anlauf wrote: Hi Steve, Am 22.10.21 um 21:35 schrieb Steve Kargl via Fortran: Here's an obvious quick fix. Please apply. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 6043e100fbb..e889bb44142 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -5619,14 +5619,6 @@ match_attr_spec (void) m = MATCH_ERROR; goto cleanup; } -if (current_ts.kind != gfc_default_integer_kind) - { -gfc_error ("Component with LEN attribute at %C must be " - "default integer kind (%d)", -gfc_default_integer_kind); -m = MATCH_ERROR; -goto cleanup; - } } else { I think you are right. We should always have allowed any integer kind. However, have you checked whether this change introduces regressions? If you don't, somebody else will. Please open a PR, then. It seems that pdt_4.f03 will fail with the above patch because it explicitly tests for this error message. That's the only failure in the testsuite. For the record, F2003, page 48, R435 type-param-def-stmt is INTEGER [ kind-selector ] , ... Each type parameter is itself of type integer. If its kind selector is omitted, the kind type parameter is default integer. Now that I think about and look, there is a nearby similar gcc_error() for KIND. This should be removed too. -- Steve - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH] PR fortran/102816 - [12 Regression] ICE in resolve_structure_cons, at fortran/resolve.c:1467
Dear Harald, hi all, On 22.10.21 21:36, Harald Anlauf via Fortran wrote: the recently introduced shape validation for array components in DT constructors did not properly deal with invalid code created by ingenious testers. Obvious solution: replace the gcc_assert by a suitable error message. Regarding the error message: before the shape validation, gfortran would emit the same error message twice referring to the same line, namely the bad declaration of the component. With the attached patch we get one error message for the bad declaration of the component, and one for the structure constructor referring to that DT component. One could easily change that and make the second message refer to the same as the declaration, giving two errors for the same line. Comments / opinions? Does not really matter in this case as long as there is one error for the invalid "integer :: a([2])". In other cases, it requires some careful weighting whether error should have the error location "use m" or where the symbol is used. (Here, it cannot occur as the module won't get generated and an error is already printed at the proper location.) Regtested on x86_64-pc-linux-gnu. OK? OK. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH] forwprop: Remove incorrect assertion [PR102897]
Hi Richi, on 2021/10/26 下午3:50, Richard Biener wrote: > On Tue, Oct 26, 2021 at 5:40 AM Kewen.Lin wrote: >> >> Hi, >> >> As PR102897 shows, there is one incorrect assertion in function >> simplify_permutation, which is based on the wrong assumption that >> all cases with op2_type == tgt_type are handled previously, the >> proposed fix is to remove this wrong assertion. >> >> Bootstrapped and regtested on x86_64-redhat-linux, >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. > > I think you need to enable optimization in the new testcase, gcc.dg/ only > runs -O0 by default which wouldn't trigger forwprop? Please verify the > testcase ICEs before the fix. > Thanks for catching! You are right, the optimization option is required, I just verified it and committed with the additional "-O1" as r12-4705. > Otherwise OK. > Thanks! BR, Kewen > Thanks, > Richard, > >> BR, >> Kewen >> - >> gcc/ChangeLog: >> >> PR tree-optimization/102897 >> * tree-ssa-forwprop.c (simplify_permutation): Remove a wrong >> assertion. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/pr102897.c: New test. >> --- >> gcc/testsuite/gcc.dg/pr102897.c | 16 >> gcc/tree-ssa-forwprop.c | 2 -- >> 2 files changed, 16 insertions(+), 2 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/pr102897.c >> >> diff --git a/gcc/testsuite/gcc.dg/pr102897.c >> b/gcc/testsuite/gcc.dg/pr102897.c >> new file mode 100644 >> index 000..d96b0e48ccc >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/pr102897.c >> @@ -0,0 +1,16 @@ >> +/* { dg-do compile } */ >> +/* Specify C99 to avoid the warning/error on compound literals. */ >> +/* { dg-options "-std=c99" } */ >> + >> +/* Verify that there is no ICE. */ >> + >> +typedef __attribute__((vector_size(8))) signed char int8x8_t; >> +typedef __attribute__((vector_size(8))) unsigned char uint8x8_t; >> + >> +int8x8_t fn1 (int8x8_t val20, char tmp) >> +{ >> + uint8x8_t __trans_tmp_3; >> + __trans_tmp_3 = (uint8x8_t){tmp}; >> + int8x8_t __a = (int8x8_t) __trans_tmp_3; >> + return __builtin_shuffle (__a, val20, (uint8x8_t){0}); >> +} >> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c >> index 5b30d4c1a76..a830bab78ba 100644 >> --- a/gcc/tree-ssa-forwprop.c >> +++ b/gcc/tree-ssa-forwprop.c >> @@ -2267,8 +2267,6 @@ simplify_permutation (gimple_stmt_iterator *gsi) >> if (!VECTOR_TYPE_P (tgt_type)) >> return 0; >> tree op2_type = TREE_TYPE (op2); >> - /* Should have folded this before. */ >> - gcc_assert (op2_type != tgt_type); >> >> /* Figure out the shrunk factor. */ >> poly_uint64 tgt_units = TYPE_VECTOR_SUBPARTS (tgt_type); >> -- >> 2.27.0
Re: [PATCH] Turn vect_create_addr_base_for_vector_ref offset into a byte offset
On 26 October 2021 10:58:50 CEST, Richard Biener via Gcc-patches wrote: >@@ -2006,6 +2011,7 @@ get_negative_load_store_type (vec_info *vinfo, > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >"negative step but alignment required.\n"); > return VMAT_ELEMENTWISE; >+ *poffset = 0; Would you want to move this to before the return? thanks, > } > > if (vls_type == VLS_STORE_INVARIANT) >@@ -2014,7 +2020,6 @@ get_negative_load_store_type (vec_info *vinfo, > dump_printf_loc (MSG_NOTE, vect_location, >"negative step with invariant source;" >" no permute needed.\n"); >- *poffset = -TYPE_VECTOR_SUBPARTS (vectype) + 1; > return VMAT_CONTIGUOUS_DOWN; > }
Re: [PATCH] Turn vect_create_addr_base_for_vector_ref offset into a byte offset
On Tue, 26 Oct 2021, Bernhard Reutner-Fischer wrote: > On 26 October 2021 10:58:50 CEST, Richard Biener via Gcc-patches > wrote: > > >@@ -2006,6 +2011,7 @@ get_negative_load_store_type (vec_info *vinfo, > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > "negative step but alignment required.\n"); > > return VMAT_ELEMENTWISE; > >+ *poffset = 0; > > Would you want to move this to before the return? Doh! Why don't we diagnose this ... :/ Richard.
[PATCH] Unify offset and byte_offset for vect_create_addr_base_for_vector_ref
Now that both are measured in bytes we can unify the two parameters. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-10-26 Richard Biener * tree-vectorizer.h (vect_create_addr_base_for_vector_ref): Remove byte_offset parameter. (vect_create_data_ref_ptr): Likewise. * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): Likewise. (vect_create_data_ref_ptr): Likewise. * tree-vect-stmts.c (vectorizable_store): Adjust. (vectorizable_load): Likewise. --- gcc/tree-vect-data-refs.c | 21 - gcc/tree-vect-stmts.c | 18 +- gcc/tree-vectorizer.h | 4 ++-- 3 files changed, 15 insertions(+), 28 deletions(-) diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index cda89ef9725..2ea8e983fe6 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -4772,8 +4772,6 @@ vect_duplicate_ssa_name_ptr_info (tree name, dr_vec_info *dr_info) is as follows: if LOOP=i_loop: &in (relative to i_loop) if LOOP=j_loop: &in+i*2B(relative to j_loop) - BYTE_OFFSET: Optional, defaulted to NULL. If supplied, it is added to the - initial address. Both OFFSET and BYTE_OFFSET are measured in bytes. Output: 1. Return an SSA_NAME whose value is the address of the memory location of @@ -4786,8 +4784,7 @@ vect_duplicate_ssa_name_ptr_info (tree name, dr_vec_info *dr_info) tree vect_create_addr_base_for_vector_ref (vec_info *vinfo, stmt_vec_info stmt_info, gimple_seq *new_stmt_list, - tree offset, - tree byte_offset) + tree offset) { dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info); struct data_reference *dr = dr_info->dr; @@ -4823,12 +4820,6 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, stmt_vec_info stmt_info, base_offset = fold_build2 (PLUS_EXPR, sizetype, base_offset, offset); } - if (byte_offset) -{ - byte_offset = fold_convert (sizetype, byte_offset); - base_offset = fold_build2 (PLUS_EXPR, sizetype, -base_offset, byte_offset); -} /* base + base_offset */ if (loop_vinfo) @@ -4882,10 +4873,6 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, stmt_vec_info stmt_info, 5. BSI: location where the new stmts are to be placed if there is no loop 6. ONLY_INIT: indicate if ap is to be updated in the loop, or remain pointing to the initial address. - 7. BYTE_OFFSET (optional, defaults to NULL): a byte offset to be added - to the initial address accessed by the data-ref in STMT_INFO. This is - similar to OFFSET, but OFFSET is counted in elements, while BYTE_OFFSET - in bytes. 8. IV_STEP (optional, defaults to NULL): the amount that should be added to the IV during each iteration of the loop. NULL says to move by one copy of AGGR_TYPE up or down, depending on the step of the @@ -4920,7 +4907,7 @@ vect_create_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info, tree aggr_type, class loop *at_loop, tree offset, tree *initial_address, gimple_stmt_iterator *gsi, gimple **ptr_incr, bool only_init, - tree byte_offset, tree iv_step) + tree iv_step) { const char *base_name; loop_vec_info loop_vinfo = dyn_cast (vinfo); @@ -5048,11 +5035,11 @@ vect_create_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info, /* (2) Calculate the initial address of the aggregate-pointer, and set the aggregate-pointer to point to it before the loop. */ - /* Create: (&(base[init_val]+offset+byte_offset) in the loop preheader. */ + /* Create: (&(base[init_val]+offset) in the loop preheader. */ new_temp = vect_create_addr_base_for_vector_ref (vinfo, stmt_info, &new_stmt_list, - offset, byte_offset); + offset); if (new_stmt_list) { if (pe) diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 3cd338a73eb..bf07e7a9495 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -8110,7 +8110,7 @@ vectorizable_store (vec_info *vinfo, = vect_create_data_ref_ptr (vinfo, first_stmt_info, aggr_type, simd_lane_access_p ? loop : NULL, offset, &dummy, gsi, &ptr_incr, - simd_lane_access_p, NULL_TREE, bump); + simd_lane_access_p, bump); } else { @@ -8518,7 +8518,6 @@ vect
Re: [PATCH] Move negative stride bias out of dr_misalignment
On 26 October 2021 11:19:44 CEST, Richard Biener via Gcc-patches wrote: >@@ -2010,6 +2010,7 @@ get_negative_load_store_type (vec_info *vinfo, > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >"negative step but alignment required.\n"); >+ *poffset = 0; > return VMAT_ELEMENTWISE; > *poffset = 0; > } I think you cannot really diagnose these, they would trigger a lot, in each early return, no? Or would we see that there are unreachable, non artificial stmts in the same block after a return? Somebody could experiment with diagnosing each and every DCEd stmt (!artificial, nondebug) but I would expect that hell breaks loose.. cheers,
Re: [PATCH] Fix loop split incorrect count and probability
On Tue, 26 Oct 2021, Xionghu Luo wrote: > > > On 2021/10/21 18:55, Richard Biener wrote: > > On Thu, 21 Oct 2021, Xionghu Luo wrote: > > > >> > >> > >> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote: > >>> > >>> > >>> On 2021/9/23 20:17, Richard Biener wrote: > On Wed, 22 Sep 2021, Xionghu Luo wrote: > > > > > > > On 2021/8/11 17:16, Richard Biener wrote: > >> On Wed, 11 Aug 2021, Xionghu Luo wrote: > >> > >>> > >>> > >>> On 2021/8/10 22:47, Richard Biener wrote: > On Mon, 9 Aug 2021, Xionghu Luo wrote: > > > Thanks, > > > > On 2021/8/6 19:46, Richard Biener wrote: > >> On Tue, 3 Aug 2021, Xionghu Luo wrote: > >> > >>> loop split condition is moved between loop1 and loop2, the split > >>> bb's > >>> count and probability should also be duplicated instead of (100% > >>> vs > >>> INV), > >>> secondly, the original loop1 and loop2 count need be propotional > >>> from > >>> the > >>> original loop. > >>> > >>> > >>> diff base/loop-cond-split-1.c.151t.lsplit > >>> patched/loop-cond-split-1.c.151t.lsplit: > >>> ... > >>> int prephitmp_16; > >>> int prephitmp_25; > >>> > >>>[local count: 118111600]: > >>> if (n_7(D) > 0) > >>> goto ; [89.00%] > >>> else > >>> goto ; [11.00%] > >>> > >>>[local count: 118111600]: > >>> return; > >>> > >>>[local count: 105119324]: > >>> pretmp_3 = ga; > >>> > >>> - [local count: 955630225]: > >>> + [local count: 315357973]: > >>> # i_13 = PHI > >>> # prephitmp_12 = PHI > >>> if (prephitmp_12 != 0) > >>> goto ; [33.00%] > >>> else > >>> goto ; [67.00%] > >>> > >>> - [local count: 315357972]: > >>> + [local count: 104068130]: > >>> _2 = do_something (); > >>> ga = _2; > >>> > >>> - [local count: 955630225]: > >>> + [local count: 315357973]: > >>> # prephitmp_5 = PHI > >>> i_10 = inc (i_13); > >>> if (n_7(D) > i_10) > >>> goto ; [89.00%] > >>> else > >>> goto ; [11.00%] > >>> > >>>[local count: 105119324]: > >>> goto ; [100.00%] > >>> > >>> - [local count: 850510901]: > >>> + [local count: 280668596]: > >>> if (prephitmp_12 != 0) > >>> -goto ; [100.00%] > >>> +goto ; [33.00%] > >>> else > >>> -goto ; [INV] > >>> +goto ; [67.00%] > >>> > >>> - [local count: 850510901]: > >>> + [local count: 280668596]: > >>> goto ; [100.00%] > >>> > >>> - [count: 0]: > >>> + [local count: 70429947]: > >>> # i_23 = PHI > >>> # prephitmp_25 = PHI > >>> > >>> - [local count: 955630225]: > >>> + [local count: 640272252]: > >>> # i_15 = PHI > >>> # prephitmp_16 = PHI > >>> i_22 = inc (i_15); > >>> if (n_7(D) > i_22) > >>> goto ; [89.00%] > >>> else > >>> goto ; [11.00%] > >>> > >>> - [local count: 850510901]: > >>> + [local count: 569842305]: > >>> goto ; [100.00%] > >>> > >>> } > >>> > >>> gcc/ChangeLog: > >>> > >>> * tree-ssa-loop-split.c (split_loop): Fix incorrect probability. > >>> (do_split_loop_on_cond): Likewise. > >>> --- > >>> gcc/tree-ssa-loop-split.c | 16 > >>> 1 file changed, 8 insertions(+), 8 deletions(-) > >>> > >>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c > >>> index 3a09bbc39e5..8e5a7ded0f7 100644 > >>> --- a/gcc/tree-ssa-loop-split.c > >>> +++ b/gcc/tree-ssa-loop-split.c > >>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1) > >>> basic_block cond_bb; > > > > if (!initial_true) > > - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > > + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > > + > > + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE > > + ? EDGE_SUCC (bbs[i], 0) > > + : EDGE_SUCC (bbs[i], 1); > > > >>> > >>> class loop *loop2 = loop_version (loop1, cond, &cond_bb, > >>> -
Re: [PATCH] PR fortran/102816 - [12 Regression] ICE in resolve_structure_cons, at fortran/resolve.c:1467
Hi Tobias, > In other cases, it requires some careful weighting whether error should > have the error location "use m" or where the symbol is used. (Here, it > cannot occur as the module won't get generated and an error is already > printed at the proper location.) I had though about this but couldn't come up with a way to create a module file from an invalid type definition. The closest is sth. which imports it, such as in program p type t integer :: a([2]) ! { dg-error "must be scalar" } end type type(t) :: x = t([3, 4]) ! { dg-error "Bad array spec of component" } interface subroutine foo (x) import t type(t) :: x type(t), parameter :: y = t([5, 6]) ! { dg-error "Bad array spec of component" } end subroutine foo end interface end However (= fortunately) this works just fine. > > Regtested on x86_64-pc-linux-gnu. OK? > > OK. Will commit tonight. Thanks for the review! Harald > Tobias > > - > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 > München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas > Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht > München, HRB 106955 >
Re: [PATCH] PR fortran/102917 - PDT type parameters are not restricted to default integer
Hi Tobias, > OK. I wonder whether a valid len/kind example should be added which uses > such a PDT with non-default-kind integer. the testcase pdt_4.f03 did actually check for the error message that gets removed and had to be adjusted. Removing just the dg-error does that job :-) Thanks, Harald > Tobias > > >> Gesendet: Freitag, 22. Oktober 2021 um 22:25 Uhr > >> Von: "Steve Kargl" > >> An: "Harald Anlauf" > >> Cc: fort...@gcc.gnu.org > >> Betreff: Re: PDT type parameters are not restricted to default integer > >> > >> On Fri, Oct 22, 2021 at 10:16:05PM +0200, Harald Anlauf wrote: > >>> Hi Steve, > >>> > >>> Am 22.10.21 um 21:35 schrieb Steve Kargl via Fortran: > Here's an obvious quick fix. Please apply. > > > diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c > index 6043e100fbb..e889bb44142 100644 > --- a/gcc/fortran/decl.c > +++ b/gcc/fortran/decl.c > @@ -5619,14 +5619,6 @@ match_attr_spec (void) > m = MATCH_ERROR; > goto cleanup; > } > -if (current_ts.kind != gfc_default_integer_kind) > - { > -gfc_error ("Component with LEN attribute at %C must be " > - "default integer kind (%d)", > -gfc_default_integer_kind); > -m = MATCH_ERROR; > -goto cleanup; > - } > } > else > { > >>> I think you are right. We should always have allowed any integer kind. > >>> > >>> However, have you checked whether this change introduces regressions? > >>> If you don't, somebody else will. Please open a PR, then. > >>> > >> It seems that pdt_4.f03 will fail with the above patch because > >> it explicitly tests for this error message. That's the only > >> failure in the testsuite. For the record, F2003, page 48, > >> > >> R435 type-param-def-stmt is INTEGER [ kind-selector ] , ... > >> > >> Each type parameter is itself of type integer. If its kind selector > >> is omitted, the kind type parameter is default integer. > >> > >> Now that I think about and look, there is a nearby similar gcc_error() > >> for KIND. This should be removed too. > >> > >> -- > >> Steve > >> > - > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 > München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas > Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht > München, HRB 106955 >
RE: [PATCH]AArch64 Lower intrinsics shift to GIMPLE when possible.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-builtins.c (aarch64_general_gimple_fold_builtin): Add ashl, sshl, ushl, ashr, ashr_simd, lshr, lshr_simd. * config/aarch64/aarch64-simd-builtins.def (lshr): Use USHIFTIMM. * config/aarch64/arm_neon.h (vshr_n_u8, vshr_n_u16, vshr_n_u32, vshrq_n_u8, vshrq_n_u16, vshrq_n_u32, vshrq_n_u64): Fix type hack. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-1.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-2.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-3.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-4.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-5.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-6.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-7.c: New test. * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-8.c: New test. * gcc.target/aarch64/signbit-2.c: New test. --- inline copy of patch --- diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index f6b41d9c200d6300dee65ba60ae94488231a8a38..c362b29186cfc0bf0d39c08c314cfd6a99124cb2 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -2394,6 +2394,54 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt) 1, args[0]); gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt)); break; + BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, NONE) + if (TREE_CODE (args[1]) == INTEGER_CST + && wi::ltu_p (wi::to_wide (args[1]), element_precision (args[0]))) + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), + LSHIFT_EXPR, args[0], args[1]); + break; + BUILTIN_VSDQ_I_DI (BINOP, sshl, 0, NONE) + BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0, NONE) + { + tree cst = args[1]; + tree ctype = TREE_TYPE (cst); + /* Left shifts can be both scalar or vector, e.g. uint64x1_t is +treated as a scalar type not a vector one. */ + if ((cst = uniform_integer_cst_p (cst)) != NULL_TREE) + { + wide_int wcst = wi::to_wide (cst); + tree unit_ty = TREE_TYPE (cst); + + wide_int abs_cst = wi::abs (wcst); + if (wi::geu_p (abs_cst, element_precision (args[0]))) + break; + + if (wi::neg_p (wcst, TYPE_SIGN (ctype))) + { + tree final_cst; + final_cst = wide_int_to_tree (unit_ty, abs_cst); + if (TREE_CODE (cst) != INTEGER_CST) + final_cst = build_uniform_cst (ctype, final_cst); + + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), + RSHIFT_EXPR, args[0], + final_cst); + } + else + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), + LSHIFT_EXPR, args[0], args[1]); + } + } + break; + BUILTIN_VDQ_I (SHIFTIMM, ashr, 3, NONE) + VAR1 (SHIFTIMM, ashr_simd, 0, NONE, di) + BUILTIN_VDQ_I (USHIFTIMM, lshr, 3, NONE) + VAR1 (USHIFTIMM, lshr_simd, 0, NONE, di) + if (TREE_CODE (args[1]) == INTEGER_CST + && wi::ltu_p (wi::to_wide (args[1]), element_precision (args[0]))) + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), + RSHIFT_EXPR, args[0], args[1]); + break; BUILTIN_GPF (BINOP, fmulx, 0, ALL) { gcc_assert (nargs == 2); diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 402453aa9bba5949da43c984c4603196b1efd092..bbe0a4a3c4aea4187e7b1a9f10ab60e79df7b138 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -409,7 +409,7 @@ BUILTIN_VDQ_I (SHIFTIMM, ashr, 3, NONE) VAR1 (SHIFTIMM, ashr_simd, 0, NONE, di) - BUILTIN_VDQ_I (SHIFTIMM, lshr, 3, NONE) + BUILTIN_VDQ_I (USHIFTIMM, lshr, 3, NONE) VAR1 (USHIFTIMM, lshr_simd, 0, NONE, di) /* Implemented by aarch64_shr_n. */ BUILTIN_VSDQ_I_DI (SHIFTIMM, srshr_n, 0, NONE) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 635a223b59eb0f64304351939d11b697af81..c4ef5f7f7e3658c830893931ef5a874842410e10 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -27400,21 +27400,21 @@ __extension__ extern __inline uint8x8_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vshr_n_u8 (uint8x8_t __a, const int __b) {
[PATCH v3] bpf: Add support to eBPF atomic instructions
Hello people, This patch v3 to add support for atomics instructions in eBPF target, the rtl for eBPF atomics was splitted in a new file: atomic.md, Thanks Andrew for the comments, Kind Regards, Guillermo eBPF add support for atomic instructions, the following gcc built-in functions are implemented for bpf target using both: 32 and 64 bits data types: __atomic_fetch_add __atomic_fetch_sub __atomic_fetch_and __atomic_fetch_xor __atomic_fetch_or __atomic_exchange __atomic_compare_exchange_n Also calls to __atomic__fetch are fully supported. New define instructions were added to bpf.md along with its tests for gcc atomic built-in functions. In order to restrict/enable the use of `add + fetch' and the rest of atomic instructions the -m[no-]atomics was added. Those instructions are fully compliant with: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html https://www.kernel.org/doc/Documentation/networking/filter.rst This support depends of two previous submissions in CGEN and binutils-gdb projects: https://sourceware.org/pipermail/cgen/2021q3/002774.html https://sourceware.org/pipermail/binutils/2021-August/117798.html gcc/ * config/bpf/bpf.md: Add defines for atomic instructions. * config/bpf/bpf.c: Enable atomics by default in ISA v3. * config/bpf/bpf.h: Pass option to gas to disabled use of atomics (-mno-atomics). * config/bpf/bpf.opt: Add -m[no-]atomics option. * doc/invoke.texi: Add documentation for -m[no-a]tomics. gcc/testsuite/ * gcc.target/bpf/atomic-compare-exchange.c: New test. * gcc.target/bpf/atomic-exchange.c: Likewise. * gcc.target/bpf/atomic-add.c: Likewise. * gcc.target/bpf/atomic-and.c: Likewise. * gcc.target/bpf/atomic-or.c: Likewise. * gcc.target/bpf/atomic-sub.c: Likewise. * gcc.target/bpf/atomic-xor.c: Likewise. * gcc.target/bpf/atomics-disabled.c: Likewise. * gcc.target/bpf/ftest-mcpuv3-atomics.c: Likewise. * gcc.target/bpf/ftest-no-atomics-add.c: Likewise. --- gcc/ChangeLog | 8 + gcc/config/bpf/atomic.md | 160 ++ gcc/config/bpf/bpf.c | 2 + gcc/config/bpf/bpf.h | 12 +- gcc/config/bpf/bpf.md | 24 +-- gcc/config/bpf/bpf.opt| 4 + gcc/config/bpf/constraints.md | 3 + gcc/doc/invoke.texi | 12 +- gcc/testsuite/ChangeLog | 12 ++ .../gcc.target/bpf/atomic-add-fetch.c | 29 gcc/testsuite/gcc.target/bpf/atomic-and.c | 25 +++ .../gcc.target/bpf/atomic-compare-exchange.c | 28 +++ .../gcc.target/bpf/atomic-exchange.c | 19 +++ gcc/testsuite/gcc.target/bpf/atomic-or.c | 25 +++ gcc/testsuite/gcc.target/bpf/atomic-sub.c | 27 +++ gcc/testsuite/gcc.target/bpf/atomic-xor.c | 25 +++ .../gcc.target/bpf/atomics-disabled.c | 28 +++ .../gcc.target/bpf/ftest-mcpuv3-atomics.c | 36 .../gcc.target/bpf/ftest-no-atomics-add.c | 23 +++ 19 files changed, 482 insertions(+), 20 deletions(-) create mode 100644 gcc/config/bpf/atomic.md create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-add-fetch.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-and.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-compare-exchange.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-exchange.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-or.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-sub.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-xor.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomics-disabled.c create mode 100644 gcc/testsuite/gcc.target/bpf/ftest-mcpuv3-atomics.c create mode 100644 gcc/testsuite/gcc.target/bpf/ftest-no-atomics-add.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 115f32e5061..782d33908ba 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2021-10-25 Guillermo E. Martinez + * config/bpf/bpf.md: Add defines for atomic instructions. + * config/bpf/bpf.c: Enable atomics by default in ISA v3. + * config/bpf/bpf.h: Pass option to gas to disable use of + atomics (-mno-atomics). + * config/bpf/bpf.opt: Add -m[no-]atomics option. + * doc/invoke.texi: Add documentation for -m[no-]atomics. + 2021-10-20 Alex Coplan * calls.c (initialize_argument_information): Remove some dead diff --git a/gcc/config/bpf/atomic.md b/gcc/config/bpf/atomic.md new file mode 100644 index 000..9afc38f01ae --- /dev/null +++ b/gcc/config/bpf/atomic.md @@ -0,0 +1,160 @@ +;; Machine description for eBPF atomic Instructions. +;; Copyright (C) 2021 Free Software Foundation, Inc. + +;; This file is part of GCC. + +;; GCC is free software; you can red
Re: [PATCH] Move negative stride bias out of dr_misalignment
On Tue, 26 Oct 2021, Bernhard Reutner-Fischer wrote: > On 26 October 2021 11:19:44 CEST, Richard Biener via Gcc-patches > wrote: > > > > >@@ -2010,6 +2010,7 @@ get_negative_load_store_type (vec_info *vinfo, > > if (dump_enabled_p ()) > > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > "negative step but alignment required.\n"); > >+ *poffset = 0; > > return VMAT_ELEMENTWISE; > > *poffset = 0; > > } > > I think you cannot really diagnose these, they would trigger a lot, in each > early return, no? > Or would we see that there are unreachable, non artificial stmts in the same > block after a return? > Somebody could experiment with diagnosing each and every DCEd stmt > (!artificial, nondebug) but I would expect that hell breaks loose.. > cheers, [I fixed the issue before pushing this patch btw] I agree in general, where I would diagnose this is when we build the CFG I would diagnose unreachable blocks - the above does not have any path to the second *poffset store. Like with the prototype patch below we warn for int *p; int foo() { return 0; *p = 0; } > ./cc1 -quiet t.c t.c: In function 'foo': t.c:5:3: warning: statement is not reachable 5 | *p = 0; | ^~ diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index b3a27bcd17c..2de661ba35d 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -374,6 +374,22 @@ execute_build_cfg (void) fprintf (dump_file, "Scope blocks:\n"); dump_scope_blocks (dump_file, dump_flags); } + + find_unreachable_blocks (); + basic_block bb; + FOR_EACH_BB_FN (bb, cfun) +if (!(bb->flags & BB_REACHABLE)) + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + { + if (gimple_location (gsi_stmt (gsi)) != UNKNOWN_LOCATION) + warning_at (gimple_location (gsi_stmt (gsi)), 0, + "statement is not reachable"); + /* ??? Mark blocks reachable from here. And even better make +sure to process entries to unreachable regions first. */ + break; + } + cleanup_tree_cfg (); bb_to_omp_idx.release ();
RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values
> -Original Message- > From: Richard Biener > Sent: Tuesday, October 26, 2021 9:46 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd > > Subject: RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns > on signed values > > On Tue, 26 Oct 2021, Tamar Christina wrote: > > > > -Original Message- > > > From: Richard Biener > > > Sent: Tuesday, October 26, 2021 9:26 AM > > > To: Tamar Christina > > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd > > > > > > Subject: RE: [PATCH] middle-end: fix de-optimizations with bitclear > > > patterns on signed values > > > > > > On Mon, 25 Oct 2021, Tamar Christina wrote: > > > > > > > > -Original Message- > > > > > From: Richard Biener > > > > > Sent: Friday, October 15, 2021 12:31 PM > > > > > To: Tamar Christina > > > > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; > > > > > nd > > > > > Subject: Re: [PATCH] middle-end: fix de-optimizations with > > > > > bitclear patterns on signed values > > > > > > > > > > On Fri, 15 Oct 2021, Tamar Christina wrote: > > > > > > > > > > > Hi All, > > > > > > > > > > > > During testing after rebasing to commit I noticed a failing > > > > > > testcase with the bitmask compare patch. > > > > > > > > > > > > Consider the following C++ testcase: > > > > > > > > > > > > #include > > > > > > > > > > > > #define A __attribute__((noipa)) A bool f5 (double i, double > > > > > > j) { auto c = i <=> j; return c >= 0; } > > > > > > > > > > > > This turns into a comparison against chars, on systems where > > > > > > chars are signed the pattern inserts an unsigned convert such > > > > > > that it's able to do the transformation. > > > > > > > > > > > > i.e.: > > > > > > > > > > > > # RANGE [-1, 2] > > > > > > # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)> > > > > > > # RANGE ~[3, 254] > > > > > > _11 = (unsigned char) c$_M_value_22; > > > > > > _19 = _11 <= 1; > > > > > > # .MEM_24 = VDEF <.MEM_6(D)> > > > > > > D.10434 ={v} {CLOBBER}; > > > > > > # .MEM_14 = VDEF <.MEM_24> > > > > > > D.10407 ={v} {CLOBBER}; > > > > > > # VUSE <.MEM_14> > > > > > > return _19; > > > > > > > > > > > > instead of: > > > > > > > > > > > > # RANGE [-1, 2] > > > > > > # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)> > > > > > > # RANGE [-2, 2] > > > > > > _3 = c$_M_value_5 & -2; > > > > > > _19 = _3 == 0; > > > > > > # .MEM_24 = VDEF <.MEM_6(D)> > > > > > > D.10440 ={v} {CLOBBER}; > > > > > > # .MEM_14 = VDEF <.MEM_24> > > > > > > D.10413 ={v} {CLOBBER}; > > > > > > # VUSE <.MEM_14> > > > > > > return _19; > > > > > > > > > > > > This causes much worse codegen under -ffast-math due to phiops > > > > > > no longer recognizing the pattern. It turns out that phiopts > > > > > > spaceship_replacement is looking for the exact form that was > > > > > > just > > > changed. > > > > > > > > > > > > Trying to get it to recognize the new form is not trivial as > > > > > > the transformation doesn't look to work when the thing it's > > > > > > pointing to is itself > > > > > a phi-node. > > > > > > > > > > What do you mean? Where it handles the BIT_AND it could also > > > > > handle the conversion, no? The later handling would probably > > > > > more explicitely need to distinguish between the BIT_AND and the > > > > > conversion > > > forms. > > > > > > > > Looks like I misunderstood the code, it was looking at the uses > > > > not the defs of the value. > > > > > > > > --- inline copy of patch --- > > > > > > > > The comments seems to suggest this code only checks for (res & ~1) > > > > == > > > > 0 but the implementation seems to suggest it's broader. > > > > > > > > As such I added a case to check to see if the value comparison we > > > > found is a type cast. and strips away the type cast and continues. > > > > > > > > In match.pd the typecasts are only added for signed comparisons to > > > > == > > > > 0 and != 0 which are then rewritten into comparisons with 1. > > > > > > > > As such I only check for 1 and LE and GT, which is what match.pd > > > > would have rewritten it to. > > > > > > > > This fixes the regression but this is not code I 100% understand, > > > > since I don't really know the semantics of the spaceship operator > > > > so would appreciate an extra look. > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > x86_64-pc-linux-gnu and no regressions. > > > > > > > > Ok for master? > > > > > > Please add a testcase. I hope Jakub can review the > > > spaceship_replacement patch since he's the one familiar with the code. > > > > There's already a bunch of testcases that test the various variants: > > gcc/testsuite/g++.dg/opt/pr94589-1.C > > and gcc/testsuite/g++.dg/opt/pr94589-2.C which is how I noticed the > > failure. However they only trigger the failure on signed chars. I > > tried forcing `-fsigned-char` to see if I can make a general testcase but > > this > seems to have not done it. > > > > Is there another f
Re: [PATCH] Fix loop split incorrect count and probability
> On Tue, 26 Oct 2021, Xionghu Luo wrote: > > > > > > > On 2021/10/21 18:55, Richard Biener wrote: > > > On Thu, 21 Oct 2021, Xionghu Luo wrote: > > > > > >> > > >> > > >> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote: > > >>> > > >>> > > >>> On 2021/9/23 20:17, Richard Biener wrote: > > On Wed, 22 Sep 2021, Xionghu Luo wrote: > > > > > > > > > > > On 2021/8/11 17:16, Richard Biener wrote: > > >> On Wed, 11 Aug 2021, Xionghu Luo wrote: > > >> > > >>> > > >>> > > >>> On 2021/8/10 22:47, Richard Biener wrote: > > On Mon, 9 Aug 2021, Xionghu Luo wrote: > > > > > Thanks, > > > > > > On 2021/8/6 19:46, Richard Biener wrote: > > >> On Tue, 3 Aug 2021, Xionghu Luo wrote: > > >> > > >>> loop split condition is moved between loop1 and loop2, the > > >>> split bb's > > >>> count and probability should also be duplicated instead of > > >>> (100% vs > > >>> INV), > > >>> secondly, the original loop1 and loop2 count need be > > >>> propotional from > > >>> the > > >>> original loop. > > >>> > > >>> > > >>> diff base/loop-cond-split-1.c.151t.lsplit > > >>> patched/loop-cond-split-1.c.151t.lsplit: > > >>> ... > > >>> int prephitmp_16; > > >>> int prephitmp_25; > > >>> > > >>>[local count: 118111600]: > > >>> if (n_7(D) > 0) > > >>> goto ; [89.00%] > > >>> else > > >>> goto ; [11.00%] > > >>> > > >>>[local count: 118111600]: > > >>> return; > > >>> > > >>>[local count: 105119324]: > > >>> pretmp_3 = ga; > > >>> > > >>> - [local count: 955630225]: > > >>> + [local count: 315357973]: > > >>> # i_13 = PHI > > >>> # prephitmp_12 = PHI > > >>> if (prephitmp_12 != 0) > > >>> goto ; [33.00%] > > >>> else > > >>> goto ; [67.00%] > > >>> > > >>> - [local count: 315357972]: > > >>> + [local count: 104068130]: > > >>> _2 = do_something (); > > >>> ga = _2; > > >>> > > >>> - [local count: 955630225]: > > >>> + [local count: 315357973]: > > >>> # prephitmp_5 = PHI > > >>> i_10 = inc (i_13); > > >>> if (n_7(D) > i_10) > > >>> goto ; [89.00%] > > >>> else > > >>> goto ; [11.00%] > > >>> > > >>>[local count: 105119324]: > > >>> goto ; [100.00%] > > >>> > > >>> - [local count: 850510901]: > > >>> + [local count: 280668596]: > > >>> if (prephitmp_12 != 0) > > >>> -goto ; [100.00%] > > >>> +goto ; [33.00%] > > >>> else > > >>> -goto ; [INV] > > >>> +goto ; [67.00%] > > >>> > > >>> - [local count: 850510901]: > > >>> + [local count: 280668596]: > > >>> goto ; [100.00%] > > >>> > > >>> - [count: 0]: > > >>> + [local count: 70429947]: > > >>> # i_23 = PHI > > >>> # prephitmp_25 = PHI > > >>> > > >>> - [local count: 955630225]: > > >>> + [local count: 640272252]: > > >>> # i_15 = PHI > > >>> # prephitmp_16 = PHI > > >>> i_22 = inc (i_15); > > >>> if (n_7(D) > i_22) > > >>> goto ; [89.00%] > > >>> else > > >>> goto ; [11.00%] > > >>> > > >>> - [local count: 850510901]: > > >>> + [local count: 569842305]: > > >>> goto ; [100.00%] > > >>> > > >>> } > > >>> > > >>> gcc/ChangeLog: > > >>> > > >>> * tree-ssa-loop-split.c (split_loop): Fix incorrect > > >>> probability. > > >>> (do_split_loop_on_cond): Likewise. > > >>> --- > > >>> gcc/tree-ssa-loop-split.c | 16 > > >>> 1 file changed, 8 insertions(+), 8 deletions(-) > > >>> > > >>> diff --git a/gcc/tree-ssa-loop-split.c > > >>> b/gcc/tree-ssa-loop-split.c > > >>> index 3a09bbc39e5..8e5a7ded0f7 100644 > > >>> --- a/gcc/tree-ssa-loop-split.c > > >>> +++ b/gcc/tree-ssa-loop-split.c > > >>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1) > > >>> basic_block cond_bb; > > > > > > if (!initial_true) > > > - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > > > + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > > > + > > > + e
Re: [PATCH] Move negative stride bias out of dr_misalignment
On Tue, 26 Oct 2021 14:10:05 +0200 (CEST) Richard Biener wrote: > I agree in general, where I would diagnose this is when we build the > CFG I would diagnose unreachable blocks - the above does not have > any path to the second *poffset store. Like with the prototype patch > below we warn for > > int *p; > int foo() > { > return 0; > *p = 0; > } mhm. I thought we have -Wunreachable-code but it seems that's gone thanks,
[PATCH v3] bpf: Add support to eBPF atomic instructions
Apologies, I forgot update the ChangeLog(s) Kind Regards, Guillermo eBPF add support for atomic instructions, the following gcc built-in functions are implemented for bpf target using both: 32 and 64 bits data types: __atomic_fetch_add __atomic_fetch_sub __atomic_fetch_and __atomic_fetch_xor __atomic_fetch_or __atomic_exchange __atomic_compare_exchange_n Also calls to __atomic__fetch are fully supported. New define instructions were added to bpf.md along with its tests for gcc atomic built-in functions. In order to restrict/enable the use of `add + fetch' and the rest of atomic instructions the -m[no-]atomics was added. Those instructions are fully compliant with: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html https://www.kernel.org/doc/Documentation/networking/filter.rst This support depends of two previous submissions in CGEN and binutils-gdb projects: https://sourceware.org/pipermail/cgen/2021q3/002774.html https://sourceware.org/pipermail/binutils/2021-August/117798.html gcc/ * config/bpf/atomic.md: New file for atomic instructions. * config/bpf/bpf.md: Move rtl atomic add definition to atomic.md * config/bpf/bpf.c: Enable atomics by default in ISA v3. * config/bpf/bpf.h: Pass option to gas to disabled use of atomics (-mno-atomics). * config/bpf/bpf.opt: Add -m[no-]atomics option. * doc/invoke.texi: Add documentation for -m[no-a]tomics. gcc/testsuite/ * gcc.target/bpf/atomic-compare-exchange.c: New test. * gcc.target/bpf/atomic-exchange.c: Likewise. * gcc.target/bpf/atomic-add.c: Likewise. * gcc.target/bpf/atomic-and.c: Likewise. * gcc.target/bpf/atomic-or.c: Likewise. * gcc.target/bpf/atomic-sub.c: Likewise. * gcc.target/bpf/atomic-xor.c: Likewise. * gcc.target/bpf/atomics-disabled.c: Likewise. * gcc.target/bpf/ftest-mcpuv3-atomics.c: Likewise. * gcc.target/bpf/ftest-no-atomics-add.c: Likewise. --- gcc/ChangeLog | 9 + gcc/config/bpf/atomic.md | 169 ++ gcc/config/bpf/bpf.c | 2 + gcc/config/bpf/bpf.h | 12 +- gcc/config/bpf/bpf.md | 20 +-- gcc/config/bpf/bpf.opt| 4 + gcc/config/bpf/constraints.md | 3 + gcc/doc/invoke.texi | 12 +- gcc/testsuite/ChangeLog | 12 ++ .../gcc.target/bpf/atomic-add-fetch.c | 29 +++ gcc/testsuite/gcc.target/bpf/atomic-and.c | 25 +++ .../gcc.target/bpf/atomic-compare-exchange.c | 28 +++ .../gcc.target/bpf/atomic-exchange.c | 19 ++ gcc/testsuite/gcc.target/bpf/atomic-or.c | 25 +++ gcc/testsuite/gcc.target/bpf/atomic-sub.c | 27 +++ gcc/testsuite/gcc.target/bpf/atomic-xor.c | 25 +++ .../gcc.target/bpf/atomics-disabled.c | 28 +++ .../gcc.target/bpf/ftest-mcpuv3-atomics.c | 36 .../gcc.target/bpf/ftest-no-atomics-add.c | 23 +++ 19 files changed, 487 insertions(+), 21 deletions(-) create mode 100644 gcc/config/bpf/atomic.md create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-add-fetch.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-and.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-compare-exchange.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-exchange.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-or.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-sub.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-xor.c create mode 100644 gcc/testsuite/gcc.target/bpf/atomics-disabled.c create mode 100644 gcc/testsuite/gcc.target/bpf/ftest-mcpuv3-atomics.c create mode 100644 gcc/testsuite/gcc.target/bpf/ftest-no-atomics-add.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 115f32e5061..af52f9fcc03 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2021-10-26 Guillermo E. Martinez + * config/bpf/atomic.md: New file for atomic instructions. + * config/bpf/bpf.md: Move rtl atomic add definition to atomic.md + * config/bpf/bpf.c: Enable atomics by default in ISA v3. + * config/bpf/bpf.h: Pass option to gas to disable use of + atomics (-mno-atomics). + * config/bpf/bpf.opt: Add -m[no-]atomics option. + * doc/invoke.texi: Add documentation for -m[no-]atomics. + 2021-10-20 Alex Coplan * calls.c (initialize_argument_information): Remove some dead diff --git a/gcc/config/bpf/atomic.md b/gcc/config/bpf/atomic.md new file mode 100644 index 000..f8f9a6b96c1 --- /dev/null +++ b/gcc/config/bpf/atomic.md @@ -0,0 +1,169 @@ +;; Machine description for eBPF atomic Instructions. +;; Copyright (C) 2021 Free Software Foundation, Inc. + +;; This file is part of GCC. + +;; GCC is free software; you ca
[PATCH 1/4] Makefile.in: Ensure build CPP is used for build targets
During cross compiling, CPP is being set to the target compiler even for build targets. As an example, when building a cross compiler targetting mingw, the config.log for libiberty in build.x86_64-pokysdk-mingw32.i586-poky-linux/build-x86_64-linux/libiberty/config.log shows: configure:3786: checking how to run the C preprocessor configure:3856: result: x86_64-pokysdk-mingw32-gcc -E --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 configure:3876: x86_64-pokysdk-mingw32-gcc -E --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 conftest.c configure:3876: $? = 0 This is libiberty being built for the build environment, not the target one (i.e. in build-x86_64-linux). As such it should be using the build environment's gcc and not the target one. In the mingw case the system headers are quite different leading to build failures related to not being able to include a process.h file for pem-unix.c. Fix this by using CC_FOR_BUILD instead of CC. Ultimately a CPP_FOR_BUILD could be defined but CC_FOR_BUILD seems at least more correct and is a simpler fix. 2021-10-26 Richard Purdie Changelog: * Makefile.in: Use CC_FOR_BUILD as CPP for build targets to avoid host/target contamination Signed-off-by: Richard Purdie --- Makefile.in | 1 + 1 file changed, 1 insertion(+) diff --git a/Makefile.in b/Makefile.in index 34b2d89660d..f4815d7e75f 100644 --- a/Makefile.in +++ b/Makefile.in @@ -152,6 +152,7 @@ BUILD_EXPORTS = \ AR="$(AR_FOR_BUILD)"; export AR; \ AS="$(AS_FOR_BUILD)"; export AS; \ CC="$(CC_FOR_BUILD)"; export CC; \ + CPP="$(CC_FOR_BUILD) -E"; export CPP; \ CFLAGS="$(CFLAGS_FOR_BUILD)"; export CFLAGS; \ CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \ CXX="$(CXX_FOR_BUILD)"; export CXX; \ -- 2.32.0
[PATCH 2/4] gcc: Fix "argument list too long" from install-plugins
When building in longer build paths (200+ characters), the "echo $(PLUGIN_HEADERS)" from the install-plugins target would cause an "argument list too long error" on some systems. Avoid this by calling make's sort function on the list which removes duplicates and stops the overflow from reaching the echo command. The original sort is left to handle the the .h and .def files. 2021-10-26 Richard Purdie gcc/ChangeLog: * Makefile.in: Fix "argument list too long" from install-plugins Signed-off-by: Richard Purdie --- gcc/Makefile.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 658093c11c0..89482c6dd4e 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3685,7 +3685,7 @@ install-plugin: installdirs lang.install-plugin s-header-vars install-gengtype # We keep the directory structure for files in config, common/config or # c-family and .def files. All other files are flattened to a single directory. $(mkinstalldirs) $(DESTDIR)$(plugin_includedir) - headers=`echo $(PLUGIN_HEADERS) $$(cd $(srcdir); echo *.h *.def) | tr ' ' '\012' | sort -u`; \ + headers=`echo $(sort $(PLUGIN_HEADERS)) $$(cd $(srcdir); echo *.h *.def) | tr ' ' '\012' | sort -u`; \ srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*|]/&/g'`; \ for file in $$headers; do \ if [ -f $$file ] ; then \ -- 2.32.0
[PATCH 4/4] gcc/nios2: Define the musl linker
Add a definition of the musl linker used on the nios2 platform. 2021-10-26 Richard Purdie gcc/ChangeLog: * config/nios2/linux.h (MUSL_DYNAMIC_LINKER): Add musl linker Signed-off-by: Richard Purdie --- gcc/config/nios2/linux.h | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/config/nios2/linux.h b/gcc/config/nios2/linux.h index 08edf1521f6..15696d86241 100644 --- a/gcc/config/nios2/linux.h +++ b/gcc/config/nios2/linux.h @@ -30,6 +30,7 @@ #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{pthread:-D_REENTRANT}" #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-nios2.so.1" +#define MUSL_DYNAMIC_LINKER "/lib/ld-musl-nios2.so.1" #undef LINK_SPEC #define LINK_SPEC LINK_SPEC_ENDIAN \ -- 2.32.0
[PATCH 3/4] gcc: Add --nostdlib++ option
OpenEmbedded/Yocto Project builds libgcc and the other gcc runtime libraries separately from the compiler and slightly differently to the standard gcc build. In general this works well but in trying to build them separately we run into an issue since we're using our gcc, not xgcc and there is no way to tell configure to use libgcc but not look for libstdc++. This adds such an option allowing such configurations to work. 2021-10-26 Richard Purdie gcc/c-family/ChangeLog: * c.opt: Add --nostdlib++ option gcc/cp/ChangeLog: * g++spec.c (lang_specific_driver): Add --nostdlib++ option gcc/ChangeLog: * doc/invoke.texi: Document --nostdlib++ option * gcc.c: Add --nostdlib++ option Signed-off-by: Richard Purdie --- gcc/c-family/c.opt | 4 gcc/cp/g++spec.c| 1 + gcc/doc/invoke.texi | 8 ++-- gcc/gcc.c | 1 + 4 files changed, 12 insertions(+), 2 deletions(-) diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 06457ac739e..ee742c831fd 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -2190,6 +2190,10 @@ nostdinc++ C++ ObjC++ Do not search standard system include directories for C++. +nostdlib++ +Driver +Do not link standard C++ runtime library + o C ObjC C++ ObjC++ Joined Separate ; Documented in common.opt diff --git a/gcc/cp/g++spec.c b/gcc/cp/g++spec.c index 3c9bd1490b4..818beb61cee 100644 --- a/gcc/cp/g++spec.c +++ b/gcc/cp/g++spec.c @@ -159,6 +159,7 @@ lang_specific_driver (struct cl_decoded_option **in_decoded_options, switch (decoded_options[i].opt_index) { case OPT_nostdlib: + case OPT_nostdlib__: case OPT_nodefaultlibs: library = -1; break; diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 71992b8c597..b30b79ebdc0 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -236,7 +236,7 @@ in the following sections. -ftemplate-backtrace-limit=@var{n} @gol -ftemplate-depth=@var{n} @gol -fno-threadsafe-statics -fuse-cxa-atexit @gol --fno-weak -nostdinc++ @gol +-fno-weak -nostdinc++ -nostdlib++ @gol -fvisibility-inlines-hidden @gol -fvisibility-ms-compat @gol -fext-numeric-literals @gol @@ -633,7 +633,7 @@ Objective-C and Objective-C++ Dialects}. @item Linker Options @xref{Link Options,,Options for Linking}. @gccoptlist{@var{object-file-name} -fuse-ld=@var{linker} -l@var{library} @gol --nostartfiles -nodefaultlibs -nolibc -nostdlib @gol +-nostartfiles -nodefaultlibs -nolibc -nostdlib -nostdlib++ @gol -e @var{entry} --entry=@var{entry} @gol -pie -pthread -r -rdynamic @gol -s -static -static-pie -static-libgcc -static-libstdc++ @gol @@ -16125,6 +16125,10 @@ library subroutines. constructors are called; @pxref{Collect2,,@code{collect2}, gccint, GNU Compiler Collection (GCC) Internals}.) +@item -nostdlib++ +@opindex nostdlib++ +Do not use the standard system C++ runtime libraries when linking. + @item -e @var{entry} @itemx --entry=@var{entry} @opindex e diff --git a/gcc/gcc.c b/gcc/gcc.c index 506c2acc282..abb900a4247 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -1167,6 +1167,7 @@ proper position among the other output files. */ %(mflib) " STACK_SPLIT_SPEC "\ %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} " SANITIZER_SPEC " \ %{!nostdlib:%{!r:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}}\ +%{!nostdlib++:}\ %{!nostdlib:%{!r:%{!nostartfiles:%E}}} %{T*} \n%(post_link) }}" #endif -- 2.32.0
Re: [PATCH 1/4] Makefile.in: Ensure build CPP is used for build targets
On Tue, Oct 26, 2021 at 2:45 PM Richard Purdie via Gcc-patches wrote: > > During cross compiling, CPP is being set to the target compiler even for > build targets. As an example, when building a cross compiler targetting > mingw, the config.log for libiberty in > build.x86_64-pokysdk-mingw32.i586-poky-linux/build-x86_64-linux/libiberty/config.log > shows: > > configure:3786: checking how to run the C preprocessor > configure:3856: result: x86_64-pokysdk-mingw32-gcc -E > --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 > configure:3876: x86_64-pokysdk-mingw32-gcc -E > --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 conftest.c > configure:3876: $? = 0 > > This is libiberty being built for the build environment, not the target one > (i.e. in build-x86_64-linux). As such it should be using the build > environment's > gcc and not the target one. In the mingw case the system headers are quite > different leading to build failures related to not being able to include a > process.h file for pem-unix.c. > > Fix this by using CC_FOR_BUILD instead of CC. Ultimately a CPP_FOR_BUILD > could be defined but CC_FOR_BUILD seems at least more correct and is a simpler > fix. Since we're using AC_PROG_CPP to find the preprocessor simply assuming that $(CC_FOR_BUILD) -E works looks dangerous? > 2021-10-26 Richard Purdie > > Changelog: > > * Makefile.in: Use CC_FOR_BUILD as CPP for build targets > to avoid host/target contamination > > Signed-off-by: Richard Purdie > --- > Makefile.in | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/Makefile.in b/Makefile.in > index 34b2d89660d..f4815d7e75f 100644 > --- a/Makefile.in > +++ b/Makefile.in > @@ -152,6 +152,7 @@ BUILD_EXPORTS = \ > AR="$(AR_FOR_BUILD)"; export AR; \ > AS="$(AS_FOR_BUILD)"; export AS; \ > CC="$(CC_FOR_BUILD)"; export CC; \ > + CPP="$(CC_FOR_BUILD) -E"; export CPP; \ > CFLAGS="$(CFLAGS_FOR_BUILD)"; export CFLAGS; \ > CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \ > CXX="$(CXX_FOR_BUILD)"; export CXX; \ > -- > 2.32.0 >
Re: [PATCH 1/4] Makefile.in: Ensure build CPP is used for build targets
On Tue, 2021-10-26 at 14:55 +0200, Richard Biener wrote: > On Tue, Oct 26, 2021 at 2:45 PM Richard Purdie via Gcc-patches > wrote: > > > > During cross compiling, CPP is being set to the target compiler even for > > build targets. As an example, when building a cross compiler targetting > > mingw, the config.log for libiberty in > > build.x86_64-pokysdk-mingw32.i586-poky-linux/build-x86_64-linux/libiberty/config.log > > shows: > > > > configure:3786: checking how to run the C preprocessor > > configure:3856: result: x86_64-pokysdk-mingw32-gcc -E > > --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 > > configure:3876: x86_64-pokysdk-mingw32-gcc -E > > --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 conftest.c > > configure:3876: $? = 0 > > > > This is libiberty being built for the build environment, not the target one > > (i.e. in build-x86_64-linux). As such it should be using the build > > environment's > > gcc and not the target one. In the mingw case the system headers are quite > > different leading to build failures related to not being able to include a > > process.h file for pem-unix.c. > > > > Fix this by using CC_FOR_BUILD instead of CC. Ultimately a CPP_FOR_BUILD > > could be defined but CC_FOR_BUILD seems at least more correct and is a > > simpler > > fix. > > Since we're using AC_PROG_CPP to find the preprocessor simply assuming > that $(CC_FOR_BUILD) -E works looks dangerous? Potentially, yes. So the route of adding CPP_FOR_BUILD would be preferred? Cheers, Richard
RE: [PATCH 2/2]AArch64: Add better costing for vector constants and operations
Hi, Following the discussion below here's a revised patch. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/arm/aarch-common-protos.h (struct vector_cost_table): Add movi, dup and extract costing fields. * config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs, thunderx_extra_costs, thunderx2t99_extra_costs, thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs): Use them. * config/arm/aarch-cost-tables.h (generic_extra_costs, cortexa53_extra_costs, cortexa57_extra_costs, cortexa76_extra_costs, exynosm1_extra_costs, xgene1_extra_costs): Likewise * config/aarch64/aarch64-simd.md (aarch64_simd_dup): Add r->w dup. * config/aarch64/aarch64.c (aarch64_simd_make_constant): Expose. (aarch64_rtx_costs): Add extra costs. (aarch64_simd_dup_constant): Support check only mode. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-cse-codegen.c: New test. --- inline copy of patch --- diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h index dd2e7e7cbb13d24f0b51092270cd7e2d75fabf29..bb499a1eae62a145f1665d521f57c98b49ac5389 100644 --- a/gcc/config/aarch64/aarch64-cost-tables.h +++ b/gcc/config/aarch64/aarch64-cost-tables.h @@ -124,7 +124,10 @@ const struct cpu_cost_table qdf24xx_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* alu. */ -COSTS_N_INSNS (4) /* mult. */ +COSTS_N_INSNS (4), /* mult. */ +COSTS_N_INSNS (1), /* movi. */ +COSTS_N_INSNS (2), /* dup. */ +COSTS_N_INSNS (2) /* extract. */ } }; @@ -229,7 +232,10 @@ const struct cpu_cost_table thunderx_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* Alu. */ -COSTS_N_INSNS (4) /* mult. */ +COSTS_N_INSNS (4), /* mult. */ +COSTS_N_INSNS (1), /* movi. */ +COSTS_N_INSNS (2), /* dup. */ +COSTS_N_INSNS (2) /* extract. */ } }; @@ -333,7 +339,10 @@ const struct cpu_cost_table thunderx2t99_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* Alu. */ -COSTS_N_INSNS (4) /* Mult. */ +COSTS_N_INSNS (4), /* Mult. */ +COSTS_N_INSNS (1), /* movi. */ +COSTS_N_INSNS (2), /* dup. */ +COSTS_N_INSNS (2) /* extract. */ } }; @@ -437,7 +446,10 @@ const struct cpu_cost_table thunderx3t110_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* Alu. */ -COSTS_N_INSNS (4) /* Mult. */ +COSTS_N_INSNS (4), /* Mult. */ +COSTS_N_INSNS (1), /* movi. */ +COSTS_N_INSNS (2), /* dup. */ +COSTS_N_INSNS (2) /* extract. */ } }; @@ -542,7 +554,10 @@ const struct cpu_cost_table tsv110_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* alu. */ -COSTS_N_INSNS (4) /* mult. */ +COSTS_N_INSNS (4), /* mult. */ +COSTS_N_INSNS (1), /* movi. */ +COSTS_N_INSNS (2), /* dup. */ +COSTS_N_INSNS (2) /* extract. */ } }; @@ -646,7 +661,10 @@ const struct cpu_cost_table a64fx_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* alu. */ -COSTS_N_INSNS (4) /* mult. */ +COSTS_N_INSNS (4), /* mult. */ +COSTS_N_INSNS (1), /* movi. */ +COSTS_N_INSNS (2), /* dup. */ +COSTS_N_INSNS (2) /* extract. */ } }; diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 29f381728a3b3d28bcd6a1002ba398c8b87713d2..61c3d7e195c510da88aa513f99af5f76f4d696e7 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -74,12 +74,14 @@ (define_insn "aarch64_simd_dup" ) (define_insn "aarch64_simd_dup" - [(set (match_operand:VDQF_F16 0 "register_operand" "=w") + [(set (match_operand:VDQF_F16 0 "register_operand" "=w,w") (vec_duplicate:VDQF_F16 - (match_operand: 1 "register_operand" "w")))] + (match_operand: 1 "register_operand" "w,r")))] "TARGET_SIMD" - "dup\\t%0., %1.[0]" - [(set_attr "type" "neon_dup")] + "@ + dup\\t%0., %1.[0] + dup\\t%0., %1" + [(set_attr "type" "neon_dup, neon_from_gp")] ) (define_insn "aarch64_dup_lane" diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 699c105a42a613c06c462e2de686795279d85bc9..1fb4350916572c915e5af339102444daf324efc7 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -303,6 +303,7 @@ static machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64); static bool aarch64_print_address_internal (FILE*, machine_mode, rtx, aarch64_addr_query_type); static HOST_WIDE_INT aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val); +static rtx aarch64_simd_make_constant (rtx, bool); /* Major revision number of the ARM Architecture implemented by the target. */ unsigned aarch64_architecture_version; @@ -12705,7 +12706,7 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer ATTRIB
Re: [RFC] Partial vectors for s390
Hi Richard, We already have code to probe the predicates of the underlying define_expands/insns to see whether they support certain constant IFN arguments; see e.g. internal_gather_scatter_fn_supported_p. We could do something similar here: add an extra operand to the optab, and an extra argument to the IFN, that gives a bias amount. The PowerPC version would require 0, the System Z version would require -1. The vectoriser would probe to see which value it should use. Doing it that way ensures that the gimple is still self-describing. It avoids gimple semantics depending on target hooks. As I don't have much previous exposure to the vectoriser code, I cobbled together something pretty ad-hoc (attached). Does this come somehow close to what you have in mind? internal_len_load_supported_p should rather be called internal_len_load_bias_supported_p or so I guess and the part where we exclude multiple loop_lens is still missing. Would we also check for a viable bias there and then either accept multiple lens or not? Regards Robincommit 2320dbfdfe1477b15a2ac59847d2a52e68de49ab Author: Robin Dapp Date: Tue Oct 26 14:36:08 2021 +0200 bias1 diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 8312d08aab2..bf97d3e471a 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2696,9 +2696,9 @@ expand_call_mem_ref (tree type, gcall *stmt, int index) static void expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) { - class expand_operand ops[3]; - tree type, lhs, rhs, maskt; - rtx mem, target, mask; + class expand_operand ops[4]; + tree type, lhs, rhs, maskt, biast; + rtx mem, target, mask, bias; insn_code icode; maskt = gimple_call_arg (stmt, 2); @@ -2727,7 +2727,18 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) TYPE_UNSIGNED (TREE_TYPE (maskt))); else create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); - expand_insn (icode, 3, ops); + if (optab == len_load_optab) +{ + biast = gimple_call_arg (stmt, 3); + bias = expand_normal (biast); + create_input_operand (&ops[3], bias, SImode); +} + + if (optab != len_load_optab) +expand_insn (icode, 3, ops); + else +expand_insn (icode, 4, ops); + if (!rtx_equal_p (target, ops[0].value)) emit_move_insn (target, ops[0].value); } @@ -2741,9 +2752,9 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) static void expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) { - class expand_operand ops[3]; - tree type, lhs, rhs, maskt; - rtx mem, reg, mask; + class expand_operand ops[4]; + tree type, lhs, rhs, maskt, biast; + rtx mem, reg, mask, bias; insn_code icode; maskt = gimple_call_arg (stmt, 2); @@ -2770,7 +2781,17 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) TYPE_UNSIGNED (TREE_TYPE (maskt))); else create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); - expand_insn (icode, 3, ops); + if (optab == len_store_optab) +{ + biast = gimple_call_arg (stmt, 4); + bias = expand_normal (biast); + create_input_operand (&ops[3], bias, SImode); +} + + if (optab != len_store_optab) +expand_insn (icode, 3, ops); + else +expand_insn (icode, 4, ops); } #define expand_mask_store_optab_fn expand_partial_store_optab_fn @@ -4154,6 +4175,25 @@ internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type, && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale))); } +bool +internal_len_load_supported_p (internal_fn ifn, tree load_type, int bias) +{ + if (bias > 0 || bias < -1) +return false; + + machine_mode mode = TYPE_MODE (load_type); + + optab optab = direct_internal_fn_optab (ifn); + insn_code icode = direct_optab_handler (optab, mode); + int output_ops = internal_load_fn_p (ifn) ? 1 : 0; + + if (icode != CODE_FOR_nothing + && insn_operand_matches (icode, 2 + output_ops, GEN_INT (bias))) +return true; + + return false; +} + /* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function IFN for pointers of type TYPE when the accesses have LENGTH bytes and their common byte alignment is ALIGN. */ diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 19d0f849a5a..d0bf9941bcc 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -225,6 +225,7 @@ extern int internal_fn_mask_index (internal_fn); extern int internal_fn_stored_value_index (internal_fn); extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, tree, tree, int); +extern bool internal_len_load_supported_p (internal_fn ifn, tree, int); extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree, poly_uint64, unsigned int); diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index d7723b1a92a..50537763ace 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -8272,12
Re: [PATCH] Fix loop split incorrect count and probability
> > > That said, likely the profile update cannot be done uniformly > for all blocks of a loop? For the loop: for (i = 0; i < n; i = inc (i)) { if (ga) ga = do_something (); } to: for (i = 0; i < x; i = inc (i)) { if (true) ga = do_something (); if (!ga) break; } for (; i < n; i = inc (i)) { if (false) ga = do_something (); } If probability of if (ga) being true is p, then you indeed can scale the first loop by p and second loop by 1-p. Imagine that loop has n iterations and it takes m iterations for ga to become false, then probability of if(ga) is m/n and you get frequencies with m=n*(m/n) for first loop and n-m=n*(1-n/m) for second loop. Because the conditional becomes constant true, one needs to scale up the basic block guarded by the if (true) up by n/m to compensate for the change. With that the udpate should be right. Ideally one can bypass scaling of basic block(s) containing ga = do_something () since the scaling first scales down to m/n and then scale sup to m/n. Which may not combine to noop. Perhaps one wants to have a parameter specifying basic blocks on which the scaling is performed while duplicating for this? Honza > > Richard.
RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values
On Tue, 26 Oct 2021, Tamar Christina wrote: > > > > -Original Message- > > From: Richard Biener > > Sent: Tuesday, October 26, 2021 9:46 AM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd > > > > Subject: RE: [PATCH] middle-end: fix de-optimizations with bitclear patterns > > on signed values > > > > On Tue, 26 Oct 2021, Tamar Christina wrote: > > > > > > -Original Message- > > > > From: Richard Biener > > > > Sent: Tuesday, October 26, 2021 9:26 AM > > > > To: Tamar Christina > > > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; nd > > > > > > > > Subject: RE: [PATCH] middle-end: fix de-optimizations with bitclear > > > > patterns on signed values > > > > > > > > On Mon, 25 Oct 2021, Tamar Christina wrote: > > > > > > > > > > -Original Message- > > > > > > From: Richard Biener > > > > > > Sent: Friday, October 15, 2021 12:31 PM > > > > > > To: Tamar Christina > > > > > > Cc: gcc-patches@gcc.gnu.org; Jakub Jelinek ; > > > > > > nd > > > > > > Subject: Re: [PATCH] middle-end: fix de-optimizations with > > > > > > bitclear patterns on signed values > > > > > > > > > > > > On Fri, 15 Oct 2021, Tamar Christina wrote: > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > During testing after rebasing to commit I noticed a failing > > > > > > > testcase with the bitmask compare patch. > > > > > > > > > > > > > > Consider the following C++ testcase: > > > > > > > > > > > > > > #include > > > > > > > > > > > > > > #define A __attribute__((noipa)) A bool f5 (double i, double > > > > > > > j) { auto c = i <=> j; return c >= 0; } > > > > > > > > > > > > > > This turns into a comparison against chars, on systems where > > > > > > > chars are signed the pattern inserts an unsigned convert such > > > > > > > that it's able to do the transformation. > > > > > > > > > > > > > > i.e.: > > > > > > > > > > > > > > # RANGE [-1, 2] > > > > > > > # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)> > > > > > > > # RANGE ~[3, 254] > > > > > > > _11 = (unsigned char) c$_M_value_22; > > > > > > > _19 = _11 <= 1; > > > > > > > # .MEM_24 = VDEF <.MEM_6(D)> > > > > > > > D.10434 ={v} {CLOBBER}; > > > > > > > # .MEM_14 = VDEF <.MEM_24> > > > > > > > D.10407 ={v} {CLOBBER}; > > > > > > > # VUSE <.MEM_14> > > > > > > > return _19; > > > > > > > > > > > > > > instead of: > > > > > > > > > > > > > > # RANGE [-1, 2] > > > > > > > # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)> > > > > > > > # RANGE [-2, 2] > > > > > > > _3 = c$_M_value_5 & -2; > > > > > > > _19 = _3 == 0; > > > > > > > # .MEM_24 = VDEF <.MEM_6(D)> > > > > > > > D.10440 ={v} {CLOBBER}; > > > > > > > # .MEM_14 = VDEF <.MEM_24> > > > > > > > D.10413 ={v} {CLOBBER}; > > > > > > > # VUSE <.MEM_14> > > > > > > > return _19; > > > > > > > > > > > > > > This causes much worse codegen under -ffast-math due to phiops > > > > > > > no longer recognizing the pattern. It turns out that phiopts > > > > > > > spaceship_replacement is looking for the exact form that was > > > > > > > just > > > > changed. > > > > > > > > > > > > > > Trying to get it to recognize the new form is not trivial as > > > > > > > the transformation doesn't look to work when the thing it's > > > > > > > pointing to is itself > > > > > > a phi-node. > > > > > > > > > > > > What do you mean? Where it handles the BIT_AND it could also > > > > > > handle the conversion, no? The later handling would probably > > > > > > more explicitely need to distinguish between the BIT_AND and the > > > > > > conversion > > > > forms. > > > > > > > > > > Looks like I misunderstood the code, it was looking at the uses > > > > > not the defs of the value. > > > > > > > > > > --- inline copy of patch --- > > > > > > > > > > The comments seems to suggest this code only checks for (res & ~1) > > > > > == > > > > > 0 but the implementation seems to suggest it's broader. > > > > > > > > > > As such I added a case to check to see if the value comparison we > > > > > found is a type cast. and strips away the type cast and continues. > > > > > > > > > > In match.pd the typecasts are only added for signed comparisons to > > > > > == > > > > > 0 and != 0 which are then rewritten into comparisons with 1. > > > > > > > > > > As such I only check for 1 and LE and GT, which is what match.pd > > > > > would have rewritten it to. > > > > > > > > > > This fixes the regression but this is not code I 100% understand, > > > > > since I don't really know the semantics of the spaceship operator > > > > > so would appreciate an extra look. > > > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > > x86_64-pc-linux-gnu and no regressions. > > > > > > > > > > Ok for master? > > > > > > > > Please add a testcase. I hope Jakub can review the > > > > spaceship_replacement patch since he's the one familiar with the code. > > > > > > There's already a bunch of testcases that test the various variants: > > > gcc/testsuite/g++.dg/
Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values
On Tue, Oct 26, 2021 at 03:13:29PM +0200, Richard Biener wrote: > try > auto c = ...; > signed char c2 = c; > return c2 >= ... > then That won't work, at least when using , which is what we with the optimization want to deal with primarily. Because std::partial_ordering etc. aren't implicitly nor explicitly convertible to int or signed char etc. Sure, one could in the testcase define its own std::strong_ordering etc. and define a conversion operator for it... Jakub
Re: [RFC] Don't move cold code out of loop by checking bb count
On Mon, Oct 18, 2021 at 6:29 AM Xionghu Luo wrote: > > > > On 2021/10/15 16:11, Richard Biener wrote: > > On Sat, Oct 9, 2021 at 5:45 AM Xionghu Luo wrote: > >> > >> Hi, > >> > >> On 2021/9/28 20:09, Richard Biener wrote: > >>> On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo wrote: > > Update the patch to v3, not sure whether you prefer the paste style > and continue to link the previous thread as Segher dislikes this... > > > [PATCH v3] Don't move cold code out of loop by checking bb count > > > Changes: > 1. Handle max_loop in determine_max_movement instead of > outermost_invariant_loop. > 2. Remove unnecessary changes. > 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in > can_sm_ref_p. > 4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused > infinite loop when implementing v1 and the iteration is missed to be > updated actually. > > v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html > v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html > > There was a patch trying to avoid move cold block out of loop: > > https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html > > Richard suggested to "never hoist anything from a bb with lower execution > frequency to a bb with higher one in LIM invariantness_dom_walker > before_dom_children". > > In gimple LIM analysis, add find_coldest_out_loop to move invariants to > expected target loop, if profile count of the loop bb is colder > than target loop preheader, it won't be hoisted out of loop. > Likely for store motion, if all locations of the REF in loop is cold, > don't do store motion of it. > > SPEC2017 performance evaluation shows 1% performance improvement for > intrate GEOMEAN and no obvious regression for others. Especially, > 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is > largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00% > on P8LE. > > gcc/ChangeLog: > > * loop-invariant.c (find_invariants_bb): Check profile count > before motion. > (find_invariants_body): Add argument. > * tree-ssa-loop-im.c (find_coldest_out_loop): New function. > (determine_max_movement): Use find_coldest_out_loop. > (move_computations_worker): Adjust and fix iteration udpate. > (execute_sm_exit): Check pointer validness. > (class ref_in_loop_hot_body): New functor. > (ref_in_loop_hot_body::operator): New. > (can_sm_ref_p): Use for_all_locs_in_loop. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/recip-3.c: Adjust. > * gcc.dg/tree-ssa/ssa-lim-18.c: New test. > * gcc.dg/tree-ssa/ssa-lim-19.c: New test. > * gcc.dg/tree-ssa/ssa-lim-20.c: New test. > --- > gcc/loop-invariant.c | 10 ++-- > gcc/tree-ssa-loop-im.c | 61 -- > gcc/testsuite/gcc.dg/tree-ssa/recip-3.c| 2 +- > gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++ > gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++ > gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 + > gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++ > 7 files changed, 165 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c > index fca0c2b24be..5c3be7bf0eb 100644 > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c > @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool > always_reached, bool always_executed) > call. */ > > static void > -find_invariants_bb (basic_block bb, bool always_reached, bool > always_executed) > +find_invariants_bb (class loop *loop, basic_block bb, bool > always_reached, > + bool always_executed) > { > rtx_insn *insn; > + basic_block preheader = loop_preheader_edge (loop)->src; > + > + if (preheader->count > bb->count) > +return; > > FOR_BB_INSNS (bb, insn) > { > @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, > basic_block *body, > unsigned i; > > for (i = 0; i < loop->num_nodes; i++) > -find_invariants_bb (body[i], > - bitmap_bit_p (always_reached, i), > +find_invariants_bb (loop, body[i],
Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values
On Tue, 26 Oct 2021, Jakub Jelinek wrote: > On Tue, Oct 26, 2021 at 03:13:29PM +0200, Richard Biener wrote: > > try > > auto c = ...; > > signed char c2 = c; > > return c2 >= ... > > then > > That won't work, at least when using , which is what we with the > optimization want to deal with primarily. > Because std::partial_ordering etc. aren't implicitly nor explicitly > convertible to int or signed char etc. > Sure, one could in the testcase define its own std::strong_ordering etc. > and define a conversion operator for it... So how do we end up with the signed char case in the first place? Is the frontend using a type that's target dependent? Richard.
Re: [PATCH]AArch64 Lower intrinsics shift to GIMPLE when possible.
Tamar Christina writes: > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-builtins.c > (aarch64_general_gimple_fold_builtin): Add ashl, sshl, ushl, ashr, > ashr_simd, lshr, lshr_simd. > * config/aarch64/aarch64-simd-builtins.def (lshr): Use USHIFTIMM. > * config/aarch64/arm_neon.h (vshr_n_u8, vshr_n_u16, vshr_n_u32, > vshrq_n_u8, vshrq_n_u16, vshrq_n_u32, vshrq_n_u64): Fix type hack. > > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-1.c: New test. > * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-2.c: New test. > * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-3.c: New test. > * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-4.c: New test. > * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-5.c: New test. > * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-6.c: New test. > * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-7.c: New test. > * gcc.target/aarch64/advsimd-intrinsics/vshl-opt-8.c: New test. > * gcc.target/aarch64/signbit-2.c: New test. OK, thanks. For the record, I guess vshl-opt-[1-3].c are now not really testing for optimisations, since the new versions of the tests use the intrinsics directly associated with the output (instead of using vshl, like the original [1-3].c did). I think they're still worthwhile tests though. Who knows what they might find in future. :-) Richard > > --- inline copy of patch --- > > diff --git a/gcc/config/aarch64/aarch64-builtins.c > b/gcc/config/aarch64/aarch64-builtins.c > index > f6b41d9c200d6300dee65ba60ae94488231a8a38..c362b29186cfc0bf0d39c08c314cfd6a99124cb2 > 100644 > --- a/gcc/config/aarch64/aarch64-builtins.c > +++ b/gcc/config/aarch64/aarch64-builtins.c > @@ -2394,6 +2394,54 @@ aarch64_general_gimple_fold_builtin (unsigned int > fcode, gcall *stmt) > 1, args[0]); > gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt)); > break; > + BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, NONE) > + if (TREE_CODE (args[1]) == INTEGER_CST > + && wi::ltu_p (wi::to_wide (args[1]), element_precision (args[0]))) > + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), > + LSHIFT_EXPR, args[0], args[1]); > + break; > + BUILTIN_VSDQ_I_DI (BINOP, sshl, 0, NONE) > + BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0, NONE) > + { > + tree cst = args[1]; > + tree ctype = TREE_TYPE (cst); > + /* Left shifts can be both scalar or vector, e.g. uint64x1_t is > + treated as a scalar type not a vector one. */ > + if ((cst = uniform_integer_cst_p (cst)) != NULL_TREE) > + { > + wide_int wcst = wi::to_wide (cst); > + tree unit_ty = TREE_TYPE (cst); > + > + wide_int abs_cst = wi::abs (wcst); > + if (wi::geu_p (abs_cst, element_precision (args[0]))) > + break; > + > + if (wi::neg_p (wcst, TYPE_SIGN (ctype))) > + { > + tree final_cst; > + final_cst = wide_int_to_tree (unit_ty, abs_cst); > + if (TREE_CODE (cst) != INTEGER_CST) > + final_cst = build_uniform_cst (ctype, final_cst); > + > + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), > + RSHIFT_EXPR, args[0], > + final_cst); > + } > + else > + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), > + LSHIFT_EXPR, args[0], args[1]); > + } > + } > + break; > + BUILTIN_VDQ_I (SHIFTIMM, ashr, 3, NONE) > + VAR1 (SHIFTIMM, ashr_simd, 0, NONE, di) > + BUILTIN_VDQ_I (USHIFTIMM, lshr, 3, NONE) > + VAR1 (USHIFTIMM, lshr_simd, 0, NONE, di) > + if (TREE_CODE (args[1]) == INTEGER_CST > + && wi::ltu_p (wi::to_wide (args[1]), element_precision (args[0]))) > + new_stmt = gimple_build_assign (gimple_call_lhs (stmt), > + RSHIFT_EXPR, args[0], args[1]); > + break; >BUILTIN_GPF (BINOP, fmulx, 0, ALL) > { > gcc_assert (nargs == 2); > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > 402453aa9bba5949da43c984c4603196b1efd092..bbe0a4a3c4aea4187e7b1a9f10ab60e79df7b138 > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -409,7 +409,7 @@ > >BUILTIN_VDQ_I (SHIFTIMM, ashr, 3, NONE) >VAR1 (SHIFTIMM, ashr_simd, 0, NONE, di) > - BUILTIN_VDQ_I (SHIFTIMM, lshr, 3, NONE) > + BUILTIN_VDQ_I (USHIFTIMM, lshr, 3, NONE) >VAR1 (USHIFTIMM, lshr_simd, 0, NONE, di) >/* Implemented by aarch64_shr_n. */ >
Re: [PATCH 1/4] Makefile.in: Ensure build CPP is used for build targets
On Tue, Oct 26, 2021 at 2:57 PM Richard Purdie wrote: > > On Tue, 2021-10-26 at 14:55 +0200, Richard Biener wrote: > > On Tue, Oct 26, 2021 at 2:45 PM Richard Purdie via Gcc-patches > > wrote: > > > > > > During cross compiling, CPP is being set to the target compiler even for > > > build targets. As an example, when building a cross compiler targetting > > > mingw, the config.log for libiberty in > > > build.x86_64-pokysdk-mingw32.i586-poky-linux/build-x86_64-linux/libiberty/config.log > > > shows: > > > > > > configure:3786: checking how to run the C preprocessor > > > configure:3856: result: x86_64-pokysdk-mingw32-gcc -E > > > --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 > > > configure:3876: x86_64-pokysdk-mingw32-gcc -E > > > --sysroot=[sysroot]/x86_64-nativesdk-mingw32-pokysdk-mingw32 conftest.c > > > configure:3876: $? = 0 > > > > > > This is libiberty being built for the build environment, not the target > > > one > > > (i.e. in build-x86_64-linux). As such it should be using the build > > > environment's > > > gcc and not the target one. In the mingw case the system headers are quite > > > different leading to build failures related to not being able to include a > > > process.h file for pem-unix.c. > > > > > > Fix this by using CC_FOR_BUILD instead of CC. Ultimately a CPP_FOR_BUILD > > > could be defined but CC_FOR_BUILD seems at least more correct and is a > > > simpler > > > fix. > > > > Since we're using AC_PROG_CPP to find the preprocessor simply assuming > > that $(CC_FOR_BUILD) -E works looks dangerous? > > Potentially, yes. So the route of adding CPP_FOR_BUILD would be preferred? I would say so, yes. Richard. > > Cheers, > > Richard >
Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values
On Tue, Oct 26, 2021 at 03:21:55PM +0200, Richard Biener wrote: > On Tue, 26 Oct 2021, Jakub Jelinek wrote: > > > On Tue, Oct 26, 2021 at 03:13:29PM +0200, Richard Biener wrote: > > > try > > > auto c = ...; > > > signed char c2 = c; > > > return c2 >= ... > > > then > > > > That won't work, at least when using , which is what we with the > > optimization want to deal with primarily. > > Because std::partial_ordering etc. aren't implicitly nor explicitly > > convertible to int or signed char etc. > > Sure, one could in the testcase define its own std::strong_ordering etc. > > and define a conversion operator for it... > > So how do we end up with the signed char case in the first place? > Is the frontend using a type that's target dependent? uses explicitly signed char: namespace std { // [cmp.categories], comparison category types namespace __cmp_cat { using type = signed char; enum class _Ord : type { equivalent = 0, less = -1, greater = 1 }; enum class _Ncmp : type { _Unordered = 2 }; ... and __cmp_cat::type is what is used as type of _M_value of std::*_ordering -fsigned-char vs. -funsigned-char make no difference on the testcases on x86, but as mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589#c24 some target decisions like load_extend_op uses in fold-const.c can affect it. See https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570714.html Jakub
Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values
On Tue, 26 Oct 2021, Jakub Jelinek wrote: > On Tue, Oct 26, 2021 at 03:21:55PM +0200, Richard Biener wrote: > > On Tue, 26 Oct 2021, Jakub Jelinek wrote: > > > > > On Tue, Oct 26, 2021 at 03:13:29PM +0200, Richard Biener wrote: > > > > try > > > > auto c = ...; > > > > signed char c2 = c; > > > > return c2 >= ... > > > > then > > > > > > That won't work, at least when using , which is what we with the > > > optimization want to deal with primarily. > > > Because std::partial_ordering etc. aren't implicitly nor explicitly > > > convertible to int or signed char etc. > > > Sure, one could in the testcase define its own std::strong_ordering etc. > > > and define a conversion operator for it... > > > > So how do we end up with the signed char case in the first place? > > Is the frontend using a type that's target dependent? > > uses explicitly signed char: > namespace std > { > // [cmp.categories], comparison category types > namespace __cmp_cat > { > using type = signed char; > enum class _Ord : type { equivalent = 0, less = -1, greater = 1 }; > enum class _Ncmp : type { _Unordered = 2 }; > ... > and __cmp_cat::type is what is used as type of _M_value of std::*_ordering > -fsigned-char vs. -funsigned-char make no difference on the testcases on > x86, but as mentioned in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589#c24 > some target decisions like load_extend_op uses in fold-const.c can affect > it. See https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570714.html Eh, I see. So there are alrady testcases covering the issues on the affected targets. So ignore my comment about adding additional testcases. Richard.
Re: [RFC] Partial vectors for s390
Robin Dapp writes: > Hi Richard, > >> We already have code to probe the predicates of the underlying >> define_expands/insns to see whether they support certain constant >> IFN arguments; see e.g. internal_gather_scatter_fn_supported_p. >> We could do something similar here: add an extra operand to the optab, >> and an extra argument to the IFN, that gives a bias amount. >> The PowerPC version would require 0, the System Z version would >> require -1. The vectoriser would probe to see which value >> it should use. >> >> Doing it that way ensures that the gimple is still self-describing. >> It avoids gimple semantics depending on target hooks. > > As I don't have much previous exposure to the vectoriser code, I cobbled > together something pretty ad-hoc (attached). Does this come somehow > close to what you have in mind? Yeah, looks good. > internal_len_load_supported_p should rather be called > internal_len_load_bias_supported_p or so I guess and the part where we > exclude multiple loop_lens is still missing. Since we only support one bias, it might be better to make the internal-fn.c function return the bias as an int (with some marker value for “not supported”), so that the caller doesn't need to probe both values. > Would we also check for a viable bias there and then either accept > multiple lens or not? Yeah, I think so. Thanks, Richard > > Regards > Robin > > commit 2320dbfdfe1477b15a2ac59847d2a52e68de49ab > Author: Robin Dapp > Date: Tue Oct 26 14:36:08 2021 +0200 > > bias1 > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c > index 8312d08aab2..bf97d3e471a 100644 > --- a/gcc/internal-fn.c > +++ b/gcc/internal-fn.c > @@ -2696,9 +2696,9 @@ expand_call_mem_ref (tree type, gcall *stmt, int index) > static void > expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > { > - class expand_operand ops[3]; > - tree type, lhs, rhs, maskt; > - rtx mem, target, mask; > + class expand_operand ops[4]; > + tree type, lhs, rhs, maskt, biast; > + rtx mem, target, mask, bias; >insn_code icode; > >maskt = gimple_call_arg (stmt, 2); > @@ -2727,7 +2727,18 @@ expand_partial_load_optab_fn (internal_fn, gcall > *stmt, convert_optab optab) >TYPE_UNSIGNED (TREE_TYPE (maskt))); >else > create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); > - expand_insn (icode, 3, ops); > + if (optab == len_load_optab) > +{ > + biast = gimple_call_arg (stmt, 3); > + bias = expand_normal (biast); > + create_input_operand (&ops[3], bias, SImode); > +} > + > + if (optab != len_load_optab) > +expand_insn (icode, 3, ops); > + else > +expand_insn (icode, 4, ops); > + >if (!rtx_equal_p (target, ops[0].value)) > emit_move_insn (target, ops[0].value); > } > @@ -2741,9 +2752,9 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, > convert_optab optab) > static void > expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > { > - class expand_operand ops[3]; > - tree type, lhs, rhs, maskt; > - rtx mem, reg, mask; > + class expand_operand ops[4]; > + tree type, lhs, rhs, maskt, biast; > + rtx mem, reg, mask, bias; >insn_code icode; > >maskt = gimple_call_arg (stmt, 2); > @@ -2770,7 +2781,17 @@ expand_partial_store_optab_fn (internal_fn, gcall > *stmt, convert_optab optab) >TYPE_UNSIGNED (TREE_TYPE (maskt))); >else > create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); > - expand_insn (icode, 3, ops); > + if (optab == len_store_optab) > +{ > + biast = gimple_call_arg (stmt, 4); > + bias = expand_normal (biast); > + create_input_operand (&ops[3], bias, SImode); > +} > + > + if (optab != len_store_optab) > +expand_insn (icode, 3, ops); > + else > +expand_insn (icode, 4, ops); > } > > #define expand_mask_store_optab_fn expand_partial_store_optab_fn > @@ -4154,6 +4175,25 @@ internal_gather_scatter_fn_supported_p (internal_fn > ifn, tree vector_type, > && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale))); > } > > +bool > +internal_len_load_supported_p (internal_fn ifn, tree load_type, int bias) > +{ > + if (bias > 0 || bias < -1) > +return false; > + > + machine_mode mode = TYPE_MODE (load_type); > + > + optab optab = direct_internal_fn_optab (ifn); > + insn_code icode = direct_optab_handler (optab, mode); > + int output_ops = internal_load_fn_p (ifn) ? 1 : 0; > + > + if (icode != CODE_FOR_nothing > + && insn_operand_matches (icode, 2 + output_ops, GEN_INT (bias))) > +return true; > + > + return false; > +} > + > /* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function IFN > for pointers of type TYPE when the accesses have LENGTH bytes and their > common byte alignment is ALIGN. */ > diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h > index 19d0f849a5a..d0bf9941bcc 100644 > --
Re: [PATCH v2] c++tools: Fix memory leak
On 10/21/21 17:34, Jonathan Wakely wrote: On Thu, 21 Oct 2021 at 20:38, Jason Merrill wrote: On 10/21/21 09:28, Jonathan Wakely wrote: > #else > buffer = xmalloc (stat.st_size); > if (!buffer) > return -errno; > + struct Deleter { void operator()(void* p) const { free(p); } }; > + std::unique_ptr guard; Don't you need to initialize guard from buffer? Oops, yes! Updated patch attached. OK, thanks. Jason
Re: [PATCH v2] c++: P2360R0: Extend init-stmt to allow alias-decl [PR102617]
On 10/21/21 18:27, Marek Polacek wrote: On Thu, Oct 21, 2021 at 04:56:57PM -0400, Jason Merrill wrote: On 10/21/21 16:26, Marek Polacek wrote: The following patch implements C++23 P2360R0. This proposal merely extends init-statement to contain alias-declaration. init-statement is used in if/for/switch. The unsightly duplication of the new code seems to be necessary to handle for ( init-statement condition[opt] ; expression[opt] ) statement as well as for ( init-statement[opt] for-range-declaration : for-range-initializer ) statement It seems like the duplication of the new code is a consequence of the duplication of the old code. I'd think we could remove the duplication by remembering the result of cp_parser_range_based_for_with_init_p and then recursing at the end if it was true. Or check it in cp_parser_for and call cp_parser_init_statement twice. That works well, just had to move the pedwarn too. dg.exp passes, full testing running, OK if it passes? OK. -- >8 -- The following patch implements C++23 P2360R0. This proposal merely extends init-statement to contain alias-declaration. init-statement is used in if/for/switch. It also removes the unsightly duplication of code by calling cp_parser_init_statement twice. PR c++/102617 gcc/cp/ChangeLog: * parser.c (cp_parser_for): Maybe call cp_parser_init_statement twice. Warn about range-based for loops with initializer here. (cp_parser_init_statement): Don't duplicate code. Allow alias-declaration in init-statement. gcc/testsuite/ChangeLog: * g++.dg/cpp23/init-stmt1.C: New test. * g++.dg/cpp23/init-stmt2.C: New test. --- gcc/cp/parser.c | 70 ++--- gcc/testsuite/g++.dg/cpp23/init-stmt1.C | 31 +++ gcc/testsuite/g++.dg/cpp23/init-stmt2.C | 25 + 3 files changed, 95 insertions(+), 31 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp23/init-stmt1.C create mode 100644 gcc/testsuite/g++.dg/cpp23/init-stmt2.C diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 49d951cfb19..93335c817d7 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -12040,6 +12040,7 @@ cp_parser_handle_directive_omp_attributes (cp_parser *parser, tree *pattrs, init-statement: expression-statement simple-declaration +alias-declaration TM Extension: @@ -13327,6 +13328,23 @@ cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll) /* Begin the for-statement. */ scope = begin_for_scope (&init); + /* Maybe parse the optional init-statement in a range-based for loop. */ + if (cp_parser_range_based_for_with_init_p (parser) + /* Checked for diagnostic purposes only. */ + && cp_lexer_next_token_is_not (parser->lexer, CPP_SEMICOLON)) +{ + tree dummy; + cp_parser_init_statement (parser, &dummy); + if (cxx_dialect < cxx20) + { + pedwarn (cp_lexer_peek_token (parser->lexer)->location, + OPT_Wc__20_extensions, + "range-based % loops with initializer only " + "available with %<-std=c++20%> or %<-std=gnu++20%>"); + decl = error_mark_node; + } +} + /* Parse the initialization. */ is_range_for = cp_parser_init_statement (parser, &decl); @@ -13987,12 +14005,13 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep, return statement; } -/* Parse a init-statement or the declarator of a range-based-for. +/* Parse an init-statement or the declarator of a range-based-for. Returns true if a range-based-for declaration is seen. init-statement: expression-statement - simple-declaration */ + simple-declaration + alias-declaration */ static bool cp_parser_init_statement (cp_parser *parser, tree *decl) @@ -14008,40 +14027,29 @@ cp_parser_init_statement (cp_parser *parser, tree *decl) bool is_range_for = false; bool saved_colon_corrects_to_scope_p = parser->colon_corrects_to_scope_p; - /* Try to parse the init-statement. */ - if (cp_parser_range_based_for_with_init_p (parser)) - { - tree dummy; - cp_parser_parse_tentatively (parser); - /* Parse the declaration. */ - cp_parser_simple_declaration (parser, - /*function_definition_allowed_p=*/false, - &dummy); - cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON); - if (!cp_parser_parse_definitely (parser)) - /* That didn't work, try to parse it as an expression-statement. */ - cp_parser_expression_statement (parser, NULL_TREE); - - if (cxx_dialect < cxx20) - { - pedwarn (cp_lexer_peek_token (parser->lexer)->location, - OPT_Wc__20_extensions, - "range-based % loops with initializer only " -
Re: [PATCH 2/2]AArch64: Add better costing for vector constants and operations
Tamar Christina writes: > Hi, > > Following the discussion below here's a revised patch. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? Looks good functionally, just got some comments about the implementation. > @@ -14006,8 +14007,52 @@ cost_plus: >mode, MULT, 1, speed); >return true; > } > + break; > +case CONST_VECTOR: > + { > + rtx gen_insn = aarch64_simd_make_constant (x, true); > + /* Not a valid const vector. */ > + if (!gen_insn) > + break; > > - /* Fall through. */ > + switch (GET_CODE (gen_insn)) > + { > + case CONST_VECTOR: > + /* Load using MOVI/MVNI. */ > + if (aarch64_simd_valid_immediate (x, NULL)) > + *cost += extra_cost->vect.movi; > + else /* Load using constant pool. */ > + *cost += extra_cost->ldst.load; > + break; > + /* Load using a DUP. */ > + case VEC_DUPLICATE: > + gcc_unreachable (); > + break; > + default: > + *cost += extra_cost->ldst.load; > + break; > + } > + return true; > + } This might be a problem (if it is a problem) with some of the existing cases too, but: is using += rather than = the right behaviour here? It maens that we add our cost on top of whatever the target-independent rtx_costs thought was a good default choice, whereas it looks like these table entries specify the correct full cost. If it's not clear-cut, then I think using = would be better. Also, going back to an earlier part of the thread, I think the “inner” CONST_VECTOR case is now a correct replacement for the “outer” CONST_VECTOR case, meaning we don't need the aarch64_simd_make_constant bits. I.e. I think we can make the top-level case: case CONST_VECTOR: /* Load using MOVI/MVNI. */ if (aarch64_simd_valid_immediate (x, NULL)) *cost = extra_cost->vect.movi; else /* Load using constant pool. */ *cost = extra_cost->ldst.load; break; > +case VEC_CONCAT: > + /* depending on the operation, either DUP or INS. > +For now, keep default costing. */ > + break; > +case VEC_DUPLICATE: > + *cost += extra_cost->vect.dup; > + return true; For this I think we should do: *cost = extra_cost->vect.dup; return false; so that we cost the operand of the vec_duplicate as well. This will have no effect if the operand is a REG, but would affect more complex expressions. > +case VEC_SELECT: > + { Here I think we should recurse on operand 0: rtx op0 = XEXP (x, 0); *cost = rtx_cost (op0, GET_MODE (op0), VEC_SELECT, 0, speed); > + /* cost subreg of 0 as free, otherwise as DUP */ > + rtx op1 = XEXP (x, 1); > + if (vec_series_lowpart_p (mode, GET_MODE (op1), op1)) > + ; > + else if (vec_series_highpart_p (mode, GET_MODE (op1), op1)) > + *cost += extra_cost->vect.dup; > + else > + *cost += extra_cost->vect.extract; > + return true; > + } > default: >break; > } > @@ -20654,9 +20699,12 @@ aarch64_builtin_support_vector_misalignment > (machine_mode mode, > > /* If VALS is a vector constant that can be loaded into a register > using DUP, generate instructions to do so and return an RTX to > - assign to the register. Otherwise return NULL_RTX. */ > + assign to the register. Otherwise return NULL_RTX. > + > + If CHECK then the resulting instruction may not be used in > + codegen but can be used for costing. */ > static rtx > -aarch64_simd_dup_constant (rtx vals) > +aarch64_simd_dup_constant (rtx vals, bool check = false) > { >machine_mode mode = GET_MODE (vals); >machine_mode inner_mode = GET_MODE_INNER (mode); > @@ -20668,7 +20716,8 @@ aarch64_simd_dup_constant (rtx vals) >/* We can load this constant by using DUP and a constant in a > single ARM register. This will be cheaper than a vector > load. */ > - x = copy_to_mode_reg (inner_mode, x); > + if (!check) > +x = copy_to_mode_reg (inner_mode, x); >return gen_vec_duplicate (mode, x); > } > > @@ -20676,9 +20725,12 @@ aarch64_simd_dup_constant (rtx vals) > /* Generate code to load VALS, which is a PARALLEL containing only > constants (for vec_init) or CONST_VECTOR, efficiently into a > register. Returns an RTX to copy into the register, or NULL_RTX > - for a PARALLEL that cannot be converted into a CONST_VECTOR. */ > + for a PARALLEL that cannot be converted into a CONST_VECTOR. > + > + If CHECK then the resulting instruction may not be used in > + codegen but can be used for costing. */ > static rtx > -aarch64_simd_make_constant (rtx vals) > +aarch64_simd_make_constant (rtx vals, bool check = false) > { >machine_mode mode = GET_MODE (vals); >rtx const_dup; > @@ -20710,7 +20762,7 @@ aarch64_simd_make_constant (rtx vals) >
Re: [PATCH] aix: handle 64bit inodes for include directories
Hi everyone, Gentle ping on this patch. Clément From: CHIGOT, CLEMENT Sent: Tuesday, October 12, 2021 10:35 AM To: Jeff Law ; David Malcolm Cc: gcc-patches@gcc.gnu.org ; David Edelsohn Subject: Re: [PATCH] aix: handle 64bit inodes for include directories Hi Jeff, Any update on this patch ? As it's dealing with configure files, I would like to have it merged asap before any conflicts appear. Thanks, Clément
[PATCH] Fix negative integer range for UInteger.
Hello. We should not allow a negative IntegerRange for UInteger options. The negative ranges for the 2 params are default and does not make sense. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin gcc/ChangeLog: * opt-functions.awk: Add new sanity checking. * optc-gen.awk: Add new argument to integer_range_info. * params.opt: Update 2 params which have negative IntegerRange. --- gcc/opt-functions.awk | 4 +++- gcc/optc-gen.awk | 2 +- gcc/params.opt| 4 ++-- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/gcc/opt-functions.awk b/gcc/opt-functions.awk index be9255064d7..9bc85604066 100644 --- a/gcc/opt-functions.awk +++ b/gcc/opt-functions.awk @@ -356,7 +356,7 @@ function search_var_name(name, opt_numbers, opts, flags, n_opts) return "" } -function integer_range_info(range_option, init, option) +function integer_range_info(range_option, init, option, uinteger_used) { if (range_option != "") { ival = init + 0; @@ -364,6 +364,8 @@ function integer_range_info(range_option, init, option) end = nth_arg(1, range_option) + 0; if (init != "" && init != "-1" && (ival < start || ival > end)) print "#error initial value " init " of '" option "' must be in range [" start "," end "]" + if (uinteger_used && start < 0) + print "#error '" option"': negative IntegerRange (" start ", " end ") cannot be combined with UInteger" return start ", " end } else diff --git a/gcc/optc-gen.awk b/gcc/optc-gen.awk index 77e598efd60..ebc1a02fa36 100644 --- a/gcc/optc-gen.awk +++ b/gcc/optc-gen.awk @@ -422,7 +422,7 @@ for (i = 0; i < n_opts; i++) { cl_flags, cl_bit_fields) printf("%s, %s, %s }%s\n", var_ref(opts[i], flags[i]), var_set(flags[i]), integer_range_info(opt_args("IntegerRange", flags[i]), - opt_args("Init", flags[i]), opts[i]), comma) + opt_args("Init", flags[i]), opts[i], flag_set_p("UInteger", flags[i])), comma) # Bump up the informational option index. ++optindex diff --git a/gcc/params.opt b/gcc/params.opt index 393d52bc660..1bad2b51df0 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -398,7 +398,7 @@ Common Joined UInteger Var(param_lim_expensive) Init(20) Param Optimization The minimum cost of an expensive expression in the loop invariant motion. -param=logical-op-non-short-circuit= -Common Joined UInteger Var(param_logical_op_non_short_circuit) Init(-1) IntegerRange(-1, 1) Param +Common Joined UInteger Var(param_logical_op_non_short_circuit) Init(-1) IntegerRange(0, 1) Param True if a non-short-circuit operation is optimal. -param=loop-block-tile-size= @@ -1128,7 +1128,7 @@ Common Joined UInteger Var(param_vect_epilogues_nomask) Init(1) IntegerRange(0, Enable loop epilogue vectorization using smaller vector size. -param=vect-max-peeling-for-alignment= -Common Joined UInteger Var(param_vect_max_peeling_for_alignment) Init(-1) IntegerRange(-1, 64) Param Optimization +Common Joined UInteger Var(param_vect_max_peeling_for_alignment) Init(-1) IntegerRange(0, 64) Param Optimization Maximum number of loop peels to enhance alignment of data references in a loop. -param=vect-max-version-for-alias-checks= -- 2.33.1
[committed] MAINTAINERS: Add myself as a VAX port maintainer
* MAINTAINERS (CPU Port Maintainers): Add myself as a VAX port maintainer. --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index b22f930583a..901a6305184 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -117,6 +117,7 @@ tilegx port Walter Lee tilepro port Walter Lee v850 port Nick Clifton vax port Matt Thomas +vax port Maciej W. Rozycki visium portEric Botcazou x86-64 portJan Hubicka xstormy16 port Nick Clifton -- 2.11.0
Re: [committed] MAINTAINERS: Add myself as a VAX port maintainer
On 10/26/21 17:26, Maciej W. Rozycki wrote: * MAINTAINERS (CPU Port Maintainers): Add myself as a VAX port maintainer. Congratulations! Please remove your name from Write After Approval: $ make check -k RUNTESTFLAGS="maintainers.exp" ... Running /home/marxin/Programming/gcc/gcc/testsuite/gcc.src/maintainers.exp ... Redundant in write approval: Maciej W. Rozycki FAIL: maintainers-verify.sh Cheers, Martin --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index b22f930583a..901a6305184 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -117,6 +117,7 @@ tilegx port Walter Lee tilepro port Walter Lee v850 port Nick Clifton vax port Matt Thomas +vax port Maciej W. Rozycki visium port Eric Botcazou x86-64 port Jan Hubicka xstormy16 portNick Clifton
Re: [PATCH] gcc/configure: check for powerpc64le-unknown-freebsd
On Sun, Oct 17, 2021 at 02:28:48AM +0200, Piotr Kubaj wrote: > On 21-10-16 18:57:39, Segher Boessenkool wrote: > > On Sat, Oct 16, 2021 at 04:09:05AM +0200, Piotr Kubaj wrote: > > > Only powerpc64-unknown-freebsd was checked for. > > > > > > Signed-off-by: Piotr Kubaj > > > > Thanks! > > > > I pushed this now, with commit message (including changelog, which is > > required): > > Thank you! > > Since powerpc64le-unknown-freebsd is supported also in 9, 10 and 11, can you > also backport it? Done, for all of those. Cheers, Segher
Re: [PATCH v3 1/1] [ARM] Add support for TLS register based stack protector canary access
On Tue, Oct 26, 2021 at 10:18:36AM +0200, Ard Biesheuvel wrote: > Add support for accessing the stack canary value via the TLS register, > so that multiple threads running in the same address space can use > distinct canary values. This is intended for the Linux kernel running in > SMP mode, where processes entering the kernel are essentially threads > running the same program concurrently: using a global variable for the > canary in that context is problematic because it can never be rotated, > and so the OS is forced to use the same value as long as it remains up. > > Using the TLS register to index the stack canary helps with this, as it > allows each CPU to context switch the TLS register along with the rest > of the process, permitting each process to use its own value for the > stack canary. > > 2021-10-21 Ard Biesheuvel > > * config/arm/arm-opts.h (enum stack_protector_guard): New > * config/arm/arm-protos.h (arm_stack_protect_tls_canary_mem): > New > * config/arm/arm.c (TARGET_STACK_PROTECT_GUARD): Define > (arm_option_override_internal): Handle and put in error checks > for stack protector guard options. > (arm_option_reconfigure_globals): Likewise > (arm_stack_protect_tls_canary_mem): New > (arm_stack_protect_guard): New > * config/arm/arm.md (stack_protect_set): New > (stack_protect_set_tls): Likewise > (stack_protect_test): Likewise > (stack_protect_test_tls): Likewise > * config/arm/arm.opt (-mstack-protector-guard): New > (-mstack-protector-guard-offset): New. > > Signed-off-by: Ard Biesheuvel I can't speak to the specific implementation details here, but this builds for me, and behaves as expected. I get a working kernel[1], and have verified[2] that we have per-task canaries for arm32. :) Yay! Tested-by: Kees Cook Who's best to review and commit this? Qing, is something you're able to review? Thanks! -Kees [1] https://lore.kernel.org/linux-arm-kernel/20211021142516.1843042-1-a...@kernel.org/ [2] https://lore.kernel.org/linux-hardening/2021103826.330653-3-keesc...@chromium.org/ -- Kees Cook
Re: [PATCH v2 COMMITTED] rs6000: Fixes for tests including only
On Mon, Oct 25, 2021 at 05:32:51PM -0500, Segher Boessenkool wrote: > On Mon, Oct 25, 2021 at 03:33:21PM -0500, Paul A. Clarke wrote: > > * config/rs6000/x86intrin.h: Move some included headers to new > > headers; include new immintrin.h instead. > > s/; i/. I/ (And instead of what?) > > > * config/rs6000/immintrin.h: New. > > * config/rs6000/x86gprintrin.h: New. > > (That is a filename worse than our worst mnemonic :-) ) Not my choice. ;-) > > * config/config.gcc (powerpc-*-*): Add new headers to extra_headers. > > powerpc*-*-* > > > --- a/gcc/testsuite/gcc.target/powerpc/pr78102.c > > +++ b/gcc/testsuite/gcc.target/powerpc/pr78102.c > > @@ -1,6 +1,6 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O2 -mvsx" } */ > > -/* { dg-require-effective-target vsx_hw } */ > > +/* { dg-options "-O2 -mpower8-vector -DNO_WARN_X86_INTRINSICS" } */ > > +/* { dg-require-effective-target p8vector_hw } */ > > Please use -mcpu=power8 instead? (And -mdejagnu-cpu=power8 in > testcases). So, -mdejagnu-cpu=power8 here. > (The changelog should say you added the -D btw). OK > If you run you need *_hw. If you only compile, like here, you want to > use *_ok instead. Yep, my mistake. Fixed. > Okay for trunk with those things tuned up. Thanks! Thanks for the review! This has been committed: -- Tests which only include expect many other include files to be brought in, but not enough are. Try to increase compatibility with x86 headers by: - Create new immintrin.h, including the analogous subset of intrinsics headers available for powerpc. - Create new x86gprintrin.h, serving exclusively as the umbrella for bmiintrin.h and bmi2intrin.h. - Modify x86intrin.h: - Include new immintrin.h. - Remove mmintrin.h, xmmintrin.h, emmintrin.h, now included indirectly from immintrin.h. - Remove bmiintrin.h, bmi2intrin.h, now included indirectly from x86gprintrin.h (which is now included from immintrin.h). Add the new files to gcc/config.gcc. Also, fix up the testcase that provoked PR102719, which requires Power8 vector support. Fixes commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665. 2021-10-25 Paul A. Clarke gcc PR target/102719 * config/rs6000/x86intrin.h: Move some included headers to new headers. Include new immintrin.h instead of those headers. * config/rs6000/immintrin.h: New. * config/rs6000/x86gprintrin.h: New. * config.gcc (powerpc*-*-*): Add new headers to extra_headers. gcc/testsuite * gcc.target/powerpc/pr78102.c: Fix dg directives to require Power8 vector support. Also, add -DNO_WARN_X86_INTRINSICS. --- gcc/config.gcc | 2 +- gcc/config/rs6000/immintrin.h | 41 ++ gcc/config/rs6000/x86gprintrin.h | 31 gcc/config/rs6000/x86intrin.h | 10 +- gcc/testsuite/gcc.target/powerpc/pr78102.c | 4 +-- 5 files changed, 76 insertions(+), 12 deletions(-) create mode 100644 gcc/config/rs6000/immintrin.h create mode 100644 gcc/config/rs6000/x86gprintrin.h diff --git a/gcc/config.gcc b/gcc/config.gcc index fb1f06f3da89..efd1f42ac234 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -490,7 +490,7 @@ powerpc*-*-*) extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h" extra_headers="${extra_headers} mmintrin.h x86intrin.h" extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h" - extra_headers="${extra_headers} nmmintrin.h" + extra_headers="${extra_headers} nmmintrin.h immintrin.h x86gprintrin.h" extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h si2vmx.h" extra_headers="${extra_headers} amo.h" case x$with_cpu in diff --git a/gcc/config/rs6000/immintrin.h b/gcc/config/rs6000/immintrin.h new file mode 100644 index ..647a5ae49b5a --- /dev/null +++ b/gcc/config/rs6000/immintrin.h @@ -0,0 +1,41 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +
[PATCH] c++: CTAD within template argument [PR102933]
Here when checking for erroneous occurrences of 'auto' inside a template argument (which is allowed by the concepts TS for class templates), extract_autos_r picks up the CTAD placeholder for X{T{0}} which causes check_auto_in_tmpl_args to reject this valid template argument. This patch fixes this by making extract_autos_r ignore CTAD placeholders. However, it seems we don't need to call check_auto_in_tmpl_args at all outside of the concepts TS since using 'auto' as a type-id is otherwise rejected more generally at parse time. So this patch guards calls to check_auto_in_tmpl_args with flag_concepts_ts instead of flag_concepts. Relatedly, I think the concepts code paths in do_auto_deduction and type_uses_auto are also necessary only for the concepts TS, so this patch also restricts these code paths accordingly. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps 11? PR c++/102933 gcc/cp/ChangeLog: * parser.c (cp_parser_template_id): Call check_auto_in_tmpl_args only for the concepts TS not also for standard concepts. (cp_parser_simple_type_specifier): Adjust diagnostic for using auto in parameter declaration. * pt.c (tsubst_qualified_id): Call check_auto_in_tmpl_args only for the concepts TS not also for standard concepts. (extract_autos_r): Ignore CTAD placeholders. (extract_autos): Use range-based for. (do_auto_deduction): Use extract_autos only for the concepts TS and not also for standard concepts. (type_uses_auto): Likewise with for_each_template_parm. (check_auto_in_tmpl_args): Assert that this function is used only for the concepts tS. Simplify. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/nontype-class50.C: New test. * g++.dg/cpp2a/nontype-class50a.C: New test. --- gcc/cp/parser.c | 4 ++-- gcc/cp/pt.c | 24 +-- gcc/testsuite/g++.dg/cpp2a/nontype-class50.C | 13 ++ gcc/testsuite/g++.dg/cpp2a/nontype-class50a.C | 5 4 files changed, 32 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class50.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class50a.C diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 49d951cfb19..5052f534d40 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -18168,7 +18168,7 @@ cp_parser_template_id (cp_parser *parser, types. We reject them in functions, but if what we have is an identifier, even with none_type we can't conclude it's NOT a type, we have to wait for template substitution. */ - if (flag_concepts && check_auto_in_tmpl_args (templ, arguments)) + if (flag_concepts_ts && check_auto_in_tmpl_args (templ, arguments)) template_id = error_mark_node; /* Build a representation of the specialization. */ else if (identifier_p (templ)) @@ -19505,7 +19505,7 @@ cp_parser_simple_type_specifier (cp_parser* parser, else if (!flag_concepts) pedwarn (token->location, 0, "use of % in parameter declaration " -"only available with %<-fconcepts-ts%>"); +"only available with %<-std=c++20%> or %<-fconcepts%>"); } else type = make_auto (); diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 287cf4ce9d0..3321601e6ff 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -16456,7 +16456,7 @@ tsubst_qualified_id (tree qualified_id, tree args, we want to catch is when we couldn't tell then, and can now, namely when templ prior to substitution was an identifier. */ - if (flag_concepts && check_auto_in_tmpl_args (expr, template_args)) + if (flag_concepts_ts && check_auto_in_tmpl_args (expr, template_args)) return error_mark_node; if (variable_template_p (expr)) @@ -28560,7 +28560,7 @@ static int extract_autos_r (tree t, void *data) { hash_table &hash = *(hash_table*)data; - if (is_auto (t)) + if (is_auto (t) && !template_placeholder_p (t)) { /* All the autos were built with index 0; fix that up now. */ tree *p = hash.find_slot (t, INSERT); @@ -28594,10 +28594,8 @@ extract_autos (tree type) for_each_template_parm (type, extract_autos_r, &hash, &visited, true); tree tree_vec = make_tree_vec (hash.elements()); - for (hash_table::iterator iter = hash.begin(); - iter != hash.end(); ++iter) + for (tree elt : hash) { - tree elt = *iter; unsigned i = TEMPLATE_PARM_IDX (TEMPLATE_TYPE_PARM_INDEX (elt)); TREE_VEC_ELT (tree_vec, i) = build_tree_list (NULL_TREE, TYPE_NAME (elt)); @@ -29837,7 +29835,7 @@ do_auto_deduction (tree type, tree init, tree auto_node, tree parms = build_tree_list (NULL_TREE, type); tree tparms; - if (flag_concepts) + if (flag_concepts_ts) tparms = extract_autos (type); else { @@
[PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]
In the testcase below the two left fold expressions each expand into a logical expression with 1024 terms, for which potential_const_expr_1 takes more than a minute to return true. This is because p_c_e_1 performs trial evaluation of the first operand of a &&/|| in order to determine whether to consider its second operand. And because the expanded expression is left-associated, this trial evaluation causes p_c_e_1 to be quadratic in the number of terms of the expression. This patch fixes this quadratic behavior by making p_c_e_1 recurse only into the first operand of a &&/|| when checking for potentiality. This means we'll consider more non-constant expressions as potentially constant compared to the trial evaluation approach, but that should be harmless as later constexpr evaluation of the expression will deem it non-constant anyway. As a nice bonus, this change also reduces compile time for the libstdc++ testcase 20_util/variant/87619.cc by about 15%, even though our uses right folds instead of left folds. Likewise for the testcase in the PR, for which compile time is reduced by 30%. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? PR c++/102780 gcc/cp/ChangeLog: * constexpr.c (potential_constant_expression_1) : Only recurse into the first operand, and ignore the second. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/fold13.C: New test. --- gcc/cp/constexpr.c | 21 +++-- gcc/testsuite/g++.dg/cpp1z/fold13.C | 29 + 2 files changed, 32 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp1z/fold13.C diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 6f83d303cdd..9fd2c403afb 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -8880,28 +8880,13 @@ potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool now, goto binary; } - /* If the first operand is the non-short-circuit constant, look at -the second operand; otherwise we only care about the first one for -potentiality. */ case TRUTH_AND_EXPR: case TRUTH_ANDIF_EXPR: - tmp = boolean_true_node; - goto truth; case TRUTH_OR_EXPR: case TRUTH_ORIF_EXPR: - tmp = boolean_false_node; -truth: - { - tree op = TREE_OPERAND (t, 0); - if (!RECUR (op, rval)) - return false; - if (!processing_template_decl) - op = cxx_eval_outermost_constant_expr (op, true); - if (tree_int_cst_equal (op, tmp)) - return RECUR (TREE_OPERAND (t, 1), rval); - else - return true; - } + /* Only look at the first operand for potentiality since the second +operand may be irrelevant if the first short circuits. */ + return RECUR (TREE_OPERAND (t, 0), rval); case PLUS_EXPR: case MULT_EXPR: diff --git a/gcc/testsuite/g++.dg/cpp1z/fold13.C b/gcc/testsuite/g++.dg/cpp1z/fold13.C new file mode 100644 index 000..b2b8c954be9 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/fold13.C @@ -0,0 +1,29 @@ +// { dg-do compile { target c++17 } } +// Verify constexpr evaluation of a large left fold logical expression +// isn't quadratic in the size of the expanded expression. + +template struct S { static constexpr bool value = true; }; + +template struct integer_sequence { }; + +template + using make_integer_sequence +#if __has_builtin(__make_integer_seq) += __make_integer_seq; +#else += integer_sequence; +#endif + +template +constexpr bool f_impl(integer_sequence) { + return (... && S::value); +} + +static_assert(f_impl(make_integer_sequence())); + +template +constexpr bool g_impl(integer_sequence) { + return (... || !S::value); +} + +static_assert(!g_impl(make_integer_sequence())); -- 2.33.1.711.g9d530dc002
Re: [committed] MAINTAINERS: Add myself as a VAX port maintainer
On Tue, 26 Oct 2021, Martin Liška wrote: > Please remove your name from Write After Approval: > > $ make check -k RUNTESTFLAGS="maintainers.exp" > ... > Running /home/marxin/Programming/gcc/gcc/testsuite/gcc.src/maintainers.exp ... > Redundant in write approval: Maciej W. Rozycki > FAIL: maintainers-verify.sh Hmm, that seems like useless policy to me; the "Write After Approval" section used to be an exhaustive, alphabetically sorted list of people with commit rights, and it still is with our sister projects binutils and GDB. By removing entries for people listed elsewhere (which may be across several places anyway) you require one to look for information scattered around the file. And contributors to remember adding themselves back as they step down from maintainer posts. Besides, my e-mail addresses listed are different in the two sections, and that's been deliberate. Also what about people carrying identical full names? I insist on using my middle initial for a reason. It seems like there's been hardly any discussion about this matter around the time this stuff was added with commit bddcac9d1c32 ("[contrib] Add contrib/maintainers-verify.sh"). What was the actual motivation behind that change? Maciej
Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]
On Tue, Oct 26, 2021 at 01:44:18PM -0400, Patrick Palka via Gcc-patches wrote: > In the testcase below the two left fold expressions each expand into a > logical expression with 1024 terms, for which potential_const_expr_1 > takes more than a minute to return true. This is because p_c_e_1 > performs trial evaluation of the first operand of a &&/|| in order to > determine whether to consider its second operand. And because the > expanded expression is left-associated, this trial evaluation causes > p_c_e_1 to be quadratic in the number of terms of the expression. > > This patch fixes this quadratic behavior by making p_c_e_1 recurse only > into the first operand of a &&/|| when checking for potentiality. This > means we'll consider more non-constant expressions as potentially > constant compared to the trial evaluation approach, but that should be > harmless as later constexpr evaluation of the expression will deem it > non-constant anyway. For very large && or || expressions (but not mixed or with ! etc.) another possibility would be to check if the lhs or rhs have the same code and if so, linearize the trees and recurse only on the leafs (trees with other TREE_CODE) and stop when first expression isn't equal to tmp. For mixed &&/|| or with ! that could still be expensive though. Jakub
[PATCH] powerpc: Remove LINK_OS_EXTRA_SPEC{32, 64} from --with-advance-toolchain
Historically this was added to fill gaps from ld.so.cache on early AT releases. This now are just causing errors and rework. Since AT5.0 the AT's ld.so is using a correctly configured ld.so.cache and sets the DT_INTERP to AT's ld.so. This two factors are sufficient for an AT builded program to get the correct libraries. GCC congured with --with-advance-toolchain has issues building GlibC releases because it adds DT_RUNPATH to ld.so and that's unsupported. 2021-03-11 Lucas A. M. Magalhães gcc/ * config.gcc (powerpc*-*-*): Remove -rpath from --with-advance-toochain --- gcc/config.gcc | 10 -- 1 file changed, 10 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index fb1f06f3da8..9eba3ece0a9 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -5088,16 +5088,6 @@ case "${target}" in (at="/opt/$with_advance_toolchain" echo "/* Use Advance Toolchain $at */" echo -echo "#undef LINK_OS_EXTRA_SPEC32" -echo "#define LINK_OS_EXTRA_SPEC32" \ - "\"%(link_os_new_dtags)" \ - "-rpath $prefix/lib -rpath $at/lib\"" -echo -echo "#undef LINK_OS_EXTRA_SPEC64" -echo "#define LINK_OS_EXTRA_SPEC64" \ - "\"%(link_os_new_dtags)" \ - "-rpath $prefix/lib64 -rpath $at/lib64\"" -echo echo "#undef LINK_OS_NEW_DTAGS_SPEC" echo "#define LINK_OS_NEW_DTAGS_SPEC" \ "\"--enable-new-dtags\"" -- 2.31.1
Re: [committed] MAINTAINERS: Add myself as a VAX port maintainer
On Tue, Oct 26, 2021 at 06:51:09PM +0100, Maciej W. Rozycki wrote: > On Tue, 26 Oct 2021, Martin Liška wrote: > > > Please remove your name from Write After Approval: > > > > $ make check -k RUNTESTFLAGS="maintainers.exp" > > ... > > Running /home/marxin/Programming/gcc/gcc/testsuite/gcc.src/maintainers.exp > > ... > > Redundant in write approval: Maciej W. Rozycki > > FAIL: maintainers-verify.sh > > Hmm, that seems like useless policy to me; the "Write After Approval" > section used to be an exhaustive, alphabetically sorted list of people > with commit rights, and it still is with our sister projects binutils and > GDB. By removing entries for people listed elsewhere (which may be across > several places anyway) you require one to look for information scattered > around the file. And contributors to remember adding themselves back as > they step down from maintainer posts. > > Besides, my e-mail addresses listed are different in the two sections, > and that's been deliberate. Also what about people carrying identical > full names? I insist on using my middle initial for a reason. > > It seems like there's been hardly any discussion about this matter around > the time this stuff was added with commit bddcac9d1c32 ("[contrib] Add > contrib/maintainers-verify.sh"). What was the actual motivation behind > that change? That was only addition of a script and testcase to verify what has been done in MAINTAINERS since forever. Just look at all the commits to remove redundant entries from Write After Approval, e.g. https://gcc.gnu.org/legacy-ml/gcc-patches/2003-05/msg00366.html All maintainers or reviewers (global or specific) have write after approval rights for areas they don't maintain. Jakub
[committed] [PR102842] LRA: Consider all outputs in generation of matching reloads
The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102842 As the patch touches a sensitive LRA code, the patch was bootstrapped tested on x86-64, aarch64, and ppc64. I've committed the patch only in master branch. Later (after some observation), I'll commit it into gcc-10 and gcc-11 branches. commit 8c59f4118357789cfa8df2cf0d3ecb61be7e9041 Author: Vladimir N. Makarov Date: Tue Oct 26 14:03:42 2021 -0400 [PR102842] Consider all outputs in generation of matching reloads Without considering all output insn operands (not only processed before), in rare cases LRA can use the same hard register for different outputs of the insn on different assignment subpasses. The patch fixes the problem. gcc/ChangeLog: PR rtl-optimization/102842 * lra-constraints.c (match_reload): Ignore out in checking values of outs. (curr_insn_transform): Collect outputs before doing reloads of operands. gcc/testsuite/ChangeLog: PR rtl-optimization/102842 * g++.target/arm/pr102842.C: New test. diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index 8f75125fc2e..0195b4fb9c3 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -1102,7 +1102,7 @@ match_reload (signed char out, signed char *ins, signed char *outs, for (i = 0; outs[i] >= 0; i++) { rtx other_out_rtx = *curr_id->operand_loc[outs[i]]; - if (REG_P (other_out_rtx) + if (outs[i] != out && REG_P (other_out_rtx) && (regno_val_use_in (REGNO (in_rtx), other_out_rtx) != NULL_RTX)) { @@ -4382,7 +4382,10 @@ curr_insn_transform (bool check_only_p) } n_outputs = 0; - outputs[0] = -1; + for (i = 0; i < n_operands; i++) +if (curr_static_id->operand[i].type == OP_OUT) + outputs[n_outputs++] = i; + outputs[n_outputs] = -1; for (i = 0; i < n_operands; i++) { int regno; @@ -4457,8 +4460,6 @@ curr_insn_transform (bool check_only_p) lra-lives.c. */ match_reload (i, goal_alt_matched[i], outputs, goal_alt[i], &before, &after, TRUE); - outputs[n_outputs++] = i; - outputs[n_outputs] = -1; } continue; } @@ -4636,14 +4637,6 @@ curr_insn_transform (bool check_only_p) process_alt_operands decides that it is possible. */ gcc_unreachable (); - /* Memorise processed outputs so that output remaining to be processed - can avoid using the same register value (see match_reload). */ - if (curr_static_id->operand[i].type == OP_OUT) - { - outputs[n_outputs++] = i; - outputs[n_outputs] = -1; - } - if (optional_p) { rtx reg = op; diff --git a/gcc/testsuite/g++.target/arm/pr102842.C b/gcc/testsuite/g++.target/arm/pr102842.C new file mode 100644 index 000..a2bac66091a --- /dev/null +++ b/gcc/testsuite/g++.target/arm/pr102842.C @@ -0,0 +1,30 @@ +/* PR rtl-optimization/102842 */ +/* { dg-do compile } */ +/* { dg-options "-fPIC -O2 -fno-omit-frame-pointer -mthumb -march=armv7-a+fp" } */ + +struct Plane { + using T = float; + T *Row(); +}; +using ImageF = Plane; +long long Mirror_x; +struct EnsurePaddingInPlaceRowByRow { + void Process() { +switch (strategy_) { +case kSlow: + float *row = img_.Row(); + long long xsize = x1_; + while (Mirror_x >= xsize) +if (Mirror_x) + Mirror_x = 2 * xsize - 1; + *row = Mirror_x; +} + } + ImageF img_; + unsigned x1_; + enum { kSlow } strategy_; +}; +void FinalizeImageRect() { + EnsurePaddingInPlaceRowByRow ensure_padding; + ensure_padding.Process(); +}
Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]
On Tue, Oct 26, 2021 at 2:46 PM Jakub Jelinek via Gcc-patches < gcc-patches@gcc.gnu.org> wrote: > On Tue, Oct 26, 2021 at 01:44:18PM -0400, Patrick Palka via Gcc-patches > wrote: > > In the testcase below the two left fold expressions each expand into a > > logical expression with 1024 terms, for which potential_const_expr_1 > > takes more than a minute to return true. This is because p_c_e_1 > > performs trial evaluation of the first operand of a &&/|| in order to > > determine whether to consider its second operand. And because the > > expanded expression is left-associated, this trial evaluation causes > > p_c_e_1 to be quadratic in the number of terms of the expression. > > > > This patch fixes this quadratic behavior by making p_c_e_1 recurse only > > into the first operand of a &&/|| when checking for potentiality. This > > means we'll consider more non-constant expressions as potentially > > constant compared to the trial evaluation approach, but that should be > > harmless as later constexpr evaluation of the expression will deem it > > non-constant anyway. > Fair, especially now that it seems that C++23 is removing the diagnostic for a constexpr function that can't produce a constant expression. And as more functions become constexpr, the trial evaluation gets more expensive to get to the point of failure. I wonder if we want to stop calling cxx_eval_outermost_constant_expression from pot_c_e_1 entirely, only check whether the operand is already constant. For very large && or || expressions (but not mixed or with ! etc.) another > possibility would be to check if the lhs or rhs have the same code and if > so, linearize the trees and recurse only on the leafs (trees with other > TREE_CODE) and stop when first expression isn't equal to tmp. > This would be good if we wanted to keep the evaluation. Jason
Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values
On Tue, 26 Oct 2021 at 14:36, Jakub Jelinek wrote: > On Tue, Oct 26, 2021 at 03:21:55PM +0200, Richard Biener wrote: > > On Tue, 26 Oct 2021, Jakub Jelinek wrote: > > > > > On Tue, Oct 26, 2021 at 03:13:29PM +0200, Richard Biener wrote: > > > > try > > > > auto c = ...; > > > > signed char c2 = c; > > > > return c2 >= ... > > > > then > > > > > > That won't work, at least when using , which is what we with > the > > > optimization want to deal with primarily. > > > Because std::partial_ordering etc. aren't implicitly nor explicitly > > > convertible to int or signed char etc. > > > Sure, one could in the testcase define its own std::strong_ordering > etc. > > > and define a conversion operator for it... > > > > So how do we end up with the signed char case in the first place? > > Is the frontend using a type that's target dependent? > > uses explicitly signed char: > namespace std > { > // [cmp.categories], comparison category types > namespace __cmp_cat > { > using type = signed char; > enum class _Ord : type { equivalent = 0, less = -1, greater = 1 }; > enum class _Ncmp : type { _Unordered = 2 }; > ... > and __cmp_cat::type is what is used as type of _M_value of std::*_ordering > -fsigned-char vs. -funsigned-char make no difference on the testcases on > x86, but as mentioned in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94589#c24 > some target decisions like load_extend_op uses in fold-const.c can affect > it. See https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570714.html > > We can change __cmp_cat::type if that would result in better code. I picked signed char because we only need two bits, and preferably have a signed type as it simplifies some things. Would int make more sense? Or int:2 ?
Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values
On Tue, Oct 26, 2021 at 08:35:40PM +0100, Jonathan Wakely wrote: > We can change __cmp_cat::type if that would result in better code. I picked > signed char because we only need two bits, and preferably have a signed > type as it simplifies some things. Would int make more sense? Or int:2 ? I think signed char is fine (and changing it would be an ABI change), int is unnecessarily large and int:2 would be certainly slower (and harder). Jakub
[PATCH] PR fortran/102956 - PDT KIND and LEN type parameters are mutually exclusive
Dear Fortranners, we were missing a conflict check for PDT KIND and LEN type parameters: F2018: 7.5.3.1 Type parameter definition statement R732 type-param-def-stmt is integer-type-spec, type-param-attr-spec :: type-param-decl-list R734 type-param-attr-spec is KIND or LEN (3) The type-param-attr-spec explicitly specifies whether a type parameter is a kind parameter or a length parameter. Thus the KIND and LEN attributes are mutually exclusive. The attached trivial patch remedies that. Regtested on x86_64-pc-linux-gnu. OK for mainline? Thanks, Harald Fortran: [PDT] KIND and LEN type parameters are mutually exclusive gcc/fortran/ChangeLog: PR fortran/102956 * symbol.c (gfc_check_conflict): Add conflict check for PDT KIND and LEN type parameters. gcc/testsuite/ChangeLog: PR fortran/102956 * gfortran.dg/pdt_32.f03: New test. diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c index 2c4acd5abe1..c77f3f84962 100644 --- a/gcc/fortran/symbol.c +++ b/gcc/fortran/symbol.c @@ -720,6 +720,7 @@ gfc_check_conflict (symbol_attribute *attr, const char *name, locus *where) conf (pdt_len, pointer) conf (pdt_len, dimension) conf (pdt_len, codimension) + conf (pdt_len, pdt_kind) if (attr->access == ACCESS_PRIVATE) { diff --git a/gcc/testsuite/gfortran.dg/pdt_32.f03 b/gcc/testsuite/gfortran.dg/pdt_32.f03 new file mode 100644 index 000..f8d40410828 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pdt_32.f03 @@ -0,0 +1,17 @@ +! { dg-do compile } +! PR fortran/102956 +! PDT KIND and LEN type parameters are mutually exclusive (F2018:R734) +! +module m + type :: good_pdt (k,l) + integer, kind :: k = 1 + integer, len:: l = 1 + character(kind=k,len=l) :: c + end type good_pdt + + type :: bad_pdt (k,l) ! { dg-error "does not have a component" } + integer, kind, len :: k = 1 ! { dg-error "attribute conflicts with" } + integer, len, kind :: l = 1 ! { dg-error "attribute conflicts with" } + character(kind=k,len=l) :: c ! { dg-error "has not been declared" } + end type bad_pdt +end
Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values
On Tue, 26 Oct 2021 at 20:40, Jakub Jelinek wrote: > On Tue, Oct 26, 2021 at 08:35:40PM +0100, Jonathan Wakely wrote: > > We can change __cmp_cat::type if that would result in better code. I > picked > > signed char because we only need two bits, and preferably have a signed > > type as it simplifies some things. Would int make more sense? Or int:2 ? > > I think signed char is fine (and changing it would be an ABI change), We haven't committed to a C++20 ABI yet, so changes are possible. If we don't need to change, that's obviously preferable. > int is > unnecessarily large and int:2 would be certainly slower (and harder). > OK, signed char it is then.
Re: [PATCH] PR fortran/102956 - PDT KIND and LEN type parameters are mutually exclusive
On 26.10.21 21:40, Harald Anlauf via Fortran wrote: we were missing a conflict check for PDT KIND and LEN type parameters: F2018: 7.5.3.1 Type parameter definition statement R732 type-param-def-stmt is integer-type-spec, type-param-attr-spec :: type-param-decl-list R734 type-param-attr-spec is KIND or LEN (3) The type-param-attr-spec explicitly specifies whether a type parameter is a kind parameter or a length parameter. Thus the KIND and LEN attributes are mutually exclusive. The attached trivial patch remedies that. Regtested on x86_64-pc-linux-gnu. OK for mainline? LGTM, thanks. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PING PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics
Patches 1/3 and 3/3 have been committed. This is only a ping for 2/3. On Mon, Oct 18, 2021 at 08:15:11PM -0500, Paul A. Clarke via Gcc-patches wrote: > Suppress exceptions (when specified), by saving, manipulating, and > restoring the FPSCR. Similarly, save, set, and restore the floating-point > rounding mode when required. > > No attempt is made to optimize writing the FPSCR (by checking if the new > value would be the same), other than using lighter weight instructions > when possible. Note that explicit instruction scheduling "barriers" are > added to prevent floating-point computations from being moved before or > after the explicit FPSCR manipulations. (That these are required has > been reported as an issue in GCC: PR102783.) > > The scalar versions naively use the parallel versions to compute the > single scalar result and then construct the remainder of the result. > > Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO > are swapped from the corresponding values on x86 so as to match the > corresponding rounding mode values in the Power ISA. > > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and > convert _mm_ceil* and _mm_floor* into macros. This matches the current > analogous implementations in config/i386/smmintrin.h. > > Function signatures match the analogous functions in config/i386/smmintrin.h. > > Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss, > modeled after the very similar "floor" and "ceil" tests. > > Include basic tests, plus tests at the boundaries for floating-point > representation, positive and negative, test all of the parameterized > rounding modes as well as the C99 rounding modes and interactions > between the two. > > Exceptions are not explicitly tested. > > 2021-10-18 Paul A. Clarke > > gcc > * config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps, > _mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT, > _MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF, > _MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC, > _MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC, > _MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New. > * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, > _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss): > Convert from function to macro. > > gcc/testsuite > * gcc.target/powerpc/sse4_1-round3.h: New. > * gcc.target/powerpc/sse4_1-roundpd.c: New. > * gcc.target/powerpc/sse4_1-roundps.c: New. > * gcc.target/powerpc/sse4_1-roundsd.c: New. > * gcc.target/powerpc/sse4_1-roundss.c: New. > --- > gcc/config/rs6000/smmintrin.h | 292 ++ > .../gcc.target/powerpc/sse4_1-round3.h| 81 + > .../gcc.target/powerpc/sse4_1-roundpd.c | 143 + > .../gcc.target/powerpc/sse4_1-roundps.c | 98 ++ > .../gcc.target/powerpc/sse4_1-roundsd.c | 256 +++ > .../gcc.target/powerpc/sse4_1-roundss.c | 208 + > 6 files changed, 1014 insertions(+), 64 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h > create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c > > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h > index 90ce03d22709..6bb03e6e20ac 100644 > --- a/gcc/config/rs6000/smmintrin.h > +++ b/gcc/config/rs6000/smmintrin.h > @@ -42,6 +42,234 @@ > #include > #include > > +/* Rounding mode macros. */ > +#define _MM_FROUND_TO_NEAREST_INT 0x00 > +#define _MM_FROUND_TO_ZERO 0x01 > +#define _MM_FROUND_TO_POS_INF 0x02 > +#define _MM_FROUND_TO_NEG_INF 0x03 > +#define _MM_FROUND_CUR_DIRECTION0x04 > + > +#define _MM_FROUND_NINT \ > + (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC) > +#define _MM_FROUND_FLOOR \ > + (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC) > +#define _MM_FROUND_CEIL \ > + (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC) > +#define _MM_FROUND_TRUNC \ > + (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC) > +#define _MM_FROUND_RINT \ > + (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC) > +#define _MM_FROUND_NEARBYINT \ > + (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC) > + > +#define _MM_FROUND_RAISE_EXC0x00 > +#define _MM_FROUND_NO_EXC 0x08 > + > +extern __inline __m128d > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_round_pd (__m128d __A, int __rounding) > +{ > + __v2df __r; > + union { > +double __fr; > +long long __fpscr; > + } __enables_save, __fpscr_save; > + > + if (__rounding & _
Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]
On Tue, Oct 26, 2021 at 03:29:02PM -0400, Jason Merrill wrote: > On Tue, Oct 26, 2021 at 2:46 PM Jakub Jelinek via Gcc-patches < > gcc-patches@gcc.gnu.org> wrote: > > > On Tue, Oct 26, 2021 at 01:44:18PM -0400, Patrick Palka via Gcc-patches > > wrote: > > > In the testcase below the two left fold expressions each expand into a > > > logical expression with 1024 terms, for which potential_const_expr_1 > > > takes more than a minute to return true. This is because p_c_e_1 > > > performs trial evaluation of the first operand of a &&/|| in order to > > > determine whether to consider its second operand. And because the > > > expanded expression is left-associated, this trial evaluation causes > > > p_c_e_1 to be quadratic in the number of terms of the expression. > > > > > > This patch fixes this quadratic behavior by making p_c_e_1 recurse only > > > into the first operand of a &&/|| when checking for potentiality. This > > > means we'll consider more non-constant expressions as potentially > > > constant compared to the trial evaluation approach, but that should be > > > harmless as later constexpr evaluation of the expression will deem it > > > non-constant anyway. > > > > Fair, especially now that it seems that C++23 is removing the diagnostic > for a constexpr function that can't produce a constant expression. And as > more functions become constexpr, the trial evaluation gets more expensive > to get to the point of failure. I wonder if we want to stop calling > cxx_eval_outermost_constant_expression from pot_c_e_1 entirely, only check > whether the operand is already constant. That would make p_c_e_1 surely much faster, but on the other side we would stop diagnosing some functions that can't be valid constant expressions no matter with what arguments they are called. Another option would be to do these cxx_eval_outermost_constant_expression calls from p_c_e_1 only in a cheap mode, i.e. with constexpr_ops_limit temporarily lowered to something very small, 100-ish or 1000-ish or so, so it would handle small expressions like before, but would quietly punt on large expressions. Jakub
[PATCH, committed] PR fortran/86551 - ICE on invalid code with select type
Dear Fortranners, as has been validated by others before and checked again, the underlying issue of this PR has been fixed before by an unknown commit. To ensure that it doesn't pop up again, and as suggested in the PR, I've packaged the testcase and committed as obvious. Thanks, Harald commit 0ec53a3df536f83ec72ef25b045768c06c363f86 Author: Harald Anlauf Date: Tue Oct 26 22:22:36 2021 +0200 Fortran: error recovery on invalid code with SELECT TYPE gcc/testsuite/ChangeLog: PR fortran/86551 * gfortran.dg/pr86551.f90: New test to verify that PR86551 remains fixed. diff --git a/gcc/testsuite/gfortran.dg/pr86551.f90 b/gcc/testsuite/gfortran.dg/pr86551.f90 new file mode 100644 index 000..d96e17a1884 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr86551.f90 @@ -0,0 +1,12 @@ +! { dg-do compile } +! PR fortran/86551 - ICE on invalid code with select type / end select type + +subroutine b + type :: t1 + end type t1 + class(t1) :: c2 + select type (d => c2) + end select type ! { dg-error "Syntax error" } +end ! { dg-error "END SELECT statement expected" } + +! { dg-prune-output "Unexpected end of file" }
Re: [committed] MAINTAINERS: Add myself as a VAX port maintainer
On 10/26/2021 1:06 PM, Jakub Jelinek via Gcc-patches wrote: On Tue, Oct 26, 2021 at 06:51:09PM +0100, Maciej W. Rozycki wrote: On Tue, 26 Oct 2021, Martin Liška wrote: Please remove your name from Write After Approval: $ make check -k RUNTESTFLAGS="maintainers.exp" ... Running /home/marxin/Programming/gcc/gcc/testsuite/gcc.src/maintainers.exp ... Redundant in write approval: Maciej W. Rozycki FAIL: maintainers-verify.sh Hmm, that seems like useless policy to me; the "Write After Approval" section used to be an exhaustive, alphabetically sorted list of people with commit rights, and it still is with our sister projects binutils and GDB. By removing entries for people listed elsewhere (which may be across several places anyway) you require one to look for information scattered around the file. And contributors to remember adding themselves back as they step down from maintainer posts. Besides, my e-mail addresses listed are different in the two sections, and that's been deliberate. Also what about people carrying identical full names? I insist on using my middle initial for a reason. It seems like there's been hardly any discussion about this matter around the time this stuff was added with commit bddcac9d1c32 ("[contrib] Add contrib/maintainers-verify.sh"). What was the actual motivation behind that change? That was only addition of a script and testcase to verify what has been done in MAINTAINERS since forever. Just look at all the commits to remove redundant entries from Write After Approval, e.g. https://gcc.gnu.org/legacy-ml/gcc-patches/2003-05/msg00366.html All maintainers or reviewers (global or specific) have write after approval rights for areas they don't maintain. I went ahead and fixed Maciej's entries in the obvious way. jeff
Re: [PATCH, v2] c++: Diagnose taking address of an immediate member function [PR102753]
On 10/21/21 07:17, Jakub Jelinek wrote: On Wed, Oct 20, 2021 at 07:16:44PM -0400, Jason Merrill wrote: or an unevaluated operand, or a subexpression of an immediate invocation. Hmm...that suggests that in consteval23.C, bar(foo) should also be OK, Ouch. That opens a big can of worms, see below. because 'foo' is a subexpression of an immediate invocation. We can handle that by... + +bool +in_immediate_context () +{ + return (cp_unevaluated_operand != 0 + || (current_function_decl != NULL_TREE + && DECL_IMMEDIATE_FUNCTION_P (current_function_decl)) + || (current_binding_level->kind == sk_function_parms + && current_binding_level->immediate_fn_ctx_p) + || in_consteval_if_p); +} + /* Return true if a call to FN with number of arguments NARGS is an immediate invocation. */ @@ -9451,6 +9459,12 @@ build_over_call (struct z_candidate *can } /* Default arguments */ + bool save_in_consteval_if_p = in_consteval_if_p; + /* If the call is immediate function invocation, make sure + taking address of immediate functions is allowed in default + arguments. */ + if (immediate_invocation_p (STRIP_TEMPLATE (fn), nargs)) +in_consteval_if_p = true; ...moving this earlier in build_over_call, e.g. shortly after not_really_used: You can also use make_temp_override. make_temp_override can't be used, because in_consteval_if_p is BOOL_BITFIELD under the hood and binding a reference to a bit-field is not valid. In the patch I've used a specialized in_consteval_if_p_temp_override class instead. And it either needs to be restored after all the argument processing, because we call immediate_invocation_p again and because of the in_consteval_if_p override it is then false (the patch calls icip.reset ()), or we'd need to remember the result of the first immediate_invocation_p call in some bool temporary and reuse that later. Updated patch below. This fixes the static_assert (bar (foo) == 42); case newly added to consteval23.C, but unfortunately that isn't enough. Consider incremental patch to the testcase: --- gcc/testsuite/g++.dg/cpp2a/consteval23.C.jj 2021-10-21 11:32:48.570586881 +0200 +++ gcc/testsuite/g++.dg/cpp2a/consteval23.C2021-10-21 12:47:42.281769906 +0200 @@ -2,6 +2,7 @@ // { dg-do compile { target c++20 } } consteval int foo () { return 42; } +constexpr auto baz (int (*fn) ()) { return fn; } consteval int bar (int (*fn) () = foo) @@ -11,3 +12,5 @@ bar (int (*fn) () = foo) static_assert (bar () == 42); static_assert (bar (foo) == 42); +static_assert (bar (&foo) == 42); +static_assert (bar (baz (&foo)) == 42); If even the last two lines are supposed to be valid, then I'm afraid we need to reconsider the errors performed in cp_build_addr_expr_1, because when that is called, we often don't know if we are in a subexpression of immediate invocation or not, whether it is because we are in template, something non-type dependent but other argumeents of the function call are type dependent, or because it is an overloaded function and some overloads are consteval and others are not and we are deep in a middle of some argument parsing and other arguments haven't been parsed at all yet, etc. I'm afraid I don't have a good idea where to move that diagnostic to though, it would need to be done somewhere where we are certain we aren't in a subexpression of immediate invocation. Given statement expressions, even diagnostics after parsing whole statements might not be good enough, e.g. void qux () { static_assert (bar (({ constexpr auto a = 1; foo; })) == 42); } I suppose (a wrapper for) fold_build_cleanup_point_expr would be a possible place to check, since that's called for full-expressions. But if we e.g. diagnose it somewhere after parsing the whole function (and for templates only after instantiating it) plus where we handle non-automatic variable initializers etc., will we handle all spots where we should diagnose it? It better should be done before cp_fold... In whatever spot is right doing something similar to what cp_walk_tree find_immediate_fndecl, does, except probably error out on each case we see (with the right locus) instead of just finding the first match. Note, trying clang++ on godbolt on the consteval23.C testcase with the above ammendments, it diagnoses the = foo default argument, that seems to be a clang bug to me, but if I comment out the static_assert (bar () == 42); test and comment out the default argument bar (int (*fn) () /* = foo */) then it succeeds. It even accepts the above qux with statement expression. This patch is OK. 2021-10-21 Jakub Jelinek PR c++/102753 * cp-tree.h (saved_scope): Document that consteval_if_p member is also set while processing immediate invocation. (in_immediate_context): Declare. * call.c (in_immediate_context): New function. (immediate_invocation_p): Use it. (struct in_cons
Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]
On Tue, 26 Oct 2021, Jason Merrill wrote: > On Tue, Oct 26, 2021 at 2:46 PM Jakub Jelinek via Gcc-patches > wrote: > On Tue, Oct 26, 2021 at 01:44:18PM -0400, Patrick Palka via Gcc-patches > wrote: > > In the testcase below the two left fold expressions each expand into a > > logical expression with 1024 terms, for which potential_const_expr_1 > > takes more than a minute to return true. This is because p_c_e_1 > > performs trial evaluation of the first operand of a &&/|| in order to > > determine whether to consider its second operand. And because the > > expanded expression is left-associated, this trial evaluation causes > > p_c_e_1 to be quadratic in the number of terms of the expression. > > > > This patch fixes this quadratic behavior by making p_c_e_1 recurse > only > > into the first operand of a &&/|| when checking for potentiality. > This > > means we'll consider more non-constant expressions as potentially > > constant compared to the trial evaluation approach, but that should be > > harmless as later constexpr evaluation of the expression will deem it > > non-constant anyway. > > > Fair, especially now that it seems that C++23 is removing the diagnostic for > a constexpr function that can't produce a constant expression. And as more > functions become constexpr, the trial evaluation gets > more expensive to get to the point of failure. I wonder if we want to stop > calling cxx_eval_outermost_constant_expression from pot_c_e_1 entirely, only > check whether the operand is already constant. The performance impact of the other calls to cxx_eval_outermost_const_expr from p_c_e_1 is probably already mostly mitigated by the constexpr call cache and the fact that we try to evaluate all calls to constexpr functions during cp_fold_function anyway (at least with -O). So trial evaluation of e.g. the condition of a for/while loop that contains an expensive constexpr call shouldn't cause the compiler to perform more work overall IIUC. > > For very large && or || expressions (but not mixed or with ! etc.) > another > possibility would be to check if the lhs or rhs have the same code and > if > so, linearize the trees and recurse only on the leafs (trees with other > TREE_CODE) and stop when first expression isn't equal to tmp. > > > This would be good if we wanted to keep the evaluation. > > Jason > >
Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]
On Tue, Oct 26, 2021 at 05:07:43PM -0400, Patrick Palka wrote: > The performance impact of the other calls to cxx_eval_outermost_const_expr > from p_c_e_1 is probably already mostly mitigated by the constexpr call > cache and the fact that we try to evaluate all calls to constexpr > functions during cp_fold_function anyway (at least with -O). So trial constexpr function bodies don't go through cp_fold_function (intentionally, so that we don't optimize away UB), the bodies are copied before the trees of the normal copy are folded. Jakub
[PATCH] gcc/Makefile.in: fix bug in gengtype link rule
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: * Makefile.in: Fix syntax for reference to LIBDEPS in gengtype link rule. Signed-off-by: David Malcolm --- gcc/Makefile.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 658093c11c0..842e57dff87 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -2964,7 +2964,7 @@ build/gengtype$(build_exeext) : build/gengtype-lex.o build/gengtype-parse.o \ gengtype$(exeext) : gengtype.o gengtype-lex.o gengtype-parse.o \ gengtype-state.o errors.o $(LIBDEPS) +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \ - $(filter-out ($LIBDEPS), $^) $(LIBS) + $(filter-out $(LIBDEPS), $^) $(LIBS) # Rule for the generator programs: $(genprog:%=build/gen%$(build_exeext)): build/gen%$(build_exeext): build/gen%.o $(BUILD_LIBDEPS) -- 2.26.3
[PATCH] assert_streq: add newlines to failure message
Adding newlines so that the two strings line up makes string equality failures considerably easier to read. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: * selftest.c (assert_streq): Add newlines when emitting non-equal non-NULL strings. Signed-off-by: David Malcolm --- gcc/selftest.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/selftest.c b/gcc/selftest.c index 8f1cde0cc19..0db56f3b240 100644 --- a/gcc/selftest.c +++ b/gcc/selftest.c @@ -89,7 +89,7 @@ assert_streq (const location &loc, if (strcmp (val1, val2) == 0) pass (loc, "ASSERT_STREQ"); else - fail_formatted (loc, "ASSERT_STREQ (%s, %s) val1=\"%s\" val2=\"%s\"", + fail_formatted (loc, "ASSERT_STREQ (%s, %s)\n val1=\"%s\"\n val2=\"%s\"\n", desc_val1, desc_val2, val1, val2); } } -- 2.26.3
Go patch committed: Permit compiler directives in parenthesized groups
This patch to the Go frontend permits compiler directives in parenthesized groups. The original compiler directive support was only for //line at the start of a line and for //go: comments before function declarations. When support was added for //go:notinheap for types and //go:embed for variables the code did not adapt to permit spaces before the comment or to permit the comments in var() or type() groups. This change corrects those omissions. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian 4571c4689239eda7567a69b0decea52c5f6ce501 diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index affba73e4bb..e7ff6705563 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -925ace70ac7426c3f8b5c0bfb75aa9601f071de4 +128ea3dce9b8753167f33d0a96bd093a6cbd58b8 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/gcc/go/gofrontend/lex.cc b/gcc/go/gofrontend/lex.cc index dd66c0209a4..d6909b840a8 100644 --- a/gcc/go/gofrontend/lex.cc +++ b/gcc/go/gofrontend/lex.cc @@ -1909,14 +1909,13 @@ Lex::skip_cpp_comment() if (saw_error) return; - // Recognize various magic comments at the start of a line. + // Recognize various magic comments at the start of a line, preceded + // only by spaces or tabs. - if (lineoff != 2) -{ - // Not at the start of the line. (lineoff == 2 because of the - // two characters in "//"). + // "- 2" for the "//" at the start of the comment. + for (const char* psp = this->linebuf_; psp < p - 2; psp++) +if (*psp != ' ' && *psp != '\t') return; -} while (pend > p && (pend[-1] == ' ' || pend[-1] == '\t' diff --git a/gcc/go/gofrontend/lex.h b/gcc/go/gofrontend/lex.h index 75c8429b68f..1c4cc5bd2a1 100644 --- a/gcc/go/gofrontend/lex.h +++ b/gcc/go/gofrontend/lex.h @@ -420,6 +420,12 @@ class Lex std::swap(*embeds, this->embeds_); } + // Clear any go:embed patterns seen so far. This is used for + // erroneous cases. + void + clear_embeds() + { this->embeds_.clear(); } + // Return whether the identifier NAME should be exported. NAME is a // mangled name which includes only ASCII characters. static bool diff --git a/gcc/go/gofrontend/parse.cc b/gcc/go/gofrontend/parse.cc index e43b5f21448..cc197e5eb35 100644 --- a/gcc/go/gofrontend/parse.cc +++ b/gcc/go/gofrontend/parse.cc @@ -1307,46 +1307,14 @@ void Parse::declaration() { const Token* token = this->peek_token(); - - unsigned int pragmas = this->lex_->get_and_clear_pragmas(); - if (pragmas != 0 - && !token->is_keyword(KEYWORD_FUNC) - && !token->is_keyword(KEYWORD_TYPE)) -go_warning_at(token->location(), 0, - "ignoring magic comment before non-function"); - - std::vector* embeds = NULL; - if (this->lex_->has_embeds()) -{ - embeds = new(std::vector); - this->lex_->get_and_clear_embeds(embeds); - - if (!this->gogo_->current_file_imported_embed()) - { - go_error_at(token->location(), - "invalid go:embed: missing import %"); - delete embeds; - embeds = NULL; - } - if (!token->is_keyword(KEYWORD_VAR)) - { - go_error_at(token->location(), "misplaced go:embed directive"); - if (embeds != NULL) - { - delete embeds; - embeds = NULL; - } - } -} - if (token->is_keyword(KEYWORD_CONST)) this->const_decl(); else if (token->is_keyword(KEYWORD_TYPE)) -this->type_decl(pragmas); +this->type_decl(); else if (token->is_keyword(KEYWORD_VAR)) -this->var_decl(embeds); +this->var_decl(); else if (token->is_keyword(KEYWORD_FUNC)) -this->function_decl(pragmas); +this->function_decl(); else { go_error_at(this->location(), "expected declaration"); @@ -1367,8 +1335,7 @@ Parse::declaration_may_start_here() // Decl = P | "(" [ List ] ")" . void -Parse::decl(void (Parse::*pfn)(unsigned int, std::vector*), - unsigned int pragmas, std::vector* embeds) +Parse::decl(void (Parse::*pfn)()) { if (this->peek_token()->is_eof()) { @@ -1378,15 +1345,18 @@ Parse::decl(void (Parse::*pfn)(unsigned int, std::vector*), } if (!this->peek_token()->is_op(OPERATOR_LPAREN)) -(this->*pfn)(pragmas, embeds); +(this->*pfn)(); else { - if (pragmas != 0) - go_warning_at(this->location(), 0, - "ignoring magic % comment before group"); - if (embeds != NULL) + if (this->lex_->get_and_clear_pragmas() != 0) go_error_at(this->location(), - "ignoring % comment before group"); + "ignoring compiler directive before group"); + if (this->lex_->has_embeds()) + { + this->lex_->clear_embeds(); + go_error_at(this->location(), + "ignoring % comment before group"); +
Re: [PATCH] Fix loop split incorrect count and probability
On 2021/10/26 21:05, Jan Hubicka wrote: >>> > >> That said, likely the profile update cannot be done uniformly >> for all blocks of a loop? > > For the loop: > > for (i = 0; i < n; i = inc (i)) > { > if (ga) > ga = do_something (); > } > > to: > > for (i = 0; i < x; i = inc (i)) > { > if (true) > ga = do_something (); > if (!ga) > break; > } > for (; i < n; i = inc (i)) > { > if (false) > ga = do_something (); > } > > If probability of if (ga) being true is p, then you indeed can scale the > first loop by p and second loop by 1-p. > > Imagine that loop has n iterations and it takes m iterations for ga to > become false, then probability of if(ga) is m/n and you get frequencies > with m=n*(m/n) for first loop and n-m=n*(1-n/m) for second loop. > > Because the conditional becomes constant true, one needs to scale up the > basic block guarded by the if (true) up by n/m to compensate for the > change. With that the udpate should be right. > Ideally one can bypass scaling of basic block(s) containing > ga = do_something () since the scaling first scales down to m/n > and then scale sup to m/n. Which may not combine to noop. > Perhaps one wants to have a parameter specifying basic blocks on which > the scaling is performed while duplicating for this? Yes. This is the patch I tried to fix the issue for the case you pasted for loop split. https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576566.html 5< | \ | 6-- 7| | \ | 21 11->3 | | \ | 19 20-- | 12< / | | 3<-11 16-- with the patch, Loop 1's bb {5,6,7,21,20} is scaled to 33% * count, Loop 2's bb {12,16} is scaled to 66% * count, especially, probability of edge 21->19 and 21->20 is fixed to (33%, 67%) instead of (100%, INV). - [local count: 955630225]: + [local count: 315357973]: # i_13 = PHI # prephitmp_12 = PHI if (prephitmp_12 != 0) goto ; [33.00%] else goto ; [67.00%] - [local count: 315357972]: + [local count: 104068130]: _2 = do_something (); ga = _2; - [local count: 955630225]: + [local count: 315357973]: # prephitmp_5 = PHI i_10 = inc (i_13); if (n_7(D) > i_10) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: goto ; [100.00%] - [local count: 850510901]: + [local count: 280668596]: if (prephitmp_12 != 0) -goto ; [100.00%] +goto ; [33.00%] else -goto ; [INV] +goto ; [67.00%] - [local count: 850510901]: + [local count: 280668596]: goto ; [100.00%] - [count: 0]: + [local count: 70429947]: # i_23 = PHI # prephitmp_25 = PHI - [local count: 955630225]: + [local count: 640272252]: # i_15 = PHI # prephitmp_16 = PHI i_22 = inc (i_15); if (n_7(D) > i_22) goto ; [89.00%] else goto ; [11.00%] - [local count: 850510901]: + [local count: 569842305]: goto ; [100.00%] > > Honza >> >> Richard. -- Thanks, Xionghu
Re: [RFC] Don't move cold code out of loop by checking bb count
On 2021/10/26 21:20, Richard Biener wrote: > On Mon, Oct 18, 2021 at 6:29 AM Xionghu Luo wrote: >> >> >> >> On 2021/10/15 16:11, Richard Biener wrote: >>> On Sat, Oct 9, 2021 at 5:45 AM Xionghu Luo wrote: Hi, On 2021/9/28 20:09, Richard Biener wrote: > On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo wrote: >> >> Update the patch to v3, not sure whether you prefer the paste style >> and continue to link the previous thread as Segher dislikes this... >> >> >> [PATCH v3] Don't move cold code out of loop by checking bb count >> >> >> Changes: >> 1. Handle max_loop in determine_max_movement instead of >> outermost_invariant_loop. >> 2. Remove unnecessary changes. >> 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in >> can_sm_ref_p. >> 4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused >> infinite loop when implementing v1 and the iteration is missed to be >> updated actually. >> >> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html >> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html >> >> There was a patch trying to avoid move cold block out of loop: >> >> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html >> >> Richard suggested to "never hoist anything from a bb with lower execution >> frequency to a bb with higher one in LIM invariantness_dom_walker >> before_dom_children". >> >> In gimple LIM analysis, add find_coldest_out_loop to move invariants to >> expected target loop, if profile count of the loop bb is colder >> than target loop preheader, it won't be hoisted out of loop. >> Likely for store motion, if all locations of the REF in loop is cold, >> don't do store motion of it. >> >> SPEC2017 performance evaluation shows 1% performance improvement for >> intrate GEOMEAN and no obvious regression for others. Especially, >> 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is >> largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00% >> on P8LE. >> >> gcc/ChangeLog: >> >> * loop-invariant.c (find_invariants_bb): Check profile count >> before motion. >> (find_invariants_body): Add argument. >> * tree-ssa-loop-im.c (find_coldest_out_loop): New function. >> (determine_max_movement): Use find_coldest_out_loop. >> (move_computations_worker): Adjust and fix iteration udpate. >> (execute_sm_exit): Check pointer validness. >> (class ref_in_loop_hot_body): New functor. >> (ref_in_loop_hot_body::operator): New. >> (can_sm_ref_p): Use for_all_locs_in_loop. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/tree-ssa/recip-3.c: Adjust. >> * gcc.dg/tree-ssa/ssa-lim-18.c: New test. >> * gcc.dg/tree-ssa/ssa-lim-19.c: New test. >> * gcc.dg/tree-ssa/ssa-lim-20.c: New test. >> --- >> gcc/loop-invariant.c | 10 ++-- >> gcc/tree-ssa-loop-im.c | 61 -- >> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c| 2 +- >> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++ >> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++ >> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 + >> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++ >> 7 files changed, 165 insertions(+), 8 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c >> >> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c >> index fca0c2b24be..5c3be7bf0eb 100644 >> --- a/gcc/loop-invariant.c >> +++ b/gcc/loop-invariant.c >> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool >> always_reached, bool always_executed) >> call. */ >> >> static void >> -find_invariants_bb (basic_block bb, bool always_reached, bool >> always_executed) >> +find_invariants_bb (class loop *loop, basic_block bb, bool >> always_reached, >> + bool always_executed) >> { >>rtx_insn *insn; >> + basic_block preheader = loop_preheader_edge (loop)->src; >> + >> + if (preheader->count > bb->count) >> +return; >> >>FOR_BB_INSNS (bb, insn) >> { >> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, >> basic_block *body, >>unsigned i; >> >>for (i = 0; i < loop->num_nodes; i++) >> -find_invariants_bb (body[i], >> - bitmap_bit_p (always_reached,
RISCV: Add zmmul extension
From: Liaoshihua --- gcc/common/config/riscv/riscv-common.c | 3 +++ gcc/config/riscv/riscv-c.c | 2 +- gcc/config/riscv/riscv-opts.h | 3 +++ gcc/config/riscv/riscv.c | 5 - gcc/config/riscv/riscv.md | 30 +- gcc/config/riscv/riscv.opt | 3 +++ 6 files changed, 29 insertions(+), 17 deletions(-) diff --git a/gcc/common/config/riscv/riscv-common.c b/gcc/common/config/riscv/riscv-common.c index 34b74e52a2d..ad3180677be 100644 --- a/gcc/common/config/riscv/riscv-common.c +++ b/gcc/common/config/riscv/riscv-common.c @@ -101,6 +101,7 @@ static const struct riscv_ext_version riscv_ext_version_table[] = {"zifencei", ISA_SPEC_CLASS_20191213, 2, 0}, {"zifencei", ISA_SPEC_CLASS_20190608, 2, 0}, + {"zmmul", ISA_SPEC_CLASS_NONE, 0, 1}, /* Terminate the list. */ {NULL, ISA_SPEC_CLASS_NONE, 0, 0} }; @@ -904,6 +905,8 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] = {"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR}, {"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI}, + {"zmmul", &gcc_options::x_riscv_zmmul_subext, MASK_ZMMUL}, + {NULL, NULL, 0} }; diff --git a/gcc/config/riscv/riscv-c.c b/gcc/config/riscv/riscv-c.c index efd4a61ea29..72aa4e389c0 100644 --- a/gcc/config/riscv/riscv-c.c +++ b/gcc/config/riscv/riscv-c.c @@ -47,7 +47,7 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile) if (TARGET_ATOMIC) builtin_define ("__riscv_atomic"); - if (TARGET_MUL) + if (TARGET_MUL || TARGET_ZMMUL) builtin_define ("__riscv_mul"); if (TARGET_DIV) builtin_define ("__riscv_div"); diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h index f4cf6ca4b82..c52b18ebd80 100644 --- a/gcc/config/riscv/riscv-opts.h +++ b/gcc/config/riscv/riscv-opts.h @@ -73,4 +73,7 @@ enum stack_protector_guard { #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0) #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0) +#define MASK_ZMMUL (1 << 0) +#define TARGET_ZMMUL ((riscv_zmmul_subext & MASK_ZMMUL) != 0) + #endif /* ! GCC_RISCV_OPTS_H */ diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index 17cdf705c32..4f5cb35e625 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -1872,7 +1872,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN case MULT: if (float_mode_p) *total = tune_param->fp_mul[mode == DFmode]; - else if (!TARGET_MUL) + else if (!TARGET_MUL && !TARGET_ZMMUL) /* Estimate the cost of a library call. */ *total = COSTS_N_INSNS (speed ? 32 : 6); else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD) @@ -4736,6 +4736,9 @@ riscv_option_override (void) if (flag_pic) g_switch_value = 0; + /* zmmul */ + if (TARGET_ZMMUL && TARGET_MUL) +error ("can not use both the % and the % extension"); /* The presence of the M extension implies that division instructions are present, so include them unless explicitly disabled. */ if (TARGET_MUL && (target_flags_explicit & MASK_DIV) == 0) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index c3687d57047..2ee7d801f1a 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -600,7 +600,7 @@ [(set (match_operand:SI 0 "register_operand" "=r") (mult:SI (match_operand:SI 1 "register_operand" " r") (match_operand:SI 2 "register_operand" " r")))] - "TARGET_MUL" + "(TARGET_MUL || TARGET_ZMMUL)" { return TARGET_64BIT ? "mulw\t%0,%1,%2" : "mul\t%0,%1,%2"; } [(set_attr "type" "imul") (set_attr "mode" "SI")]) @@ -609,7 +609,7 @@ [(set (match_operand:DI 0 "register_operand" "=r") (mult:DI (match_operand:DI 1 "register_operand" " r") (match_operand:DI 2 "register_operand" " r")))] - "TARGET_MUL && TARGET_64BIT" + "(TARGET_MUL || TARGET_ZMMUL) && TARGET_64BIT" "mul\t%0,%1,%2" [(set_attr "type" "imul") (set_attr "mode" "DI")]) @@ -619,7 +619,7 @@ (sign_extend:DI (mult:SI (match_operand:SI 1 "register_operand" " r") (match_operand:SI 2 "register_operand" " r"] - "TARGET_MUL && TARGET_64BIT" + "(TARGET_MUL || TARGET_ZMMUL) && TARGET_64BIT" "mulw\t%0,%1,%2" [(set_attr "type" "imul") (set_attr "mode" "SI")]) @@ -630,7 +630,7 @@ (match_operator:SI 3 "subreg_lowpart_operator" [(mult:DI (match_operand:DI 1 "register_operand" " r") (match_operand:DI 2 "register_operand" " r"))])))] - "TARGET_MUL && TARGET_64BIT" + "(TARGET_MUL || TARGET_ZMMUL) && TARGET_64BIT" "mulw\t%0,%1,%2" [(set_attr "type" "imul") (set_attr "mode" "SI")]) @@ -648,7 +648,7 @@ [(set (match_operand:TI 0 "register_operand") (mult:TI (any_extend:TI (match_operand:DI 1 "register_operand")) (any_extend:TI (match_operand:
[r12-4726 Regression] FAIL: g++.dg/warn/Wstringop-overflow-4.C -std=gnu++98 (test for excess errors) on Linux/x86_64
On Linux/x86_64, 9a27acc30a34b7854db32eac562306cebac6fa1e is the first bad commit commit 9a27acc30a34b7854db32eac562306cebac6fa1e Author: Martin Sebor Date: Tue Oct 26 14:38:11 2021 -0600 Make full use of context-sensitive ranges in access warnings. caused FAIL: g++.dg/warn/Wstringop-overflow-4.C -std=gnu++98 (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4726/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-4.C --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-4.C --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-4.C --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/warn/Wstringop-overflow-4.C --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
[r12-4725 Regression] FAIL: libgomp.c/doacross-1.c (test for excess errors) on Linux/x86_64
On Linux/x86_64, 88b504b7a8c5affb0ffa97990d22af2b199e36ed is the first bad commit commit 88b504b7a8c5affb0ffa97990d22af2b199e36ed Author: Martin Sebor Date: Tue Oct 26 14:34:16 2021 -0600 Detect overflow by atomic functions [PR102453]. caused FAIL: gcc.dg/Warray-bounds-90.c (test for excess errors) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 100) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 103) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 105) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 107) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 110) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 112) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 114) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 121) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 124) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 127) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 129) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 132) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 134) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 136) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 139) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 141) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 144) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 146) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 66) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 69) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 73) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 77) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 85) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 88) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 91) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 93) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 96) FAIL: gcc.dg/Warray-bounds-90.c (test for warnings, line 98) FAIL: libgomp.c/doacross-1.c (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4725/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/Warray-bounds-90.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/Warray-bounds-90.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check RUNTESTFLAGS="c.exp=libgomp.c/doacross-1.c --target_board='unix{-m64}'" $ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check RUNTESTFLAGS="c.exp=libgomp.c/doacross-1.c --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)
Re: [PATCH] Fix negative integer range for UInteger.
On Tue, Oct 26, 2021 at 5:11 PM Martin Liška wrote: > > Hello. > > We should not allow a negative IntegerRange for UInteger options. > The negative ranges for the 2 params are default and does not make sense. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? OK. (btw, can we use opts_set.param_... now instead of -1?) Thanks, Richard. > Thanks, > Martin > > gcc/ChangeLog: > > * opt-functions.awk: Add new sanity checking. > * optc-gen.awk: Add new argument to integer_range_info. > * params.opt: Update 2 params which have negative IntegerRange. > --- > gcc/opt-functions.awk | 4 +++- > gcc/optc-gen.awk | 2 +- > gcc/params.opt| 4 ++-- > 3 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/gcc/opt-functions.awk b/gcc/opt-functions.awk > index be9255064d7..9bc85604066 100644 > --- a/gcc/opt-functions.awk > +++ b/gcc/opt-functions.awk > @@ -356,7 +356,7 @@ function search_var_name(name, opt_numbers, opts, flags, > n_opts) > return "" > } > > -function integer_range_info(range_option, init, option) > +function integer_range_info(range_option, init, option, uinteger_used) > { > if (range_option != "") { > ival = init + 0; > @@ -364,6 +364,8 @@ function integer_range_info(range_option, init, option) > end = nth_arg(1, range_option) + 0; > if (init != "" && init != "-1" && (ival < start || ival > end)) > print "#error initial value " init " of '" option "' must be in > range [" start "," end "]" > + if (uinteger_used && start < 0) > + print "#error '" option"': negative IntegerRange (" start ", " end > ") cannot be combined with UInteger" > return start ", " end > } > else > diff --git a/gcc/optc-gen.awk b/gcc/optc-gen.awk > index 77e598efd60..ebc1a02fa36 100644 > --- a/gcc/optc-gen.awk > +++ b/gcc/optc-gen.awk > @@ -422,7 +422,7 @@ for (i = 0; i < n_opts; i++) { >cl_flags, cl_bit_fields) > printf("%s, %s, %s }%s\n", var_ref(opts[i], flags[i]), >var_set(flags[i]), integer_range_info(opt_args("IntegerRange", > flags[i]), > - opt_args("Init", flags[i]), opts[i]), comma) > + opt_args("Init", flags[i]), opts[i], > flag_set_p("UInteger", flags[i])), comma) > > # Bump up the informational option index. > ++optindex > diff --git a/gcc/params.opt b/gcc/params.opt > index 393d52bc660..1bad2b51df0 100644 > --- a/gcc/params.opt > +++ b/gcc/params.opt > @@ -398,7 +398,7 @@ Common Joined UInteger Var(param_lim_expensive) Init(20) > Param Optimization > The minimum cost of an expensive expression in the loop invariant motion. > > -param=logical-op-non-short-circuit= > -Common Joined UInteger Var(param_logical_op_non_short_circuit) Init(-1) > IntegerRange(-1, 1) Param > +Common Joined UInteger Var(param_logical_op_non_short_circuit) Init(-1) > IntegerRange(0, 1) Param > True if a non-short-circuit operation is optimal. > > -param=loop-block-tile-size= > @@ -1128,7 +1128,7 @@ Common Joined UInteger Var(param_vect_epilogues_nomask) > Init(1) IntegerRange(0, > Enable loop epilogue vectorization using smaller vector size. > > -param=vect-max-peeling-for-alignment= > -Common Joined UInteger Var(param_vect_max_peeling_for_alignment) Init(-1) > IntegerRange(-1, 64) Param Optimization > +Common Joined UInteger Var(param_vect_max_peeling_for_alignment) Init(-1) > IntegerRange(0, 64) Param Optimization > Maximum number of loop peels to enhance alignment of data references in a > loop. > > -param=vect-max-version-for-alias-checks= > -- > 2.33.1 >
[PATCH v2 0/4] loop split fix and functions renaming
This patchset is an followed update from [1]. Patch 1 is expecting review comments from Honza[2]; Patch 2 refactors loop_version to remove loopify call and adjust condition generation later than loopify; Patch 3 and Patch 4 are function renamings to help better understanding. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582600.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582607.html Xionghu Luo (4): Fix loop split incorrect count and probability Refactor loop_version Rename loop_version to clone_loop_to_header_edge. Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge gcc/cfghooks.c| 27 +++ gcc/cfghooks.h| 13 ++- gcc/cfgloopmanip.c| 144 +++--- gcc/cfgloopmanip.h| 17 ++-- gcc/cfgrtl.c | 2 +- gcc/doc/loop.texi | 4 +- gcc/gimple-loop-versioning.cc | 11 +-- gcc/loop-unroll.c | 27 +++ gcc/modulo-sched.c| 6 +- gcc/tree-cfg.c| 2 +- gcc/tree-if-conv.c| 13 +-- gcc/tree-loop-distribution.c | 4 +- gcc/tree-parloops.c | 10 +-- gcc/tree-ssa-loop-ivcanon.c | 4 +- gcc/tree-ssa-loop-manip.c | 31 gcc/tree-ssa-loop-manip.h | 7 +- gcc/tree-ssa-loop-split.c | 39 + gcc/tree-ssa-loop-unswitch.c | 8 +- gcc/tree-vect-loop-manip.c| 5 +- 19 files changed, 159 insertions(+), 215 deletions(-) -- 2.27.0.90.geebb51ba8c