[PATCH] Fix PR82132
Tested on x86_64-unknown-linux-gnu, applied. Richard. 2018-01-16 Richard Biener PR testsuite/82132 * gcc.dg/vect/vect-tail-nomask-1.c: Copy posix_memalign boiler-plate from gcc.dg/torture/pr60092.c. Index: gcc/testsuite/gcc.dg/vect/vect-tail-nomask-1.c === --- gcc/testsuite/gcc.dg/vect/vect-tail-nomask-1.c (revision 256722) +++ gcc/testsuite/gcc.dg/vect/vect-tail-nomask-1.c (working copy) @@ -1,5 +1,9 @@ /* { dg-do run } */ /* { dg-require-weak "" } */ +/* { dg-skip-if "No undefined weak" { hppa*-*-hpux* && { ! lp64 } } } */ +/* { dg-skip-if "No undefined weak" { nvptx-*-* } } */ +/* { dg-additional-options "-Wl,-undefined,dynamic_lookup" { target *-*-darwin* } } */ +/* { dg-additional-options "-Wl,-flat_namespace" { target *-*-darwin[89]* } } */ /* { dg-additional-options "--param vect-epilogues-nomask=1 -mavx2" { target avx2_runtime } } */ #define SIZE 1023
Re: [PATCH] Preserve CROSSING_JUMP_P in peephole2 (PR rtl-optimization/83213)
On Mon, 15 Jan 2018, Jakub Jelinek wrote: > Hi! > > On the testcase in the PR (too large and creduce not making sufficient > progress) we ICE because i386.md: > ;; Combining simple memory jump instruction > > (define_peephole2 > [(set (match_operand:W 0 "register_operand") > (match_operand:W 1 "memory_operand")) >(set (pc) (match_dup 0))] > "!TARGET_X32 >&& !ix86_indirect_branch_thunk_register >&& peep2_reg_dead_p (2, operands[0])" > [(set (pc) (match_dup 1))]) > > peephole2 triggers on a CROSSING_JUMP_P jump, but nothing actually > copies that bit over from the old to the new JUMP_INSN. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? Ok. Richard. > 2018-01-15 Jakub Jelinek > > PR rtl-optimization/83213 > * recog.c (peep2_attempt): Copy over CROSSING_JUMP_P from peepinsn > to last if both are JUMP_INSNs. > > --- gcc/recog.c.jj2018-01-09 08:58:14.594002069 +0100 > +++ gcc/recog.c 2018-01-15 16:37:13.279196178 +0100 > @@ -3446,6 +3446,8 @@ peep2_attempt (basic_block bb, rtx_insn >last = emit_insn_after_setloc (attempt, >peep2_insn_data[i].insn, >INSN_LOCATION (peepinsn)); > + if (JUMP_P (peepinsn) && JUMP_P (last)) > +CROSSING_JUMP_P (last) = CROSSING_JUMP_P (peepinsn); >before_try = PREV_INSN (insn); >delete_insn_chain (insn, peep2_insn_data[i].insn, false); > > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH] Fix PR83435
On Mon, 15 Jan 2018, Szabolcs Nagy wrote: > On 11/01/18 13:41, Richard Biener wrote: > > 2018-01-11 Richard Biener > > > > PR tree-optimization/83435 > > * graphite.c (canonicalize_loop_form): Ignore fake loop exit edges. > > * graphite-scop-detection.c (scop_detection::get_sese): Likewise. > > * tree-vrp.c (add_assert_info): Drop TREE_OVERFLOW if they appear. > > > > * gcc.dg/graphite/pr83435.c: New testcase. > > this test case fails on baremetal targets for me with > > xgcc: error: unrecognized command line option '-pthread' Fixed as follows. Richard. 2018-01-16 Richard Biener * gcc.dg/graphite/pr83435.c: Restrict to target pthread. Index: gcc/testsuite/gcc.dg/graphite/pr83435.c === --- gcc/testsuite/gcc.dg/graphite/pr83435.c (revision 256722) +++ gcc/testsuite/gcc.dg/graphite/pr83435.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target pthread } } */ /* { dg-options "-O -ftree-parallelize-loops=2 -floop-parallelize-all" } */ int yj, ax; > > > Index: gcc/testsuite/gcc.dg/graphite/pr83435.c > > === > > --- gcc/testsuite/gcc.dg/graphite/pr83435.c (nonexistent) > > +++ gcc/testsuite/gcc.dg/graphite/pr83435.c (working copy) > > @@ -0,0 +1,25 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O -ftree-parallelize-loops=2 -floop-parallelize-all" } */ > > + > > +int yj, ax; > > + > > +void > > +gf (signed char mp) > > +{ > > + int *dh = &yj; > > + > > + for (;;) > > +{ > > + signed char sb; > > + > > + for (sb = 0; sb < 1; sb -= 8) > > + { > > + } > > + > > + mp &= mp <= sb; > > + if (mp == 0) > > + dh = &ax; > > + mp = 0; > > + *dh = 0; > > +} > > +} > > > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: Don't group gather loads (PR83847)
On Mon, Jan 15, 2018 at 4:12 PM, Richard Sandiford wrote: > In the testcase we were trying to group two gather loads, even though > that isn't supported. Fixed by explicitly disallowing grouping of > gathers and scatters. > > This problem didn't show up on SVE because there we convert to > IFN_GATHER_LOAD/IFN_SCATTER_STORE pattern statements, which fail > the can_group_stmts_p check. > > Tested on x86_64-linux-gnu. OK to install? Ok. Richard. > Richard > > > 2018-01-15 Richard Sandiford > > gcc/ > * tree-vect-data-refs.c (vect_analyze_data_ref_accesses): > > gcc/testsuite/ > * gcc.dg/torture/pr83847.c: New test. > > Index: gcc/tree-vect-data-refs.c > === > --- gcc/tree-vect-data-refs.c 2018-01-13 18:02:00.948360274 + > +++ gcc/tree-vect-data-refs.c 2018-01-15 12:22:47.066621712 + > @@ -2923,7 +2923,8 @@ vect_analyze_data_ref_accesses (vec_info >data_reference_p dra = datarefs_copy[i]; >stmt_vec_info stmtinfo_a = vinfo_for_stmt (DR_STMT (dra)); >stmt_vec_info lastinfo = NULL; > - if (! STMT_VINFO_VECTORIZABLE (stmtinfo_a)) > + if (!STMT_VINFO_VECTORIZABLE (stmtinfo_a) > + || STMT_VINFO_GATHER_SCATTER_P (stmtinfo_a)) > { > ++i; > continue; > @@ -2932,7 +2933,8 @@ vect_analyze_data_ref_accesses (vec_info > { > data_reference_p drb = datarefs_copy[i]; > stmt_vec_info stmtinfo_b = vinfo_for_stmt (DR_STMT (drb)); > - if (! STMT_VINFO_VECTORIZABLE (stmtinfo_b)) > + if (!STMT_VINFO_VECTORIZABLE (stmtinfo_b) > + || STMT_VINFO_GATHER_SCATTER_P (stmtinfo_b)) > break; > > /* ??? Imperfect sorting (non-compatible types, non-modulo > Index: gcc/testsuite/gcc.dg/torture/pr83847.c > === > --- /dev/null 2018-01-12 06:40:27.684409621 + > +++ gcc/testsuite/gcc.dg/torture/pr83847.c 2018-01-15 12:22:47.064621805 > + > @@ -0,0 +1,32 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-march=bdver4" { target i?86-*-* x86_64-*-* } } > */ > + > +typedef struct { > + struct { > +int a; > +int b; > + } c; > +} * d; > +typedef struct { > + unsigned e; > + d f[]; > +} g; > +g h; > +d *k; > +int i(int j) { > + if (j) { > +*k = *h.f; > +return 1; > + } > + return 0; > +} > +int l; > +int m; > +int n; > +d o; > +void p() { > + for (; i(l); l++) { > +n += o->c.a; > +m += o->c.b; > + } > +}
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Mon, Jan 15, 2018 at 5:53 PM, H.J. Lu wrote: > On Mon, Jan 15, 2018 at 3:38 AM, H.J. Lu wrote: >> On Mon, Jan 15, 2018 at 12:31 AM, Richard Biener >> wrote: >>> On Sun, Jan 14, 2018 at 4:08 PM, H.J. Lu wrote: Now my patch set has been checked into trunk. Here is a patch set to move struct ix86_frame to machine_function on GCC 7, which is needed to backport the patch set to GCC 7: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01239.html https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01240.html https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01241.html OK for gcc-7-branch? >>> >>> Yes, backporting is ok - please watch for possible fallout on trunk and make >>> sure to adjust the backport accordingly. I plan to do GCC 7.3 RC1 on >>> Wednesday now with the final release about a week later if no issue shows >>> up. >>> >> >> Backport is blocked by >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83838 >> >> There are many test failures due to lack of comdat support in linker on >> Solaris. >> I can limit these tests to Linux. > > These are testcase issues and shouldn't block backport to GCC 7. It makes the option using thunks unusable though, right? Can you simply make them hidden on systems without comdat support? That duplicates them per TU but at least the feature works. Or those systems should provide the thunks via libgcc. I agree we can followup with a fix for Solaris given lack of a public testing machine. Richard. >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83839 >> >> Bootstrap failed on Dawning due to lack of ".set" directive in assembler. I >> uploaded a patch: >> >> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43124 >> >> There is no confirmation on it. Also there may be test failures on Dardwin >> due to difference in assembly output. > > I posted a patch for Darwin build: > > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01347.html > > This needs to be checked into trunk before I can start backport to GCC 7. > > -- > H.J.
Re: [PATCH, rs6000] Support for gimple folding of mergeh, mergel intrinsics
On Mon, Jan 15, 2018 at 6:20 PM, Will Schmidt wrote: > On Mon, 2018-01-15 at 10:24 +, Richard Sandiford wrote: >> >> + for (int i = 0; i < midpoint; i++) >> >> +{ >> >> + tree tmp1 = build_int_cst (lhs_type_type, offset + i); >> >> + tree tmp2 = build_int_cst (lhs_type_type, offset + n_elts + >> i); >> >> + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp1); >> >> + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, tmp2); >> >> +} >> >> + tree permute = create_tmp_reg_or_ssa_name (lhs_type); >> >> + g = gimple_build_assign (permute, build_constructor (lhs_type, >> ctor_elts)); >> > >> > I think this is no longer canonical GIMPLE (Richard?) >> >> FWIW, although the recent patches added the option of using wider >> permute vectors if the permute vector is constant, it's still OK to >> use the original style of permute vectors if that's known to be valid. >> In this case it is because we know the indices won't wrap, given the >> size of the input vectors. > > Ok. > >> > and given it is also a constant you shouldn't emit a CONSTRUCTOR >> here >> > but directly construct the appropriate VECTOR_CST. So it looks like >> > the mergel/h intrinsics interleave the low or high part of two >> > vectors? > > Right, it is an interleaving of the two vectors. The size and contents > vary depending on the type, and though i briefly considered building up > a if/else table, this approach was far simpler (and less error prone for > me) to code up. > i.e. (int, mergel) (permute) D.2885 = {0, 4, 1, 5}; > (long long, mergel) (permute) D.2876 = {1, 3}; I meant in the loop you could have populated a auto_vec elts; elts.safe_grow (n_elts * 2); and use permute = build_vector (lhs_type, elts); to build a VECTOR_CST rather than going through a COSNTRUCTOR and a separate assignment statement. Richard. > Thanks > -Will > >
Re: [PATCH] Bump minimum value for max-sched-ready-insns param to 1 (PR rtl-optimization/86620)
On Mon, Jan 15, 2018 at 11:04 PM, Jakub Jelinek wrote: > Hi! > > This param allows minimum of 0, which doesn't make much sense. > On the i386/pr83620.c test (when used with the =0 value) we ICE > because ix86_adjust_priority which has code to prevent moving of likely > spilled hard regs doesn't have a chance to do anything, since we don't > consider any other insns as ready. > > This patch bumps the minimum to 1, so that there is at least something > considered. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Richard. > 2018-01-15 Jakub Jelinek > > PR rtl-optimization/86620 > * params.def (max-sched-ready-insns): Bump minimum value to 1. > > * gcc.dg/pr64935-2.c: Use --param=max-sched-ready-insns=1 > instead of --param=max-sched-ready-insns=0. > * gcc.target/i386/pr83620.c: New test. > * gcc.dg/pr83620.c: New test. > > --- gcc/params.def.jj 2018-01-14 17:16:57.471836055 +0100 > +++ gcc/params.def 2018-01-15 18:53:24.122124325 +0100 > @@ -744,7 +744,7 @@ DEFPARAM (PARAM_MAX_FIELDS_FOR_FIELD_SEN > DEFPARAM(PARAM_MAX_SCHED_READY_INSNS, > "max-sched-ready-insns", > "The maximum number of instructions ready to be issued to be > considered by the scheduler during the first scheduling pass.", > -100, 0, 0) > +100, 1, 0) > > /* This is the maximum number of active local stores RTL DSE will consider. > */ > DEFPARAM (PARAM_MAX_DSE_ACTIVE_LOCAL_STORES, > --- gcc/testsuite/gcc.dg/pr64935-2.c.jj 2017-06-19 08:27:46.126467108 +0200 > +++ gcc/testsuite/gcc.dg/pr64935-2.c2018-01-15 18:52:23.987124863 +0100 > @@ -1,6 +1,6 @@ > /* PR rtl-optimization/64935 */ > /* { dg-do compile } */ > -/* { dg-options "-O -fschedule-insns --param=max-sched-ready-insns=0 > -fcompare-debug" } */ > +/* { dg-options "-O -fschedule-insns --param=max-sched-ready-insns=1 > -fcompare-debug" } */ > /* { dg-require-effective-target scheduling } */ > /* { dg-xfail-if "" { powerpc-ibm-aix* } } */ > > --- gcc/testsuite/gcc.target/i386/pr83620.c.jj 2018-01-15 18:53:43.267124153 > +0100 > +++ gcc/testsuite/gcc.target/i386/pr83620.c 2018-01-15 19:17:31.053208498 > +0100 > @@ -0,0 +1,15 @@ > +/* PR rtl-optimization/86620 */ > +/* { dg-do compile { target int128 } } */ > +/* { dg-options "-O2 -flive-range-shrinkage --param=max-sched-ready-insns=1 > -Wno-psabi -mno-avx" } */ > + > +typedef unsigned __int128 V __attribute__ ((vector_size (64))); > + > +V u, v; > + > +V > +foo (char c, short d, int e, long f, __int128 g) > +{ > + f >>= c & 63; > + v = (V){f} == u; > + return e + g + v; > +} > --- gcc/testsuite/gcc.dg/pr83620.c.jj 2018-01-15 19:16:31.953190203 +0100 > +++ gcc/testsuite/gcc.dg/pr83620.c 2018-01-15 19:16:16.499185414 +0100 > @@ -0,0 +1,9 @@ > +/* PR rtl-optimization/86620 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -flive-range-shrinkage --param=max-sched-ready-insns=0" > } */ > +/* { dg-error "minimum value of parameter 'max-sched-ready-insns' is 1" "" { > target *-*-* } 0 } */ > + > +void > +foo (void) > +{ > +} > > Jakub
Re: [PATCH v3, rs6000] Add -mspeculate-indirect-jumps option and implement non-speculating bctr / bctrl
On Tue, Jan 16, 2018 at 12:09 AM, Bill Schmidt wrote: > Hi, > > This patch supercedes v2: > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01204.html, > and fixes the problems noted in its review. It also adds the test cases from > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01261.html and adjusts them > according > to the results of the review. > > There is more function to be provided in a future patch: Sibling calls for > all ABIs, > and indirect calls for non-ELFv2 ABIs. I'm getting close on that, but I > think it's > better to keep that separate at this point. > > Bootstrapped and tested on powerpc64-linux-gnu and powerpc64le-linux-gnu with > no > regressions. Is this okay for trunk? Did you consider simply removing the tablejump/casesi support so expansion always expands to a balanced tree? At least if we have any knobs to tune we should probably tweak them away from the indirect jump using variants with -mno-speculate-indirect-jumps, right? Performance optimization, so shouldn't block this patch - I just thought I should probably mention this. Richard. > Thanks, > Bill > > > [gcc] > > 2018-01-15 Bill Schmidt > > * config/rs6000/rs6000.c (rs6000_opt_vars): Add entry for > -mspeculate-indirect-jumps. > * config/rs6000/rs6000.md (*call_indirect_elfv2): Disable > for -mno-speculate-indirect-jumps. > (*call_indirect_elfv2_nospec): New define_insn. > (*call_value_indirect_elfv2): Disable for > -mno-speculate-indirect-jumps. > (*call_value_indirect_elfv2_nospec): New define_insn. > (indirect_jump): Emit different RTL for > -mno-speculate-indirect-jumps. > (*indirect_jump): Disable for > -mno-speculate-indirect-jumps. > (*indirect_jump_nospec): New define_insn. > (tablejump): Emit different RTL for > -mno-speculate-indirect-jumps. > (tablejumpsi): Disable for -mno-speculate-indirect-jumps. > (tablejumpsi_nospec): New define_expand. > (tablejumpdi): Disable for -mno-speculate-indirect-jumps. > (tablejumpdi_nospec): New define_expand. > (*tablejump_internal1): Disable for > -mno-speculate-indirect-jumps. > (*tablejump_internal1_nospec): New define_insn. > * config/rs6000/rs6000.opt (mspeculate-indirect-jumps): New > option. > > [gcc/testsuite] > > 2018-01-15 Bill Schmidt > > * gcc.target/powerpc/safe-indirect-jump-1.c: New file. > * gcc.target/powerpc/safe-indirect-jump-2.c: New file. > * gcc.target/powerpc/safe-indirect-jump-3.c: New file. > * gcc.target/powerpc/safe-indirect-jump-4.c: New file. > * gcc.target/powerpc/safe-indirect-jump-5.c: New file. > * gcc.target/powerpc/safe-indirect-jump-6.c: New file. > > > Index: gcc/config/rs6000/rs6000.c > === > --- gcc/config/rs6000/rs6000.c (revision 256364) > +++ gcc/config/rs6000/rs6000.c (working copy) > @@ -36726,6 +36726,9 @@ static struct rs6000_opt_var const rs6000_opt_vars >{ "sched-epilog", > offsetof (struct gcc_options, x_TARGET_SCHED_PROLOG), > offsetof (struct cl_target_option, x_TARGET_SCHED_PROLOG), }, > + { "speculate-indirect-jumps", > +offsetof (struct gcc_options, x_rs6000_speculate_indirect_jumps), > +offsetof (struct cl_target_option, x_rs6000_speculate_indirect_jumps), }, > }; > > /* Inner function to handle attribute((target("..."))) and #pragma GCC target > Index: gcc/config/rs6000/rs6000.md > === > --- gcc/config/rs6000/rs6000.md (revision 256364) > +++ gcc/config/rs6000/rs6000.md (working copy) > @@ -11222,11 +11222,22 @@ > (match_operand 1 "" "g,g")) > (set (reg:P TOC_REGNUM) (unspec:P [(match_operand:P 2 "const_int_operand" > "n,n")] UNSPEC_TOCSLOT)) > (clobber (reg:P LR_REGNO))] > - "DEFAULT_ABI == ABI_ELFv2" > + "DEFAULT_ABI == ABI_ELFv2 && rs6000_speculate_indirect_jumps" >"b%T0l\; 2,%2(1)" >[(set_attr "type" "jmpreg") > (set_attr "length" "8")]) > > +;; Variant with deliberate misprediction. > +(define_insn "*call_indirect_elfv2_nospec" > + [(call (mem:SI (match_operand:P 0 "register_operand" "c,*l")) > +(match_operand 1 "" "g,g")) > + (set (reg:P TOC_REGNUM) (unspec:P [(match_operand:P 2 "const_int_operand" > "n,n")] UNSPEC_TOCSLOT)) > + (clobber (reg:P LR_REGNO))] > + "DEFAULT_ABI == ABI_ELFv2 && !rs6000_speculate_indirect_jumps" > + "crset eq\;beq%T0l-\; 2,%2(1)" > + [(set_attr "type" "jmpreg") > + (set_attr "length" "12")]) > + > (define_insn "*call_value_indirect_elfv2" >[(set (match_operand 0 "" "") > (call (mem:SI (match_operand:P 1 "register_operand" "c,*l")) > @@ -11233,11 +11244,22 @@ > (match_operand 2 "" "g,g"))) > (set (reg:P TOC_REGNUM) (unspec:P [(match_operand:P 3 "const_int_operand" > "n,n")] UNSPEC_TOCSLOT)) > (clobber (
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
> On Mon, Jan 15, 2018 at 5:53 PM, H.J. Lu wrote: > > On Mon, Jan 15, 2018 at 3:38 AM, H.J. Lu wrote: > >> On Mon, Jan 15, 2018 at 12:31 AM, Richard Biener > >> wrote: > >>> On Sun, Jan 14, 2018 at 4:08 PM, H.J. Lu wrote: > Now my patch set has been checked into trunk. Here is a patch set > to move struct ix86_frame to machine_function on GCC 7, which is > needed to backport the patch set to GCC 7: > > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01239.html > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01240.html > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01241.html > > OK for gcc-7-branch? > >>> > >>> Yes, backporting is ok - please watch for possible fallout on trunk and > >>> make > >>> sure to adjust the backport accordingly. I plan to do GCC 7.3 RC1 on > >>> Wednesday now with the final release about a week later if no issue shows > >>> up. > >>> > >> > >> Backport is blocked by > >> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83838 > >> > >> There are many test failures due to lack of comdat support in linker on > >> Solaris. > >> I can limit these tests to Linux. > > > > These are testcase issues and shouldn't block backport to GCC 7. > > It makes the option using thunks unusable though, right? Can you simply make > them hidden on systems without comdat support? That duplicates them per TU > but at least the feature works. Or those systems should provide the thunks > via > libgcc. > > I agree we can followup with a fix for Solaris given lack of a public > testing machine. My memory is bit dim, but I am convinced I was fixing specific errors for comdats on Solaris, so I think the toolchain supports them in some sort, just is more restrictive/different from GNU implementation. Indeed, i think just producing sorry, unimplemented message is what we should do if we can't support retpoline on given target. Honza > > Richard. > > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83839 > >> > >> Bootstrap failed on Dawning due to lack of ".set" directive in assembler. > >> I > >> uploaded a patch: > >> > >> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43124 > >> > >> There is no confirmation on it. Also there may be test failures on Dardwin > >> due to difference in assembly output. > > > > I posted a patch for Darwin build: > > > > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01347.html > > > > This needs to be checked into trunk before I can start backport to GCC 7. > > > > -- > > H.J.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Tue, Jan 16, 2018 at 9:34 AM, Jan Hubicka wrote: >> On Mon, Jan 15, 2018 at 5:53 PM, H.J. Lu wrote: >> > On Mon, Jan 15, 2018 at 3:38 AM, H.J. Lu wrote: >> >> On Mon, Jan 15, 2018 at 12:31 AM, Richard Biener >> >> wrote: >> >>> On Sun, Jan 14, 2018 at 4:08 PM, H.J. Lu wrote: >> Now my patch set has been checked into trunk. Here is a patch set >> to move struct ix86_frame to machine_function on GCC 7, which is >> needed to backport the patch set to GCC 7: >> >> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01239.html >> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01240.html >> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01241.html >> >> OK for gcc-7-branch? >> >>> >> >>> Yes, backporting is ok - please watch for possible fallout on trunk and >> >>> make >> >>> sure to adjust the backport accordingly. I plan to do GCC 7.3 RC1 on >> >>> Wednesday now with the final release about a week later if no issue shows >> >>> up. >> >>> >> >> >> >> Backport is blocked by >> >> >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83838 >> >> >> >> There are many test failures due to lack of comdat support in linker on >> >> Solaris. >> >> I can limit these tests to Linux. >> > >> > These are testcase issues and shouldn't block backport to GCC 7. >> >> It makes the option using thunks unusable though, right? Can you simply make >> them hidden on systems without comdat support? That duplicates them per TU >> but at least the feature works. Or those systems should provide the thunks >> via >> libgcc. >> >> I agree we can followup with a fix for Solaris given lack of a public >> testing machine. > > My memory is bit dim, but I am convinced I was fixing specific errors for > comdats > on Solaris, so I think the toolchain supports them in some sort, just is more > restrictive/different from GNU implementation. > > Indeed, i think just producing sorry, unimplemented message is what we should > do > if we can't support retpoline on given target. I'm quite sure Solaris supports comdats, after all it invented ELF ;) I've also seen comdats in debugging early LTO issues. We might run into Solaris as issues though. Richard. > Honza >> >> Richard. >> >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83839 >> >> >> >> Bootstrap failed on Dawning due to lack of ".set" directive in assembler. >> >> I >> >> uploaded a patch: >> >> >> >> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43124 >> >> >> >> There is no confirmation on it. Also there may be test failures on >> >> Dardwin >> >> due to difference in assembly output. >> > >> > I posted a patch for Darwin build: >> > >> > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01347.html >> > >> > This needs to be checked into trunk before I can start backport to GCC 7. >> > >> > -- >> > H.J.
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
> On Tue, Jan 16, 2018 at 9:34 AM, Jan Hubicka wrote: > >> On Mon, Jan 15, 2018 at 5:53 PM, H.J. Lu wrote: > >> > On Mon, Jan 15, 2018 at 3:38 AM, H.J. Lu wrote: > >> >> On Mon, Jan 15, 2018 at 12:31 AM, Richard Biener > >> >> wrote: > >> >>> On Sun, Jan 14, 2018 at 4:08 PM, H.J. Lu wrote: > >> Now my patch set has been checked into trunk. Here is a patch set > >> to move struct ix86_frame to machine_function on GCC 7, which is > >> needed to backport the patch set to GCC 7: > >> > >> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01239.html > >> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01240.html > >> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01241.html > >> > >> OK for gcc-7-branch? > >> >>> > >> >>> Yes, backporting is ok - please watch for possible fallout on trunk > >> >>> and make > >> >>> sure to adjust the backport accordingly. I plan to do GCC 7.3 RC1 on > >> >>> Wednesday now with the final release about a week later if no issue > >> >>> shows > >> >>> up. > >> >>> > >> >> > >> >> Backport is blocked by > >> >> > >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83838 > >> >> > >> >> There are many test failures due to lack of comdat support in linker on > >> >> Solaris. > >> >> I can limit these tests to Linux. > >> > > >> > These are testcase issues and shouldn't block backport to GCC 7. > >> > >> It makes the option using thunks unusable though, right? Can you simply > >> make > >> them hidden on systems without comdat support? That duplicates them per TU > >> but at least the feature works. Or those systems should provide the > >> thunks via > >> libgcc. > >> > >> I agree we can followup with a fix for Solaris given lack of a public > >> testing machine. > > > > My memory is bit dim, but I am convinced I was fixing specific errors for > > comdats > > on Solaris, so I think the toolchain supports them in some sort, just is > > more > > restrictive/different from GNU implementation. > > > > Indeed, i think just producing sorry, unimplemented message is what we > > should do > > if we can't support retpoline on given target. > > I'm quite sure Solaris supports comdats, after all it invented ELF ;) > I've also seen > comdats in debugging early LTO issues. We might run into Solaris as > issues though. :) My recollection is that the thunks in a comdat group needs to come in specific order after the entry symbol. Probably after - at some point I tried to move the before (for better code layout) and needed to retreat. Honza
[PATCH][WWWDOCS][AArch64][ARM] Update GCC 8 release notes
Hi All, This patch updates the GCC 8 release notes for ARM and AArch64. Ok for cvs? Thanks, Tamar -- Index: htdocs/gcc-8/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v retrieving revision 1.26 diff -u -r1.26 changes.html --- htdocs/gcc-8/changes.html 11 Jan 2018 09:31:53 - 1.26 +++ htdocs/gcc-8/changes.html 11 Jan 2018 15:47:15 - @@ -147,7 +147,51 @@ AArch64 - + +The Armv8.4-A architecture is now supported. It can be used by +specifying the -march=armv8.4-a option. + + +The Dot Product instructions are now supported as an optional extension to the +Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. The extension can be used by +specifying the +dotprod architecture extension. E.g. -march=armv8.2-a+dotprod. + + +The Armv8-A +crypto extension has now been split into two extensions for finer grained control: + + +aes which contains the Armv8-A AES crytographic instructions. + +sha2 which contains the Armv8-A SHA2 and SHA1 cryptographic instructions. + +Using +crypto will now enable these two extensions. + + +New Armv8.4-A FP16 Floating Point Multiplication Variant instructions have been added. These instructions are +mandatory in Armv8.4-A but available as an optional extension to Armv8.2-A and Armv8.3-A. The new extension +can be used by specifying the +fp16fml architectural extension on Armv8.2-A and Armv8.3-A. On Armv8.4-A +the instructions can be enabled by specifying +fp16. + + +New cryptographic instructions have been added as optional extensions to Armv8.2-A and newer. These instructions can +be enabled with: + + +sha3 New SHA3 and SHA2 instructions from Armv8.4-A. This implies +sha2. + +sm4 New SM3 and SM4 instructions from Armv8.4-A. + + + + Support has been added for the following processors + (GCC identifiers in parentheses): + + Arm Cortex-A75 (cortex-a75). + Arm Cortex-A55 (cortex-a55). + Arm Cortex-A55/Cortex-A75 DynamIQ big.LITTLE (cortex-a75.cortex-a55). + + The GCC identifiers can be used + as arguments to the -mcpu or -mtune options, + for example: -mcpu=cortex-a75 or + -mtune=thunderx2t99p1 or as arguments to the equivalent target + attributes and pragmas. + ARM @@ -169,14 +213,58 @@ removed in a future release. -The default link behavior for ARMv6 and ARMv7-R targets has been +The default link behavior for Armv6 and Armv7-R targets has been changed to produce BE8 format when generating big-endian images. A new flag -mbe32 can be used to force the linker to produce legacy BE32 format images. There is no change of behavior for -ARMv6-m and other ARMv7 or later targets: these already defaulted +Armv6-M and other Armv7 or later targets: these already defaulted to BE8 format. This change brings GCC into alignment with other compilers for the ARM architecture. + +The Armv8-R architecture is now supported. It can be used by specifying the +-march=armv8-r option. + + +The Armv8.3-A architecture is now supported. It can be used by +specifying the -march=armv8.3-a option. + + +The Armv8.4-A architecture is now supported. It can be used by +specifying the -march=armv8.4-a option. + + + The Dot Product instructions are now supported as an optional extension to the + Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. The extension can be used by + specifying the +dotprod architecture extension. E.g. -march=armv8.2-a+dotprod. + + + +Support for setting extensions and architectures using the GCC target pragma and attribute has been added. +It can be used by specifying #pragma GCC target ("arch=..."), #pragma GCC target ("+extension"), +__attribute__((target("arch=..."))) or __attribute__((target("+extension"))). + + +New Armv8.4-A FP16 Floating Point Multiplication Variant instructions have been added. These instructions are +mandatory in Armv8.4-A but available as an optional extension to Armv8.2-A and Armv8.3-A. The new extension +can be used by specifying the +fp16fml architectural extension on Armv8.2-A and Armv8.3-A. On Armv8.4-A +the instructions can be enabled by specifying +fp16. + + + Support has been added for the following processors + (GCC identifiers in parentheses): + + Arm Cortex-A75 (cortex-a75). + Arm Cortex-A55 (cortex-a55). + Arm Cortex-A55/Cortex-A75 DynamIQ big.LITTLE (cortex-a75.cortex-a55). + Arm Cortex-R52 for Armv8-R (cortex-r52). + + The GCC identifiers can be used + as arguments to the -mcpu or -mtune options, + for example: -mcpu=cortex-a75 or + -mtune=xgene1 or as arguments to the equivalent target + attributes and pragmas. + AVR
Re: [PATCH] Fix warn_if_not_align ICE (PR c/83844)
On Tue, Jan 16, 2018 at 08:57:38AM +0100, Richard Biener wrote: > > - unsigned HOST_WIDE_INT off > > -= (tree_to_uhwi (DECL_FIELD_OFFSET (field)) > > - + tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field)) / BITS_PER_UNIT); > > - if ((off % warn_if_not_align) != 0) > > -warning (opt_w, "%q+D offset %wu in %qT isn't aligned to %u", > > + tree off = byte_position (field); > > + if (!multiple_of_p (TREE_TYPE (off), off, size_int (warn_if_not_align))) > > multiple_of_p also returns 0 if it doesn't know (for the non-constant > case obviously), so the warning should say "may be not aligned"? Or > we don't want any false positives which means multiple_of_p should get > a worker factored out that returns a tri-state value? tri-state sounds optimizing for the very uncommon case, I think it must be very rare in practice when we could prove it must be not aligned and especially we'd need to extend it a lot to handle those cases. Here is an updated patch which says may not be aligned if off is non-constant. When extending the testcase, I've noticed we don't handle IMHO quite important case in multiple_of_p, so the patch handles that too. I've tried not to increase asymptotic complexity of multiple_of_p, so except for the cases where both arguments are INTEGER_CSTs it shouldn't call multiple_of_p more times than before. Ok for trunk if this passes bootstrap/regtest? 2018-01-16 Jakub Jelinek PR c/83844 * stor-layout.c (handle_warn_if_not_align): Use byte_position and multiple_of_p instead of unchecked tree_to_uhwi and UHWI check. If off is not INTEGER_CST, issue a may not be aligned warning rather than isn't aligned. Use isn%'t rather than isn't. * fold-const.c (multiple_of_p) : Don't fall through into MULT_EXPR. : Improve the case when bottom and one of the MULT_EXPR operands are INTEGER_CSTs and bottom is multiple of that operand, in that case check if the other operand is multiple of bottom divided by the INTEGER_CST operand. * gcc.dg/pr83844.c: New test. --- gcc/stor-layout.c.jj2018-01-15 22:40:14.009263280 +0100 +++ gcc/stor-layout.c 2018-01-16 10:01:48.135111031 +0100 @@ -1150,12 +1150,16 @@ handle_warn_if_not_align (tree field, un warning (opt_w, "alignment %u of %qT is less than %u", record_align, context, warn_if_not_align); - unsigned HOST_WIDE_INT off -= (tree_to_uhwi (DECL_FIELD_OFFSET (field)) - + tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field)) / BITS_PER_UNIT); - if ((off % warn_if_not_align) != 0) -warning (opt_w, "%q+D offset %wu in %qT isn't aligned to %u", -field, off, context, warn_if_not_align); + tree off = byte_position (field); + if (!multiple_of_p (TREE_TYPE (off), off, size_int (warn_if_not_align))) +{ + if (TREE_CODE (off) == INTEGER_CST) + warning (opt_w, "%q+D offset %E in %qT isn%'t aligned to %u", +field, off, context, warn_if_not_align); + else + warning (opt_w, "%q+D offset %E in %qT may not be aligned to %u", +field, off, context, warn_if_not_align); +} } /* Called from place_field to handle unions. */ --- gcc/fold-const.c.jj 2018-01-15 10:02:04.119181355 +0100 +++ gcc/fold-const.c2018-01-16 10:48:10.444360796 +0100 @@ -12595,9 +12595,34 @@ multiple_of_p (tree type, const_tree top a multiple of BOTTOM then TOP is a multiple of BOTTOM. */ if (!integer_pow2p (bottom)) return 0; - /* FALLTHRU */ + return (multiple_of_p (type, TREE_OPERAND (top, 1), bottom) + || multiple_of_p (type, TREE_OPERAND (top, 0), bottom)); case MULT_EXPR: + if (TREE_CODE (bottom) == INTEGER_CST) + { + op1 = TREE_OPERAND (top, 0); + op2 = TREE_OPERAND (top, 1); + if (TREE_CODE (op1) == INTEGER_CST) + std::swap (op1, op2); + if (TREE_CODE (op2) == INTEGER_CST) + { + if (multiple_of_p (type, op2, bottom)) + return 1; + /* Handle multiple_of_p ((x * 2 + 2) * 4, 8). */ + if (multiple_of_p (type, bottom, op2)) + { + widest_int w = wi::sdiv_trunc (wi::to_widest (bottom), +wi::to_widest (op2)); + if (wi::fits_to_tree_p (w, TREE_TYPE (bottom))) + { + op2 = wide_int_to_tree (TREE_TYPE (bottom), w); + return multiple_of_p (type, op1, op2); + } + } + return multiple_of_p (type, op1, bottom); + } + } return (multiple_of_p (type, TREE_OPERAND (top, 1), bottom) || multiple_of_p (type, TREE_OPERAND (top, 0), bottom)); --- gcc/testsuite/gcc.dg/pr83844.c.jj 2018-01-16 09:56:57.459175232 +0100 +++ gcc/testsuite/gcc.dg/pr83844.c 2018-01-16 10:02:55.494096157 +0100 @@ -0,0 +1,36 @@ +/* PR c
Re: [PATCH 00/10][ARC] Critical fixes
* Claudiu Zissulescu [2018-01-08 15:18:30 +]: > > [ARC][LRA] Use TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV. > > [ARC] Don't allow the last ZOL insn to be in a delay slot. > > [ARC] Add trap instruction. > > [ARC] Update legitimate constant hook. > > [ARC] Enable unaligned access. > > [ARC] Revamp trampoline implementation. > > [ARC][ZOL] Update uses for hw-loop labels. > > [ARC] Add ARCv2 core3 tune option. > > [ARC][FIX] Consider command line ffixed- option. > > [ARC] Update (u)maddsidi patterns. > > Hi Andrew, > > Thank you for reviewing this batch of fixes. Any chance to check also these > ones, they are hanging there for a long time now: > > https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00078.html > https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00081.html > https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00080.html > https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00079.html > https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00084.html > https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00083.html > https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00082.html Sorry for missing these, they somehow didn't make it onto my todo list. I'll review these over the next couple of days. Thanks, Andrew
Re: [PATCH 1/6] [ARC] Add JLI support.
* Claudiu Zissulescu [2017-11-02 13:30:30 +0100]: > The ARCv2 ISA provides the JLI instruction, which is two-byte instructions > that can be used to reduce code size in an application. To make use of it, > we provide two new function attributes 'jli_always' and 'jli_fixed' which > will force the compiler to call the indicated function using a jli_s > instruction. The compiler also generates the entries in the JLI table for > the case when we use 'jli_always' attribute. In the case of 'jli_fixed' > the compiler assumes a fixed position of the function into JLI > table. Thus, the user needs to provide an assembly file with the JLI table > for the final link. This is usefully when we want to have a table in ROM > and a second table in the RAM memory. > > The jli instruction usage can be also forced without the need to annotate > the source code via '-mjli-always' command. > > gcc/ > 2017-02-10 Claudiu Zissulescu > John Eric Martin > > * config/arc/arc-protos.h: Add arc_is_jli_call_p proto. > * config/arc/arc.c (_arc_jli_section): New struct. > (arc_jli_section): New type. > (rc_jli_sections): New static variable. > (arc_handle_jli_attribute): New function. > (arc_attribute_table): Add jli_always and jli_fixed attribute. > (arc_file_end): New function. > (TARGET_ASM_FILE_END): Define. > (arc_print_operand): Reuse 'S' letter for JLI output instruction. > (arc_add_jli_section): New function. > (jli_call_scan): Likewise. > (arc_reorg): Call jli_call_scan. > (arc_output_addsi): Remove 'S' from printing asm operand. > (arc_is_jli_call_p): New function. > * config/arc/arc.md (movqi_insn): Remove 'S' from printing asm > operand. > (movhi_insn): Likewise. > (movsi_insn): Likewise. > (movsi_set_cc_insn): Likewise. > (loadqi_update): Likewise. > (load_zeroextendqisi_update): Likewise. > (load_signextendqisi_update): Likewise. > (loadhi_update): Likewise. > (load_zeroextendhisi_update): Likewise. > (load_signextendhisi_update): Likewise. > (loadsi_update): Likewise. > (loadsf_update): Likewise. > (movsicc_insn): Likewise. > (bset_insn): Likewise. > (bxor_insn): Likewise. > (bclr_insn): Likewise. > (bmsk_insn): Likewise. > (bicsi3_insn): Likewise. > (cmpsi_cc_c_insn): Likewise. > (movsi_ne): Likewise. > (movsi_cond_exec): Likewise. > (clrsbsi2): Likewise. > (norm_f): Likewise. > (normw): Likewise. > (swap): Likewise. > (divaw): Likewise. > (flag): Likewise. > (sr): Likewise. > (kflag): Likewise. > (ffs): Likewise. > (ffs_f): Likewise. > (fls): Likewise. > (call_i): Remove 'S' asm letter, add jli instruction. > (call_value_i): Likewise. > * config/arc/arc.op (mjli-always): New option. > * config/arc/constraints.md (Cji): New constraint. > * config/arc/fpx.md (addsf3_fpx): Remove 'S' from printing asm > operand. > (subsf3_fpx): Likewise. > (mulsf3_fpx): Likewise. > * config/arc/simdext.md (vendrec_insn): Remove 'S' from printing > asm operand. > * doc/extend.texi (ARC): Document 'jli-always' and 'jli-fixed' > function attrbutes. > * doc/invoke.texi (ARC): Document mjli-always option. > > gcc/testsuite > 2017-02-10 Claudiu Zissulescu > > * gcc.target/arc/jli-1.c: New file. > * gcc.target/arc/jli-2.c: Likewise. This looks fine, but I wonder if there should be some documentation that mentions the new .jlitab section added? There's one whitespace issue I also spotted... > @@ -5026,6 +5062,36 @@ static void arc_file_start (void) >fprintf (asm_out_file, "\t.cpu %s\n", arc_cpu_string); > } > > +/* Implement `TARGET_ASM_FILE_END'. */ > +/* Outputs to the stdio stream FILE jli related text. */ > + > +void arc_file_end (void) > +{ > + arc_jli_section *sec = arc_jli_sections; > + > + while (sec != NULL) > + { I think the '{' is not indented correctly. Thanks, Andrew
Re: [PATCH] Fix warn_if_not_align ICE (PR c/83844)
On Tue, 16 Jan 2018, Jakub Jelinek wrote: > On Tue, Jan 16, 2018 at 08:57:38AM +0100, Richard Biener wrote: > > > - unsigned HOST_WIDE_INT off > > > -= (tree_to_uhwi (DECL_FIELD_OFFSET (field)) > > > - + tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field)) / BITS_PER_UNIT); > > > - if ((off % warn_if_not_align) != 0) > > > -warning (opt_w, "%q+D offset %wu in %qT isn't aligned to %u", > > > + tree off = byte_position (field); > > > + if (!multiple_of_p (TREE_TYPE (off), off, size_int > > > (warn_if_not_align))) > > > > multiple_of_p also returns 0 if it doesn't know (for the non-constant > > case obviously), so the warning should say "may be not aligned"? Or > > we don't want any false positives which means multiple_of_p should get > > a worker factored out that returns a tri-state value? > > tri-state sounds optimizing for the very uncommon case, I think it must be > very rare in practice when we could prove it must be not aligned and > especially we'd need to extend it a lot to handle those cases. > > Here is an updated patch which says may not be aligned if off is > non-constant. When extending the testcase, I've noticed we don't handle > IMHO quite important case in multiple_of_p, so the patch handles that too. > I've tried not to increase asymptotic complexity of multiple_of_p, so except > for the cases where both arguments are INTEGER_CSTs it shouldn't call > multiple_of_p more times than before. > > Ok for trunk if this passes bootstrap/regtest? Ok. Thanks, Richard. > 2018-01-16 Jakub Jelinek > > PR c/83844 > * stor-layout.c (handle_warn_if_not_align): Use byte_position and > multiple_of_p instead of unchecked tree_to_uhwi and UHWI check. > If off is not INTEGER_CST, issue a may not be aligned warning > rather than isn't aligned. Use isn%'t rather than isn't. > * fold-const.c (multiple_of_p) : Don't fall through > into MULT_EXPR. > : Improve the case when bottom and one of the > MULT_EXPR operands are INTEGER_CSTs and bottom is multiple of that > operand, in that case check if the other operand is multiple of > bottom divided by the INTEGER_CST operand. > > * gcc.dg/pr83844.c: New test. > > --- gcc/stor-layout.c.jj 2018-01-15 22:40:14.009263280 +0100 > +++ gcc/stor-layout.c 2018-01-16 10:01:48.135111031 +0100 > @@ -1150,12 +1150,16 @@ handle_warn_if_not_align (tree field, un > warning (opt_w, "alignment %u of %qT is less than %u", >record_align, context, warn_if_not_align); > > - unsigned HOST_WIDE_INT off > -= (tree_to_uhwi (DECL_FIELD_OFFSET (field)) > - + tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field)) / BITS_PER_UNIT); > - if ((off % warn_if_not_align) != 0) > -warning (opt_w, "%q+D offset %wu in %qT isn't aligned to %u", > - field, off, context, warn_if_not_align); > + tree off = byte_position (field); > + if (!multiple_of_p (TREE_TYPE (off), off, size_int (warn_if_not_align))) > +{ > + if (TREE_CODE (off) == INTEGER_CST) > + warning (opt_w, "%q+D offset %E in %qT isn%'t aligned to %u", > + field, off, context, warn_if_not_align); > + else > + warning (opt_w, "%q+D offset %E in %qT may not be aligned to %u", > + field, off, context, warn_if_not_align); > +} > } > > /* Called from place_field to handle unions. */ > --- gcc/fold-const.c.jj 2018-01-15 10:02:04.119181355 +0100 > +++ gcc/fold-const.c 2018-01-16 10:48:10.444360796 +0100 > @@ -12595,9 +12595,34 @@ multiple_of_p (tree type, const_tree top >a multiple of BOTTOM then TOP is a multiple of BOTTOM. */ >if (!integer_pow2p (bottom)) > return 0; > - /* FALLTHRU */ > + return (multiple_of_p (type, TREE_OPERAND (top, 1), bottom) > + || multiple_of_p (type, TREE_OPERAND (top, 0), bottom)); > > case MULT_EXPR: > + if (TREE_CODE (bottom) == INTEGER_CST) > + { > + op1 = TREE_OPERAND (top, 0); > + op2 = TREE_OPERAND (top, 1); > + if (TREE_CODE (op1) == INTEGER_CST) > + std::swap (op1, op2); > + if (TREE_CODE (op2) == INTEGER_CST) > + { > + if (multiple_of_p (type, op2, bottom)) > + return 1; > + /* Handle multiple_of_p ((x * 2 + 2) * 4, 8). */ > + if (multiple_of_p (type, bottom, op2)) > + { > + widest_int w = wi::sdiv_trunc (wi::to_widest (bottom), > + wi::to_widest (op2)); > + if (wi::fits_to_tree_p (w, TREE_TYPE (bottom))) > + { > + op2 = wide_int_to_tree (TREE_TYPE (bottom), w); > + return multiple_of_p (type, op1, op2); > + } > + } > + return multiple_of_p (type, op1, bottom); > + } > + } >return (multiple_of_p (type, TREE_OPERAND (top, 1), bottom) > || multiple_of_p (type, TREE_OPERAND (top, 0)
Re: [PATCH][ARM] Fix test fail with conflicting -mfloat-abi
Hi Christophe On 12/01/18 18:32, Christophe Lyon wrote: Le 12 janv. 2018 15:26, "Sudakshina Das" a écrit : Hi This patch fixes my earlier test case that fails for arm-none-eabi with explicit user option for -mfloat-abi which conflict with the test case options. I have added a guard to skip the test on those cases. @Christophe: Sorry about this. I think this should fix the test case. Can you please confirm if this works for you? Yes it does thanks Thanks for checking that. I have added one more directive for armv5t as well to avoid any conflicts for mcpu options. Sudi Thanks Sudi gcc/testsuite/ChangeLog 2018-01-12 Sudakshina Das * gcc.c-torture/compile/pr82096.c: Add dg-skip-if directive. diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c index 9fed28c..35551f5 100644 --- a/gcc/testsuite/gcc.c-torture/compile/pr82096.c +++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c @@ -1,3 +1,5 @@ +/* { dg-require-effective-target arm_arch_v5t_ok } */ +/* { dg-skip-if "Do not combine float-abi values" { arm*-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */ /* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */ static long long AL[24];
Re: [PATCH 2/6] [ARC] Add SJLI support.
* Claudiu Zissulescu [2017-11-02 13:30:31 +0100]: > gcc/ > 2017-02-20 Claudiu Zissulescu > > * config/arc/arc-protos.h: Add arc_is_secure_call_p proto. > * config/arc/arc.c (arc_handle_secure_attribute): New function. > (arc_attribute_table): Add 'secure_call' attribute. > (arc_print_operand): Print secure call operand. > (arc_function_ok_for_sibcall): Don't optimize tail calls when > secure. > (arc_is_secure_call_p): New function. > * config/arc/arc.md (call_i): Add support for sjli instruction. > (call_value_i): Likewise. > * config/arc/constraints.md (Csc): New constraint. > --- > gcc/config/arc/arc-protos.h | 1 + > gcc/config/arc/arc.c | 164 > +++--- > gcc/config/arc/arc.md | 32 + > gcc/config/arc/constraints.md | 7 ++ > gcc/doc/extend.texi | 6 ++ > 5 files changed, 155 insertions(+), 55 deletions(-) Looks fine, few comments inline below. Thanks Andrew > > @@ -3939,6 +3985,9 @@ arc_print_operand (FILE *file, rtx x, int code) > : NULL_TREE); > if (lookup_attribute ("jli_fixed", attrs)) > { > + /* No special treatment for jli_fixed functions. */ > + if (code == 'j' ) Extra space before ')'. > + break; > fprintf (file, "%ld\t; @", > TREE_INT_CST_LOW (TREE_VALUE (TREE_VALUE (attrs; > assemble_name (file, XSTR (x, 0)); > @@ -3947,6 +3996,22 @@ arc_print_operand (FILE *file, rtx x, int code) > } > fprintf (file, "@__jli."); > assemble_name (file, XSTR (x, 0)); > + if (code == 'j') > + arc_add_jli_section (x); > + return; > + } > + if (GET_CODE (x) == SYMBOL_REF > + && arc_is_secure_call_p (x)) > + { > + /* No special treatment for secure functions. */ > + if (code == 'j' ) > + break; > + tree attrs = (TREE_TYPE (SYMBOL_REF_DECL (x)) != error_mark_node > + ? TYPE_ATTRIBUTES (TREE_TYPE (SYMBOL_REF_DECL (x))) > + : NULL_TREE); > + fprintf (file, "%ld\t; @", > +TREE_INT_CST_LOW (TREE_VALUE (TREE_VALUE (attrs; > + assemble_name (file, XSTR (x, 0)); > return; > } >break; > @@ -6897,6 +6962,8 @@ arc_function_ok_for_sibcall (tree decl, > return false; >if (lookup_attribute ("jli_fixed", attrs)) > return false; > + if (lookup_attribute ("secure_call", attrs)) > + return false; > } > >/* Everything else is ok. */ > @@ -7594,46 +7661,6 @@ arc_reorg_loops (void) >reorg_loops (true, &arc_doloop_hooks); > } > > -/* Add the given function declaration to emit code in JLI section. */ > - > -static void > -arc_add_jli_section (rtx pat) > -{ > - const char *name; > - tree attrs; > - arc_jli_section *sec = arc_jli_sections, *new_section; > - tree decl = SYMBOL_REF_DECL (pat); > - > - if (!pat) > -return; > - > - if (decl) > -{ > - /* For fixed locations do not generate the jli table entry. It > - should be provided by the user as an asm file. */ > - attrs = TYPE_ATTRIBUTES (TREE_TYPE (decl)); > - if (lookup_attribute ("jli_fixed", attrs)) > - return; > -} > - > - name = XSTR (pat, 0); > - > - /* Don't insert the same symbol twice. */ > - while (sec != NULL) > -{ > - if(strcmp (name, sec->name) == 0) > - return; > - sec = sec->next; > -} > - > - /* New name, insert it. */ > - new_section = (arc_jli_section *) xmalloc (sizeof (arc_jli_section)); > - gcc_assert (new_section != NULL); > - new_section->name = name; > - new_section->next = arc_jli_sections; > - arc_jli_sections = new_section; > -} > - > /* Scan all calls and add symbols to be emitted in the jli section if > needed. */ > > @@ -10968,6 +10995,63 @@ arc_handle_jli_attribute (tree *node > ATTRIBUTE_UNUSED, > return NULL_TREE; > } > > +/* Handle and "scure" attribute; arguments as in struct > + attribute_spec.handler. */ > + > +static tree > +arc_handle_secure_attribute (tree *node ATTRIBUTE_UNUSED, > + tree name, tree args, int, > + bool *no_add_attrs) > +{ > + if (!TARGET_EM) > +{ > + warning (OPT_Wattributes, > +"%qE attribute only valid for ARC EM architecture", > +name); > + *no_add_attrs = true; > +} > + > + if (args == NULL_TREE) > +{ > + warning (OPT_Wattributes, > +"argument of %qE attribute is missing", > +name); > + *no_add_attrs = true; > +} > + else > +{ > + if (TREE_CODE (TREE_VALUE (args)) == NON_LVALUE_EXPR) > + TREE_VALUE (args) = TREE_OPERAND (TREE_VALUE (args), 0); > + tree arg = TREE_VALUE (args); > + if (TREE_CODE (arg) != INTEGER_CST) > + { > + warning
[C++ Patch] PR 81054 ("[7/8 Regression] ICE with volatile variable in constexpr function")
Hi, in this error recovery regression we ICE when we end-up in an inconsistent state after meaningful diagnostic emitted by ensure_literal_type_for_constexpr_object and then some redundant / slightly misleading one emitted by check_static_variable_definition. I think we can just return early from cp_finish_decl and solve the primary and the secondary issue. I also checked that clang too doesn't emit an error for line #28 of constexpr-diag3.C, after the hard error for co1 itself at line #27. Tested x86_64-linux. Thanks, Paolo. // /cp 2018-01-61 Paolo Carlini PR c++/81054 * decl.c (cp_finish_decl): Early return when the ensure_literal_type_for_constexpr_object fails. /testsuite 2018-01-61 Paolo Carlini PR c++/81054 * g++.dg/cpp0x/constexpr-ice19.C: New. * g++.dg/cpp0x/constexpr-diag3.C: Adjust. Index: cp/decl.c === --- cp/decl.c (revision 256728) +++ cp/decl.c (working copy) @@ -6811,7 +6811,11 @@ cp_finish_decl (tree decl, tree init, bool init_co } if (!ensure_literal_type_for_constexpr_object (decl)) -DECL_DECLARED_CONSTEXPR_P (decl) = 0; +{ + DECL_DECLARED_CONSTEXPR_P (decl) = 0; + TREE_TYPE (decl) = error_mark_node; + return; +} if (VAR_P (decl) && DECL_CLASS_SCOPE_P (decl) Index: testsuite/g++.dg/cpp0x/constexpr-diag3.C === --- testsuite/g++.dg/cpp0x/constexpr-diag3.C(revision 256728) +++ testsuite/g++.dg/cpp0x/constexpr-diag3.C(working copy) @@ -25,7 +25,7 @@ struct complex// { dg-message "no .constexpr. }; constexpr complex co1(0, 1); // { dg-error "not literal" } -constexpr double dd2 = co1.real(); // { dg-error "|in .constexpr. expansion of " } +constexpr double dd2 = co1.real(); // Index: testsuite/g++.dg/cpp0x/constexpr-ice19.C === --- testsuite/g++.dg/cpp0x/constexpr-ice19.C(nonexistent) +++ testsuite/g++.dg/cpp0x/constexpr-ice19.C(working copy) @@ -0,0 +1,13 @@ +// PR c++/81054 +// { dg-do compile { target c++11 } } + +struct A +{ + volatile int i; + constexpr A() : i() {} +}; + +struct B +{ + static constexpr A a {}; // { dg-error "not literal" } +};
Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi
Hi Jeff On 12/01/18 23:00, Jeff Law wrote: On 01/12/2018 01:45 AM, Christophe Lyon wrote: Hi, On 11 January 2018 at 11:58, Sudakshina Das wrote: Hi Jeff On 10/01/18 21:08, Jeff Law wrote: On 01/10/2018 09:25 AM, Sudakshina Das wrote: Hi Jeff On 10/01/18 10:44, Sudakshina Das wrote: Hi Jeff On 09/01/18 23:43, Jeff Law wrote: On 01/05/2018 12:25 PM, Sudakshina Das wrote: Hi Jeff On 05/01/18 18:44, Jeff Law wrote: On 01/04/2018 08:35 AM, Sudakshina Das wrote: Hi The bug reported a particular test di-longlong64-sync-1.c failing when run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3] and -mthumb -march=armv6 -O[g,1,2,3]. According to what I could see, the crash was caused because of the explicit VOIDmode argument that was sent to emit_store_flag_force (). Since the comparing argument was a long long, it was being forced into a VOID type register before the comparison (in prepare_cmp_insn()) is done. As pointed out by Kyrill, there is a comment on emit_store_flag() which says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT". This condition is not true in this case and thus I think it is suitable to change the argument. Testing done: Checked for regressions on bootstrapped arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test cases. Sudi ChangeLog entries: *** gcc/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * optabs.c (expand_atomic_compare_and_swap): Change argument to emit_store_flag_force. *** gcc/testsuite/ChangeLog *** 2017-01-04 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096-1.c: New test. * gcc.c-torture/compile/pr82096-2.c: Likwise. In the case where both (op0/op1) to emit_store_flag/emit_store_flag_force are constants, don't we know the result of the comparison and shouldn't we have optimized the store flag to something simpler? I feel like I must be missing something here. emit_store_flag_force () is comparing a register to op0. ? /* Emit a store-flags instruction for comparison CODE on OP0 and OP1 and storing in TARGET. Normally return TARGET. Return 0 if that cannot be done. MODE is the mode to use for OP0 and OP1 should they be CONST_INTs. If it is VOIDmode, they cannot both be CONST_INT. So we're comparing op0 and op1 AFAICT. One, but not both can be a CONST_INT. If both are a CONST_INT, then you need to address the problem in the caller (by optimizing away the condition). If you've got a REG and a CONST_INT, then the mode should be taken from the REG operand. The 2 constant arguments are to the expand_atomic_compare_and_swap () function. emit_store_flag_force () is used in case when this function is called by the bool variant of the built-in function where the bool return value is computed by comparing the result register with the expected op0. So if only one of the two objects is a CONST_INT, then the mode should come from the other object. I think that's the fundamental problem here and that you're just papering over it by changing the caller. I think my earlier explanation was a bit misleading and I may have rushed into quoting the comment about both operands being const for emit_store_flag_force(). The problem is with the function and I do agree with your suggestion of changing the function to add the code below to be a better approach than the changing the caller. I will change the patch and test it. This is the updated patch according to your suggestions. Testing: Checked for regressions on arm-none-linux-gnueabihf and added new test case. Thanks Sudi ChangeLog entries: *** gcc/ChangeLog *** 2017-01-10 Sudakshina Das PR target/82096 * expmed.c (emit_store_flag_force): Swap if const op0 and change VOIDmode to mode of op0. *** gcc/testsuite/ChangeLog *** 2017-01-10 Sudakshina Das PR target/82096 * gcc.c-torture/compile/pr82096.c: New test. OK. Thanks. Committed as r256526. Sudi Could you add a guard like in other tests to skip it if the user added -mfloat-abi=XXX when running the tests? For instance, I have a configuration where I add -mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard and the new test fails because: xgcc: error: -mfloat-abi=soft and -mfloat-abi=hard may not be used together It's starting to feel like the test should move into gcc.target/arm :-) I nearly suggested that already. Consider moving it into gcc.target/arm pre-approved along with adding the -O to the options and whatever is needed to skip the test at the appropriate time. My initial thought was also to put the test in gcc.target/arm. But I wanted to put it in a torture suite as this was failing at different optimization levels. Creating several tests for different optimization levels or a new torture suite just for this test did not look like the better opt
Re: [PATCH 3/6] [ARC] Add support for "register file 16" reduced register set
* Claudiu Zissulescu [2017-11-02 13:30:32 +0100]: > gcc/ > 2017-03-20 Claudiu Zissulescu > > * config/arc/arc-arches.def: Option mrf16 valid for all > architectures. > * config/arc/arc-c.def (__ARC_RF16__): New predefined macro. > * config/arc/arc-cpus.def (em_mini): New cpu with rf16 on. > * config/arc/arc-options.def (FL_RF16): Add mrf16 option. > * config/arc/arc-tables.opt: Regenerate. > * config/arc/arc.c (arc_conditional_register_usage): Handle > reduced register file case. > (arc_file_start): Set must have build attributes. > * config/arc/arc.h (MAX_ARC_PARM_REGS): Conditional define using > mrf16 option value. > * config/arc/arc.opt (mrf16): Add new option. > * config/arc/elf.h (ATTRIBUTE_PCS): Define. > * config/arc/genmultilib.awk: Handle new mrf16 option. > * config/arc/linux.h (ATTRIBUTE_PCS): Define. > * config/arc/t-multilib: Regenerate. > * doc/invoke.texi (ARC Options): Document mrf16 option. > > gcc/testsuite/ > 2017-03-20 Claudiu Zissulescu > > * gcc.dg/builtin-apply2.c: Change for the ARC's reduced register > set file case. > > libgcc/ > 2017-09-18 Claudiu Zissulescu > > * config/arc/lib1funcs.S (__udivmodsi4): Use safe version for RF16 > option. > (__divsi3): Use RF16 safe registers. > (__modsi3): Likewise. Looks fine, except I think that the new 'em_mini' cpu needs to be added to the -mcpu= description in doc/invoke.texi. Thanks, Andrew > --- > gcc/config/arc/arc-arches.def | 8 > gcc/config/arc/arc-c.def | 1 + > gcc/config/arc/arc-cpus.def | 1 + > gcc/config/arc/arc-options.def| 2 +- > gcc/config/arc/arc-tables.opt | 3 +++ > gcc/config/arc/arc.c | 27 +++ > gcc/config/arc/arc.h | 2 +- > gcc/config/arc/arc.opt| 4 > gcc/config/arc/elf.h | 4 > gcc/config/arc/genmultilib.awk| 2 ++ > gcc/config/arc/linux.h| 9 + > gcc/config/arc/t-multilib | 4 ++-- > gcc/doc/invoke.texi | 8 +++- > gcc/testsuite/gcc.dg/builtin-apply2.c | 8 +++- > libgcc/config/arc/lib1funcs.S | 22 +++--- > 15 files changed, 84 insertions(+), 21 deletions(-) > > diff --git a/gcc/config/arc/arc-arches.def b/gcc/config/arc/arc-arches.def > index 29cb9c4..a0d585b 100644 > --- a/gcc/config/arc/arc-arches.def > +++ b/gcc/config/arc/arc-arches.def > @@ -40,15 +40,15 @@ > > ARC_ARCH ("arcem", em, FL_MPYOPT_1_6 | FL_DIVREM | FL_CD | FL_NORM \ > | FL_BS | FL_SWAP | FL_FPUS | FL_SPFP | FL_DPFP \ > - | FL_SIMD | FL_FPUDA | FL_QUARK, 0) > + | FL_SIMD | FL_FPUDA | FL_QUARK | FL_RF16, 0) > ARC_ARCH ("archs", hs, FL_MPYOPT_7_9 | FL_DIVREM | FL_NORM | FL_CD \ > | FL_ATOMIC | FL_LL64 | FL_BS | FL_SWAP \ > - | FL_FPUS | FL_FPUD, \ > + | FL_FPUS | FL_FPUD | FL_RF16,\ > FL_CD | FL_ATOMIC | FL_BS | FL_NORM | FL_SWAP) > ARC_ARCH ("arc6xx", 6xx, FL_BS | FL_NORM | FL_SWAP | FL_MUL64 | FL_MUL32x16 \ > - | FL_SPFP | FL_ARGONAUT | FL_DPFP, 0) > + | FL_SPFP | FL_ARGONAUT | FL_DPFP | FL_RF16, 0) > ARC_ARCH ("arc700", 700, FL_ATOMIC | FL_BS | FL_NORM | FL_SWAP | FL_EA \ > - | FL_SIMD | FL_SPFP | FL_ARGONAUT | FL_DPFP, \ > + | FL_SIMD | FL_SPFP | FL_ARGONAUT | FL_DPFP | FL_RF16, \ > FL_BS | FL_NORM | FL_SWAP) > > /* Local Variables: */ > diff --git a/gcc/config/arc/arc-c.def b/gcc/config/arc/arc-c.def > index 8c5097e..c9443c9 100644 > --- a/gcc/config/arc/arc-c.def > +++ b/gcc/config/arc/arc-c.def > @@ -28,6 +28,7 @@ ARC_C_DEF ("__ARC_NORM__", TARGET_NORM) > ARC_C_DEF ("__ARC_MUL64__", TARGET_MUL64_SET) > ARC_C_DEF ("__ARC_MUL32BY16__", TARGET_MULMAC_32BY16_SET) > ARC_C_DEF ("__ARC_SIMD__", TARGET_SIMD_SET) > +ARC_C_DEF ("__ARC_RF16__", TARGET_RF16) > > ARC_C_DEF ("__ARC_BARREL_SHIFTER__", TARGET_BARREL_SHIFTER) > > diff --git a/gcc/config/arc/arc-cpus.def b/gcc/config/arc/arc-cpus.def > index 60b4045..c2b0062 100644 > --- a/gcc/config/arc/arc-cpus.def > +++ b/gcc/config/arc/arc-cpus.def > @@ -46,6 +46,7 @@ > TUNETune value for the given configuration, otherwise NONE. */ > > ARC_CPU (em, em, 0, NONE) > +ARC_CPU (em_mini, em, FL_RF16, NONE) > ARC_CPU (arcem, em, FL_MPYOPT_2|FL_CD|FL_BS, NONE) > ARC_CPU (em4,em, FL_CD, NONE) > ARC_CPU (em4_dmips, em, FL_MPYOPT_2|FL_CD|FL_DIVREM|FL_NORM|FL_SWAP|FL_BS, > NONE) > diff --git a/gcc/config/arc/arc-options.def b/gcc/config/arc/arc-options.def > index be51614..8fc7b50 100644 > --- a/gcc/config/arc/arc-options.def > +++ b/gcc/config/arc/arc-options.def > @@ -60,7 +60,7 @@ > ARC_OPT (FL_CD,(1UL
Move pa.h FUNCTION_ARG_SIZE to pa.c (PR83858)
The port-local FUNCTION_ARG_SIZE: MODE) != BLKmode \ ? (HOST_WIDE_INT) GET_MODE_SIZE (MODE) \ : int_size_in_bytes (TYPE)) + UNITS_PER_WORD - 1) / UNITS_PER_WORD) is used by code in pa.c and by ASM_DECLARE_FUNCTION_NAME in som.h. Treating GET_MODE_SIZE as a constant is OK for the former but not the latter, which is used in target-independent code. This caused a build failure on hppa2.0w-hp-hpux11.11. Tested with a cross build of hppa2.0w-hp-hpux11.11. OK to install? Richard 2018-01-16 Richard Sandiford gcc/ PR target/83858 * config/pa/pa.h (FUNCTION_ARG_SIZE): Delete. * config/pa/pa-protos.h (pa_function_arg_size): Declare. * config/pa/som.h (ASM_DECLARE_FUNCTION_NAME): Use pa_function_arg_size instead of FUNCTION_ARG_SIZE. * config/pa/pa.c (pa_function_arg_advance): Likewise. (pa_function_arg, pa_arg_partial_bytes): Likewise. (pa_function_arg_size): New function. Index: gcc/config/pa/pa.h === --- gcc/config/pa/pa.h 2018-01-03 11:12:55.202783713 + +++ gcc/config/pa/pa.h 2018-01-16 10:50:31.245063090 + @@ -592,15 +592,6 @@ #define INIT_CUMULATIVE_INCOMING_ARGS(CU (CUM).indirect = 0, \ (CUM).nargs_prototype = 1000 -/* Figure out the size in words of the function argument. The size - returned by this macro should always be greater than zero because - we pass variable and zero sized objects by reference. */ - -#define FUNCTION_ARG_SIZE(MODE, TYPE) \ - MODE) != BLKmode \ - ? (HOST_WIDE_INT) GET_MODE_SIZE (MODE) \ - : int_size_in_bytes (TYPE)) + UNITS_PER_WORD - 1) / UNITS_PER_WORD) - /* Determine where to put an argument to a function. Value is zero to push the argument on the stack, or a hard register in which to store the argument. Index: gcc/config/pa/pa-protos.h === --- gcc/config/pa/pa-protos.h 2018-01-03 11:12:55.198783870 + +++ gcc/config/pa/pa-protos.h 2018-01-16 10:50:31.244063125 + @@ -107,5 +107,6 @@ extern void pa_asm_output_aligned_local unsigned int); extern void pa_hpux_asm_output_external (FILE *, tree, const char *); extern HOST_WIDE_INT pa_initial_elimination_offset (int, int); +extern HOST_WIDE_INT pa_function_arg_size (machine_mode, const_tree); extern const int pa_magic_milli[]; Index: gcc/config/pa/som.h === --- gcc/config/pa/som.h 2018-01-03 11:12:55.191784145 + +++ gcc/config/pa/som.h 2018-01-16 10:50:31.246063055 + @@ -136,8 +136,8 @@ #define ASM_DECLARE_FUNCTION_NAME(FILE, else \ {\ int arg_size = \ - FUNCTION_ARG_SIZE (TYPE_MODE (DECL_ARG_TYPE (parm)),\ - DECL_ARG_TYPE (parm));\ + pa_function_arg_size (TYPE_MODE (DECL_ARG_TYPE (parm)),\ +DECL_ARG_TYPE (parm)); \ /* Passing structs by invisible reference uses \ one general register. */ \ if (arg_size > 2 \ Index: gcc/config/pa/pa.c === --- gcc/config/pa/pa.c 2018-01-03 11:12:55.201783752 + +++ gcc/config/pa/pa.c 2018-01-16 10:50:31.245063090 + @@ -9485,7 +9485,7 @@ pa_function_arg_advance (cumulative_args const_tree type, bool named ATTRIBUTE_UNUSED) { CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v); - int arg_size = FUNCTION_ARG_SIZE (mode, type); + int arg_size = pa_function_arg_size (mode, type); cum->nargs_prototype--; cum->words += (arg_size @@ -9517,7 +9517,7 @@ pa_function_arg (cumulative_args_t cum_v if (mode == VOIDmode) return NULL_RTX; - arg_size = FUNCTION_ARG_SIZE (mode, type); + arg_size = pa_function_arg_size (mode, type); /* If this arg would be passed partially or totally on the stack, then this routine should return zero. pa_arg_partial_bytes will @@ -9724,10 +9724,10 @@ pa_arg_partial_bytes (cumulative_args_t if (!TARGET_64BIT) return 0; - if (FUNCTION_ARG_SIZE (mode, type) > 1 && (cum->words & 1)) + if (pa_function_arg_size (mode, type) > 1 && (cum->words & 1)) offset = 1; - if (cum->words + offset + FUNCTION_ARG_SIZE (mode, type) <= max_arg_words) + if (cum->words + offset + pa_function_arg_size (mode, type) <= max_arg_words) /* Arg fits fully into registers. */ return 0; else if (cum->words + offset >= max_arg_words) @@ -10835,4 +10835,16 @@ pa_starting_frame_offset (void) r
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Tue, Jan 16, 2018 at 12:34 AM, Jan Hubicka wrote: >> On Mon, Jan 15, 2018 at 5:53 PM, H.J. Lu wrote: >> > On Mon, Jan 15, 2018 at 3:38 AM, H.J. Lu wrote: >> >> On Mon, Jan 15, 2018 at 12:31 AM, Richard Biener >> >> wrote: >> >>> On Sun, Jan 14, 2018 at 4:08 PM, H.J. Lu wrote: >> Now my patch set has been checked into trunk. Here is a patch set >> to move struct ix86_frame to machine_function on GCC 7, which is >> needed to backport the patch set to GCC 7: >> >> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01239.html >> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01240.html >> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01241.html >> >> OK for gcc-7-branch? >> >>> >> >>> Yes, backporting is ok - please watch for possible fallout on trunk and >> >>> make >> >>> sure to adjust the backport accordingly. I plan to do GCC 7.3 RC1 on >> >>> Wednesday now with the final release about a week later if no issue shows >> >>> up. >> >>> >> >> >> >> Backport is blocked by >> >> >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83838 >> >> >> >> There are many test failures due to lack of comdat support in linker on >> >> Solaris. >> >> I can limit these tests to Linux. >> > >> > These are testcase issues and shouldn't block backport to GCC 7. >> >> It makes the option using thunks unusable though, right? Can you simply make >> them hidden on systems without comdat support? That duplicates them per TU >> but at least the feature works. Or those systems should provide the thunks >> via >> libgcc. >> >> I agree we can followup with a fix for Solaris given lack of a public >> testing machine. > > My memory is bit dim, but I am convinced I was fixing specific errors for > comdats > on Solaris, so I think the toolchain supports them in some sort, just is more > restrictive/different from GNU implementation. > > Indeed, i think just producing sorry, unimplemented message is what we should > do > if we can't support retpoline on given target. > It still works without comdat. GCC just generate a local thunk in each object file. -- H.J.
Re: [PATCH] Fix warn_if_not_align ICE (PR c/83844)
Jakub Jelinek writes: > On Tue, Jan 16, 2018 at 08:57:38AM +0100, Richard Biener wrote: >> > - unsigned HOST_WIDE_INT off >> > -= (tree_to_uhwi (DECL_FIELD_OFFSET (field)) >> > - + tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field)) / BITS_PER_UNIT); >> > - if ((off % warn_if_not_align) != 0) >> > -warning (opt_w, "%q+D offset %wu in %qT isn't aligned to %u", >> > + tree off = byte_position (field); >> > + if (!multiple_of_p (TREE_TYPE (off), off, size_int (warn_if_not_align))) >> >> multiple_of_p also returns 0 if it doesn't know (for the non-constant >> case obviously), so the warning should say "may be not aligned"? Or >> we don't want any false positives which means multiple_of_p should get >> a worker factored out that returns a tri-state value? > > tri-state sounds optimizing for the very uncommon case, I think it must be > very rare in practice when we could prove it must be not aligned and > especially we'd need to extend it a lot to handle those cases. > > Here is an updated patch which says may not be aligned if off is > non-constant. When extending the testcase, I've noticed we don't handle > IMHO quite important case in multiple_of_p, so the patch handles that too. > I've tried not to increase asymptotic complexity of multiple_of_p, so except > for the cases where both arguments are INTEGER_CSTs it shouldn't call > multiple_of_p more times than before. > > Ok for trunk if this passes bootstrap/regtest? > > 2018-01-16 Jakub Jelinek > > PR c/83844 > * stor-layout.c (handle_warn_if_not_align): Use byte_position and > multiple_of_p instead of unchecked tree_to_uhwi and UHWI check. > If off is not INTEGER_CST, issue a may not be aligned warning > rather than isn't aligned. Use isn%'t rather than isn't. > * fold-const.c (multiple_of_p) : Don't fall through > into MULT_EXPR. > : Improve the case when bottom and one of the > MULT_EXPR operands are INTEGER_CSTs and bottom is multiple of that > operand, in that case check if the other operand is multiple of > bottom divided by the INTEGER_CST operand. > > * gcc.dg/pr83844.c: New test. > > --- gcc/stor-layout.c.jj 2018-01-15 22:40:14.009263280 +0100 > +++ gcc/stor-layout.c 2018-01-16 10:01:48.135111031 +0100 > @@ -1150,12 +1150,16 @@ handle_warn_if_not_align (tree field, un > warning (opt_w, "alignment %u of %qT is less than %u", >record_align, context, warn_if_not_align); > > - unsigned HOST_WIDE_INT off > -= (tree_to_uhwi (DECL_FIELD_OFFSET (field)) > - + tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field)) / BITS_PER_UNIT); > - if ((off % warn_if_not_align) != 0) > -warning (opt_w, "%q+D offset %wu in %qT isn't aligned to %u", > - field, off, context, warn_if_not_align); > + tree off = byte_position (field); > + if (!multiple_of_p (TREE_TYPE (off), off, size_int (warn_if_not_align))) > +{ > + if (TREE_CODE (off) == INTEGER_CST) > + warning (opt_w, "%q+D offset %E in %qT isn%'t aligned to %u", > + field, off, context, warn_if_not_align); > + else > + warning (opt_w, "%q+D offset %E in %qT may not be aligned to %u", > + field, off, context, warn_if_not_align); > +} > } > > /* Called from place_field to handle unions. */ > --- gcc/fold-const.c.jj 2018-01-15 10:02:04.119181355 +0100 > +++ gcc/fold-const.c 2018-01-16 10:48:10.444360796 +0100 > @@ -12595,9 +12595,34 @@ multiple_of_p (tree type, const_tree top >a multiple of BOTTOM then TOP is a multiple of BOTTOM. */ >if (!integer_pow2p (bottom)) > return 0; > - /* FALLTHRU */ > + return (multiple_of_p (type, TREE_OPERAND (top, 1), bottom) > + || multiple_of_p (type, TREE_OPERAND (top, 0), bottom)); > > case MULT_EXPR: > + if (TREE_CODE (bottom) == INTEGER_CST) > + { > + op1 = TREE_OPERAND (top, 0); > + op2 = TREE_OPERAND (top, 1); > + if (TREE_CODE (op1) == INTEGER_CST) > + std::swap (op1, op2); > + if (TREE_CODE (op2) == INTEGER_CST) > + { > + if (multiple_of_p (type, op2, bottom)) > + return 1; > + /* Handle multiple_of_p ((x * 2 + 2) * 4, 8). */ > + if (multiple_of_p (type, bottom, op2)) > + { > + widest_int w = wi::sdiv_trunc (wi::to_widest (bottom), > + wi::to_widest (op2)); > + if (wi::fits_to_tree_p (w, TREE_TYPE (bottom))) > + { > + op2 = wide_int_to_tree (TREE_TYPE (bottom), w); > + return multiple_of_p (type, op1, op2); > + } > + } It doesn't really matter since this isn't performance-critical code, but FWIW, there's a wi::multiple_of_p that would avoid the recursion and do the sdiv_trunc as a side-effect. Thanks, Richard
Re: [PATCH][WWWDOCS][AArch64][ARM] Update GCC 8 release notes
Hi Tamar, On 16/01/18 10:04, Tamar Christina wrote: Hi All, This patch updates the GCC 8 release notes for ARM and AArch64. Ok for cvs? Thanks, Tamar -- + + +New Armv8.4-A FP16 Floating Point Multiplication Variant instructions have been added. These instructions are +mandatory in Armv8.4-A but available as an optional extension to Armv8.2-A and Armv8.3-A. The new extension +can be used by specifying the +fp16fml architectural extension on Armv8.2-A and Armv8.3-A. On Armv8.4-A +the instructions can be enabled by specifying +fp16. + + + Support has been added for the following processors + (GCC identifiers in parentheses): + +Arm Cortex-A75 (cortex-a75). +Arm Cortex-A55 (cortex-a55). +Arm Cortex-A55/Cortex-A75 DynamIQ big.LITTLE (cortex-a75.cortex-a55). +Arm Cortex-R52 for Armv8-R (cortex-r52). + + The GCC identifiers can be used + as arguments to the -mcpu or -mtune options, + for example: -mcpu=cortex-a75 or + -mtune=xgene1 or as arguments to the equivalent target xgene1 was added a few releases ago, better to use one of the new additions from the above list. For example -mtune=cortex-r52. With that nit the arm changes look ok to me. Thanks for compiling this! Kyrill
[PATCH] i386: More use reference of struct ix86_frame to avoid copy
This patch has been used with my Spectre backport for GCC 7 for many weeks and has been checked into GCC 7 branch. Should I revert it on GCC 7 branch or check it into trunk? H.J. --- When there is no need to make a copy of ix86_frame, we can use reference of struct ix86_frame to avoid copy. * config/i386/i386.c (ix86_expand_prologue): Use reference of struct ix86_frame. (ix86_expand_epilogue): Likewise. --- gcc/config/i386/i386.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index bfb31db8752..9eba3ffd5d6 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -13385,7 +13385,6 @@ ix86_expand_prologue (void) { struct machine_function *m = cfun->machine; rtx insn, t; - struct ix86_frame frame; HOST_WIDE_INT allocate; bool int_registers_saved; bool sse_registers_saved; @@ -13413,7 +13412,7 @@ ix86_expand_prologue (void) m->fs.sp_valid = true; m->fs.sp_realigned = false; - frame = m->frame; + struct ix86_frame &frame = cfun->machine->frame; if (!TARGET_64BIT && ix86_function_ms_hook_prologue (current_function_decl)) { @@ -14291,7 +14290,6 @@ ix86_expand_epilogue (int style) { struct machine_function *m = cfun->machine; struct machine_frame_state frame_state_save = m->fs; - struct ix86_frame frame; bool restore_regs_via_mov; bool using_drap; bool restore_stub_is_tail = false; @@ -14304,7 +14302,7 @@ ix86_expand_epilogue (int style) } ix86_finalize_stack_frame_flags (); - frame = m->frame; + struct ix86_frame &frame = cfun->machine->frame; m->fs.sp_realigned = stack_realign_fp; m->fs.sp_valid = stack_realign_fp -- 2.14.3
Avoid GCC 4.1 build failure in fold-const.c
We had: tree t = fold_vec_perm (type, arg1, arg2, vec_perm_indices (sel, 2, nelts)); where fold_vec_perm takes a const vec_perm_indices &. GCC 4.1 apparently required a public copy constructor: gcc/vec-perm-indices.h:85: error: 'vec_perm_indices::vec_perm_indices(const vec_perm_indices&)' is private gcc/fold-const.c:11410: error: within this context even though no copy should be made here. This patch tries to work around that by constructing the vec_perm_indices separately. Tested on aarch64-linux-gnu. OK to install? Richard 2018-01-16 Richard Sandiford gcc/ * fold-const.c (fold_ternary_loc): Construct the vec_perm_indices in a separate statement. Index: gcc/fold-const.c === --- gcc/fold-const.c2018-01-15 12:38:28.967896418 + +++ gcc/fold-const.c2018-01-16 12:08:10.08501 + @@ -11406,8 +11406,8 @@ fold_ternary_loc (location_t loc, enum t else /* Currently unreachable. */ return NULL_TREE; } - tree t = fold_vec_perm (type, arg1, arg2, - vec_perm_indices (sel, 2, nelts)); + vec_perm_indices indices (sel, 2, nelts); + tree t = fold_vec_perm (type, arg1, arg2, indices); if (t != NULL_TREE) return t; }
Re: Avoid GCC 4.1 build failure in fold-const.c
On Tue, Jan 16, 2018 at 12:11:28PM +, Richard Sandiford wrote: > We had: > > tree t = fold_vec_perm (type, arg1, arg2, > vec_perm_indices (sel, 2, nelts)); > > where fold_vec_perm takes a const vec_perm_indices &. GCC 4.1 apparently > required a public copy constructor: > > gcc/vec-perm-indices.h:85: error: 'vec_perm_indices::vec_perm_indices(const > vec_perm_indices&)' is private > gcc/fold-const.c:11410: error: within this context > > even though no copy should be made here. This patch tries to work > around that by constructing the vec_perm_indices separately. > > Tested on aarch64-linux-gnu. OK to install? > > Richard > > > 2018-01-16 Richard Sandiford > > gcc/ > * fold-const.c (fold_ternary_loc): Construct the vec_perm_indices > in a separate statement. Ok, thanks. > Index: gcc/fold-const.c > === > --- gcc/fold-const.c 2018-01-15 12:38:28.967896418 + > +++ gcc/fold-const.c 2018-01-16 12:08:10.08501 + > @@ -11406,8 +11406,8 @@ fold_ternary_loc (location_t loc, enum t > else /* Currently unreachable. */ > return NULL_TREE; > } > - tree t = fold_vec_perm (type, arg1, arg2, > - vec_perm_indices (sel, 2, nelts)); > + vec_perm_indices indices (sel, 2, nelts); > + tree t = fold_vec_perm (type, arg1, arg2, indices); > if (t != NULL_TREE) > return t; > } Jakub
Re: [PATCH v3, rs6000] Add -mspeculate-indirect-jumps option and implement non-speculating bctr / bctrl
Hi! On Tue, Jan 16, 2018 at 09:29:13AM +0100, Richard Biener wrote: > Did you consider simply removing the tablejump/casesi support so > expansion always > expands to a balanced tree? At least if we have any knobs to tune we > should probably > tweak them away from the indirect jump using variants with > -mno-speculate-indirect-jumps, > right? We can generate indirect jumps for other situations so this patch will still be needed. > Performance optimization, so shouldn't block this patch - I just > thought I should probably > mention this. Yeah let's get this done first :-) Segher
[PATCH, committed] Add myself to MAINTAINERS
Hi, Just added myself to MAINTAINERS (write after approval) Best Regards, Sebastian Index: ChangeLog === --- ChangeLog(revision 256737) +++ ChangeLog(working copy) @@ -1,3 +1,7 @@ +2018-01-16 Sebastian Perta + +* MAINTAINERS (write after approval): Add myself. + 2018-01-03 Jakub Jelinek Update copyright years. Index: MAINTAINERS === --- MAINTAINERS(revision 256737) +++ MAINTAINERS(working copy) @@ -535,6 +535,7 @@ Devang Patel Andris Pavenis Fernando Pereira +Sebastian Perta Sebastian Peryt Kaushik Phatak Nicolas Pitre Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered No. 04586709.
Re: [PATCH v3, rs6000] Add -mspeculate-indirect-jumps option and implement non-speculating bctr / bctrl
Hi! On Mon, Jan 15, 2018 at 05:09:06PM -0600, Bill Schmidt wrote: > @@ -12933,9 +12974,27 @@ >"" > { >if (TARGET_32BIT) > -emit_jump_insn (gen_tablejumpsi (operands[0], operands[1])); > +{ > + if (rs6000_speculate_indirect_jumps) > + emit_jump_insn (gen_tablejumpsi (operands[0], operands[1])); > + else > + { > + rtx ccreg = gen_reg_rtx (CCmode); > + rtx jump = gen_tablejumpsi_nospec (operands[0], operands[1], ccreg); > + emit_jump_insn (jump); > + } > +} >else > -emit_jump_insn (gen_tablejumpdi (operands[0], operands[1])); > +{ > + if (rs6000_speculate_indirect_jumps) > + emit_jump_insn (gen_tablejumpdi (operands[0], operands[1])); > + else > + { > + rtx ccreg = gen_reg_rtx (CCmode); > + rtx jump = gen_tablejumpdi_nospec (operands[0], operands[1], ccreg); > + emit_jump_insn (jump); > + } > +} >DONE; > }) This is easier to read if you swap the "if"s (put the rs6000_speculate_indirect_jumps test on the outside). Okay for trunk with or without such a change. Also okay for the branches after some testing (esp. on other ABIs, it is easy to break those together with -mno-speculate-indirect-branches since no one sane would use that combo on purpose). Thanks! Segher
Re: [PATCH] i386: More use reference of struct ix86_frame to avoid copy
On Tue, Jan 16, 2018 at 3:40 AM, H.J. Lu wrote: > This patch has been used with my Spectre backport for GCC 7 for many > weeks and has been checked into GCC 7 branch. Should I revert it on > GCC 7 branch or check it into trunk? Ada build failed with this on trunk: raised STORAGE_ERROR : stack overflow or erroneous memory access make[5]: *** [/export/gnu/import/git/sources/gcc/gcc/ada/Make-generated.in:45: ada/sinfo.h] Error 1 Let me revert it on gcc-7-branch. H.J. > H.J. > --- > When there is no need to make a copy of ix86_frame, we can use reference > of struct ix86_frame to avoid copy. > > * config/i386/i386.c (ix86_expand_prologue): Use reference of > struct ix86_frame. > (ix86_expand_epilogue): Likewise. > --- > gcc/config/i386/i386.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index bfb31db8752..9eba3ffd5d6 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -13385,7 +13385,6 @@ ix86_expand_prologue (void) > { >struct machine_function *m = cfun->machine; >rtx insn, t; > - struct ix86_frame frame; >HOST_WIDE_INT allocate; >bool int_registers_saved; >bool sse_registers_saved; > @@ -13413,7 +13412,7 @@ ix86_expand_prologue (void) >m->fs.sp_valid = true; >m->fs.sp_realigned = false; > > - frame = m->frame; > + struct ix86_frame &frame = cfun->machine->frame; > >if (!TARGET_64BIT && ix86_function_ms_hook_prologue > (current_function_decl)) > { > @@ -14291,7 +14290,6 @@ ix86_expand_epilogue (int style) > { >struct machine_function *m = cfun->machine; >struct machine_frame_state frame_state_save = m->fs; > - struct ix86_frame frame; >bool restore_regs_via_mov; >bool using_drap; >bool restore_stub_is_tail = false; > @@ -14304,7 +14302,7 @@ ix86_expand_epilogue (int style) > } > >ix86_finalize_stack_frame_flags (); > - frame = m->frame; > + struct ix86_frame &frame = cfun->machine->frame; > >m->fs.sp_realigned = stack_realign_fp; >m->fs.sp_valid = stack_realign_fp > -- > 2.14.3 > -- H.J.
[PATCH] PR libstdc++/83834 replace wildcard pattern in linker script
The soon-to-be-released binutils 2.30 makes a small change to how lambda functions are demangled, which causes some unwanted symbols to match a wildcard pattern in the GLIBCXX_3.4 version node of our linker script. The only symbol that is supposed to match the pattern is std::cerr so we should just name that explicitly. That prevents other new symbols matching and being added to the old version. See PR 83893 for the general problem, which we should fix later. PR libstdc++/83834 * config/abi/pre/gnu.ver (GLIBCXX_3.4): Replace std::c[a-g]* wildcard pattern with exact match for std::cerr. Tested powerpc64le-linux with binutils 2.25.1-32.base.el7_4.1 and on x86_64-linux with a binutils-2.3.0.0 snapshot from 2018-01-13. Committed to trunk, backports to follow. commit f8896e7451cd61008e0ceb0ac9a770d5cb77d85b Author: Jonathan Wakely Date: Tue Jan 16 12:01:36 2018 + PR libstdc++/83834 replace wildcard pattern in linker script PR libstdc++/83834 * config/abi/pre/gnu.ver (GLIBCXX_3.4): Replace std::c[a-g]* wildcard pattern with exact match for std::cerr. diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 774bedec9bc..5e66dc5cc3f 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -60,7 +60,7 @@ GLIBCXX_3.4 { std::basic_[t-z]*; std::ba[t-z]*; std::b[b-z]*; - std::c[a-g]*; + std::cerr; # std::char_traits; # std::c[i-z]*; std::c[i-n]*;
Re: GCC 8.0.0 Status Report (2018-01-15), Trunk in Regression and Documentation fixes only mode
On Mon, Jan 15, 2018 at 09:21:07AM +0100, Richard Biener wrote: > We're still in pretty bad shape regression-wise. Please also take > the opportunity to check the state of your favorite host/target > combination to make sure building and testing works appropriately. I tested building Linux (the kernel) for all supported architectures. Everything builds (with my usual tweaks, link with libgcc etc.); except x86_64 and sh have more problems in the kernel, and mips has an ICE. I'll open a PR for that one. Segher
Re: [PATCH] Fix store-merging for ~ of bswap (PR tree-optimization/83843)
On 15 January 2018 at 22:44, Jakub Jelinek wrote: > Hi! > > When using the bswap pass infrastructure, BIT_NOT_EXPRs aren't allowed in > the middle, but due to the way process_store handles those it can appear > around the value, which is something output_merged_store didn't handle. > > Fixed thusly, where we handle not just the case when the bswap (or nop) > value needs inversion as whole, but also cases where only a few portions of > it need xoring with some mask. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2018-01-15 Jakub Jelinek > > PR tree-optimization/83843 > * gimple-ssa-store-merging.c > (imm_store_chain_info::output_merged_store): Handle bit_not_p on > store_immediate_info for bswap/nop orig_stores. > > * gcc.dg/store_merging_18.c: New test. > Hi Jakub, I've noticed that this new test fails on arm, eg: arm-none-linux-gnueabihf --with-mode arm --with-cpu cortex-a9 --with-fpu neon-fp16 FAIL: gcc.dg/store_merging_18.c scan-tree-dump-times store-merging "Merging successful" 3 (found 0 times) Do you want me to file a PR? Christophe > --- gcc/gimple-ssa-store-merging.c.jj 2018-01-04 00:43:17.629703230 +0100 > +++ gcc/gimple-ssa-store-merging.c 2018-01-15 12:29:14.105789381 +0100 > @@ -3619,6 +3619,15 @@ imm_store_chain_info::output_merged_stor > gimple_seq_add_stmt_without_update (&seq, stmt); > src = gimple_assign_lhs (stmt); > } > + inv_op = invert_op (split_store, 2, int_type, xor_mask); > + if (inv_op != NOP_EXPR) > + { > + stmt = gimple_build_assign (make_ssa_name (int_type), > + inv_op, src, xor_mask); > + gimple_set_location (stmt, loc); > + gimple_seq_add_stmt_without_update (&seq, stmt); > + src = gimple_assign_lhs (stmt); > + } > break; > default: > src = ops[0]; > --- gcc/testsuite/gcc.dg/store_merging_18.c.jj 2018-01-15 12:43:49.607227365 > +0100 > +++ gcc/testsuite/gcc.dg/store_merging_18.c 2018-01-15 12:43:24.882245004 > +0100 > @@ -0,0 +1,51 @@ > +/* PR tree-optimization/83843 */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fdump-tree-store-merging" } */ > +/* { dg-final { scan-tree-dump-times "Merging successful" 3 "store-merging" > { target store_merge } } } */ > + > +__attribute__((noipa)) void > +foo (unsigned char *buf, unsigned char *tab) > +{ > + unsigned v = tab[1] ^ (tab[0] << 8); > + buf[0] = ~(v >> 8); > + buf[1] = ~v; > +} > + > +__attribute__((noipa)) void > +bar (unsigned char *buf, unsigned char *tab) > +{ > + unsigned v = tab[1] ^ (tab[0] << 8); > + buf[0] = (v >> 8); > + buf[1] = ~v; > +} > + > +__attribute__((noipa)) void > +baz (unsigned char *buf, unsigned char *tab) > +{ > + unsigned v = tab[1] ^ (tab[0] << 8); > + buf[0] = ~(v >> 8); > + buf[1] = v; > +} > + > +int > +main () > +{ > + volatile unsigned char l1 = 0; > + volatile unsigned char l2 = 1; > + unsigned char buf[2]; > + unsigned char tab[2] = { l1 + 1, l2 * 2 }; > + foo (buf, tab); > + if (buf[0] != (unsigned char) ~1 || buf[1] != (unsigned char) ~2) > +__builtin_abort (); > + buf[0] = l1 + 7; > + buf[1] = l2 * 8; > + bar (buf, tab); > + if (buf[0] != 1 || buf[1] != (unsigned char) ~2) > +__builtin_abort (); > + buf[0] = l1 + 9; > + buf[1] = l2 * 10; > + baz (buf, tab); > + if (buf[0] != (unsigned char) ~1 || buf[1] != 2) > +__builtin_abort (); > + return 0; > +} > > Jakub
Re: [C++ Patch] PR 81054 ("[7/8 Regression] ICE with volatile variable in constexpr function")
.. nevermind, this requires more work: my simple patchlet would cause a few regression in the libstdc++-v3 testsuite (the assert at the beginning of finish_expr_stmt triggers) Paolo.
Two fixes for live-out SLP inductions (PR 83857)
vect_analyze_loop_operations was calling vectorizable_live_operation for all live-out phis, which led to a bogus ncopies calculation in the pure SLP case. I think v_a_l_o should only be passing phis that are vectorised using normal loop vectorisation, since vect_slp_analyze_node_operations handles the SLP side (and knows the correct slp_index and slp_node arguments to pass in, via vect_analyze_stmt). With that fixed we hit an older bug that vectorizable_live_operation didn't handle live-out SLP inductions. Fixed by using gimple_phi_result rather than gimple_get_lhs for phis. Tested on aarch64-linux-gnu. OK to install? Richard 2018-01-16 Richard Sandiford gcc/ PR tree-optimization/83857 * tree-vect-loop.c (vect_analyze_loop_operations): Don't call vectorizable_live_operation for pure SLP statements. (vectorizable_live_operation): Handle PHIs. gcc/testsuite/ PR tree-optimization/83857 * gcc.dg/vect/pr83857.c: New test. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c2018-01-13 18:02:00.950360196 + +++ gcc/tree-vect-loop.c2018-01-16 13:24:33.022528019 + @@ -1851,7 +1851,10 @@ vect_analyze_loop_operations (loop_vec_i ok = vectorizable_reduction (phi, NULL, NULL, NULL, NULL); } - if (ok && STMT_VINFO_LIVE_P (stmt_info)) + /* SLP PHIs are tested by vect_slp_analyze_node_operations. */ + if (ok + && STMT_VINFO_LIVE_P (stmt_info) + && !PURE_SLP_STMT (stmt_info)) ok = vectorizable_live_operation (phi, NULL, NULL, -1, NULL); if (!ok) @@ -8217,7 +8220,11 @@ vectorizable_live_operation (gimple *stm gcc_assert (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)); /* Get the correct slp vectorized stmt. */ - vec_lhs = gimple_get_lhs (SLP_TREE_VEC_STMTS (slp_node)[vec_entry]); + gimple *vec_stmt = SLP_TREE_VEC_STMTS (slp_node)[vec_entry]; + if (gphi *phi = dyn_cast (vec_stmt)) + vec_lhs = gimple_phi_result (phi); + else + vec_lhs = gimple_get_lhs (vec_stmt); /* Get entry to use. */ bitstart = bitsize_int (vec_index); Index: gcc/testsuite/gcc.dg/vect/pr83857.c === --- /dev/null 2018-01-15 18:48:25.844002736 + +++ gcc/testsuite/gcc.dg/vect/pr83857.c 2018-01-16 13:24:33.021528058 + @@ -0,0 +1,30 @@ +/* { dg-do run } */ +/* { dg-additional-options "-ffast-math" } */ + +#define N 100 + +double __attribute__ ((noinline, noclone)) +f (double *x, double y) +{ + double a = 0; + for (int i = 0; i < N; ++i) +{ + a += y; + x[i * 2] += a; + x[i * 2 + 1] += a; +} + return a - y; +} + +double x[N * 2]; + +int +main (void) +{ + if (f (x, 5) != (N - 1) * 5) +__builtin_abort (); + return 0; +} + +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target vect_double } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target vect_double } } } */
Re: [PATCH] rtlanal: dead_or_set_regno_p should handle CLOBBER (PR83424)
On Mon, Dec 18, 2017 at 12:16:13PM -0700, Jeff Law wrote: > On 12/16/2017 02:03 PM, Segher Boessenkool wrote: > > In PR83424 combine's move_deaths puts a REG_DEAD not in the wrong place > > because dead_or_set_regno_p does not account for CLOBBER insns. This > > fixes it. > > > > Bootstrapped and tested on powerpc64-linux {-m32,-m64} and on x86_64-linux. > > Is this okay for trunk? > > > > > > Segher > > > > > > 2017-12-16 Segher Boessenkool > > > > PR rtl-optimization/83424 > > * rtlanal.c (dead_or_set_regno_p): Handle CLOBBER just like SET. > > > > gcc/testsuite/ > > PR rtl-optimization/83424 > > * gcc.dg/pr83424.c: New testsuite. > OK. Is this okay for backports to 7 and 6, too? Segher
Re: [PATCH][WWWDOCS][AArch64][ARM] Update GCC 8 release notes
Hi Kyrill, > > xgene1 was added a few releases ago, better to use one of the new additions > from the above list. > For example -mtune=cortex-r52. Thanks, I have updated the patch. I'll wait for an ok from an AArch64 maintainer and a Docs maintainer. > > With that nit the arm changes look ok to me. > Thanks for compiling this! > Kyrill > Cheers, Tamar -- Index: htdocs/gcc-8/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v retrieving revision 1.26 diff -u -r1.26 changes.html --- htdocs/gcc-8/changes.html 11 Jan 2018 09:31:53 - 1.26 +++ htdocs/gcc-8/changes.html 16 Jan 2018 14:12:57 - @@ -147,7 +147,51 @@ AArch64 - + +The Armv8.4-A architecture is now supported. It can be used by +specifying the -march=armv8.4-a option. + + +The Dot Product instructions are now supported as an optional extension to the +Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. The extension can be used by +specifying the +dotprod architecture extension. E.g. -march=armv8.2-a+dotprod. + + +The Armv8-A +crypto extension has now been split into two extensions for finer grained control: + + +aes which contains the Armv8-A AES crytographic instructions. + +sha2 which contains the Armv8-A SHA2 and SHA1 cryptographic instructions. + +Using +crypto will now enable these two extensions. + + +New Armv8.4-A FP16 Floating Point Multiplication Variant instructions have been added. These instructions are +mandatory in Armv8.4-A but available as an optional extension to Armv8.2-A and Armv8.3-A. The new extension +can be used by specifying the +fp16fml architectural extension on Armv8.2-A and Armv8.3-A. On Armv8.4-A +the instructions can be enabled by specifying +fp16. + + +New cryptographic instructions have been added as optional extensions to Armv8.2-A and newer. These instructions can +be enabled with: + + +sha3 New SHA3 and SHA2 instructions from Armv8.4-A. This implies +sha2. + +sm4 New SM3 and SM4 instructions from Armv8.4-A. + + + + Support has been added for the following processors + (GCC identifiers in parentheses): + + Arm Cortex-A75 (cortex-a75). + Arm Cortex-A55 (cortex-a55). + Arm Cortex-A55/Cortex-A75 DynamIQ big.LITTLE (cortex-a75.cortex-a55). + + The GCC identifiers can be used + as arguments to the -mcpu or -mtune options, + for example: -mcpu=cortex-a75 or + -mtune=thunderx2t99p1 or as arguments to the equivalent target + attributes and pragmas. + ARM @@ -169,14 +213,58 @@ removed in a future release. -The default link behavior for ARMv6 and ARMv7-R targets has been +The default link behavior for Armv6 and Armv7-R targets has been changed to produce BE8 format when generating big-endian images. A new flag -mbe32 can be used to force the linker to produce legacy BE32 format images. There is no change of behavior for -ARMv6-m and other ARMv7 or later targets: these already defaulted +Armv6-M and other Armv7 or later targets: these already defaulted to BE8 format. This change brings GCC into alignment with other compilers for the ARM architecture. + +The Armv8-R architecture is now supported. It can be used by specifying the +-march=armv8-r option. + + +The Armv8.3-A architecture is now supported. It can be used by +specifying the -march=armv8.3-a option. + + +The Armv8.4-A architecture is now supported. It can be used by +specifying the -march=armv8.4-a option. + + + The Dot Product instructions are now supported as an optional extension to the + Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. The extension can be used by + specifying the +dotprod architecture extension. E.g. -march=armv8.2-a+dotprod. + + + +Support for setting extensions and architectures using the GCC target pragma and attribute has been added. +It can be used by specifying #pragma GCC target ("arch=..."), #pragma GCC target ("+extension"), +__attribute__((target("arch=..."))) or __attribute__((target("+extension"))). + + +New Armv8.4-A FP16 Floating Point Multiplication Variant instructions have been added. These instructions are +mandatory in Armv8.4-A but available as an optional extension to Armv8.2-A and Armv8.3-A. The new extension +can be used by specifying the +fp16fml architectural extension on Armv8.2-A and Armv8.3-A. On Armv8.4-A +the instructions can be enabled by specifying +fp16. + + + Support has been added for the following processors + (GCC identifiers in parentheses): + + Arm Cortex-A75 (cortex-a75). + Arm Cortex-A55 (cortex-a55). + Arm Cortex-A55/Cortex-A75 DynamIQ big.LITTLE (cortex-a75.cortex-a55). + Arm Cortex-R52 for Armv8-R (cortex
Re: [PATCH v3, rs6000] Add -mspeculate-indirect-jumps option and implement non-speculating bctr / bctrl
On Jan 16, 2018, at 6:13 AM, Segher Boessenkool wrote: > > Hi! > > On Tue, Jan 16, 2018 at 09:29:13AM +0100, Richard Biener wrote: >> Did you consider simply removing the tablejump/casesi support so >> expansion always >> expands to a balanced tree? At least if we have any knobs to tune we >> should probably >> tweak them away from the indirect jump using variants with >> -mno-speculate-indirect-jumps, >> right? > > We can generate indirect jumps for other situations so this patch will > still be needed. Also, I'm not convinced that a balanced tree for a large jump table is a slam dunk better performer than this (adding hundreds of poorly predictable branches that can clog up hardware predictors for, say, an interpreter loop). I'd want to do some performance testing to look for crossover points (as you say, tuning knobs). But for smaller tables this is a good idea. Thanks, Bill > >> Performance optimization, so shouldn't block this patch - I just >> thought I should probably >> mention this. > > Yeah let's get this done first :-) > > > Segher >
Re: Move pa.h FUNCTION_ARG_SIZE to pa.c (PR83858)
On 2018-01-16 5:52 AM, Richard Sandiford wrote: 2018-01-16 Richard Sandiford gcc/ PR target/83858 * config/pa/pa.h (FUNCTION_ARG_SIZE): Delete. * config/pa/pa-protos.h (pa_function_arg_size): Declare. * config/pa/som.h (ASM_DECLARE_FUNCTION_NAME): Use pa_function_arg_size instead of FUNCTION_ARG_SIZE. * config/pa/pa.c (pa_function_arg_advance): Likewise. (pa_function_arg, pa_arg_partial_bytes): Likewise. (pa_function_arg_size): New function. Thanks Richard. I started a build yesterday evening with essentially the same change. Two little nits. I believe a declaration for pa_function_arg_size needs to be added be added to added pa-protos.h. Secondly, the comment for pa_function_arg_size needs to be updated to say "function" instead of "macro". Otherwise, the change is okay. I want to see if ASM_DECLARE_FUNCTION_NAME can be turned into a function in pa.c as well. This would allow pa_function_arg_size to be static. Dave -- John David Anglin dave.ang...@bell.net
Re: Move pa.h FUNCTION_ARG_SIZE to pa.c (PR83858)
John David Anglin writes: > On 2018-01-16 5:52 AM, Richard Sandiford wrote: >> 2018-01-16 Richard Sandiford >> >> gcc/ >> PR target/83858 >> * config/pa/pa.h (FUNCTION_ARG_SIZE): Delete. >> * config/pa/pa-protos.h (pa_function_arg_size): Declare. >> * config/pa/som.h (ASM_DECLARE_FUNCTION_NAME): Use >> pa_function_arg_size instead of FUNCTION_ARG_SIZE. >> * config/pa/pa.c (pa_function_arg_advance): Likewise. >> (pa_function_arg, pa_arg_partial_bytes): Likewise. >> (pa_function_arg_size): New function. > Thanks Richard. I started a build yesterday evening with essentially > the same change. > > Two little nits. I believe a declaration for pa_function_arg_size needs > to be added to pa-protos.h. The patch did have this. > Secondly, the comment for pa_function_arg_size needs to be updated to > say "function" instead of "macro". Otherwise, the change is okay. Oops, yes. Installed with that change, thanks. Richard
[PATCH] Fix gimplify_one_sizepos (PR libgomp/83590, take 4)
Hi! After lengthy IRC discussions, here is an updated patch, which should also fix the problem that variably_modified_type_p on a REAL_TYPE returns true even when it has constant maximum and minimum. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-01-16 Jakub Jelinek Richard Biener PR libgomp/83590 * gimplify.c (gimplify_one_sizepos): For is_gimple_constant (expr) return early, inline manually is_gimple_sizepos. Make sure if we call gimplify_expr we don't end up with a gimple constant. * tree.c (variably_modified_type_p): Don't return true for is_gimple_constant (_t). Inline manually is_gimple_sizepos. * gimplify.h (is_gimple_sizepos): Remove. --- gcc/gimplify.c.jj 2018-01-12 16:38:50.705238254 +0100 +++ gcc/gimplify.c 2018-01-16 12:21:15.895859416 +0100 @@ -12562,7 +12562,10 @@ gimplify_one_sizepos (tree *expr_p, gimp a VAR_DECL. If it's a VAR_DECL from another function, the gimplifier will want to replace it with a new variable, but that will cause problems if this type is from outside the function. It's OK to have that here. */ - if (is_gimple_sizepos (expr)) + if (expr == NULL_TREE + || is_gimple_constant (expr) + || TREE_CODE (expr) == VAR_DECL + || CONTAINS_PLACEHOLDER_P (expr)) return; *expr_p = unshare_expr (expr); @@ -12570,6 +12573,12 @@ gimplify_one_sizepos (tree *expr_p, gimp /* SSA names in decl/type fields are a bad idea - they'll get reclaimed if the def vanishes. */ gimplify_expr (expr_p, stmt_p, NULL, is_gimple_val, fb_rvalue, false); + + /* If expr wasn't already is_gimple_sizepos or is_gimple_constant from the + FE, ensure that it is a VAR_DECL, otherwise we might handle some decls + as gimplify_vla_decl even when they would have all sizes INTEGER_CSTs. */ + if (is_gimple_constant (*expr_p)) +*expr_p = get_initialized_tmp_var (*expr_p, stmt_p, NULL, false); } /* Gimplify the body of statements of FNDECL and return a GIMPLE_BIND node --- gcc/tree.c.jj 2018-01-15 10:01:40.830186474 +0100 +++ gcc/tree.c 2018-01-16 12:24:11.254821615 +0100 @@ -8825,11 +8825,12 @@ variably_modified_type_p (tree type, tre do { tree _t = (T); \ if (_t != NULL_TREE \ && _t != error_mark_node\ - && TREE_CODE (_t) != INTEGER_CST\ + && !CONSTANT_CLASS_P (_t) \ && TREE_CODE (_t) != PLACEHOLDER_EXPR \ && (!fn \ || (!TYPE_SIZES_GIMPLIFIED (type) \ - && !is_gimple_sizepos (_t)) \ + && (TREE_CODE (_t) != VAR_DECL \ + && !CONTAINS_PLACEHOLDER_P (_t))) \ || walk_tree (&_t, find_var_from_fn, fn, NULL)))\ return true; } while (0) --- gcc/gimplify.h.jj 2018-01-03 10:19:53.757533721 +0100 +++ gcc/gimplify.h 2018-01-16 12:24:51.995812831 +0100 @@ -85,23 +85,4 @@ extern enum gimplify_status gimplify_va_ gimple_seq *); gimple *gimplify_assign (tree, tree, gimple_seq *); -/* Return true if gimplify_one_sizepos doesn't need to gimplify - expr (when in TYPE_SIZE{,_UNIT} and similar type/decl size/bitsize - fields). */ - -static inline bool -is_gimple_sizepos (tree expr) -{ - /* gimplify_one_sizepos doesn't need to do anything if the value isn't there, - is constant, or contains A PLACEHOLDER_EXPR. We also don't want to do - anything if it's already a VAR_DECL. If it's a VAR_DECL from another - function, the gimplifier will want to replace it with a new variable, - but that will cause problems if this type is from outside the function. - It's OK to have that here. */ - return (expr == NULL_TREE - || TREE_CODE (expr) == INTEGER_CST - || TREE_CODE (expr) == VAR_DECL - || CONTAINS_PLACEHOLDER_P (expr)); -} - #endif /* GCC_GIMPLIFY_H */ Jakub
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
Hi Richard, >>> Backport is blocked by >>> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83838 >>> >>> There are many test failures due to lack of comdat support in linker on >>> Solaris. actually this is lack of hidden .gnu.linkonce support right now. Currently that's disabled for all but gld; I'm looking to make that dynamic on newer versions of Solaris 11. >>> I can limit these tests to Linux. >> >> These are testcase issues and shouldn't block backport to GCC 7. > > It makes the option using thunks unusable though, right? Can you simply make > them hidden on systems without comdat support? That duplicates them per TU > but at least the feature works. Or those systems should provide the thunks > via > libgcc. > > I agree we can followup with a fix for Solaris given lack of a public > testing machine. I do have both an x86 and sparc machine running Solaris 11 around to serve as testing machines. Still checking with legal how best to handle external access, either locally or integrated into the compile farm. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
Hi Jan, >> It makes the option using thunks unusable though, right? Can you simply make >> them hidden on systems without comdat support? That duplicates them per TU >> but at least the feature works. Or those systems should provide the >> thunks via >> libgcc. >> >> I agree we can followup with a fix for Solaris given lack of a public >> testing machine. > > My memory is bit dim, but I am convinced I was fixing specific errors for > comdats > on Solaris, so I think the toolchain supports them in some sort, just is more > restrictive/different from GNU implementation. comdat does work just fine in Solaris 11, but the Solaris 10 linker has problems with what gcc generates. > Indeed, i think just producing sorry, unimplemented message is what we should > do > if we can't support retpoline on given target. Certainly, coupled with an appropriate effective-target keyword to limit testcases appropriately. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: Move pa.h FUNCTION_ARG_SIZE to pa.c (PR83858)
On 2018-01-16 9:48 AM, Richard Sandiford wrote: Oops, yes. Installed with that change, thanks. Oops, I just realized the CEIL function needs to be applied to the GET_MODE_SIZE return as well... Dave -- John David Anglin dave.ang...@bell.net
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
Hi Richard, > I'm quite sure Solaris supports comdats, after all it invented ELF ;) true: gcc/configure.ac has # Sun ld has COMDAT group support since Solaris 9, but it doesn't # interoperate with GNU as until Solaris 11 build 130, i.e. ld # version 1.688. # # If using Sun as for COMDAT group as emitted by GCC, one needs at # least ld version 1.2267. > I've also seen > comdats in debugging early LTO issues. We might run into Solaris as > issues though. The Solaris code has been taught to deal with that, so it should hopefully be hidden from the rest of the compiler. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] i386: More use reference of struct ix86_frame to avoid copy
On 01/16/2018 01:35 PM, H.J. Lu wrote: > On Tue, Jan 16, 2018 at 3:40 AM, H.J. Lu wrote: >> This patch has been used with my Spectre backport for GCC 7 for many >> weeks and has been checked into GCC 7 branch. Should I revert it on >> GCC 7 branch or check it into trunk? > > Ada build failed with this on trunk: > > raised STORAGE_ERROR : stack overflow or erroneous memory access > make[5]: *** [/export/gnu/import/git/sources/gcc/gcc/ada/Make-generated.in:45: > ada/sinfo.h] Error 1 Hello. I know that you've already reverted the change, but it's possible to replace struct ix86_frame &frame = cfun->machine->frame; with: struct ix86_frame *frame = &cfun->machine->frame; And replace usages with point access operator (->). That would also avoid copying. One another question. After you switched to references, isn't the behavior of function ix86_expand_epilogue as it also contains write to frame struct like: 14799/* Special care must be taken for the normal return case of a function 14800 using eh_return: the eax and edx registers are marked as saved, but 14801 not restored along this path. Adjust the save location to match. */ 14802if (crtl->calls_eh_return && style != 2) 14803 frame.reg_save_offset -= 2 * UNITS_PER_WORD; Thanks for clarification. Martin > > Let me revert it on gcc-7-branch. > > H.J. >> H.J. >> --- >> When there is no need to make a copy of ix86_frame, we can use reference >> of struct ix86_frame to avoid copy. >> >> * config/i386/i386.c (ix86_expand_prologue): Use reference of >> struct ix86_frame. >> (ix86_expand_epilogue): Likewise. >> --- >> gcc/config/i386/i386.c | 6 ++ >> 1 file changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >> index bfb31db8752..9eba3ffd5d6 100644 >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -13385,7 +13385,6 @@ ix86_expand_prologue (void) >> { >>struct machine_function *m = cfun->machine; >>rtx insn, t; >> - struct ix86_frame frame; >>HOST_WIDE_INT allocate; >>bool int_registers_saved; >>bool sse_registers_saved; >> @@ -13413,7 +13412,7 @@ ix86_expand_prologue (void) >>m->fs.sp_valid = true; >>m->fs.sp_realigned = false; >> >> - frame = m->frame; >> + struct ix86_frame &frame = cfun->machine->frame; >> >>if (!TARGET_64BIT && ix86_function_ms_hook_prologue >> (current_function_decl)) >> { >> @@ -14291,7 +14290,6 @@ ix86_expand_epilogue (int style) >> { >>struct machine_function *m = cfun->machine; >>struct machine_frame_state frame_state_save = m->fs; >> - struct ix86_frame frame; >>bool restore_regs_via_mov; >>bool using_drap; >>bool restore_stub_is_tail = false; >> @@ -14304,7 +14302,7 @@ ix86_expand_epilogue (int style) >> } >> >>ix86_finalize_stack_frame_flags (); >> - frame = m->frame; >> + struct ix86_frame &frame = cfun->machine->frame; >> >>m->fs.sp_realigned = stack_realign_fp; >>m->fs.sp_valid = stack_realign_fp >> -- >> 2.14.3 >> > > >
Re: [PATCH] Fix gimplify_one_sizepos (PR libgomp/83590, take 4)
On Tue, 16 Jan 2018, Jakub Jelinek wrote: > Hi! > > After lengthy IRC discussions, here is an updated patch, which should also > fix the problem that variably_modified_type_p on a REAL_TYPE returns true > even when it has constant maximum and minimum. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Richard. > 2018-01-16 Jakub Jelinek > Richard Biener > > PR libgomp/83590 > * gimplify.c (gimplify_one_sizepos): For is_gimple_constant (expr) > return early, inline manually is_gimple_sizepos. Make sure if we > call gimplify_expr we don't end up with a gimple constant. > * tree.c (variably_modified_type_p): Don't return true for > is_gimple_constant (_t). Inline manually is_gimple_sizepos. > * gimplify.h (is_gimple_sizepos): Remove. > > --- gcc/gimplify.c.jj 2018-01-12 16:38:50.705238254 +0100 > +++ gcc/gimplify.c2018-01-16 12:21:15.895859416 +0100 > @@ -12562,7 +12562,10 @@ gimplify_one_sizepos (tree *expr_p, gimp > a VAR_DECL. If it's a VAR_DECL from another function, the gimplifier > will want to replace it with a new variable, but that will cause > problems > if this type is from outside the function. It's OK to have that here. > */ > - if (is_gimple_sizepos (expr)) > + if (expr == NULL_TREE > + || is_gimple_constant (expr) > + || TREE_CODE (expr) == VAR_DECL > + || CONTAINS_PLACEHOLDER_P (expr)) > return; > >*expr_p = unshare_expr (expr); > @@ -12570,6 +12573,12 @@ gimplify_one_sizepos (tree *expr_p, gimp >/* SSA names in decl/type fields are a bad idea - they'll get reclaimed > if the def vanishes. */ >gimplify_expr (expr_p, stmt_p, NULL, is_gimple_val, fb_rvalue, false); > + > + /* If expr wasn't already is_gimple_sizepos or is_gimple_constant from the > + FE, ensure that it is a VAR_DECL, otherwise we might handle some decls > + as gimplify_vla_decl even when they would have all sizes INTEGER_CSTs. > */ > + if (is_gimple_constant (*expr_p)) > +*expr_p = get_initialized_tmp_var (*expr_p, stmt_p, NULL, false); > } > > /* Gimplify the body of statements of FNDECL and return a GIMPLE_BIND node > --- gcc/tree.c.jj 2018-01-15 10:01:40.830186474 +0100 > +++ gcc/tree.c2018-01-16 12:24:11.254821615 +0100 > @@ -8825,11 +8825,12 @@ variably_modified_type_p (tree type, tre >do { tree _t = (T); > \ > if (_t != NULL_TREE > \ > && _t != error_mark_node\ > - && TREE_CODE (_t) != INTEGER_CST\ > + && !CONSTANT_CLASS_P (_t) \ > && TREE_CODE (_t) != PLACEHOLDER_EXPR \ > && (!fn \ > || (!TYPE_SIZES_GIMPLIFIED (type) \ > - && !is_gimple_sizepos (_t)) \ > + && (TREE_CODE (_t) != VAR_DECL \ > + && !CONTAINS_PLACEHOLDER_P (_t))) \ > || walk_tree (&_t, find_var_from_fn, fn, NULL)))\ >return true; } while (0) > > --- gcc/gimplify.h.jj 2018-01-03 10:19:53.757533721 +0100 > +++ gcc/gimplify.h2018-01-16 12:24:51.995812831 +0100 > @@ -85,23 +85,4 @@ extern enum gimplify_status gimplify_va_ > gimple_seq *); > gimple *gimplify_assign (tree, tree, gimple_seq *); > > -/* Return true if gimplify_one_sizepos doesn't need to gimplify > - expr (when in TYPE_SIZE{,_UNIT} and similar type/decl size/bitsize > - fields). */ > - > -static inline bool > -is_gimple_sizepos (tree expr) > -{ > - /* gimplify_one_sizepos doesn't need to do anything if the value isn't > there, > - is constant, or contains A PLACEHOLDER_EXPR. We also don't want to do > - anything if it's already a VAR_DECL. If it's a VAR_DECL from another > - function, the gimplifier will want to replace it with a new variable, > - but that will cause problems if this type is from outside the function. > - It's OK to have that here. */ > - return (expr == NULL_TREE > - || TREE_CODE (expr) == INTEGER_CST > - || TREE_CODE (expr) == VAR_DECL > - || CONTAINS_PLACEHOLDER_P (expr)); > -} > - > #endif /* GCC_GIMPLIFY_H */ > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [C++ PATCH] Fix ICE in member_vec_dedup (PR c++/83825)
On 01/15/2018 04:46 PM, Jakub Jelinek wrote: Hi! As the testcase shows, calls to member_vec_dedup and qsort are just guarded by the vector being non-NULL, which doesn't mean it must be non-empty, so we can't do (*member_vec)[0] on it. Fixed by the second hunk, the rest is just a small cleanup to use the vec.h methods. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok I'm a little surprised we get this case, but I think we've both found other strange boundary cases here. thanks. nathan -- Nathan Sidwell
Re: Two fixes for live-out SLP inductions (PR 83857)
On Tue, Jan 16, 2018 at 2:29 PM, Richard Sandiford wrote: > vect_analyze_loop_operations was calling vectorizable_live_operation > for all live-out phis, which led to a bogus ncopies calculation in > the pure SLP case. I think v_a_l_o should only be passing phis > that are vectorised using normal loop vectorisation, since > vect_slp_analyze_node_operations handles the SLP side (and knows > the correct slp_index and slp_node arguments to pass in, via > vect_analyze_stmt). > > With that fixed we hit an older bug that vectorizable_live_operation > didn't handle live-out SLP inductions. Fixed by using gimple_phi_result > rather than gimple_get_lhs for phis. > > Tested on aarch64-linux-gnu. OK to install? Ok. Richard. > Richard > > > 2018-01-16 Richard Sandiford > > gcc/ > PR tree-optimization/83857 > * tree-vect-loop.c (vect_analyze_loop_operations): Don't call > vectorizable_live_operation for pure SLP statements. > (vectorizable_live_operation): Handle PHIs. > > gcc/testsuite/ > PR tree-optimization/83857 > * gcc.dg/vect/pr83857.c: New test. > > Index: gcc/tree-vect-loop.c > === > --- gcc/tree-vect-loop.c2018-01-13 18:02:00.950360196 + > +++ gcc/tree-vect-loop.c2018-01-16 13:24:33.022528019 + > @@ -1851,7 +1851,10 @@ vect_analyze_loop_operations (loop_vec_i > ok = vectorizable_reduction (phi, NULL, NULL, NULL, NULL); > } > > - if (ok && STMT_VINFO_LIVE_P (stmt_info)) > + /* SLP PHIs are tested by vect_slp_analyze_node_operations. */ > + if (ok > + && STMT_VINFO_LIVE_P (stmt_info) > + && !PURE_SLP_STMT (stmt_info)) > ok = vectorizable_live_operation (phi, NULL, NULL, -1, NULL); > >if (!ok) > @@ -8217,7 +8220,11 @@ vectorizable_live_operation (gimple *stm >gcc_assert (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)); > >/* Get the correct slp vectorized stmt. */ > - vec_lhs = gimple_get_lhs (SLP_TREE_VEC_STMTS (slp_node)[vec_entry]); > + gimple *vec_stmt = SLP_TREE_VEC_STMTS (slp_node)[vec_entry]; > + if (gphi *phi = dyn_cast (vec_stmt)) > + vec_lhs = gimple_phi_result (phi); > + else > + vec_lhs = gimple_get_lhs (vec_stmt); > >/* Get entry to use. */ >bitstart = bitsize_int (vec_index); > Index: gcc/testsuite/gcc.dg/vect/pr83857.c > === > --- /dev/null 2018-01-15 18:48:25.844002736 + > +++ gcc/testsuite/gcc.dg/vect/pr83857.c 2018-01-16 13:24:33.021528058 + > @@ -0,0 +1,30 @@ > +/* { dg-do run } */ > +/* { dg-additional-options "-ffast-math" } */ > + > +#define N 100 > + > +double __attribute__ ((noinline, noclone)) > +f (double *x, double y) > +{ > + double a = 0; > + for (int i = 0; i < N; ++i) > +{ > + a += y; > + x[i * 2] += a; > + x[i * 2 + 1] += a; > +} > + return a - y; > +} > + > +double x[N * 2]; > + > +int > +main (void) > +{ > + if (f (x, 5) != (N - 1) * 5) > +__builtin_abort (); > + return 0; > +} > + > +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { > target vect_double } } } */ > +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target > vect_double } } } */
[PATCH] Fix PR83867
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2018-01-16 Richard Biener PR tree-optimization/83867 * tree-vect-stmts.c (vect_transform_stmt): Precompute nested_in_vect_loop_p since the scalar stmt may get invalidated. * gcc.dg/vect/pr83867.c: New testcase. Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 256722) +++ gcc/tree-vect-stmts.c (working copy) @@ -9426,6 +9426,11 @@ vect_transform_stmt (gimple *stmt, gimpl gcc_assert (slp_node || !PURE_SLP_STMT (stmt_info)); gimple *old_vec_stmt = STMT_VINFO_VEC_STMT (stmt_info); + bool nested_p = (STMT_VINFO_LOOP_VINFO (stmt_info) + && nested_in_vect_loop_p + (LOOP_VINFO_LOOP (STMT_VINFO_LOOP_VINFO (stmt_info)), +stmt)); + switch (STMT_VINFO_TYPE (stmt_info)) { case type_demotion_vec_info_type: @@ -9525,9 +9530,7 @@ vect_transform_stmt (gimple *stmt, gimpl /* Handle inner-loop stmts whose DEF is used in the loop-nest that is being vectorized, but outside the immediately enclosing loop. */ if (vec_stmt - && STMT_VINFO_LOOP_VINFO (stmt_info) - && nested_in_vect_loop_p (LOOP_VINFO_LOOP ( -STMT_VINFO_LOOP_VINFO (stmt_info)), stmt) + && nested_p && STMT_VINFO_TYPE (stmt_info) != reduc_vec_info_type && (STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_outer || STMT_VINFO_RELEVANT (stmt_info) == Index: gcc/testsuite/gcc.dg/vect/pr83867.c === --- gcc/testsuite/gcc.dg/vect/pr83867.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/pr83867.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O -ftrapv" } */ + +int +k5 (int u5, int aw) +{ + int v6; + + while (u5 < 1) +{ + while (v6 < 4) + ++v6; + + v6 = 0; + aw += u5 > 0; + ++u5; +} + + return aw; +}
Re: [PATCH] rtlanal: dead_or_set_regno_p should handle CLOBBER (PR83424)
On 01/16/2018 06:41 AM, Segher Boessenkool wrote: > On Mon, Dec 18, 2017 at 12:16:13PM -0700, Jeff Law wrote: >> On 12/16/2017 02:03 PM, Segher Boessenkool wrote: >>> In PR83424 combine's move_deaths puts a REG_DEAD not in the wrong place >>> because dead_or_set_regno_p does not account for CLOBBER insns. This >>> fixes it. >>> >>> Bootstrapped and tested on powerpc64-linux {-m32,-m64} and on x86_64-linux. >>> Is this okay for trunk? >>> >>> >>> Segher >>> >>> >>> 2017-12-16 Segher Boessenkool >>> >>> PR rtl-optimization/83424 >>> * rtlanal.c (dead_or_set_regno_p): Handle CLOBBER just like SET. >>> >>> gcc/testsuite/ >>> PR rtl-optimization/83424 >>> * gcc.dg/pr83424.c: New testsuite. >> OK. > > Is this okay for backports to 7 and 6, too? Yes. jeff
Re: [PATCH v2] Change default to -fno-math-errno
Joseph Myers wrote: > Another question to consider: what about configurations (mostly > soft-float) where floating-point exceptions are not supported? (glibc > wrongly defines math_errhandling to include MATH_ERREXCEPT there, but the > only option actually permitted by C99 in that case would be to define it > to MATH_ERRNO.) > > If we wish to distinguish that case, the > targetm.float_exceptions_rounding_supported_p hook is the one to use (in > the absence of anyone identifying a target that supports exceptions but > not rounding modes) - possibly together with flag_iso. I looked into this and the issue is that calling targetm functions is not possible until the backend is fully initialized (whether the pattern exists or not is not sufficient, the pattern condition must be valid to evaluate as well), and that happens after option parsing. In general soft-float is used on tiny targets which don't use errno at all (as in remove all the code dealing with it, including the errno variable itself!), so I believe it's best to let people explicitly enable -fmath-errno in the rare case when they really want to. >> lroundf in GLIBC doesn't set errno, so all the inefficiency was for nothing: > > (glibc bug 6797.) I see, that explains it! A decade old bug - it shows the popularity of errno... Wilco
Re: [PATCH] i386: More use reference of struct ix86_frame to avoid copy
On Tue, Jan 16, 2018 at 7:03 AM, Martin Liška wrote: > On 01/16/2018 01:35 PM, H.J. Lu wrote: >> On Tue, Jan 16, 2018 at 3:40 AM, H.J. Lu wrote: >>> This patch has been used with my Spectre backport for GCC 7 for many >>> weeks and has been checked into GCC 7 branch. Should I revert it on >>> GCC 7 branch or check it into trunk? >> >> Ada build failed with this on trunk: >> >> raised STORAGE_ERROR : stack overflow or erroneous memory access >> make[5]: *** >> [/export/gnu/import/git/sources/gcc/gcc/ada/Make-generated.in:45: >> ada/sinfo.h] Error 1 > > Hello. > > I know that you've already reverted the change, but it's possible to replace > struct ix86_frame &frame = cfun->machine->frame; > > with: > struct ix86_frame *frame = &cfun->machine->frame; > > And replace usages with point access operator (->). That would also avoid > copying. Won't it be equivalent to reference? > One another question. After you switched to references, isn't the behavior of > function > ix86_expand_epilogue as it also contains write to frame struct like: > > 14799/* Special care must be taken for the normal return case of a > function > 14800 using eh_return: the eax and edx registers are marked as saved, > but > 14801 not restored along this path. Adjust the save location to > match. */ > 14802if (crtl->calls_eh_return && style != 2) > 14803 frame.reg_save_offset -= 2 * UNITS_PER_WORD; That could be the issue. I will double check it. Thanks. H.J. > Thanks for clarification. > Martin > >> >> Let me revert it on gcc-7-branch. >> >> H.J. >>> H.J. >>> --- >>> When there is no need to make a copy of ix86_frame, we can use reference >>> of struct ix86_frame to avoid copy. >>> >>> * config/i386/i386.c (ix86_expand_prologue): Use reference of >>> struct ix86_frame. >>> (ix86_expand_epilogue): Likewise. >>> --- >>> gcc/config/i386/i386.c | 6 ++ >>> 1 file changed, 2 insertions(+), 4 deletions(-) >>> >>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >>> index bfb31db8752..9eba3ffd5d6 100644 >>> --- a/gcc/config/i386/i386.c >>> +++ b/gcc/config/i386/i386.c >>> @@ -13385,7 +13385,6 @@ ix86_expand_prologue (void) >>> { >>>struct machine_function *m = cfun->machine; >>>rtx insn, t; >>> - struct ix86_frame frame; >>>HOST_WIDE_INT allocate; >>>bool int_registers_saved; >>>bool sse_registers_saved; >>> @@ -13413,7 +13412,7 @@ ix86_expand_prologue (void) >>>m->fs.sp_valid = true; >>>m->fs.sp_realigned = false; >>> >>> - frame = m->frame; >>> + struct ix86_frame &frame = cfun->machine->frame; >>> >>>if (!TARGET_64BIT && ix86_function_ms_hook_prologue >>> (current_function_decl)) >>> { >>> @@ -14291,7 +14290,6 @@ ix86_expand_epilogue (int style) >>> { >>>struct machine_function *m = cfun->machine; >>>struct machine_frame_state frame_state_save = m->fs; >>> - struct ix86_frame frame; >>>bool restore_regs_via_mov; >>>bool using_drap; >>>bool restore_stub_is_tail = false; >>> @@ -14304,7 +14302,7 @@ ix86_expand_epilogue (int style) >>> } >>> >>>ix86_finalize_stack_frame_flags (); >>> - frame = m->frame; >>> + struct ix86_frame &frame = cfun->machine->frame; >>> >>>m->fs.sp_realigned = stack_realign_fp; >>>m->fs.sp_valid = stack_realign_fp >>> -- >>> 2.14.3 >>> >> >> >> > -- H.J.
VIEW_CONVERT_EXPR slots for strict-align targets (PR 83884)
This PR is about a case in which we VIEW_CONVERT a variable-sized unaligned record: unit-size align:8 ...> to an aligned 32-bit integer. The strict-alignment handling of this case creates an aligned temporary slot, moves the operand into the slot in the operand's original mode, then accesses the slot in the more-aligned result mode. Previously the size of the temporary slot was calculated using: HOST_WIDE_INT temp_size = MAX (int_size_in_bytes (inner_type), (HOST_WIDE_INT) GET_MODE_SIZE (mode)); int_size_in_bytes would return -1 for the variable-length type, so we'd use the size of the result mode for the slot. r256152 replaced int_size_in_bytes with tree_to_poly_uint64, which triggered an ICE. I'd assumed that variable-length types couldn't occur here, since it seems strange to view-convert a variable-length type to a fixed-length one. It also seemed strange that (with the old code) we'd ignore the size of the operand if it was a variable V but honour it if it was a constant C, even though it's presumably possible for V to equal that C at runtime. If op0 has BLKmode we do a block copy of GET_MODE_SIZE (mode) bytes and then convert the slot to "mode": poly_uint64 mode_size = GET_MODE_SIZE (mode); ... if (GET_MODE (op0) == BLKmode) { rtx size_rtx = gen_int_mode (mode_size, Pmode); emit_block_move (new_with_op0_mode, op0, size_rtx, (modifier == EXPAND_STACK_PARM ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); } else ... op0 = new_rtx; } } op0 = adjust_address (op0, mode, 0); so I think in that case just the size of "mode" is enough, even if op0 is a fixed-size type. For non-BLKmode op0 we first move in op0's mode and then convert the slot to "mode": emit_move_insn (new_with_op0_mode, op0); op0 = new_rtx; } } op0 = adjust_address (op0, mode, 0); so I think we want the maximum of the two mode sizes in that case (assuming they can be different sizes). But is this VIEW_CONVERT_EXPR really valid? Maybe this is just papering over a deeper issue. There again, the MAX in the old code was presumably there because the sizes can be different... Richard 2018-01-16 Richard Sandiford gcc/ PR middle-end/83884 * expr.c (expand_expr_real_1): Use the size of GET_MODE (op0) rather than the size of inner_type to determine the stack slot size when handling VIEW_CONVERT_EXPRs on strict-alignment targets. Index: gcc/expr.c === --- gcc/expr.c 2018-01-14 08:42:44.497155977 + +++ gcc/expr.c 2018-01-16 16:07:22.737883774 + @@ -11145,11 +11145,11 @@ expand_expr_real_1 (tree exp, rtx target } else if (STRICT_ALIGNMENT) { - tree inner_type = TREE_TYPE (treeop0); poly_uint64 mode_size = GET_MODE_SIZE (mode); - poly_uint64 op0_size - = tree_to_poly_uint64 (TYPE_SIZE_UNIT (inner_type)); - poly_int64 temp_size = upper_bound (op0_size, mode_size); + poly_uint64 temp_size = mode_size; + if (GET_MODE (op0) != BLKmode) + temp_size = upper_bound (temp_size, +GET_MODE_SIZE (GET_MODE (op0))); rtx new_rtx = assign_stack_temp_for_type (mode, temp_size, type); rtx new_with_op0_mode
[PATCH v2][AArch64] Remove remaining uses of * in patterns
v2: Rebased after the big SVE commits Remove the remaining uses of '*' from aarch64.md. Using '*' in alternatives is typically incorrect as it tells the register allocator to ignore those alternatives. Also add a missing '?' so we prefer a floating point register for same-size int<->fp conversions. Passes regress & bootstrap, OK for commit? ChangeLog: 2018-01-16 Wilco Dijkstra * config/aarch64/aarch64.md (mov): Remove '*' in alternatives. (movsi_aarch64): Likewise. (load_pairsi): Likewise. (load_pairdi): Likewise. (store_pairsi): Likewise. (store_pairdi): Likewise. (load_pairsf): Likewise. (load_pairdf): Likewise. (store_pairsf): Likewise. (store_pairdf): Likewise. (zero_extend): Likewise. (fcvt_target): Add '?' to prefer w over r. gcc/testsuite/ * gcc.target/aarch64/vfp-1.c: Update test. -- diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e52e8350a203b288208c1acb12c8b881d5e8039a..088ed8cb0aad0be08a7e19064708ea14499230f2 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -907,8 +907,8 @@ (define_expand "mov" ) (define_insn "*mov_aarch64" - [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r, *w,r ,r,*w, m, m, r,*w,*w") - (match_operand:SHORT 1 "aarch64_mov_operand" " r,M,D,Usv,m, m,rZ,*w,*w, r,*w"))] + [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r, w,r ,r,w, m,m,r,w,w") + (match_operand:SHORT 1 "aarch64_mov_operand" " r,M,D,Usv,m,m,rZ,w,w,r,w"))] "(register_operand (operands[0], mode) || aarch64_reg_or_zero (operands[1], mode))" { @@ -974,7 +974,7 @@ (define_expand "mov" (define_insn_and_split "*movsi_aarch64" [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m, r, r, w,r,w, w") - (match_operand:SI 1 "aarch64_mov_operand" " r,r,k,M,n,Usv,m,m,rZ,*w,Usa,Ush,rZ,w,w,Ds"))] + (match_operand:SI 1 "aarch64_mov_operand" " r,r,k,M,n,Usv,m,m,rZ,w,Usa,Ush,rZ,w,w,Ds"))] "(register_operand (operands[0], SImode) || aarch64_reg_or_zero (operands[1], SImode))" "@ @@ -1281,9 +1281,9 @@ (define_expand "movmemdi" ;; Operands 1 and 3 are tied together by the final condition; so we allow ;; fairly lax checking on the second memory operation. (define_insn "load_pairsi" - [(set (match_operand:SI 0 "register_operand" "=r,*w") + [(set (match_operand:SI 0 "register_operand" "=r,w") (match_operand:SI 1 "aarch64_mem_pair_operand" "Ump,Ump")) - (set (match_operand:SI 2 "register_operand" "=r,*w") + (set (match_operand:SI 2 "register_operand" "=r,w") (match_operand:SI 3 "memory_operand" "m,m"))] "rtx_equal_p (XEXP (operands[3], 0), plus_constant (Pmode, @@ -1297,9 +1297,9 @@ (define_insn "load_pairsi" ) (define_insn "load_pairdi" - [(set (match_operand:DI 0 "register_operand" "=r,*w") + [(set (match_operand:DI 0 "register_operand" "=r,w") (match_operand:DI 1 "aarch64_mem_pair_operand" "Ump,Ump")) - (set (match_operand:DI 2 "register_operand" "=r,*w") + (set (match_operand:DI 2 "register_operand" "=r,w") (match_operand:DI 3 "memory_operand" "m,m"))] "rtx_equal_p (XEXP (operands[3], 0), plus_constant (Pmode, @@ -1317,9 +1317,9 @@ (define_insn "load_pairdi" ;; fairly lax checking on the second memory operation. (define_insn "store_pairsi" [(set (match_operand:SI 0 "aarch64_mem_pair_operand" "=Ump,Ump") - (match_operand:SI 1 "aarch64_reg_or_zero" "rZ,*w")) + (match_operand:SI 1 "aarch64_reg_or_zero" "rZ,w")) (set (match_operand:SI 2 "memory_operand" "=m,m") - (match_operand:SI 3 "aarch64_reg_or_zero" "rZ,*w"))] + (match_operand:SI 3 "aarch64_reg_or_zero" "rZ,w"))] "rtx_equal_p (XEXP (operands[2], 0), plus_constant (Pmode, XEXP (operands[0], 0), @@ -1333,9 +1333,9 @@ (define_insn "store_pairsi" (define_insn "store_pairdi" [(set (match_operand:DI 0 "aarch64_mem_pair_operand" "=Ump,Ump") - (match_operand:DI 1 "aarch64_reg_or_zero" "rZ,*w")) + (match_operand:DI 1 "aarch64_reg_or_zero" "rZ,w")) (set (match_operand:DI 2 "memory_operand" "=m,m") - (match_operand:DI 3 "aarch64_reg_or_zero" "rZ,*w"))] + (match_operand:DI 3 "aarch64_reg_or_zero" "rZ,w"))] "rtx_equal_p (XEXP (operands[2], 0), plus_constant (Pmode, XEXP (operands[0], 0), @@ -1350,9 +1350,9 @@ (define_insn "store_pairdi" ;; Operands 1 and 3 are tied together by the final condition; so we allow ;; fairly lax checking on the second memory operation. (define_insn "load_pairsf" - [(set (match_operand:SF 0 "register_operand" "=w,*r") + [(set (match_operand:SF 0 "register_operand" "=w,r") (match_operand:SF 1 "aarch64_mem_pair_operand" "Ump,Ump")) - (set (match_operand:SF 2 "register_operand" "=w,*r") + (set (match_operand:SF 2 "reg
Re: [PATCH][WWWDOCS][AArch64][ARM] Update GCC 8 release notes
On Tue, Jan 16, 2018 at 02:21:30PM +, Tamar Christina wrote: > Hi Kyrill, > > > > > xgene1 was added a few releases ago, better to use one of the new additions > > from the above list. > > For example -mtune=cortex-r52. > > Thanks, I have updated the patch. I'll wait for an ok from an AArch64 > maintainer and a Docs maintainer. OK. But you have the same issue in the AArch64 part. James > Index: htdocs/gcc-8/changes.html > === > RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v > retrieving revision 1.26 > diff -u -r1.26 changes.html > --- htdocs/gcc-8/changes.html 11 Jan 2018 09:31:53 - 1.26 > +++ htdocs/gcc-8/changes.html 16 Jan 2018 14:12:57 - > @@ -147,7 +147,51 @@ > > AArch64 > > - > + > +The Armv8.4-A architecture is now supported. It can be used by > +specifying the -march=armv8.4-a option. > + > + > +The Dot Product instructions are now supported as an optional extension > to the > +Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. The > extension can be used by > +specifying the +dotprod architecture extension. E.g. > -march=armv8.2-a+dotprod. > + > + > +The Armv8-A +crypto extension has now been split into two > extensions for finer grained control: > + > + +aes which contains the Armv8-A AES crytographic > instructions. > + +sha2 which contains the Armv8-A SHA2 and SHA1 > cryptographic instructions. > + > +Using +crypto will now enable these two extensions. > + > + > +New Armv8.4-A FP16 Floating Point Multiplication Variant instructions > have been added. These instructions are > +mandatory in Armv8.4-A but available as an optional extension to > Armv8.2-A and Armv8.3-A. The new extension > +can be used by specifying the +fp16fml architectural > extension on Armv8.2-A and Armv8.3-A. On Armv8.4-A > +the instructions can be enabled by specifying +fp16. > + > + > +New cryptographic instructions have been added as optional extensions to > Armv8.2-A and newer. These instructions can > +be enabled with: > + > + +sha3 New SHA3 and SHA2 instructions from Armv8.4-A. > This implies +sha2. > + +sm4 New SM3 and SM4 instructions from Armv8.4-A. > + > + > + > + Support has been added for the following processors > + (GCC identifiers in parentheses): > + > + Arm Cortex-A75 (cortex-a75). > + Arm Cortex-A55 (cortex-a55). > + Arm Cortex-A55/Cortex-A75 DynamIQ big.LITTLE > (cortex-a75.cortex-a55). > + > + The GCC identifiers can be used > + as arguments to the -mcpu or -mtune options, > + for example: -mcpu=cortex-a75 or > + -mtune=thunderx2t99p1 or as arguments to the equivalent > target > + attributes and pragmas. > + > > > ARM > @@ -169,14 +213,58 @@ > removed in a future release. > > > -The default link behavior for ARMv6 and ARMv7-R targets has been > +The default link behavior for Armv6 and Armv7-R targets has been > changed to produce BE8 format when generating big-endian images. A new > flag -mbe32 can be used to force the linker to produce > legacy BE32 format images. There is no change of behavior for > -ARMv6-m and other ARMv7 or later targets: these already defaulted > +Armv6-M and other Armv7 or later targets: these already defaulted > to BE8 format. This change brings GCC into alignment with other > compilers for the ARM architecture. > > + > +The Armv8-R architecture is now supported. It can be used by specifying > the > +-march=armv8-r option. > + > + > +The Armv8.3-A architecture is now supported. It can be used by > +specifying the -march=armv8.3-a option. > + > + > +The Armv8.4-A architecture is now supported. It can be used by > +specifying the -march=armv8.4-a option. > + > + > + The Dot Product instructions are now supported as an optional extension > to the > + Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. The > extension can be used by > + specifying the +dotprod architecture extension. E.g. > -march=armv8.2-a+dotprod. > + > + > + > +Support for setting extensions and architectures using the GCC target > pragma and attribute has been added. > +It can be used by specifying #pragma GCC target > ("arch=..."), #pragma GCC target ("+extension"), > +__attribute__((target("arch=..."))) or > __attribute__((target("+extension"))). > + > + > +New Armv8.4-A FP16 Floating Point Multiplication Variant instructions > have been added. These instructions are > +mandatory in Armv8.4-A but available as an optional extension to > Armv8.2-A and Armv8.3-A. The new extension > +can be used by specifying the +fp16fml architectural > extension on Armv8.2-A and Armv8.3-A. On Armv8.4-A > +the instructions can be ena
Re: [PATCH][WWWDOCS][AArch64][ARM] Update GCC 8 release notes
Th 01/16/2018 16:36, James Greenhalgh wrote: > On Tue, Jan 16, 2018 at 02:21:30PM +, Tamar Christina wrote: > > Hi Kyrill, > > > > > > > > xgene1 was added a few releases ago, better to use one of the new > > > additions from the above list. > > > For example -mtune=cortex-r52. > > > > Thanks, I have updated the patch. I'll wait for an ok from an AArch64 > > maintainer and a Docs maintainer. > > OK. But you have the same issue in the AArch64 part. Thanks, I've updated the patch, I'll wait for a bit for a doc reviewer if I don't hear anything I'll assume the patch is OK. Thanks, Tamar > > James > > > Index: htdocs/gcc-8/changes.html > > === > > RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v > > retrieving revision 1.26 > > diff -u -r1.26 changes.html > > --- htdocs/gcc-8/changes.html 11 Jan 2018 09:31:53 - 1.26 > > +++ htdocs/gcc-8/changes.html 16 Jan 2018 14:12:57 - > > @@ -147,7 +147,51 @@ > > > > AArch64 > > > > - > > + > > +The Armv8.4-A architecture is now supported. It can be used by > > +specifying the -march=armv8.4-a option. > > + > > + > > +The Dot Product instructions are now supported as an optional > > extension to the > > +Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. The > > extension can be used by > > +specifying the +dotprod architecture extension. E.g. > > -march=armv8.2-a+dotprod. > > + > > + > > +The Armv8-A +crypto extension has now been split into two > > extensions for finer grained control: > > + > > + +aes which contains the Armv8-A AES crytographic > > instructions. > > + +sha2 which contains the Armv8-A SHA2 and SHA1 > > cryptographic instructions. > > + > > +Using +crypto will now enable these two extensions. > > + > > + > > +New Armv8.4-A FP16 Floating Point Multiplication Variant instructions > > have been added. These instructions are > > +mandatory in Armv8.4-A but available as an optional extension to > > Armv8.2-A and Armv8.3-A. The new extension > > +can be used by specifying the +fp16fml architectural > > extension on Armv8.2-A and Armv8.3-A. On Armv8.4-A > > +the instructions can be enabled by specifying +fp16. > > + > > + > > +New cryptographic instructions have been added as optional extensions > > to Armv8.2-A and newer. These instructions can > > +be enabled with: > > + > > + +sha3 New SHA3 and SHA2 instructions from > > Armv8.4-A. This implies +sha2. > > + +sm4 New SM3 and SM4 instructions from Armv8.4-A. > > + > > + > > + > > + Support has been added for the following processors > > + (GCC identifiers in parentheses): > > + > > + Arm Cortex-A75 (cortex-a75). > > +Arm Cortex-A55 (cortex-a55). > > +Arm Cortex-A55/Cortex-A75 DynamIQ big.LITTLE > > (cortex-a75.cortex-a55). > > + > > + The GCC identifiers can be used > > + as arguments to the -mcpu or -mtune > > options, > > + for example: -mcpu=cortex-a75 or > > + -mtune=thunderx2t99p1 or as arguments to the > > equivalent target > > + attributes and pragmas. > > + > > > > > > ARM > > @@ -169,14 +213,58 @@ > > removed in a future release. > > > > > > -The default link behavior for ARMv6 and ARMv7-R targets has been > > +The default link behavior for Armv6 and Armv7-R targets has been > > changed to produce BE8 format when generating big-endian images. A new > > flag -mbe32 can be used to force the linker to produce > > legacy BE32 format images. There is no change of behavior for > > -ARMv6-m and other ARMv7 or later targets: these already defaulted > > +Armv6-M and other Armv7 or later targets: these already defaulted > > to BE8 format. This change brings GCC into alignment with other > > compilers for the ARM architecture. > > > > + > > +The Armv8-R architecture is now supported. It can be used by > > specifying the > > +-march=armv8-r option. > > + > > + > > +The Armv8.3-A architecture is now supported. It can be used by > > +specifying the -march=armv8.3-a option. > > + > > + > > +The Armv8.4-A architecture is now supported. It can be used by > > +specifying the -march=armv8.4-a option. > > + > > + > > + The Dot Product instructions are now supported as an optional > > extension to the > > + Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. The > > extension can be used by > > + specifying the +dotprod architecture extension. E.g. > > -march=armv8.2-a+dotprod. > > + > > + > > + > > +Support for setting extensions and architectures using the GCC target > > pragma and attribute has been added. > > +It can be used by specifying #pragma GCC target > > ("arch=..."), #pragma GCC target ("+extension"), > > +__attribute__((target
[PATCH 2/2] GCC 6: ii386: Use reference of struct ix86_frame to avoid copy
From: hjl When there is no need to make a copy of ix86_frame, we can use reference of struct ix86_frame to avoid copy. Backport from mainline 2017-11-06 H.J. Lu * config/i386/i386.c (ix86_can_use_return_insn_p): Use reference of struct ix86_frame. (ix86_initial_elimination_offset): Likewise. (ix86_expand_split_stack_prologue): Likewise. --- gcc/config/i386/i386.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a1ff32b648b..13ebf107e90 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -10887,7 +10887,6 @@ symbolic_reference_mentioned_p (rtx op) bool ix86_can_use_return_insn_p (void) { - struct ix86_frame frame; if (! reload_completed || frame_pointer_needed) return 0; @@ -10898,7 +10897,7 @@ ix86_can_use_return_insn_p (void) return 0; ix86_compute_frame_layout (); - frame = cfun->machine->frame; + struct ix86_frame &frame = cfun->machine->frame; return (frame.stack_pointer_offset == UNITS_PER_WORD && (frame.nregs + frame.nsseregs) == 0); } @@ -11310,7 +11309,7 @@ HOST_WIDE_INT ix86_initial_elimination_offset (int from, int to) { ix86_compute_frame_layout (); - struct ix86_frame frame = cfun->machine->frame; + struct ix86_frame &frame = cfun->machine->frame; if (from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM) return frame.hard_frame_pointer_offset; @@ -13821,7 +13820,6 @@ static GTY(()) rtx split_stack_fn_large; void ix86_expand_split_stack_prologue (void) { - struct ix86_frame frame; HOST_WIDE_INT allocate; unsigned HOST_WIDE_INT args_size; rtx_code_label *label; @@ -13834,7 +13832,7 @@ ix86_expand_split_stack_prologue (void) ix86_finalize_stack_realign_flags (); ix86_compute_frame_layout (); - frame = cfun->machine->frame; + struct ix86_frame &frame = cfun->machine->frame; allocate = frame.stack_pointer_offset - INCOMING_FRAME_SP_OFFSET; /* This is the label we will branch to if we have enough stack -- 2.14.3
Re: [PATCH][WWWDOCS][AArch64][ARM] Update GCC 8 release notes
I seem to have forgotten the patch :) The 01/16/2018 16:56, Tamar Christina wrote: > Th 01/16/2018 16:36, James Greenhalgh wrote: > > On Tue, Jan 16, 2018 at 02:21:30PM +, Tamar Christina wrote: > > > Hi Kyrill, > > > > > > > > > > > xgene1 was added a few releases ago, better to use one of the new > > > > additions from the above list. > > > > For example -mtune=cortex-r52. > > > > > > Thanks, I have updated the patch. I'll wait for an ok from an AArch64 > > > maintainer and a Docs maintainer. > > > > OK. But you have the same issue in the AArch64 part. > > Thanks, I've updated the patch, I'll wait for a bit for a doc reviewer if I > don't hear anything I'll assume > the patch is OK. > > Thanks, > Tamar > > > > James > > > > > Index: htdocs/gcc-8/changes.html > > > === > > > RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v > > > retrieving revision 1.26 > > > diff -u -r1.26 changes.html > > > --- htdocs/gcc-8/changes.html 11 Jan 2018 09:31:53 - 1.26 > > > +++ htdocs/gcc-8/changes.html 16 Jan 2018 14:12:57 - > > > @@ -147,7 +147,51 @@ > > > > > > AArch64 > > > > > > - > > > + > > > +The Armv8.4-A architecture is now supported. It can be used by > > > +specifying the -march=armv8.4-a option. > > > + > > > + > > > +The Dot Product instructions are now supported as an optional > > > extension to the > > > +Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. > > > The extension can be used by > > > +specifying the +dotprod architecture extension. E.g. > > > -march=armv8.2-a+dotprod. > > > + > > > + > > > +The Armv8-A +crypto extension has now been split into > > > two extensions for finer grained control: > > > + > > > + +aes which contains the Armv8-A AES crytographic > > > instructions. > > > + +sha2 which contains the Armv8-A SHA2 and SHA1 > > > cryptographic instructions. > > > + > > > +Using +crypto will now enable these two extensions. > > > + > > > + > > > +New Armv8.4-A FP16 Floating Point Multiplication Variant > > > instructions have been added. These instructions are > > > +mandatory in Armv8.4-A but available as an optional extension to > > > Armv8.2-A and Armv8.3-A. The new extension > > > +can be used by specifying the +fp16fml architectural > > > extension on Armv8.2-A and Armv8.3-A. On Armv8.4-A > > > +the instructions can be enabled by specifying +fp16. > > > + > > > + > > > +New cryptographic instructions have been added as optional > > > extensions to Armv8.2-A and newer. These instructions can > > > +be enabled with: > > > + > > > + +sha3 New SHA3 and SHA2 instructions from > > > Armv8.4-A. This implies +sha2. > > > + +sm4 New SM3 and SM4 instructions from Armv8.4-A. > > > + > > > + > > > + > > > + Support has been added for the following processors > > > + (GCC identifiers in parentheses): > > > + > > > + Arm Cortex-A75 (cortex-a75). > > > + Arm Cortex-A55 (cortex-a55). > > > + Arm Cortex-A55/Cortex-A75 DynamIQ big.LITTLE > > > (cortex-a75.cortex-a55). > > > + > > > + The GCC identifiers can be used > > > + as arguments to the -mcpu or -mtune > > > options, > > > + for example: -mcpu=cortex-a75 or > > > + -mtune=thunderx2t99p1 or as arguments to the > > > equivalent target > > > + attributes and pragmas. > > > + > > > > > > > > > ARM > > > @@ -169,14 +213,58 @@ > > > removed in a future release. > > > > > > > > > -The default link behavior for ARMv6 and ARMv7-R targets has been > > > +The default link behavior for Armv6 and Armv7-R targets has been > > > changed to produce BE8 format when generating big-endian images. A > > > new > > > flag -mbe32 can be used to force the linker to produce > > > legacy BE32 format images. There is no change of behavior for > > > -ARMv6-m and other ARMv7 or later targets: these already defaulted > > > +Armv6-M and other Armv7 or later targets: these already defaulted > > > to BE8 format. This change brings GCC into alignment with other > > > compilers for the ARM architecture. > > > > > > + > > > +The Armv8-R architecture is now supported. It can be used by > > > specifying the > > > +-march=armv8-r option. > > > + > > > + > > > +The Armv8.3-A architecture is now supported. It can be used by > > > +specifying the -march=armv8.3-a option. > > > + > > > + > > > +The Armv8.4-A architecture is now supported. It can be used by > > > +specifying the -march=armv8.4-a option. > > > + > > > + > > > + The Dot Product instructions are now supported as an optional > > > extension to the > > > + Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. > > > The extension can be used by > > > + specifying the +dotprod ar
[PATCH 0/2] GCC 6: i386: Move struct ix86_frame to machine_function
This patch set makes ix86_frame available to i386 code generation. They are needed to backport the patch set of -mindirect-branch= to mitigate variant #2 of the speculative execution vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre. Tested on Linux/i686 and Linux/x86-64. hjl (2): i386: Move struct ix86_frame to machine_function i386: Use reference of struct ix86_frame to avoid copy gcc/config/i386/i386.c | 70 ++ gcc/config/i386/i386.h | 53 +- 2 files changed, 65 insertions(+), 58 deletions(-) -- 2.14.3
[PATCH 1/2] GCC 6: ii386: Move struct ix86_frame to machine_function
From: hjl Make ix86_frame available to i386 code generation. This is needed to backport the patch set of -mindirect-branch= to mitigate variant #2 of the speculative execution vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre. Backport from mainline 2017-06-01 Bernd Edlinger * config/i386/i386.c (ix86_frame): Moved to ... * config/i386/i386.h (ix86_frame): Here. (machine_function): Add frame. * config/i386/i386.c (ix86_compute_frame_layout): Repace the frame argument with &cfun->machine->frame. (ix86_can_use_return_insn_p): Don't pass &frame to ix86_compute_frame_layout. Copy frame from cfun->machine->frame. (ix86_can_eliminate): Likewise. (ix86_expand_prologue): Likewise. (ix86_expand_epilogue): Likewise. (ix86_expand_split_stack_prologue): Likewise. --- gcc/config/i386/i386.c | 68 ++ gcc/config/i386/i386.h | 53 ++- 2 files changed, 65 insertions(+), 56 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 8b5faac5129..a1ff32b648b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2434,53 +2434,6 @@ struct GTY(()) stack_local_entry { struct stack_local_entry *next; }; -/* Structure describing stack frame layout. - Stack grows downward: - - [arguments] - <- ARG_POINTER - saved pc - - saved static chain if ix86_static_chain_on_stack - - saved frame pointer if frame_pointer_needed - <- HARD_FRAME_POINTER - [saved regs] - <- regs_save_offset - [padding0] - - [saved SSE regs] - <- sse_regs_save_offset - [padding1] | - |<- FRAME_POINTER - [va_arg registers] | - | - [frame]| - | - [padding2] | = to_allocate - <- STACK_POINTER - */ -struct ix86_frame -{ - int nsseregs; - int nregs; - int va_arg_size; - int red_zone_size; - int outgoing_arguments_size; - - /* The offsets relative to ARG_POINTER. */ - HOST_WIDE_INT frame_pointer_offset; - HOST_WIDE_INT hard_frame_pointer_offset; - HOST_WIDE_INT stack_pointer_offset; - HOST_WIDE_INT hfp_save_offset; - HOST_WIDE_INT reg_save_offset; - HOST_WIDE_INT sse_reg_save_offset; - - /* When save_regs_using_mov is set, emit prologue using - move instead of push instructions. */ - bool save_regs_using_mov; -}; - /* Which cpu are we scheduling for. */ enum attr_cpu ix86_schedule; @@ -2572,7 +2525,7 @@ static unsigned int ix86_function_arg_boundary (machine_mode, const_tree); static rtx ix86_static_chain (const_tree, bool); static int ix86_function_regparm (const_tree, const_tree); -static void ix86_compute_frame_layout (struct ix86_frame *); +static void ix86_compute_frame_layout (void); static bool ix86_expand_vector_init_one_nonzero (bool, machine_mode, rtx, rtx, int); static void ix86_add_new_builtins (HOST_WIDE_INT); @@ -10944,7 +10897,8 @@ ix86_can_use_return_insn_p (void) if (crtl->args.pops_args && crtl->args.size >= 32768) return 0; - ix86_compute_frame_layout (&frame); + ix86_compute_frame_layout (); + frame = cfun->machine->frame; return (frame.stack_pointer_offset == UNITS_PER_WORD && (frame.nregs + frame.nsseregs) == 0); } @@ -11355,8 +11309,8 @@ ix86_can_eliminate (const int from, const int to) HOST_WIDE_INT ix86_initial_elimination_offset (int from, int to) { - struct ix86_frame frame; - ix86_compute_frame_layout (&frame); + ix86_compute_frame_layout (); + struct ix86_frame frame = cfun->machine->frame; if (from == ARG_POINTER_REGNUM && to == HARD_FRAME_POINTER_REGNUM) return frame.hard_frame_pointer_offset; @@ -11395,8 +11349,9 @@ ix86_builtin_setjmp_frame_value (void) /* Fill structure ix86_frame about frame of currently computed function. */ static void -ix86_compute_frame_layout (struct ix86_frame *frame) +ix86_compute_frame_layout (void) { + struct ix86_frame *frame = &cfun->machine->frame; unsigned HOST_WIDE_INT stack_alignment_needed; HOST_WIDE_INT offset; unsigned HOST_WIDE_INT preferred_alignment; @@ -12702,7 +12657,8 @@ ix86_expand_prologue (void) m->fs.sp_offset = INCOMING_FRAME_SP_OFFSET; m->fs.sp_valid = true; - ix86_compute_frame_layout (&frame); + ix86_compute_frame_layout (); + frame = m->frame; if (!TARGET_64BIT && ix86_function_ms_hook_prologue (current_function_decl)) { @@ -13379,7 +13335,8 @@ ix86_expand_epilogue (int style) bool using_drap; ix86_finalize_stack_realign_flags (); - ix86_compute_frame_layout (&fram
GCC 6: i386: Move struct ix86_frame to machine_function
This is needed for GCC 6 backport of Spectre patches: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01465.html https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01466.html https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01464.html -- H.J.
Re: GCC 8.0.0 Status Report (2018-01-15), Trunk in Regression and Documentation fixes only mode
On Tue, 16 Jan 2018, Segher Boessenkool wrote: > On Mon, Jan 15, 2018 at 09:21:07AM +0100, Richard Biener wrote: > > We're still in pretty bad shape regression-wise. Please also take > > the opportunity to check the state of your favorite host/target > > combination to make sure building and testing works appropriately. > > I tested building Linux (the kernel) for all supported architectures. > Everything builds (with my usual tweaks, link with libgcc etc.); > except x86_64 and sh have more problems in the kernel, and mips has > an ICE. I'll open a PR for that one. And all glibc architectures compile (and compile the testsuite) OK except for the sh4eb ICE reported in bug 83760 (and the longstanding coldfire issue, bug 68467). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] PR82964: Fix 128-bit immediate ICEs
On Mon, Jan 15, 2018 at 11:34:19AM +, Wilco Dijkstra wrote: > This fixes PR82964 which reports ICEs for some CONST_WIDE_INT immediates. > It turns out decimal floating point CONST_DOUBLE get changed into > CONST_WIDE_INT without checking the constraint on the operand, which > results in failures. Avoid this by only allowing SF/DF/TF mode floating > point constants in aarch64_legitimate_constant_p. A similar issue can > occur with 128-bit immediates which may be emitted even when disallowed > in aarch64_legitimate_constant_p, and the constraints in movti_aarch64 > don't match. Fix this with a new constraint and allowing valid immediates > in aarch64_legitimate_constant_p. > > Rather than allowing all 128-bit immediates and expanding in up to 8 > MOV/MOVK instructions, limit them to 4 instructions and use a literal > load for other cases. Improve the pr79041-2.c test to use a literal and > skip it for -fpic. > > This fixes all reported failures. OK for commit? Most of this makes sense, but I don't understand this relaxation in aarch64_legitimate_constant_p > - /* Do not allow wide int constants - this requires support in movti. */ > + /* Only allow simple 128-bit immediates. */ >if (CONST_WIDE_INT_P (x)) > -return false; > +return aarch64_mov128_immediate (x); I can see why this could be correct, but it is unclear why it is neccessary to fix the bug. What goes wrong if we leave this as "return false". I think the patch looks OK otherwise, but I'd appreciate an answer on that point before you commit. Thanks, James
[PATCH, rs6000] Fix ICE caused by recent patch: Generate lvx and stvx without swaps for aligned vector loads and stores
A patch committed on 2018-01-10 is causing an ICE with existing test program $GCC_SRC/gcc/testsuite/gcc.target/powerpc/pr83399.c, when compiled with the -m32 option. At the time of the commit, it was thought that this was a problem with the recent resolution of PR83399. However, further investigation revealed a problem with the patch that was just committed. The generated code did not distinguish between 32- and 64-bit targets. This patch corrects that problem. This has been bootstrapped and tested without regressions on powerpc64le-unknown-linux (P8) and on powerpc64-unknown-linux (P7) with both -m32 and -m64 target options. Is this ok for trunk? gcc/ChangeLog: 2018-01-16 Kelvin Nilsen * config/rs6000/rs6000-p8swap.c (rs6000_gen_stvx): Generate different rtl trees depending on TARGET_64BIT. (rs6000_gen_lvx): Likewise. Index: gcc/config/rs6000/rs6000-p8swap.c === --- gcc/config/rs6000/rs6000-p8swap.c (revision 256710) +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) @@ -1554,23 +1554,31 @@ rs6000_gen_stvx (enum machine_mode mode, rtx dest_ op1 = XEXP (memory_address, 0); op2 = XEXP (memory_address, 1); if (mode == V16QImode) - stvx = gen_altivec_stvx_v16qi_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v16qi_2op (src_exp, op1, op2) + : gen_altivec_stvx_v16qi_2op_si (src_exp, op1, op2); else if (mode == V8HImode) - stvx = gen_altivec_stvx_v8hi_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v8hi_2op (src_exp, op1, op2) + : gen_altivec_stvx_v8hi_2op_si (src_exp, op1, op2); #ifdef HAVE_V8HFmode else if (mode == V8HFmode) - stvx = gen_altivec_stvx_v8hf_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v8hf_2op (src_exp, op1, op2) + : gen_altivec_stvx_v8hf_2op_si (src_exp, op1, op2); #endif else if (mode == V4SImode) - stvx = gen_altivec_stvx_v4si_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v4si_2op (src_exp, op1, op2) + : gen_altivec_stvx_v4si_2op_si (src_exp, op1, op2); else if (mode == V4SFmode) - stvx = gen_altivec_stvx_v4sf_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v4sf_2op (src_exp, op1, op2) + : gen_altivec_stvx_v4sf_2op_si (src_exp, op1, op2); else if (mode == V2DImode) - stvx = gen_altivec_stvx_v2di_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v2di_2op (src_exp, op1, op2) + : gen_altivec_stvx_v2di_2op_si (src_exp, op1, op2); else if (mode == V2DFmode) - stvx = gen_altivec_stvx_v2df_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v2df_2op (src_exp, op1, op2) + : gen_altivec_stvx_v2df_2op_si (src_exp, op1, op2); else if (mode == V1TImode) - stvx = gen_altivec_stvx_v1ti_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v1ti_2op (src_exp, op1, op2) + : gen_altivec_stvx_v1ti_2op_si (src_exp, op1, op2); else /* KFmode, TFmode, other modes not expected in this context. */ gcc_unreachable (); @@ -1578,23 +1586,39 @@ rs6000_gen_stvx (enum machine_mode mode, rtx dest_ else /* REG_P (memory_address) */ { if (mode == V16QImode) - stvx = gen_altivec_stvx_v16qi_1op (src_exp, memory_address); + stvx = TARGET_64BIT ? + gen_altivec_stvx_v16qi_1op (src_exp, memory_address) + : gen_altivec_stvx_v16qi_1op_si (src_exp, memory_address); else if (mode == V8HImode) - stvx = gen_altivec_stvx_v8hi_1op (src_exp, memory_address); + stvx = TARGET_64BIT ? + gen_altivec_stvx_v8hi_1op (src_exp, memory_address) + : gen_altivec_stvx_v8hi_1op_si (src_exp, memory_address); #ifdef HAVE_V8HFmode else if (mode == V8HFmode) - stvx = gen_altivec_stvx_v8hf_1op (src_exp, memory_address); + stvx = TARGET_64BIT ? + gen_altivec_stvx_v8hf_1op (src_exp, memory_address) + : gen_altivec_stvx_v8hf_1op_si (src_exp, memory_address); #endif else if (mode == V4SImode) - stvx = gen_altivec_stvx_v4si_1op (src_exp, memory_address); + stvx =TARGET_64BIT ? + gen_altivec_stvx_v4si_1op (src_exp, memory_address) + : gen_altivec_stvx_v4si_1op_si (src_exp, memory_address); else if (mode == V4SFmode) - stvx = gen_altivec_stvx_v4sf_1op (src_exp, memory_address); + stvx = TARGET_64BIT ? + gen_altivec_stvx_v4sf_1op (src_exp, memory_address) + : gen_altivec_stvx_v4sf_1op_si (src_exp, memory_address); else if (mode == V2DImode) - stvx = gen_altivec_stvx_v2di_1op (src_exp, memory_address); + stvx = TARGET_64BIT ? + gen_altivec_stvx_v2di_1op (src_exp, memory_address) + : gen_altivec_stvx_v2di_1op_si (src_exp, memory_address)
Re: [PATCH][WWWDOCS][AArch64][ARM] Update GCC 8 release notes
Hi Tamar, On 16/01/18 16:56, Tamar Christina wrote: Th 01/16/2018 16:36, James Greenhalgh wrote: On Tue, Jan 16, 2018 at 02:21:30PM +, Tamar Christina wrote: Hi Kyrill, xgene1 was added a few releases ago, better to use one of the new additions from the above list. For example -mtune=cortex-r52. Thanks, I have updated the patch. I'll wait for an ok from an AArch64 maintainer and a Docs maintainer. OK. But you have the same issue in the AArch64 part. Thanks, I've updated the patch, I'll wait for a bit for a doc reviewer if I don't hear anything I'll assume the patch is OK. Gerald has confirmed a few times in the past that port maintainers can approve target-specific changes to the web pages, and there are words to that effect at: https://gcc.gnu.org/svnwrite.html . So I'd recommend you commit your patch once you've got approval for aarch64 and arm. Unless there's some specific part of the patch you'd like the docs maintainer to give you feedback on... Thanks again for working on this. Kyrill Thanks, Tamar James Index: htdocs/gcc-8/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v retrieving revision 1.26 diff -u -r1.26 changes.html --- htdocs/gcc-8/changes.html 11 Jan 2018 09:31:53 - 1.26 +++ htdocs/gcc-8/changes.html 16 Jan 2018 14:12:57 - @@ -147,7 +147,51 @@ AArch64 - + +The Armv8.4-A architecture is now supported. It can be used by +specifying the -march=armv8.4-a option. + + +The Dot Product instructions are now supported as an optional extension to the +Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. The extension can be used by +specifying the +dotprod architecture extension. E.g. -march=armv8.2-a+dotprod. + + +The Armv8-A +crypto extension has now been split into two extensions for finer grained control: + + +aes which contains the Armv8-A AES crytographic instructions. + +sha2 which contains the Armv8-A SHA2 and SHA1 cryptographic instructions. + +Using +crypto will now enable these two extensions. + + +New Armv8.4-A FP16 Floating Point Multiplication Variant instructions have been added. These instructions are +mandatory in Armv8.4-A but available as an optional extension to Armv8.2-A and Armv8.3-A. The new extension +can be used by specifying the +fp16fml architectural extension on Armv8.2-A and Armv8.3-A. On Armv8.4-A +the instructions can be enabled by specifying +fp16. + + +New cryptographic instructions have been added as optional extensions to Armv8.2-A and newer. These instructions can +be enabled with: + + +sha3 New SHA3 and SHA2 instructions from Armv8.4-A. This implies +sha2. + +sm4 New SM3 and SM4 instructions from Armv8.4-A. + + + + Support has been added for the following processors + (GCC identifiers in parentheses): + + Arm Cortex-A75 (cortex-a75). +Arm Cortex-A55 (cortex-a55). +Arm Cortex-A55/Cortex-A75 DynamIQ big.LITTLE (cortex-a75.cortex-a55). + + The GCC identifiers can be used + as arguments to the -mcpu or -mtune options, + for example: -mcpu=cortex-a75 or + -mtune=thunderx2t99p1 or as arguments to the equivalent target + attributes and pragmas. + ARM @@ -169,14 +213,58 @@ removed in a future release. -The default link behavior for ARMv6 and ARMv7-R targets has been +The default link behavior for Armv6 and Armv7-R targets has been changed to produce BE8 format when generating big-endian images. A new flag -mbe32 can be used to force the linker to produce legacy BE32 format images. There is no change of behavior for -ARMv6-m and other ARMv7 or later targets: these already defaulted +Armv6-M and other Armv7 or later targets: these already defaulted to BE8 format. This change brings GCC into alignment with other compilers for the ARM architecture. + +The Armv8-R architecture is now supported. It can be used by specifying the +-march=armv8-r option. + + +The Armv8.3-A architecture is now supported. It can be used by +specifying the -march=armv8.3-a option. + + +The Armv8.4-A architecture is now supported. It can be used by +specifying the -march=armv8.4-a option. + + + The Dot Product instructions are now supported as an optional extension to the + Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. The extension can be used by + specifying the +dotprod architecture extension. E.g. -march=armv8.2-a+dotprod. + + + +Support for setting extensions and architectures using the GCC target pragma and attribute has been added. +It can be used by specifying #pragma GCC target ("arch=..."), #pragma GCC target ("+extension"), +__attribute__((target("arch=...")
Compilation warning in simple-object-xcoff.c
Compiling GDB 8.0.91 with mingw.org's MinGW GCC 6.0.3 produces this warning in libiberty: gcc -c -DHAVE_CONFIG_H -O2 -gdwarf-4 -g3 -D__USE_MINGW_ACCESS -I. -I./../include -W -Wall -Wwrite-strings -Wc++-compat -Wstrict-prototypes -pedantic -D_GNU_SOURCE ./simple-object-xcoff.c -o simple-object-xcoff.o ./simple-object-xcoff.c: In function 'simple_object_xcoff_find_sections': ./simple-object-xcoff.c:605:25: warning: left shift count >= width of type [-Wshift-count-overflow] x_scnlen = x_scnlen << 32 ^~ And indeed x_scnlen is declared as a 32-bit data type off_t. I'm willing to test patches if needed. Thanks.
Re: [PATCH, rs6000] Fix ICE caused by recent patch: Generate lvx and stvx without swaps for aligned vector loads and stores
Hi Kelvin, On Tue, Jan 16, 2018 at 11:15:12AM -0600, Kelvin Nilsen wrote: > > A patch committed on 2018-01-10 is causing an ICE with existing test > program $GCC_SRC/gcc/testsuite/gcc.target/powerpc/pr83399.c, when > compiled with the -m32 option. At the time of the commit, it was > thought that this was a problem with the recent resolution of PR83399. > However, further investigation revealed a problem with the patch that > was just committed. The generated code did not distinguish between 32- > and 64-bit targets. > > This patch corrects that problem. > > This has been bootstrapped and tested without regressions on > powerpc64le-unknown-linux (P8) and on powerpc64-unknown-linux (P7) with > both -m32 and -m64 target options. Is this ok for trunk? > > > gcc/ChangeLog: > > 2018-01-16 Kelvin Nilsen > PR target/83399 ? Or is there another PR? > * config/rs6000/rs6000-p8swap.c (rs6000_gen_stvx): Generate > different rtl trees depending on TARGET_64BIT. > (rs6000_gen_lvx): Likewise. > > Index: gcc/config/rs6000/rs6000-p8swap.c > === > --- gcc/config/rs6000/rs6000-p8swap.c (revision 256710) > +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) > @@ -1554,23 +1554,31 @@ rs6000_gen_stvx (enum machine_mode mode, rtx dest_ >op1 = XEXP (memory_address, 0); >op2 = XEXP (memory_address, 1); >if (mode == V16QImode) > - stvx = gen_altivec_stvx_v16qi_2op (src_exp, op1, op2); > + stvx = TARGET_64BIT ? gen_altivec_stvx_v16qi_2op (src_exp, op1, op2) > + : gen_altivec_stvx_v16qi_2op_si (src_exp, op1, op2); Please indent this like stvx = TARGET_64BIT ? gen_altivec_stvx_v16qi_2op (src_exp, op1, op2) : gen_altivec_stvx_v16qi_2op_si (src_exp, op1, op2); >if (mode == V16QImode) > - stvx = gen_altivec_stvx_v16qi_1op (src_exp, memory_address); > + stvx = TARGET_64BIT ? > + gen_altivec_stvx_v16qi_1op (src_exp, memory_address) > + : gen_altivec_stvx_v16qi_1op_si (src_exp, memory_address); You should never have ? at the end of line; and ? and : indent with the controlling expression. So: stvx = TARGET_64BIT ? gen_altivec_stvx_v16qi_1op (src_exp, memory_address) : gen_altivec_stvx_v16qi_1op_si (src_exp, memory_address); Similar everywhere. Okay with that changed. Thanks! Segher
Re: VIEW_CONVERT_EXPR slots for strict-align targets (PR 83884)
On January 16, 2018 5:14:50 PM GMT+01:00, Richard Sandiford wrote: >This PR is about a case in which we VIEW_CONVERT a variable-sized >unaligned record: > >sizes-gimplified type_7 BLK >size >unit-size >align:8 ...> > >to an aligned 32-bit integer. The strict-alignment handling of >this case creates an aligned temporary slot, moves the operand >into the slot in the operand's original mode, then accesses the >slot in the more-aligned result mode. > >Previously the size of the temporary slot was calculated using: > > HOST_WIDE_INT temp_size >= MAX (int_size_in_bytes (inner_type), > (HOST_WIDE_INT) GET_MODE_SIZE (mode)); > >int_size_in_bytes would return -1 for the variable-length type, >so we'd use the size of the result mode for the slot. r256152 replaced >int_size_in_bytes with tree_to_poly_uint64, which triggered an ICE. > >I'd assumed that variable-length types couldn't occur here, since it >seems strange to view-convert a variable-length type to a fixed-length >one. It also seemed strange that (with the old code) we'd ignore the >size of the operand if it was a variable V but honour it if it was a >constant C, even though it's presumably possible for V to equal that >C at runtime. > >If op0 has BLKmode we do a block copy of GET_MODE_SIZE (mode) bytes >and then convert the slot to "mode": > > poly_uint64 mode_size = GET_MODE_SIZE (mode); > ... > if (GET_MODE (op0) == BLKmode) > { > rtx size_rtx = gen_int_mode (mode_size, Pmode); > emit_block_move (new_with_op0_mode, op0, size_rtx, > (modifier == EXPAND_STACK_PARM > ? BLOCK_OP_CALL_PARM > : BLOCK_OP_NORMAL)); > } > else > ... > > op0 = new_rtx; > } > } > > op0 = adjust_address (op0, mode, 0); > >so I think in that case just the size of "mode" is enough, even if op0 >is a fixed-size type. For non-BLKmode op0 we first move in op0's mode >and then convert the slot to "mode": > > emit_move_insn (new_with_op0_mode, op0); > > op0 = new_rtx; > } > } > > op0 = adjust_address (op0, mode, 0); > >so I think we want the maximum of the two mode sizes in that case >(assuming they can be different sizes). > >But is this VIEW_CONVERT_EXPR really valid? IMHO it is on the border of be being invalid (verify_gimple doesn't diagnose it). Using a BIT_FIELD_REF would be much better here. Richard. Maybe this is just >papering over a deeper issue. There again, the MAX in the old >code was presumably there because the sizes can be different... > >Richard > > >2018-01-16 Richard Sandiford > >gcc/ > PR middle-end/83884 > * expr.c (expand_expr_real_1): Use the size of GET_MODE (op0) > rather than the size of inner_type to determine the stack slot size > when handling VIEW_CONVERT_EXPRs on strict-alignment targets. > >Index: gcc/expr.c >=== >--- gcc/expr.c 2018-01-14 08:42:44.497155977 + >+++ gcc/expr.c 2018-01-16 16:07:22.737883774 + >@@ -11145,11 +11145,11 @@ expand_expr_real_1 (tree exp, rtx target > } > else if (STRICT_ALIGNMENT) > { >-tree inner_type = TREE_TYPE (treeop0); > poly_uint64 mode_size = GET_MODE_SIZE (mode); >-poly_uint64 op0_size >- = tree_to_poly_uint64 (TYPE_SIZE_UNIT (inner_type)); >-poly_int64 temp_size = upper_bound (op0_size, mode_size); >+poly_uint64 temp_size = mode_size; >+if (GET_MODE (op0) != BLKmode) >+ temp_size = upper_bound (temp_size, >+ GET_MODE_SIZE (GET_MODE (op0))); > rtx new_rtx > = assign_stack_temp_for_type (mode, temp_size, type); > rtx new_with_op0_mode
Re: Compilation warning in simple-object-xcoff.c
I think that warning is valid - the host has a 32-bit limit to file sizes (off_t) but it's trying to read a 64-bit offset (in that clause). It's warning you that you won't be able to handle files as large as the field implies. Can we hide the warning? Probably. Should we? Debatable, as long as we want 64-bit xcoff support in 32-bit filesystems. Otherwise, we'd need to detect off_t overflow somehow, down the slippery slope of reporting the error to the caller...
RE: [PATCH][WWWDOCS][AArch64][ARM] Update GCC 8 release notes
Hi Kyrill, > > Hi Tamar, > > On 16/01/18 16:56, Tamar Christina wrote: > > Th 01/16/2018 16:36, James Greenhalgh wrote: > >> On Tue, Jan 16, 2018 at 02:21:30PM +, Tamar Christina wrote: > >>> Hi Kyrill, > >>> > xgene1 was added a few releases ago, better to use one of the new > additions from the above list. > For example -mtune=cortex-r52. > >>> Thanks, I have updated the patch. I'll wait for an ok from an AArch64 > maintainer and a Docs maintainer. > >> OK. But you have the same issue in the AArch64 part. > > Thanks, I've updated the patch, I'll wait for a bit for a doc reviewer > > if I don't hear anything I'll assume the patch is OK. > > Gerald has confirmed a few times in the past that port maintainers can > approve target-specific changes to the web pages, and there are words to > that effect at: > https://gcc.gnu.org/svnwrite.html . > So I'd recommend you commit your patch once you've got approval for > aarch64 and arm. > Unless there's some specific part of the patch you'd like the docs maintainer > to give you feedback on... Ah, thanks! I'll commit the patch then. > Thanks again for working on this. > Kyrill > > > > > Thanks, > > Tamar > >> James > >> > >>> Index: htdocs/gcc-8/changes.html > >>> > == > = > >>> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v > >>> retrieving revision 1.26 > >>> diff -u -r1.26 changes.html > >>> --- htdocs/gcc-8/changes.html 11 Jan 2018 09:31:53 - 1.26 > >>> +++ htdocs/gcc-8/changes.html 16 Jan 2018 14:12:57 - > >>> @@ -147,7 +147,51 @@ > >>> > >>> AArch64 > >>> > >>> - > >>> + > >>> +The Armv8.4-A architecture is now supported. It can be used by > >>> +specifying the -march=armv8.4-a option. > >>> + > >>> + > >>> +The Dot Product instructions are now supported as an optional > extension to the > >>> +Armv8.2-A architecture and newer and are mandatory on Armv8.4-A. > The extension can be used by > >>> +specifying the +dotprod architecture extension. E.g. > -march=armv8.2-a+dotprod. > >>> + > >>> + > >>> +The Armv8-A +crypto extension has now been split > into two extensions for finer grained control: > >>> + > >>> + +aes which contains the Armv8-A AES > crytographic instructions. > >>> + +sha2 which contains the Armv8-A SHA2 and > SHA1 cryptographic instructions. > >>> + > >>> +Using +crypto will now enable these two extensions. > >>> + > >>> + > >>> +New Armv8.4-A FP16 Floating Point Multiplication Variant instructions > have been added. These instructions are > >>> +mandatory in Armv8.4-A but available as an optional extension to > Armv8.2-A and Armv8.3-A. The new extension > >>> +can be used by specifying the +fp16fml architectural > extension on Armv8.2-A and Armv8.3-A. On Armv8.4-A > >>> +the instructions can be enabled by specifying +fp16. > >>> + > >>> + > >>> +New cryptographic instructions have been added as optional > extensions to Armv8.2-A and newer. These instructions can > >>> +be enabled with: > >>> + > >>> + +sha3 New SHA3 and SHA2 instructions from > Armv8.4-A. This implies +sha2. > >>> + +sm4 New SM3 and SM4 instructions from > Armv8.4-A. > >>> + > >>> + > >>> + > >>> + Support has been added for the following processors > >>> + (GCC identifiers in parentheses): > >>> + > >>> + Arm Cortex-A75 (cortex-a75). > >>> + Arm Cortex-A55 (cortex-a55). > >>> + Arm Cortex-A55/Cortex-A75 DynamIQ big.LITTLE (cortex- > a75.cortex-a55). > >>> + > >>> + The GCC identifiers can be used > >>> + as arguments to the -mcpu or - > mtune options, > >>> + for example: -mcpu=cortex-a75 or > >>> + -mtune=thunderx2t99p1 or as arguments to the > equivalent target > >>> + attributes and pragmas. > >>> + > >>> > >>> > >>> ARM > >>> @@ -169,14 +213,58 @@ > >>> removed in a future release. > >>> > >>> > >>> -The default link behavior for ARMv6 and ARMv7-R targets has been > >>> +The default link behavior for Armv6 and Armv7-R targets has > >>> + been > >>> changed to produce BE8 format when generating big-endian images. > A new > >>> flag -mbe32 can be used to force the linker to > produce > >>> legacy BE32 format images. There is no change of behavior for > >>> -ARMv6-m and other ARMv7 or later targets: these already defaulted > >>> +Armv6-M and other Armv7 or later targets: these already > >>> + defaulted > >>> to BE8 format. This change brings GCC into alignment with other > >>> compilers for the ARM architecture. > >>> > >>> + > >>> +The Armv8-R architecture is now supported. It can be used by > specifying the > >>> +-march=armv8-r option. > >>> + > >>> + > >>> +The Armv8.3-A architecture is now supported. It can be used by > >>> +specifying the -march=armv8.3-a option. > >>> +
Re: Compilation warning in simple-object-xcoff.c
> From: DJ Delorie > Cc: gcc-patches@gcc.gnu.org, gdb-patc...@sourceware.org > Date: Tue, 16 Jan 2018 13:00:48 -0500 > > > I think that warning is valid - the host has a 32-bit limit to file > sizes (off_t) but it's trying to read a 64-bit offset (in that clause). > It's warning you that you won't be able to handle files as large as the > field implies. If 32-bit off_t cannot handle this, then perhaps this file (or that function) should not be compiled for a 32-bit host?
Re: Compilation warning in simple-object-xcoff.c
Well, it should all work fine as long as the xcoff64 file is less than 4 Gb. And it's not the host's bit size that counts; there are usually ways to get 64-bit file operations on 32-bit hosts.
[PATCH, rs6000] Implement ABI_AIX indirect call handling for -mno-speculate-indirect-jumps
Hi, This patch fills in a gap from the previous -mno-speculate-indirect-jumps patch. That patch didn't provide support for indirect calls using ABI_AIX as the default ABI. This fills in that missing support and changes the one related powerpc64le-only test case to be compiled for all subtargets. After some analysis, it doesn't appear possible for sibcalls to be generated for ELFv1 or ELFv2 using a bctr, given the need for a local call to avoid the required TOC restore afterwards. I haven't been able to find a way to get a bctr generated even when one could theoretically prove the bctr must go to a local function. This has been bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Testing is still ongoing for powerpc64-linux-gnu. Provided that testing completes with no surprises, is this okay for trunk (and shortly for backport to 7)? Thanks! Bill [gcc] 2018-01-16 Bill Schmidt * config/rs6000/rs6000.md (*call_indirect_aix): Disable for -mno-speculate-indirect-jumps. (*call_indirect_aix_nospec): New define_insn. (*call_value_indirect_aix): Disable for -mno-speculate-indirect-jumps. (*call_value_indirect_aix_nospec): New define_insn. [gcc/testsuite] 2018-01-16 Bill Schmidt * gcc.target/powerpc/safe-indirect-jump-1.c: Remove powerpc64le-only restriction. Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 256753) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -10669,11 +10669,22 @@ (use (match_operand:P 2 "memory_operand" ",")) (set (reg:P TOC_REGNUM) (unspec:P [(match_operand:P 3 "const_int_operand" "n,n")] UNSPEC_TOCSLOT)) (clobber (reg:P LR_REGNO))] - "DEFAULT_ABI == ABI_AIX" + "DEFAULT_ABI == ABI_AIX && rs6000_speculate_indirect_jumps" " 2,%2\;b%T0l\; 2,%3(1)" [(set_attr "type" "jmpreg") (set_attr "length" "12")]) +(define_insn "*call_indirect_aix_nospec" + [(call (mem:SI (match_operand:P 0 "register_operand" "c,*l")) +(match_operand 1 "" "g,g")) + (use (match_operand:P 2 "memory_operand" ",")) + (set (reg:P TOC_REGNUM) (unspec:P [(match_operand:P 3 "const_int_operand" "n,n")] UNSPEC_TOCSLOT)) + (clobber (reg:P LR_REGNO))] + "DEFAULT_ABI == ABI_AIX && !rs6000_speculate_indirect_jumps" + "crset eq\; 2,%2\;beq%T0l-\; 2,%3(1)" + [(set_attr "type" "jmpreg") + (set_attr "length" "16")]) + (define_insn "*call_value_indirect_aix" [(set (match_operand 0 "" "") (call (mem:SI (match_operand:P 1 "register_operand" "c,*l")) @@ -10681,11 +10692,23 @@ (use (match_operand:P 3 "memory_operand" ",")) (set (reg:P TOC_REGNUM) (unspec:P [(match_operand:P 4 "const_int_operand" "n,n")] UNSPEC_TOCSLOT)) (clobber (reg:P LR_REGNO))] - "DEFAULT_ABI == ABI_AIX" + "DEFAULT_ABI == ABI_AIX && rs6000_speculate_indirect_jumps" " 2,%3\;b%T1l\; 2,%4(1)" [(set_attr "type" "jmpreg") (set_attr "length" "12")]) +(define_insn "*call_value_indirect_aix_nospec" + [(set (match_operand 0 "" "") + (call (mem:SI (match_operand:P 1 "register_operand" "c,*l")) + (match_operand 2 "" "g,g"))) + (use (match_operand:P 3 "memory_operand" ",")) + (set (reg:P TOC_REGNUM) (unspec:P [(match_operand:P 4 "const_int_operand" "n,n")] UNSPEC_TOCSLOT)) + (clobber (reg:P LR_REGNO))] + "DEFAULT_ABI == ABI_AIX && !rs6000_speculate_indirect_jumps" + "crset eq\; 2,%3\;beq%T1l-\; 2,%4(1)" + [(set_attr "type" "jmpreg") + (set_attr "length" "16")]) + ;; Call to indirect functions with the ELFv2 ABI. ;; Operand0 is the addresss of the function to call ;; Operand2 is the offset of the stack location holding the current TOC pointer Index: gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-1.c === --- gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-1.c (revision 256753) +++ gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-1.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-do compile } */ /* { dg-additional-options "-mno-speculate-indirect-jumps" } */ /* Test for deliberate misprediction of indirect calls for ELFv2. */
[PATCH] avoid assuming known string length is constant (PR 83896)
Recent improvements to the strlen pass introduced the assumption that when the length of a string has been recorded by the pass the length is necessarily constant. Bug 83896 shows that this assumption is not always true, and that GCC fails with an ICE when it doesn't hold. To avoid the ICE the attached patch removes the assumption. x86_64-linux bootstrap successful, regression test in progress. Martin PR tree-optimization/83896 - ice in get_string_len on a call to strlen with non-constant length gcc/ChangeLog: PR tree-optimization/83896 * tree-ssa-strlen.c (get_string_len): Avoid assuming length is constant. gcc/testsuite/ChangeLog: PR tree-optimization/83896 * gcc.dg/strlenopt-43.c: New test. Index: gcc/tree-ssa-strlen.c === --- gcc/tree-ssa-strlen.c (revision 256752) +++ gcc/tree-ssa-strlen.c (working copy) @@ -2772,7 +2772,9 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) } } -/* Check if RHS is string_cst possibly wrapped by mem_ref. */ +/* If RHS, either directly or indirectly, refers to a string of constant + length, return it. Otherwise return a negative value. */ + static int get_string_len (tree rhs) { @@ -2789,7 +2791,8 @@ get_string_len (tree rhs) if (idx > 0) { strinfo *si = get_strinfo (idx); - if (si && si->full_string_p) + if (si && si->full_string_p + && TREE_CODE (si->nonzero_chars) == INTEGER_CST) return tree_to_shwi (si->nonzero_chars); } } Index: gcc/testsuite/gcc.dg/strlenopt-43.c === --- gcc/testsuite/gcc.dg/strlenopt-43.c (nonexistent) +++ gcc/testsuite/gcc.dg/strlenopt-43.c (working copy) @@ -0,0 +1,13 @@ +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen + with non-constant length + { dg-do compile } + { dg-options "-O2 -Wall" } */ + +extern char a[5]; +extern char b[]; + +void f (void) +{ + if (__builtin_strlen (b) != 4) +__builtin_memcpy (a, b, sizeof a); +}
Re: [PATCH] avoid assuming known string length is constant (PR 83896)
Martin Sebor writes: > Recent improvements to the strlen pass introduced the assumption > that when the length of a string has been recorded by the pass > the length is necessarily constant. Bug 83896 shows that this > assumption is not always true, and that GCC fails with an ICE > when it doesn't hold. To avoid the ICE the attached patch > removes the assumption. > > x86_64-linux bootstrap successful, regression test in progress. > > Martin > > PR tree-optimization/83896 - ice in get_string_len on a call to strlen with > non-constant length > > gcc/ChangeLog: > > PR tree-optimization/83896 > * tree-ssa-strlen.c (get_string_len): Avoid assuming length is constant. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/83896 > * gcc.dg/strlenopt-43.c: New test. > > Index: gcc/tree-ssa-strlen.c > === > --- gcc/tree-ssa-strlen.c (revision 256752) > +++ gcc/tree-ssa-strlen.c (working copy) > @@ -2772,7 +2772,9 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) > } > } > > -/* Check if RHS is string_cst possibly wrapped by mem_ref. */ > +/* If RHS, either directly or indirectly, refers to a string of constant > + length, return it. Otherwise return a negative value. */ > + > static int > get_string_len (tree rhs) > { I think this should be returning HOST_WIDE_INT given the unconstrained tree_to_shwi return. Same type change for rhslen in the caller. (Not my call, but it might be better to have a more specific function name, given that the file already had "get_string_length" before this function was added.) > @@ -2789,7 +2791,8 @@ get_string_len (tree rhs) > if (idx > 0) > { > strinfo *si = get_strinfo (idx); > - if (si && si->full_string_p) > + if (si && si->full_string_p > + && TREE_CODE (si->nonzero_chars) == INTEGER_CST) > return tree_to_shwi (si->nonzero_chars); tree_fits_shwi_p? Thanks, Richard > } > } > Index: gcc/testsuite/gcc.dg/strlenopt-43.c > === > --- gcc/testsuite/gcc.dg/strlenopt-43.c (nonexistent) > +++ gcc/testsuite/gcc.dg/strlenopt-43.c (working copy) > @@ -0,0 +1,13 @@ > +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen > + with non-constant length > + { dg-do compile } > + { dg-options "-O2 -Wall" } */ > + > +extern char a[5]; > +extern char b[]; > + > +void f (void) > +{ > + if (__builtin_strlen (b) != 4) > +__builtin_memcpy (a, b, sizeof a); > +}
[PATCH, rs6000] (v2) Support for gimple folding of mergeh, mergel intrinsics
Hi, Add support for gimple folding of the mergeh, mergel intrinsics. Since the low and high versions are almost identical, a new helper function is added so that code can be shared. The changes introduced here affect the existing target testcases gcc.target/powerpc/builtins-1-be.c and builtins-1-le.c, such that a number of the scan-assembler tests would fail due to instruction counts changing. Since the purpose of that test is to primarily ensure those intrinsics are accepted by the compiler, I have disabled gimple-folding for the existing tests that count instructions, and created new variants of those tests with folding enabled and a higher optimization level, that do not count instructions. V2 updates, * thanks for the feedback & hints in how to make these improvements :-) * Reworked to merge the xxmrg* instructions into the existing define_insn stanzas. * Reworked to use the tree-vector-builder.h helpers, eliminating some constructor and assign statements. * a few more cosmetic touch-ups in nearby define_insns. * update target stanza for builtins-1-be-folded.c test. Sniff-tests of the target tests on a single system look OK. Full regtests are currently running across assorted power systems. OK for trunk, pending successful results? Thanks, -Will [gcc] 2018-01-16 Will Schmidt * config/rs6000/rs6000.c: (rs6000_gimple_builtin) Add gimple folding support for merge[hl]. (fold_mergehl_helper): New helper function. (tree-vector-builder.h): New #include for tree_vector_builder usage. * config/rs6000/altivec.md (altivec_vmrghw_direct): Add xxmrghw insn. (altivec_vmrglw_direct): Add xxmrglw insn. [testsuite] 2018-01-16 Will Schmidt * gcc.target/powerpc/fold-vec-mergehl-char.c: New. * gcc.target/powerpc/fold-vec-mergehl-double.c: New. * gcc.target/powerpc/fold-vec-mergehl-float.c: New. * gcc.target/powerpc/fold-vec-mergehl-int.c: New. * gcc.target/powerpc/fold-vec-mergehl-longlong.c: New. * gcc.target/powerpc/fold-vec-mergehl-pixel.c: New. * gcc.target/powerpc/fold-vec-mergehl-short.c: New. * gcc.target/powerpc/builtins-1-be.c: Disable gimple-folding. * gcc.target/powerpc/builtins-1-le.c: Disable gimple-folding. * gcc.target/powerpc/builtins-1-be-folded.c: New. * gcc.target/powerpc/builtins-1-le-folded.c: New. * gcc.target/powerpc/builtins-1.fold.h: New. diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 733d920..bb00583 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -995,12 +995,12 @@ } [(set_attr "type" "vecperm")]) (define_insn "altivec_vmrghb_direct" [(set (match_operand:V16QI 0 "register_operand" "=v") -(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v") - (match_operand:V16QI 2 "register_operand" "v")] + (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v") + (match_operand:V16QI 2 "register_operand" "v")] UNSPEC_VMRGH_DIRECT))] "TARGET_ALTIVEC" "vmrghb %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1102,16 +1102,18 @@ return "vmrglw %0,%2,%1"; } [(set_attr "type" "vecperm")]) (define_insn "altivec_vmrghw_direct" - [(set (match_operand:V4SI 0 "register_operand" "=v") -(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") - (match_operand:V4SI 2 "register_operand" "v")] - UNSPEC_VMRGH_DIRECT))] + [(set (match_operand:V4SI 0 "register_operand" "=v,wa") + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v,wa") + (match_operand:V4SI 2 "register_operand" "v,wa")] +UNSPEC_VMRGH_DIRECT))] "TARGET_ALTIVEC" - "vmrghw %0,%1,%2" + "@ + vmrghw %0,%1,%2 + xxmrghw %x0,%x1,%x2" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrghsf" [(set (match_operand:V4SF 0 "register_operand" "=v") (vec_select:V4SF @@ -1184,13 +1186,13 @@ } [(set_attr "type" "vecperm")]) (define_insn "altivec_vmrglb_direct" [(set (match_operand:V16QI 0 "register_operand" "=v") -(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v") - (match_operand:V16QI 2 "register_operand" "v")] - UNSPEC_VMRGL_DIRECT))] + (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v") + (match_operand:V16QI 2 "register_operand" "v")] + UNSPEC_VMRGL_DIRECT))] "TARGET_ALTIVEC" "vmrglb %0,%1,%2" [(set_attr "type" "vecperm")]) (define_expand "altivec_vmrglh" @@ -1242,11 +1244,11 @@ (define_insn "altivec_vmrglh_direct" [(set (match_operand:V8HI 0 "register_operand" "=v") (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v") (match_operand:V8HI 2 "register_operand" "v")] -
Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
On Sun, Jan 14, 2018 at 5:43 PM, Uros Bizjak wrote: > On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu wrote: >> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak wrote: >>> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak wrote: On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu wrote: > Hi Uros, > > Can you take a look at my x86 backend changes so that they are ready > to check in once we have consensus. Please finish the talks about the correct approach first. Once the consensus is reached, please post the final version of the patches for review. BTW: I have no detailed insight in these issues, so I'll look mostly at the implementation details, probably early next week. >>> >>> One general remark is on the usage of -1 as an invalid register >> >> This has been rewritten. The checked in patch no longer does that. > > I'm looking directly into current indirect_thunk_name, > output_indirect_thunk and output_indirect_thunk_function functions in > i386.c which have plenty of the mentioned checks. Improved with attached patch. 2018-01-16 Uros Bizjak * config/i386/i386.c (indirect_thunk_name): Declare regno as unsigned int. Compare regno with INVALID_REGNUM. (output_indirect_thunk): Ditto. (output_indirect_thunk_function): Ditto. (ix86_code_end): Declare regno as unsigned int. Use INVALID_REGNUM in the call to output_indirect_thunk_function. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ea9c462..7f233d1 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -10765,16 +10765,16 @@ static int indirect_thunks_bnd_used; /* Fills in the label name that should be used for the indirect thunk. */ static void -indirect_thunk_name (char name[32], int regno, bool need_bnd_p, -bool ret_p) +indirect_thunk_name (char name[32], unsigned int regno, +bool need_bnd_p, bool ret_p) { - if (regno >= 0 && ret_p) + if (regno != INVALID_REGNUM && ret_p) gcc_unreachable (); if (USE_HIDDEN_LINKONCE) { const char *bnd = need_bnd_p ? "_bnd" : ""; - if (regno >= 0) + if (regno != INVALID_REGNUM) { const char *reg_prefix; if (LEGACY_INT_REGNO_P (regno)) @@ -10792,7 +10792,7 @@ indirect_thunk_name (char name[32], int regno, bool need_bnd_p, } else { - if (regno >= 0) + if (regno != INVALID_REGNUM) { if (need_bnd_p) ASM_GENERATE_INTERNAL_LABEL (name, "LITBR", regno); @@ -10844,7 +10844,7 @@ indirect_thunk_name (char name[32], int regno, bool need_bnd_p, */ static void -output_indirect_thunk (bool need_bnd_p, int regno) +output_indirect_thunk (bool need_bnd_p, unsigned int regno) { char indirectlabel1[32]; char indirectlabel2[32]; @@ -10874,7 +10874,7 @@ output_indirect_thunk (bool need_bnd_p, int regno) ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel2); - if (regno >= 0) + if (regno != INVALID_REGNUM) { /* MOV. */ rtx xops[2]; @@ -10898,12 +10898,12 @@ output_indirect_thunk (bool need_bnd_p, int regno) } /* Output a funtion with a call and return thunk for indirect branch. - If BND_P is true, the BND prefix is needed. If REGNO != -1, the - function address is in REGNO. Otherwise, the function address is + If BND_P is true, the BND prefix is needed. If REGNO != UNVALID_REGNUM, + the function address is in REGNO. Otherwise, the function address is on the top of stack. */ static void -output_indirect_thunk_function (bool need_bnd_p, int regno) +output_indirect_thunk_function (bool need_bnd_p, unsigned int regno) { char name[32]; tree decl; @@ -10952,7 +10952,7 @@ output_indirect_thunk_function (bool need_bnd_p, int regno) ASM_OUTPUT_LABEL (asm_out_file, name); } - if (regno < 0) + if (regno == INVALID_REGNUM) { /* Create alias for __x86.return_thunk/__x86.return_thunk_bnd. */ char alias[32]; @@ -11026,16 +11026,16 @@ static void ix86_code_end (void) { rtx xops[2]; - int regno; + unsigned int regno; if (indirect_thunk_needed) -output_indirect_thunk_function (false, -1); +output_indirect_thunk_function (false, INVALID_REGNUM); if (indirect_thunk_bnd_needed) -output_indirect_thunk_function (true, -1); +output_indirect_thunk_function (true, INVALID_REGNUM); for (regno = FIRST_REX_INT_REG; regno <= LAST_REX_INT_REG; regno++) { - int i = regno - FIRST_REX_INT_REG + LAST_INT_REG + 1; + unsigned int i = regno - FIRST_REX_INT_REG + LAST_INT_REG + 1; if ((indirect_thunks_used & (1 << i))) output_indirect_thunk_function (false, regno);
[PATCH, rs6000] Bug fixes for the Power 9 stxvl and lxvl instructions.
GCC maintainers: The following patch contains fixes for the stxvl and lxvl instructions and XL_LEN_R builtin that were found while adding additional Power 9 test cases for the various load and store builtins. The new tests in builtins-5-p9-runnable.c and builtins-6-p9-runnable.c are included that exposed the bugs. The test cases have been run and verified by hand on Power 9 without error. The full regressions on Power 8 LE, Power 8 BE and Power 9 are currently running. Please let me know if the patch is acceptable provided the regression testing completes cleanly. Thanks. Carl Love --- gcc/ChangeLog: 2018-01-16 Carl Love * config/rs6000/vsx.md (define_expand xl_len_r, define_expand stxvl, define_expand *stxvl): Add match_dup argument. gcc/testsuite/ChangeLog: 2018-01-16 Carl Love * gcc.target/powerpc/builtins-6-p9-runnable.c: Add additional tests. Add debug print statements. * gcc.target/powerpc/builtins-5-p9-runnable.c: Add test to do 16 byte vector load followed by a partial vector load. --- gcc/config/rs6000/rs6000-builtin.def |4 +- gcc/config/rs6000/vsx.md | 26 +- .../gcc.target/powerpc/builtins-5-p9-runnable.c| 150 +- .../gcc.target/powerpc/builtins-6-p9-runnable.c| 1759 4 files changed, 1214 insertions(+), 725 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 0f7da6a4a..b17036c5a 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -2197,8 +2197,8 @@ BU_P9V_OVERLOAD_2 (VIEDP, "insert_exp_dp") BU_P9V_OVERLOAD_2 (VIESP, "insert_exp_sp") /* 2 argument vector functions added in ISA 3.0 (power9). */ -BU_P9V_64BIT_VSX_2 (LXVL, "lxvl", CONST, lxvl) -BU_P9V_64BIT_VSX_2 (XL_LEN_R, "xl_len_r", CONST, xl_len_r) +BU_P9V_64BIT_VSX_2 (LXVL, "lxvl", PURE, lxvl) +BU_P9V_64BIT_VSX_2 (XL_LEN_R, "xl_len_r", PURE, xl_len_r) BU_P9V_AV_2 (VEXTUBLX, "vextublx", CONST, vextublx) BU_P9V_AV_2 (VEXTUBRX, "vextubrx", CONST, vextubrx) diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 0323e866f..03f8ec2d6 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4624,10 +4624,12 @@ (define_expand "first_mismatch_or_eos_index_" ;; Load VSX Vector with Length (define_expand "lxvl" [(set (match_dup 3) -(match_operand:DI 2 "register_operand")) +(ashift:DI (match_operand:DI 2 "register_operand") + (const_int 56))) (set (match_operand:V16QI 0 "vsx_register_operand") (unspec:V16QI [(match_operand:DI 1 "gpc_reg_operand") + (mem:V16QI (match_dup 1)) (match_dup 3)] UNSPEC_LXVL))] "TARGET_P9_VECTOR && TARGET_64BIT" @@ -4639,16 +4641,17 @@ (define_insn "*lxvl" [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") (unspec:V16QI [(match_operand:DI 1 "gpc_reg_operand" "b") - (match_operand:DI 2 "register_operand" "+r")] + (mem:V16QI (match_dup 1)) + (match_operand:DI 2 "register_operand" "r")] UNSPEC_LXVL))] "TARGET_P9_VECTOR && TARGET_64BIT" - "sldi %2,%2, 56\; lxvl %x0,%1,%2" - [(set_attr "length" "8") - (set_attr "type" "vecload")]) + "lxvl %x0,%1,%2" + [(set_attr "type" "vecload")]) (define_insn "lxvll" [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") (unspec:V16QI [(match_operand:DI 1 "gpc_reg_operand" "b") + (mem:V16QI (match_dup 1)) (match_operand:DI 2 "register_operand" "r")] UNSPEC_LXVLL))] "TARGET_P9_VECTOR" @@ -4677,6 +4680,7 @@ (define_expand "xl_len_r" (define_insn "stxvll" [(set (mem:V16QI (match_operand:DI 1 "gpc_reg_operand" "b")) (unspec:V16QI [(match_operand:V16QI 0 "vsx_register_operand" "wa") + (mem:V16QI (match_dup 1)) (match_operand:DI 2 "register_operand" "r")] UNSPEC_STXVLL))] "TARGET_P9_VECTOR" @@ -4686,10 +4690,12 @@ (define_insn "stxvll" ;; Store VSX Vector with Length (define_expand "stxvl" [(set (match_dup 3) - (match_operand:DI 2 "register_operand")) + (ashift:DI (match_operand:DI 2 "register_operand") + (const_int 56))) (set (mem:V16QI (match_operand:DI 1 "gpc_reg_operand")) (unspec:V16QI [(match_operand:V16QI 0 "vsx_register_operand") + (mem:V16QI (match_dup 1)) (match_dup 3)] UNSPEC_STXVL))] "TARGET_P9_VECTOR && TARGET_64BIT" @@ -4701,12 +4707,12 @@ (define_insn "*stxvl" [(set (mem:V16QI (match_operand:DI 1 "gpc_reg_operand" "b")) (unspec:V16QI [(match_operand:V16QI 0 "vsx_register_operand" "wa") - (match_
Re: [PATCH] i386: More use reference of struct ix86_frame to avoid copy
On Tue, Jan 16, 2018 at 8:09 AM, H.J. Lu wrote: > On Tue, Jan 16, 2018 at 7:03 AM, Martin Liška wrote: >> On 01/16/2018 01:35 PM, H.J. Lu wrote: >>> On Tue, Jan 16, 2018 at 3:40 AM, H.J. Lu wrote: This patch has been used with my Spectre backport for GCC 7 for many weeks and has been checked into GCC 7 branch. Should I revert it on GCC 7 branch or check it into trunk? >>> >>> Ada build failed with this on trunk: >>> >>> raised STORAGE_ERROR : stack overflow or erroneous memory access >>> make[5]: *** >>> [/export/gnu/import/git/sources/gcc/gcc/ada/Make-generated.in:45: >>> ada/sinfo.h] Error 1 >> >> Hello. >> >> I know that you've already reverted the change, but it's possible to replace >> struct ix86_frame &frame = cfun->machine->frame; >> >> with: >> struct ix86_frame *frame = &cfun->machine->frame; >> >> And replace usages with point access operator (->). That would also avoid >> copying. > > Won't it be equivalent to reference? > >> One another question. After you switched to references, isn't the behavior >> of function >> ix86_expand_epilogue as it also contains write to frame struct like: >> >> 14799/* Special care must be taken for the normal return case of a >> function >> 14800 using eh_return: the eax and edx registers are marked as saved, >> but >> 14801 not restored along this path. Adjust the save location to >> match. */ >> 14802if (crtl->calls_eh_return && style != 2) >> 14803 frame.reg_save_offset -= 2 * UNITS_PER_WORD; > > That could be the issue. I will double check it. > Revert the ix86_expand_epilogue change fixes the ada build. I opened: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83905 -- H.J.
Re: VIEW_CONVERT_EXPR slots for strict-align targets (PR 83884)
> I'd assumed that variable-length types couldn't occur here, since it > seems strange to view-convert a variable-length type to a fixed-length > one. This happens all the time in Ada when you convert an unconstrained type into one of its constrained subtypes (but the run-time sizes must match). > But is this VIEW_CONVERT_EXPR really valid? Maybe this is just > papering over a deeper issue. There again, the MAX in the old > code was presumably there because the sizes can be different... The problem is that Ada exposes VIEW_CONVERT_EXPR to the user and the user can do very weird things with it so you need to be prepared for the worst. > 2018-01-16 Richard Sandiford > > gcc/ > PR middle-end/83884 > * expr.c (expand_expr_real_1): Use the size of GET_MODE (op0) > rather than the size of inner_type to determine the stack slot size > when handling VIEW_CONVERT_EXPRs on strict-alignment targets. This looks good to me, thanks for fixing the problem. Unexpectedly enough, I don't see the failures on SPARC (32-bit or 64-bit). -- Eric Botcazou
[C++ Patch] PR 81054 ("[7/8 Regression] ICE with volatile variable in constexpr function") [Take 2]
Hi again, thus I figured out what was badly wrong in my first try: I misread ensure_literal_type_for_constexpr_object and missed that it can return NULL_TREE without emitting an hard error. Thus my first try even caused miscompilations :( Anyway, when DECL_DECLARED_CONSTEXPR_P is true we are safe and indeed we want to clear it as matter of error recovery. Then, in this safe case the only change in the below is returning early, thus avoiding any internal inconsistencies later and also the redundant / misleading diagnostic which I already mentioned. Testing on x86_64-linux is still in progress - in libstdc++ - but I separately checked that all the regressions are gone. Thanks! Paolo. / /cp 2018-01-16 Paolo Carlini PR c++/81054 * decl.c (cp_finish_decl): Early return when ensure_literal_type_for_constexpr_object returns NULL_TREE and DECL_DECLARED_CONSTEXPR_P is true. /testsuite 2018-01-16 Paolo Carlini PR c++/81054 * g++.dg/cpp0x/constexpr-ice19.C: New. Index: cp/decl.c === --- cp/decl.c (revision 256753) +++ cp/decl.c (working copy) @@ -6810,8 +6810,12 @@ cp_finish_decl (tree decl, tree init, bool init_co cp_apply_type_quals_to_decl (cp_type_quals (type), decl); } - if (!ensure_literal_type_for_constexpr_object (decl)) -DECL_DECLARED_CONSTEXPR_P (decl) = 0; + if (!ensure_literal_type_for_constexpr_object (decl) + && DECL_DECLARED_CONSTEXPR_P (decl)) +{ + DECL_DECLARED_CONSTEXPR_P (decl) = 0; + return; +} if (VAR_P (decl) && DECL_CLASS_SCOPE_P (decl) Index: testsuite/g++.dg/cpp0x/constexpr-ice19.C === --- testsuite/g++.dg/cpp0x/constexpr-ice19.C(nonexistent) +++ testsuite/g++.dg/cpp0x/constexpr-ice19.C(working copy) @@ -0,0 +1,13 @@ +// PR c++/81054 +// { dg-do compile { target c++11 } } + +struct A +{ + volatile int i; + constexpr A() : i() {} +}; + +struct B +{ + static constexpr A a {}; // { dg-error "not literal" } +};
[PATCH] make -Wrestrict for strcat more meaningful (PR 83698)
PR 83698 - bogus offset in -Wrestrict messages for strcat of unknown strings, points out that the offsets printed by -Wrestrict for possibly overlapping strcat calls with unknown strings don't look meaningful in some cases. The root cause of the bogus values is wrapping during the conversion from offset_int in which the pass tracks numerical values to HOST_WIDE_INT for printing. (The problem will go away once GCC's pretty-printer supports wide int formatting.) For instance, the following: extern char *d; strcat (d + 3, d + 5); results in warning: ‘strcat’ accessing 0 or more bytes at offsets 3 and 5 may overlap 1 byte at offset [3, -9223372036854775806] which, besides printing the bogus negative offset on LP64 targets, isn't correct because strcat always accesses at least one byte (the nul) and there can be no overlap at offset 3. To be more accurate, the warning should say something like: warning: ‘strcat’ accessing 3 or more bytes at offsets 3 and 5 may overlap 1 byte at offset 5 [-Wrestrict] because the function must access at least 3 bytes in order to cause an overlap, and when it does, the overlap starts at the higher of the two offsets, i.e., 5. (Though it's virtually impossible to have a single sentence and a singled set of numbers cover all the cases with perfect accuracy.) The attached patch fixes these issues to make the printed values make more sense. (It doesn't affect when diagnostics are printed.) Although this isn't strictly a regression, it has an impact on the readability of the warnings. If left unchanged, the original messages are likely to confuse users and lead to bug reports. Martin PR tree-optimization/83698 - bogus offset in -Wrestrict messages for strcat of unknown strings gcc/ChangeLog: PR tree-optimization/83698 * gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref): For arrays constrain the offset range to their bounds. (builtin_access::strcat_overlap): Adjust the bounds of overlap offset. (builtin_access::overlap): Avoid setting the size of overlap if it's already been set. (maybe_diag_overlap): Also consider arrays when deciding what values of offsets to include in diagnostics. gcc/testsuite/ChangeLog: PR tree-optimization/83698 * gcc.dg/Wrestrict-7.c: New test. * c-c++-common/Wrestrict.c: Adjust expected values for strcat. * gcc.target/i386/chkp-stropt-17.c: Same. Index: gcc/gimple-ssa-warn-restrict.c === --- gcc/gimple-ssa-warn-restrict.c (revision 256752) +++ gcc/gimple-ssa-warn-restrict.c (working copy) @@ -384,6 +384,12 @@ builtin_memref::builtin_memref (tree expr, tree si base = SSA_NAME_VAR (base); } + if (DECL_P (base) && TREE_CODE (TREE_TYPE (base)) == ARRAY_TYPE) +{ + if (offrange[0] < 0 && offrange[1] > 0) + offrange[0] = 0; +} + if (size) { tree range[2]; @@ -1079,14 +1085,35 @@ builtin_access::strcat_overlap () return false; /* When strcat overlap is certain it is always a single byte: - the terminatinn NUL, regardless of offsets and sizes. When + the terminating NUL, regardless of offsets and sizes. When overlap is only possible its range is [0, 1]. */ acs.ovlsiz[0] = dstref->sizrange[0] == dstref->sizrange[1] ? 1 : 0; acs.ovlsiz[1] = 1; - acs.ovloff[0] = (dstref->sizrange[0] + dstref->offrange[0]).to_shwi (); - acs.ovloff[1] = (dstref->sizrange[1] + dstref->offrange[1]).to_shwi (); - acs.sizrange[0] = wi::smax (acs.dstsiz[0], srcref->sizrange[0]).to_shwi (); + offset_int endoff = dstref->offrange[0] + dstref->sizrange[0]; + if (endoff <= srcref->offrange[0]) +acs.ovloff[0] = wi::smin (maxobjsize, srcref->offrange[0]).to_shwi (); + else +acs.ovloff[0] = wi::smin (maxobjsize, endoff).to_shwi (); + + acs.sizrange[0] = wi::smax (wi::abs (endoff - srcref->offrange[0]) + 1, + srcref->sizrange[0]).to_shwi (); + if (dstref->offrange[0] == dstref->offrange[1]) +{ + if (srcref->offrange[0] == srcref->offrange[1]) + acs.ovloff[1] = acs.ovloff[0]; + else + acs.ovloff[1] + = wi::smin (maxobjsize, + srcref->offrange[1] + srcref->sizrange[1]).to_shwi (); +} + else +acs.ovloff[1] + = wi::smin (maxobjsize, + dstref->offrange[1] + dstref->sizrange[1]).to_shwi (); + + if (acs.sizrange[0] == 0) +acs.sizrange[0] = 1; acs.sizrange[1] = wi::smax (acs.dstsiz[1], srcref->sizrange[1]).to_shwi (); return true; } @@ -1224,8 +1251,12 @@ builtin_access::overlap () /* Call the appropriate function to determine the overlap. */ if ((this->*detect_overlap) ()) { - sizrange[0] = wi::smax (acs.dstsiz[0], srcref->sizrange[0]).to_shwi (); - sizrange[1] = wi::smax (acs.dstsiz[1], srcref->sizrange[1]).to_shwi (); + if (!sizrange[1]) + { + /* Unless the access size range has already been set, do so here. */ + sizrange[0] = wi::smax (acs.dstsiz[0], srcref->sizrange[0]).to_shwi (); + sizrange[1] = wi::smax
[libstdc++] Fix 17_intro/names.cc on SPARC/Linux
The SPARC-V8 architecture contains a Y register so defines a structure with a 'y' field on Linux. Tested on SPARC64/Linux, applied on the mainline and 7 branch as obvious. 2018-01-16 Eric Botcazou * testsuite/17_intro/names.cc: Undefine 'y' on SPARC/Linux. -- Eric Botcazou Index: testsuite/17_intro/names.cc === --- testsuite/17_intro/names.cc (revision 256562) +++ testsuite/17_intro/names.cc (working copy) @@ -112,4 +112,8 @@ #undef r #endif +#if defined (__linux__) && defined (__sparc__) +#undef y +#endif + #include
Fix PR testsuite/77734 on SPARC
We need to enable delayed-branch scheduling to have sibling calls on SPARC. Tested on SPARC64/Linux, applied on the mainline and 7 branch. 2018-01-16 Eric Botcazou PR testsuite/77734 * gcc.dg/plugin/must-tail-call-1.c: Pass -fdelayed-branch on SPARC. -- Eric BotcazouIndex: gcc.dg/plugin/must-tail-call-1.c === --- gcc.dg/plugin/must-tail-call-1.c (revision 256562) +++ gcc.dg/plugin/must-tail-call-1.c (working copy) @@ -1,3 +1,5 @@ +/* { dg-options "-fdelayed-branch" { target sparc*-*-* } } */ + extern void abort (void); int __attribute__((noinline,noclone))
[visium] Very minor tweak
Tested on visium-elf, applied on the mainline. 2018-01-16 Eric Botcazou * config/visium/visium.md (nop): Tweak comment. (hazard_nop): Likewise. -- Eric BotcazouIndex: config/visium/visium.md === --- config/visium/visium.md (revision 256562) +++ config/visium/visium.md (working copy) @@ -2962,13 +2962,13 @@ (define_insn "dsi" (define_insn "nop" [(const_int 0)] "" - "nop ;generated nop" + "nop ;generated" [(set_attr "type" "nop")]) (define_insn "hazard_nop" [(unspec_volatile [(const_int 0)] UNSPEC_NOP)] "" - "nop ;hazard avoidance nop" + "nop ;hazard avoidance" [(set_attr "type" "nop")]) (define_insn "blockage"
[testsuite] Skip loop tests on Visium
They either use too much space in the data segment or on the stack. Tested on visium-elf, applied on the mainline. 2018-01-16 Eric Botcazou * gcc.dg/tree-ssa/ldist-27.c: Skip on Visium. * gcc.dg/tree-ssa/loop-interchange-1.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-1b.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-2.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-3.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-4.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-5.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-6.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-7.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-8.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-9.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-10.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-11.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-14.c: Likewise. * gcc.dg/tree-ssa/loop-interchange-15.c: Likewise. -- Eric BotcazouIndex: gcc.dg/tree-ssa/ldist-27.c === --- gcc.dg/tree-ssa/ldist-27.c (revision 256562) +++ gcc.dg/tree-ssa/ldist-27.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options "-O3 -ftree-loop-distribute-patterns -fdump-tree-ldist-details" } */ +/* { dg-skip-if "too big data segment" { visium-*-* } } */ #define M (300) #define N (200) Index: gcc.dg/tree-ssa/loop-interchange-1.c === --- gcc.dg/tree-ssa/loop-interchange-1.c (revision 256562) +++ gcc.dg/tree-ssa/loop-interchange-1.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options "-O2 -floop-interchange -fassociative-math -fno-signed-zeros -fno-trapping-math -fdump-tree-linterchange-details" } */ +/* { dg-skip-if "too big data segment" { visium-*-* } } */ /* Copied from graphite/interchange-4.c */ Index: gcc.dg/tree-ssa/loop-interchange-10.c === --- gcc.dg/tree-ssa/loop-interchange-10.c (revision 256562) +++ gcc.dg/tree-ssa/loop-interchange-10.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */ +/* { dg-skip-if "too big data segment" { visium-*-* } } */ #define M 256 int a[M][M], b[M][M]; Index: gcc.dg/tree-ssa/loop-interchange-11.c === --- gcc.dg/tree-ssa/loop-interchange-11.c (revision 256562) +++ gcc.dg/tree-ssa/loop-interchange-11.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */ +/* { dg-skip-if "too big data segment" { visium-*-* } } */ #define M 256 int a[M][M], b[M][M]; Index: gcc.dg/tree-ssa/loop-interchange-14.c === --- gcc.dg/tree-ssa/loop-interchange-14.c (revision 256562) +++ gcc.dg/tree-ssa/loop-interchange-14.c (working copy) @@ -1,6 +1,7 @@ /* PR tree-optimization/83337 */ /* { dg-do run { target int32plus } } */ /* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */ +/* { dg-skip-if "too big data segment" { visium-*-* } } */ /* Copied from graphite/interchange-5.c */ Index: gcc.dg/tree-ssa/loop-interchange-15.c === --- gcc.dg/tree-ssa/loop-interchange-15.c (revision 256562) +++ gcc.dg/tree-ssa/loop-interchange-15.c (working copy) @@ -2,6 +2,7 @@ /* { dg-do run { target int32plus } } */ /* { dg-options "-O2 -floop-interchange" } */ /* { dg-require-effective-target alloca } */ +/* { dg-skip-if "too big stack" { visium-*-* } } */ /* Copied from graphite/interchange-5.c */ Index: gcc.dg/tree-ssa/loop-interchange-1b.c === --- gcc.dg/tree-ssa/loop-interchange-1b.c (revision 256562) +++ gcc.dg/tree-ssa/loop-interchange-1b.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */ +/* { dg-skip-if "too big data segment" { visium-*-* } } */ /* Copied from graphite/interchange-4.c */ Index: gcc.dg/tree-ssa/loop-interchange-2.c === --- gcc.dg/tree-ssa/loop-interchange-2.c (revision 256562) +++ gcc.dg/tree-ssa/loop-interchange-2.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options "-O2 -floop-interchange -fdump-tree-linterchange-details" } */ +/* { dg-skip-if "too big data segment" { visium-*-* } } */ /* Copied from graphite/interchange-5.c */ Index: gcc.dg/tree-ssa/loop-interchange-3.c === --- gcc.dg/tree-ssa/loop-interchange-3.c (revision 256562) +++ gcc.dg/tree-ssa/loop-interchange-3.c (working copy) @@ -1,5 +1,6 @
[testsuite] Tweak patchable function tests
On Visium, the compiler sometimes emits a NOP to avoid a pipeline hazard. Tested on visium-elf and x86_64-suse-linux, applied on the mainline. 2018-01-16 Eric Botcazou * c-c++-common/patchable_function_entry-decl.c: Use 3 NOPs on Visium. * c-c++-common/patchable_function_entry-default.c: Use 4 NOPs on Visium. * c-c++-common/patchable_function_entry-definition.c: Use 2 NOPs on Visium. -- Eric Botcazou Index: c-c++-common/patchable_function_entry-decl.c === --- c-c++-common/patchable_function_entry-decl.c (revision 256562) +++ c-c++-common/patchable_function_entry-decl.c (working copy) @@ -1,7 +1,8 @@ /* { dg-do compile { target { ! nvptx*-*-* } } } */ /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */ -/* { dg-final { scan-assembler-times "nop" 2 { target { ! alpha*-*-* } } } } */ +/* { dg-final { scan-assembler-times "nop" 2 { target { ! { alpha*-*-* visium-*-* } } } } } */ /* { dg-final { scan-assembler-times "bis" 2 { target alpha*-*-* } } } */ +/* { dg-final { scan-assembler-times "nop" 3 { target visium-*-* } } } */ extern int a; Index: c-c++-common/patchable_function_entry-default.c === --- c-c++-common/patchable_function_entry-default.c (revision 256562) +++ c-c++-common/patchable_function_entry-default.c (working copy) @@ -1,7 +1,8 @@ /* { dg-do compile { target { ! nvptx*-*-* } } } */ /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */ -/* { dg-final { scan-assembler-times "nop" 3 { target { ! alpha*-*-* } } } } */ +/* { dg-final { scan-assembler-times "nop" 3 { target { ! { alpha*-*-* visium-*-* } } } } } */ /* { dg-final { scan-assembler-times "bis" 3 { target alpha*-*-* } } } */ +/* { dg-final { scan-assembler-times "nop" 4 { target visium-*-* } } } */ extern int a; Index: c-c++-common/patchable_function_entry-definition.c === --- c-c++-common/patchable_function_entry-definition.c (revision 256562) +++ c-c++-common/patchable_function_entry-definition.c (working copy) @@ -1,7 +1,8 @@ /* { dg-do compile { target { ! nvptx*-*-* } } } */ /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */ -/* { dg-final { scan-assembler-times "nop" 1 { target { ! alpha*-*-* } } } } */ +/* { dg-final { scan-assembler-times "nop" 1 { target { ! { alpha*-*-* visium-*-* } } } } } */ /* { dg-final { scan-assembler-times "bis" 1 { target alpha*-*-* } } } */ +/* { dg-final { scan-assembler-times "nop" 2 { target visium-*-* } } } */ extern int a;
Re: [PATCH] avoid assuming known string length is constant (PR 83896)
On Tue, Jan 16, 2018 at 07:37:30PM +, Richard Sandiford wrote: > > -/* Check if RHS is string_cst possibly wrapped by mem_ref. */ > > +/* If RHS, either directly or indirectly, refers to a string of constant > > + length, return it. Otherwise return a negative value. */ > > + > > static int > > get_string_len (tree rhs) > > { > > I think this should be returning HOST_WIDE_INT given the unconstrained > tree_to_shwi return. Same type change for rhslen in the caller. > > (Not my call, but it might be better to have a more specific function name, > given that the file already had "get_string_length" before this function > was added.) Yeah, certainly for both. > > @@ -2789,7 +2791,8 @@ get_string_len (tree rhs) > > if (idx > 0) > > { > > strinfo *si = get_strinfo (idx); > > - if (si && si->full_string_p) > > + if (si && si->full_string_p > > + && TREE_CODE (si->nonzero_chars) == INTEGER_CST) > > return tree_to_shwi (si->nonzero_chars); > > tree_fits_shwi_p? Surely that instead of TREE_CODE check, but even that will not make sure it fits into host int, so yes, it should be HOST_WIDE_INT and the code should make sure it is also >= 0. Jakub
Re: [PATCH] make -Wrestrict for strcat more meaningful (PR 83698)
On Tue, Jan 16, 2018 at 01:36:26PM -0700, Martin Sebor wrote: > --- gcc/gimple-ssa-warn-restrict.c(revision 256752) > +++ gcc/gimple-ssa-warn-restrict.c(working copy) > @@ -384,6 +384,12 @@ builtin_memref::builtin_memref (tree expr, tree si > base = SSA_NAME_VAR (base); >} > > + if (DECL_P (base) && TREE_CODE (TREE_TYPE (base)) == ARRAY_TYPE) > +{ > + if (offrange[0] < 0 && offrange[1] > 0) > + offrange[0] = 0; > +} Why the 2 nested ifs? > @@ -1079,14 +1085,35 @@ builtin_access::strcat_overlap () > return false; > >/* When strcat overlap is certain it is always a single byte: > - the terminatinn NUL, regardless of offsets and sizes. When > + the terminating NUL, regardless of offsets and sizes. When > overlap is only possible its range is [0, 1]. */ >acs.ovlsiz[0] = dstref->sizrange[0] == dstref->sizrange[1] ? 1 : 0; >acs.ovlsiz[1] = 1; > - acs.ovloff[0] = (dstref->sizrange[0] + dstref->offrange[0]).to_shwi (); > - acs.ovloff[1] = (dstref->sizrange[1] + dstref->offrange[1]).to_shwi (); You use to_shwi many times in the patch, do the callers or something earlier in this function guarantee that you aren't throwing away any bits (unlike tree_to_shwi, to_shwi method doesn't ICE, just throws away upper bits). Especially when you perform additions like here, even if both wide_ints fit into a shwi, the result might not. Jakub
Re: [C++ Patch] PR 81054 ("[7/8 Regression] ICE with volatile variable in constexpr function") [Take 2]
On Tue, Jan 16, 2018 at 3:32 PM, Paolo Carlini wrote: > thus I figured out what was badly wrong in my first try: I misread > ensure_literal_type_for_constexpr_object and missed that it can return > NULL_TREE without emitting an hard error. Thus my first try even caused > miscompilations :( Anyway, when DECL_DECLARED_CONSTEXPR_P is true we are > safe and indeed we want to clear it as matter of error recovery. Then, in > this safe case the only change in the below is returning early, thus > avoiding any internal inconsistencies later and also the redundant / > misleading diagnostic which I already mentioned. I can't see how this could be right. In the cases where we don't give an error (e.g. because we're dealing with an instantiation of a variable template) there is no error, so we need to proceed with the rest of cp_finish_decl as normal. Jason
Re: [C++ Patch] PR 81054 ("[7/8 Regression] ICE with volatile variable in constexpr function") [Take 2]
Hi Jason On 16/01/2018 22:35, Jason Merrill wrote: On Tue, Jan 16, 2018 at 3:32 PM, Paolo Carlini wrote: thus I figured out what was badly wrong in my first try: I misread ensure_literal_type_for_constexpr_object and missed that it can return NULL_TREE without emitting an hard error. Thus my first try even caused miscompilations :( Anyway, when DECL_DECLARED_CONSTEXPR_P is true we are safe and indeed we want to clear it as matter of error recovery. Then, in this safe case the only change in the below is returning early, thus avoiding any internal inconsistencies later and also the redundant / misleading diagnostic which I already mentioned. I can't see how this could be right. In the cases where we don't give an error (e.g. because we're dealing with an instantiation of a variable template) there is no error, so we need to proceed with the rest of cp_finish_decl as normal. The cases where we don't give an error all fall under DECL_DECLARED_CONSTEXPR_P == false, thus aren't affected at all. Unless I'm again misreading ensure_literal_type_for_constexpr_object, I hope not. Paolo.
One more patch for PR80481
The patch changes the test to exclude solaris for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80481 The first patch solved the problem for solaris too but solaris gcc still generates vmovaps in some different part of the code (unrelated to the problem) where linux gcc does not. Committed as rev. 256761. Index: testsuite/ChangeLog === --- testsuite/ChangeLog (revision 256760) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2018-01-16 Vladimir Makarov + + PR rtl-optimization/80481 + * g++.dg/pr80481.C: Exclude solaris. + 2018-01-16 Eric Botcazou * c-c++-common/patchable_function_entry-decl.c: Use 3 NOPs on Visium. Index: testsuite/g++.dg/pr80481.C === --- testsuite/g++.dg/pr80481.C (revision 256760) +++ testsuite/g++.dg/pr80481.C (working copy) @@ -1,4 +1,4 @@ -// { dg-do compile { target i?86-*-* x86_64-*-* } } +// { dg-do compile { target { i?86-*-* x86_64-*-* } && { ! *-*-solaris* } } } // { dg-options "-Ofast -funroll-loops -fopenmp -march=knl" } // { dg-final { scan-assembler-not "vmovaps" } }
Re: Compilation warning in simple-object-xcoff.c
On Jan 16 2018, DJ Delorie wrote: > And it's not the host's bit size that counts; there are usually ways to > get 64-bit file operations on 32-bit hosts. If ACX_LARGEFILE doesn't succeed in enabling those 64-bit file operations (thus making off_t a 64-bit type) then you are out of luck (or AC_SYS_LARGEFILE doesn't support your host yet). Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [PATCH] avoid assuming known string length is constant (PR 83896)
On 01/16/2018 12:37 PM, Richard Sandiford wrote: Martin Sebor writes: Recent improvements to the strlen pass introduced the assumption that when the length of a string has been recorded by the pass the length is necessarily constant. Bug 83896 shows that this assumption is not always true, and that GCC fails with an ICE when it doesn't hold. To avoid the ICE the attached patch removes the assumption. x86_64-linux bootstrap successful, regression test in progress. Martin PR tree-optimization/83896 - ice in get_string_len on a call to strlen with non-constant length gcc/ChangeLog: PR tree-optimization/83896 * tree-ssa-strlen.c (get_string_len): Avoid assuming length is constant. gcc/testsuite/ChangeLog: PR tree-optimization/83896 * gcc.dg/strlenopt-43.c: New test. Index: gcc/tree-ssa-strlen.c === --- gcc/tree-ssa-strlen.c (revision 256752) +++ gcc/tree-ssa-strlen.c (working copy) @@ -2772,7 +2772,9 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) } } -/* Check if RHS is string_cst possibly wrapped by mem_ref. */ +/* If RHS, either directly or indirectly, refers to a string of constant + length, return it. Otherwise return a negative value. */ + static int get_string_len (tree rhs) { I think this should be returning HOST_WIDE_INT given the unconstrained tree_to_shwi return. Same type change for rhslen in the caller. Thanks for looking at it! I confess it's not completely clear to me in what type the pass tracks string lengths. For string constants, get_stridx() returns an int with the their length bit-flipped. I tried to maintain that invariant in the change I introduced in the block toward the end of the function (in a different patch). But then in other places the pass works with HOST_WIDE_INT, so it looks like it would be appropriate to use here as well. I tried to come up with a test case that would exercise this conversion but couldn't. If you (or someone else) have an idea for one I'd be more than happy to add it to the test suite. (Not my call, but it might be better to have a more specific function name, given that the file already had "get_string_length" before this function was added.) I renamed it (again), this time to get_string_cst_length(). Nothing better came to mind. @@ -2789,7 +2791,8 @@ get_string_len (tree rhs) if (idx > 0) { strinfo *si = get_strinfo (idx); - if (si && si->full_string_p) + if (si && si->full_string_p + && TREE_CODE (si->nonzero_chars) == INTEGER_CST) return tree_to_shwi (si->nonzero_chars); tree_fits_shwi_p? Sigh. Yes. I still keep forgetting about all these gotchas. Dealing with integers is so painfully error-prone in GCC (as evident from all the bug reports with ICEs for these things). It would be much simpler and safer if tree_to_shwi() returned true on success and false for error (e.g., null, non-const, or overflow) and took an extra argument for the result. Then the code would become: HOST_WIDE_INT result; if (si && tree_to_shwi (&result, si->nonzero_chars)) return result; and it would be nearly impossible to forget to check for bad input. Anyway, attached is an updated patch. Martin Thanks, Richard } } Index: gcc/testsuite/gcc.dg/strlenopt-43.c === --- gcc/testsuite/gcc.dg/strlenopt-43.c (nonexistent) +++ gcc/testsuite/gcc.dg/strlenopt-43.c (working copy) @@ -0,0 +1,13 @@ +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen + with non-constant length + { dg-do compile } + { dg-options "-O2 -Wall" } */ + +extern char a[5]; +extern char b[]; + +void f (void) +{ + if (__builtin_strlen (b) != 4) +__builtin_memcpy (a, b, sizeof a); +} PR tree-optimization/83896 - ice in get_string_len on a call to strlen with non-constant length gcc/ChangeLog: PR tree-optimization/83896 * tree-ssa-strlen.c (get_string_len): Rename... (get_string_cst_length): ...to this. Return HOST_WIDE_INT. Avoid assuming length is constant. (handle_char_store): Use HOST_WIDE_INT for string length. gcc/testsuite/ChangeLog: PR tree-optimization/83896 * gcc.dg/strlenopt-43.c: New test. Index: gcc/tree-ssa-strlen.c === --- gcc/tree-ssa-strlen.c (revision 256752) +++ gcc/tree-ssa-strlen.c (working copy) @@ -2772,16 +2772,20 @@ handle_pointer_plus (gimple_stmt_iterator *gsi) } } -/* Check if RHS is string_cst possibly wrapped by mem_ref. */ -static int -get_string_len (tree rhs) +/* If RHS, either directly or indirectly, refers to a string of constant + length, return it. Otherwise return a negative value. */ + +static HOST_WIDE_INT +get_string_cst_length (tree rhs) { if (TREE_CODE (rhs) == MEM
Re: [PATCH] avoid assuming known string length is constant (PR 83896)
On 01/16/2018 02:26 PM, Jakub Jelinek wrote: On Tue, Jan 16, 2018 at 07:37:30PM +, Richard Sandiford wrote: -/* Check if RHS is string_cst possibly wrapped by mem_ref. */ +/* If RHS, either directly or indirectly, refers to a string of constant + length, return it. Otherwise return a negative value. */ + static int get_string_len (tree rhs) { I think this should be returning HOST_WIDE_INT given the unconstrained tree_to_shwi return. Same type change for rhslen in the caller. (Not my call, but it might be better to have a more specific function name, given that the file already had "get_string_length" before this function was added.) Yeah, certainly for both. @@ -2789,7 +2791,8 @@ get_string_len (tree rhs) if (idx > 0) { strinfo *si = get_strinfo (idx); - if (si && si->full_string_p) + if (si && si->full_string_p + && TREE_CODE (si->nonzero_chars) == INTEGER_CST) return tree_to_shwi (si->nonzero_chars); tree_fits_shwi_p? Surely that instead of TREE_CODE check, but even that will not make sure it fits into host int, so yes, it should be HOST_WIDE_INT and the code should make sure it is also >= 0. I made these changes except for the last part: How/when can the length be negative? Martin
Re: [PATCH, rs6000] (v2) Support for gimple folding of mergeh, mergel intrinsics
Hi! On Tue, Jan 16, 2018 at 01:39:28PM -0600, Will Schmidt wrote: > Sniff-tests of the target tests on a single system look OK. Full regtests are > currently running across assorted power systems. > OK for trunk, pending successful results? Just a few little things: > 2018-01-16 Will Schmidt > > * config/rs6000/rs6000.c: (rs6000_gimple_builtin) Add gimple folding > support for merge[hl]. The : goes after the ). > (define_insn "altivec_vmrghw_direct" > - [(set (match_operand:V4SI 0 "register_operand" "=v") > -(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") > - (match_operand:V4SI 2 "register_operand" "v")] > - UNSPEC_VMRGH_DIRECT))] > + [(set (match_operand:V4SI 0 "register_operand" "=v,wa") > + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v,wa") > + (match_operand:V4SI 2 "register_operand" "v,wa")] > + UNSPEC_VMRGH_DIRECT))] >"TARGET_ALTIVEC" > - "vmrghw %0,%1,%2" > + "@ > + vmrghw %0,%1,%2 > + xxmrghw %x0,%x1,%x2" Those last two lines should be indented one more space, so that everything aligns (with the @). > + "@ > + vmrglw %0,%1,%2 > + xxmrglw %x0,%x1,%x2" Same here of course. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-be-folded.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile { target { powerpc-*-* } } } */ Do you want powerpc*-*-*? That is default in gcc.target/powerpc; dg-do compile is default, too, so you can either say /* { dg-do compile } */ or nothing at all, to taste. But it looks like you want to restrict to BE? We still don't have a dejagnu thingy for that; you could put some #ifdef around it all (there are some examples in other testcases). Not ideal, but works. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-float.c > @@ -0,0 +1,26 @@ > +/* Verify that overloaded built-ins for vec_splat with float > + inputs produce the right code. */ > + > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-options "-maltivec -O2" } */ Either powerpc_altivec_ok or -mvsx? > new file mode 100644 > index 000..ab5f54e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-longlong.c > @@ -0,0 +1,48 @@ > +/* Verify that overloaded built-ins for vec_merge* with long long > + inputs produce the right code. */ > + > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-options "-mvsx -O2" } */ Either powerpc_vsx_ok or -mpower8-vector? > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-pixel.c > @@ -0,0 +1,24 @@ > +/* Verify that overloaded built-ins for vec_splat with pixel > + inputs produce the right code. */ > + > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-options "-maltivec -mvsx -O2" } */ -mvsx implies -maltivec (not wrong of course, just a bit weird). Okay for trunk with those nits fixed. Thanks! Segher
Re: [PATCH] avoid assuming known string length is constant (PR 83896)
On Tue, Jan 16, 2018 at 03:20:24PM -0700, Martin Sebor wrote: > Thanks for looking at it! I confess it's not completely clear > to me in what type the pass tracks string lengths. For string > constants, get_stridx() returns an int with the their length > bit-flipped. I tried to maintain that invariant in the change That is because TREE_STRING_LENGTH is an int, so gcc doesn't allow string literals longer than 2GB. All other length are tracked as tree. Jakub