Re: [ipa-vrp] ice in set_value_range
Hi Andrew, On 09/11/16 17:02, Andrew Pinski wrote: Either this patch or the patch for "Handle unary pass-through jump functions for ipa-vrp" caused a bootstrap failure on aarch64-linux-gnu. Bootstrap comparison failure! gcc/go/types.o differs gcc/fortran/class.o differs gcc/tree-ssa-live.o differs gcc/data-streamer-out.o differs gcc/ira-build.o differs gcc/hsa-gen.o differs gcc/hsa-brig.o differs gcc/omp-low.o differs gcc/lto-streamer-in.o differs gcc/real.o differs gcc/final.o differs gcc/df-core.o differs I bootstrap with the following options: --with-cpu=thunderx+lse --enable-languages=c,c++,fortran,go --disable-werror --with-sysroot=/ --enable-plugins --enable-gnu-indirect-function I have not tried removing the +lse part though Sorry about the breakage. I will try to reproduce this. Thanks, Kugan
Re: [PATCH] Fix PR78189
On Tue, 8 Nov 2016, Richard Biener wrote: > On Tue, 8 Nov 2016, Richard Biener wrote: > > > On Mon, 7 Nov 2016, Christophe Lyon wrote: > > > > > Hi Richard, > > > > > > > > > On 7 November 2016 at 09:01, Richard Biener wrote: > > > > > > > > The following fixes an oversight when computing alignment in the > > > > vectorizer. > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > > > > > > > Richard. > > > > > > > > 2016-11-07 Richard Biener > > > > > > > > PR tree-optimization/78189 > > > > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix > > > > alignment computation. > > > > > > > > * g++.dg/torture/pr78189.C: New testcase. > > > > > > > > Index: gcc/testsuite/g++.dg/torture/pr78189.C > > > > === > > > > --- gcc/testsuite/g++.dg/torture/pr78189.C (revision 0) > > > > +++ gcc/testsuite/g++.dg/torture/pr78189.C (working copy) > > > > @@ -0,0 +1,41 @@ > > > > +/* { dg-do run } */ > > > > +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" > > > > } */ > > > > + > > > > +#include > > > > + > > > > +struct A > > > > +{ > > > > + void * a; > > > > + void * b; > > > > +}; > > > > + > > > > +struct alignas(16) B > > > > +{ > > > > + void * pad; > > > > + void * misaligned; > > > > + void * pad2; > > > > + > > > > + A a; > > > > + > > > > + void Null(); > > > > +}; > > > > + > > > > +void B::Null() > > > > +{ > > > > + a.a = nullptr; > > > > + a.b = nullptr; > > > > +} > > > > + > > > > +void __attribute__((noinline,noclone)) > > > > +NullB(void * misalignedPtr) > > > > +{ > > > > + B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) > > > > - offsetof(B, misaligned)); > > > > + b->Null(); > > > > +} > > > > + > > > > +int main() > > > > +{ > > > > + B b; > > > > + NullB(&b.misaligned); > > > > + return 0; > > > > +} > > > > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > > > > index 9346cfe..b03cb1e 100644 > > > > --- gcc/tree-vect-data-refs.c > > > > +++ gcc/tree-vect-data-refs.c > > > > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct > > > > data_reference *dr) > > > >base = ref; > > > >while (handled_component_p (base)) > > > > base = TREE_OPERAND (base, 0); > > > > + unsigned int base_alignment; > > > > + unsigned HOST_WIDE_INT base_bitpos; > > > > + get_object_alignment_1 (base, &base_alignment, &base_bitpos); > > > > + /* As data-ref analysis strips the MEM_REF down to its base operand > > > > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to > > > > + adjust things to make base_alignment valid as the alignment of > > > > + DR_BASE_ADDRESS. */ > > > >if (TREE_CODE (base) == MEM_REF) > > > > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr, > > > > - build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), > > > > 0)); > > > > - unsigned int base_alignment = get_object_alignment (base); > > > > +{ > > > > + base_bitpos -= mem_ref_offset (base).to_short_addr () * > > > > BITS_PER_UNIT; > > > > + base_bitpos &= (base_alignment - 1); > > > > +} > > > > + if (base_bitpos != 0) > > > > +base_alignment = base_bitpos & -base_bitpos; > > > > + /* Also look at the alignment of the base address DR analysis > > > > + computed. */ > > > > + unsigned int base_addr_alignment = get_pointer_alignment (base_addr); > > > > + if (base_addr_alignment > base_alignment) > > > > +base_alignment = base_addr_alignment; > > > > > > > >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) > > > > DR_VECT_AUX (dr)->base_element_aligned = true; > > > > > > Since you committed this patch (r241892), I'm seeing execution failures: > > > gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test > > > gcc.dg/vect/pr40074.c execution test > > > on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9 > > > --with-fpu=neon-fp16 > > > (using qemu as simulator) > > > > The difference is that we now vectorize the testcase with versioning > > for alignment (but it should never execute the vectorized variant). > > I need arm peoples help to understand what is wrong. > > > > At least the testcase shows there is (kind-of) a missed optimization > > that we no longer figure out versioning for alignment is useless. > > I'll look into that. > > I'm testing the following which restores previous behavior. You > might still want to figure out what went wrong. This is what I have applied. Bootstrapped / regtested on x86_64-unknown-linux-gnu. Richard. 2016-11-09 Richard Biener * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Look at the DR_BASE_ADDRESS object for forcing alignment. Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 241990) +++ gcc/tree-vect-d
[PATCH] Fix PR78007
The following implements vectorization of bswap via VEC_PERM_EXPR on the corresponding QImode vector. ARM already has backend handling via the builtin_vectorized_call hook and thus there were already testcases available. It doesn't end up working for vect-bswap16.c because we have a promoted argument to __builtin_bswap16 which confuses vectorization. Eventually the testcase should also succeed on vect_perm_byte targets but I have no way to verify that. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2016-11-09 Richard Biener PR tree-optimization/78007 * tree-vect-stmts.c (vectorizable_bswap): New function. (vectorizable_call): Call vectorizable_bswap for BUILT_IN_BSWAP{16,32,64} if arguments are not promoted. * gcc.dg/vect/vect-bswap32.c: Adjust. * gcc.dg/vect/vect-bswap64.c: Likewise. Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 241959) +++ gcc/tree-vect-stmts.c (working copy) @@ -2432,6 +2432,116 @@ vectorizable_mask_load_store (gimple *st return true; } +/* Check and perform vectorization of BUILT_IN_BSWAP{16,32,64}. */ + +static bool +vectorizable_bswap (gimple *stmt, gimple_stmt_iterator *gsi, + gimple **vec_stmt, slp_tree slp_node, + tree vectype_in, enum vect_def_type *dt) +{ + tree op, vectype; + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); + unsigned ncopies, nunits; + + op = gimple_call_arg (stmt, 0); + vectype = STMT_VINFO_VECTYPE (stmt_info); + nunits = TYPE_VECTOR_SUBPARTS (vectype); + + /* Multiple types in SLP are handled by creating the appropriate number of + vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in + case of SLP. */ + if (slp_node) +ncopies = 1; + else +ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; + + gcc_assert (ncopies >= 1); + + tree char_vectype = get_same_sized_vectype (char_type_node, vectype_in); + if (! char_vectype) +return false; + + unsigned char *elts += XALLOCAVEC (unsigned char, TYPE_VECTOR_SUBPARTS (char_vectype)); + unsigned char *elt = elts; + unsigned word_bytes = TYPE_VECTOR_SUBPARTS (char_vectype) / nunits; + for (unsigned i = 0; i < nunits; ++i) +for (unsigned j = 0; j < word_bytes; ++j) + *elt++ = (i + 1) * word_bytes - j - 1; + + if (! can_vec_perm_p (TYPE_MODE (char_vectype), false, elts)) +return false; + + if (! vec_stmt) +{ + STMT_VINFO_TYPE (stmt_info) = call_vec_info_type; + if (dump_enabled_p ()) +dump_printf_loc (MSG_NOTE, vect_location, "=== vectorizable_bswap ===" + "\n"); + if (! PURE_SLP_STMT (stmt_info)) + { + add_stmt_cost (stmt_info->vinfo->target_cost_data, +1, vector_stmt, stmt_info, 0, vect_prologue); + add_stmt_cost (stmt_info->vinfo->target_cost_data, +ncopies, vec_perm, stmt_info, 0, vect_body); + } + return true; +} + + tree *telts = XALLOCAVEC (tree, TYPE_VECTOR_SUBPARTS (char_vectype)); + for (unsigned i = 0; i < TYPE_VECTOR_SUBPARTS (char_vectype); ++i) +telts[i] = build_int_cst (char_type_node, elts[i]); + tree bswap_vconst = build_vector (char_vectype, telts); + + /* Transform. */ + vec vec_oprnds = vNULL; + gimple *new_stmt = NULL; + stmt_vec_info prev_stmt_info = NULL; + for (unsigned j = 0; j < ncopies; j++) +{ + /* Handle uses. */ + if (j == 0) +vect_get_vec_defs (op, NULL, stmt, &vec_oprnds, NULL, slp_node, -1); + else +vect_get_vec_defs_for_stmt_copy (dt, &vec_oprnds, NULL); + + /* Arguments are ready. create the new vector stmt. */ + unsigned i; + tree vop; + FOR_EACH_VEC_ELT (vec_oprnds, i, vop) + { +tree tem = make_ssa_name (char_vectype); +new_stmt = gimple_build_assign (tem, build1 (VIEW_CONVERT_EXPR, + char_vectype, vop)); +vect_finish_stmt_generation (stmt, new_stmt, gsi); +tree tem2 = make_ssa_name (char_vectype); +new_stmt = gimple_build_assign (tem2, VEC_PERM_EXPR, +tem, tem, bswap_vconst); +vect_finish_stmt_generation (stmt, new_stmt, gsi); +tem = make_ssa_name (vectype); +new_stmt = gimple_build_assign (tem, build1 (VIEW_CONVERT_EXPR, + vectype, tem2)); +vect_finish_stmt_generation (stmt, new_stmt, gsi); + if (slp_node) + SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); + } + + if (slp_node) +continue; + + if (j == 0) +STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else +STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + + prev_stmt_in
Re: [PATCH] combine: Do not call simplify from inside change_zero_ext (PR78232)
> This can also happen if a splitter in the MD uses nonzero_bits (for > example). I tried to make the splitting code in combine save and restore > the i2dest reg_stat info, but that causes one of the acats tests to fail. Could you elaborate? Did you try to temporarily invalidate it, e.g. by means of last_set_label, or something more involved? -- Eric Botcazou
Re: [match.pd] Fix for PR35691
On 8 November 2016 at 16:46, Richard Biener wrote: > On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote: > >> On 8 November 2016 at 13:23, Richard Biener wrote: >> > On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> On 7 November 2016 at 23:06, Prathamesh Kulkarni >> >> wrote: >> >> > On 7 November 2016 at 15:43, Richard Biener wrote: >> >> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: >> >> >> >> >> >>> On 4 November 2016 at 13:41, Richard Biener wrote: >> >> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote: >> >> >>> > >> >> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote: >> >> >>> >> >> >> >>> >> > > > > The transform would also work for vectors >> >> >>> >> > > > > (element_precision for >> >> >>> >> > > > > the test but also a value-matching zero which should >> >> >>> >> > > > > ensure the >> >> >>> >> > > > > same number of elements). >> >> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal >> >> >>> >> > > > length by a >> >> >>> >> > > > matching zero. >> >> >>> >> > > > Could you please elaborate on that ? >> >> >>> >> > > >> >> >>> >> > > He may have meant something like: >> >> >>> >> > > >> >> >>> >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) >> >> >>> >> > >> >> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 >> >> >>> >> > which was the >> >> >>> >> > point of the pattern. >> >> >>> >> >> >> >>> >> Oups, that's what I had written first, and then I somehow managed >> >> >>> >> to confuse >> >> >>> >> myself enough to remove it so as to remove the call to types_match >> >> >>> >> :-( >> >> >>> >> >> >> >>> >> > > So the last operand is checked with operand_equal_p instead of >> >> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on >> >> >>> >> > > the >> >> >>> >> > > comparison results should already imply that the number of >> >> >>> >> > > elements is the >> >> >>> >> > > same. >> >> >>> >> > >> >> >>> >> > Though for equality compares we also allow scalar results IIRC. >> >> >>> >> >> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to >> >> >>> >> generate one >> >> >>> >> for a testcase, at least until the GIMPLE FE lands... >> >> >>> >> >> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, >> >> >>> >> > > maybe some >> >> >>> >> > > vector mode on s390?), it would be cool to do the same for >> >> >>> >> > > floats (most >> >> >>> >> > > likely at the RTL level). >> >> >>> >> > >> >> >>> >> > On GIMPLE view-converts could come to the rescue here as well. >> >> >>> >> > Or we cab >> >> >>> >> > just allow bit-and/or on floats as much as we allow them on >> >> >>> >> > pointers. >> >> >>> >> >> >> >>> >> Would that generate sensible code on targets that do not have >> >> >>> >> logic insns for >> >> >>> >> floats? Actually, even on x86_64 that generates inefficient code, >> >> >>> >> so there >> >> >>> >> would be some work (for instance grep finds no gen_iordf3, only >> >> >>> >> gen_iorv2df3). >> >> >>> >> >> >> >>> >> I am also a bit wary of doing those obfuscating optimizations too >> >> >>> >> early... >> >> >>> >> a==0 is something that other optimizations might use. long >> >> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so... >> >> >>> >> >> >> >>> >> (and I am assuming that signaling NaNs don't make the whole >> >> >>> >> transformation >> >> >>> >> impossible, which might be wrong) >> >> >>> > >> >> >>> > Yeah. I also think it's not so much important - I just wanted to >> >> >>> > mention >> >> >>> > vectors... >> >> >>> > >> >> >>> > Btw, I still think we need a more sensible infrastructure for passes >> >> >>> > to gather, analyze and modify complex conditions. (I'm always >> >> >>> > pointing >> >> >>> > to tree-affine.c as an, albeit not very good, example for handling >> >> >>> > a similar problem) >> >> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of >> >> >>> this match.pd feature. >> >> >>> The current patch keeps it restricted to only bitwise operators on >> >> >>> integers. >> >> >>> Bootstrap+test running on x86_64-unknown-linux-gnu. >> >> >>> OK to commit if passes ? >> >> >> >> >> >> +/* PR35691: Transform >> >> >> + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. >> >> >> + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ >> >> >> + >> >> >> >> >> >> Please omit the vertical space >> >> >> >> >> >> +(for bitop (bit_and bit_ior) >> >> >> + cmp (eq ne) >> >> >> + (simplify >> >> >> + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) >> >> >> >> >> >> if you capture the first integer_zerop as @2 then you can re-use it... >> >> >> >> >> >> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) >> >> >> + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) >> >> >> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE >> >> >> (@1))) >> >> >> +(cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0)); >> >> >> >> >> >> ... here inplace of the { build_zero_cst ... }. >> >> >> >>
Re: [PATCH] Fix PR78189
On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener wrote: > On Mon, 7 Nov 2016, Christophe Lyon wrote: > >> Hi Richard, >> >> >> On 7 November 2016 at 09:01, Richard Biener wrote: >> > >> > The following fixes an oversight when computing alignment in the >> > vectorizer. >> > >> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. >> > >> > Richard. >> > >> > 2016-11-07 Richard Biener >> > >> > PR tree-optimization/78189 >> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix >> > alignment computation. >> > >> > * g++.dg/torture/pr78189.C: New testcase. >> > >> > Index: gcc/testsuite/g++.dg/torture/pr78189.C >> > === >> > --- gcc/testsuite/g++.dg/torture/pr78189.C (revision 0) >> > +++ gcc/testsuite/g++.dg/torture/pr78189.C (working copy) >> > @@ -0,0 +1,41 @@ >> > +/* { dg-do run } */ >> > +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" } >> > */ >> > + >> > +#include >> > + >> > +struct A >> > +{ >> > + void * a; >> > + void * b; >> > +}; >> > + >> > +struct alignas(16) B >> > +{ >> > + void * pad; >> > + void * misaligned; >> > + void * pad2; >> > + >> > + A a; >> > + >> > + void Null(); >> > +}; >> > + >> > +void B::Null() >> > +{ >> > + a.a = nullptr; >> > + a.b = nullptr; >> > +} >> > + >> > +void __attribute__((noinline,noclone)) >> > +NullB(void * misalignedPtr) >> > +{ >> > + B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) - >> > offsetof(B, misaligned)); >> > + b->Null(); >> > +} >> > + >> > +int main() >> > +{ >> > + B b; >> > + NullB(&b.misaligned); >> > + return 0; >> > +} >> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c >> > index 9346cfe..b03cb1e 100644 >> > --- gcc/tree-vect-data-refs.c >> > +++ gcc/tree-vect-data-refs.c >> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct >> > data_reference *dr) >> >base = ref; >> >while (handled_component_p (base)) >> > base = TREE_OPERAND (base, 0); >> > + unsigned int base_alignment; >> > + unsigned HOST_WIDE_INT base_bitpos; >> > + get_object_alignment_1 (base, &base_alignment, &base_bitpos); >> > + /* As data-ref analysis strips the MEM_REF down to its base operand >> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to >> > + adjust things to make base_alignment valid as the alignment of >> > + DR_BASE_ADDRESS. */ >> >if (TREE_CODE (base) == MEM_REF) >> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr, >> > - build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 0)); >> > - unsigned int base_alignment = get_object_alignment (base); >> > +{ >> > + base_bitpos -= mem_ref_offset (base).to_short_addr () * >> > BITS_PER_UNIT; >> > + base_bitpos &= (base_alignment - 1); >> > +} >> > + if (base_bitpos != 0) >> > +base_alignment = base_bitpos & -base_bitpos; >> > + /* Also look at the alignment of the base address DR analysis >> > + computed. */ >> > + unsigned int base_addr_alignment = get_pointer_alignment (base_addr); >> > + if (base_addr_alignment > base_alignment) >> > +base_alignment = base_addr_alignment; >> > >> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) >> > DR_VECT_AUX (dr)->base_element_aligned = true; >> >> Since you committed this patch (r241892), I'm seeing execution failures: >> gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/pr40074.c execution test >> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9 >> --with-fpu=neon-fp16 >> (using qemu as simulator) > > The difference is that we now vectorize the testcase with versioning > for alignment (but it should never execute the vectorized variant). > I need arm peoples help to understand what is wrong. Hi All, I will look at it. Thanks, bin > > At least the testcase shows there is (kind-of) a missed optimization > that we no longer figure out versioning for alignment is useless. > I'll look into that. > > Richard.
Re: [PATCH] combine: Do not call simplify from inside change_zero_ext (PR78232)
On Wed, Nov 09, 2016 at 09:30:59AM +0100, Eric Botcazou wrote: > > This can also happen if a splitter in the MD uses nonzero_bits (for > > example). I tried to make the splitting code in combine save and restore > > the i2dest reg_stat info, but that causes one of the acats tests to fail. > > Could you elaborate? Did you try to temporarily invalidate it, e.g. by means > of last_set_label, or something more involved? You need more than that, e.g. reg_nonzero_bits_for_combine looks at the nonzero_bits field no matter what else. So I set all fields to "safe" values and restored them later (if no combination is done). But this fails in acats c66002c (all other tests for all languages work fine). I didn't have the heart to look into this testcase breakage; the save/restore thing would make this already way over-complex code a lot more complex still, so I made a safer change instead. Ideally reg_stat will just go away, but that will not happen soon. Segher
Re: [PATCH] debug/PR78112: remove recent duplicates for DW_TAG_subprogram attributes
On Tue, Nov 8, 2016 at 3:24 PM, Pierre-Marie de Rodat wrote: > Hello, > > This patch comes from the discussion for PR debug/78112 > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78112). It seems to fix > the regression Rainer detected on x86_64-apple-darwin, although I still > wonder why remaining duplicate attributes (which were already present > before my original patch) are not a problem on this platform. > > Anyway, bootstrapped and regtested on x86_64-linux. As I said on the > PR, I managed to check with a Python script that there were no > duplicates anymore but I don't know how to turn this into a DejaGNU > testcase. Using scan-assembler-times on the dwarf? > Ok to commit? Ok. Richard. > gcc/ > > PR debug/78112 > * dwarf2out.c (dwarf2out_early_global_decl): Call dwarf2out_decl > on the context only when it has no DIE yet. > --- > gcc/dwarf2out.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c > index 1dfff38..f9ec090 100644 > --- a/gcc/dwarf2out.c > +++ b/gcc/dwarf2out.c > @@ -25256,11 +25256,11 @@ dwarf2out_early_global_decl (tree decl) > if (!DECL_STRUCT_FUNCTION (decl)) > goto early_decl_exit; > > - /* For nested functions, emit DIEs for the parents first so that all > -nested DIEs are generated at the proper scope in the first > -shot. */ > + /* For nested functions, make sure we have DIEs for the parents > first > +so that all nested DIEs are generated at the proper scope in the > +first shot. */ > tree context = decl_function_context (decl); > - if (context != NULL) > + if (context != NULL && lookup_decl_die (context) == NULL) > { > current_function_decl = context; > dwarf2out_decl (context); > -- > 2.10.2 >
Re: RFA (libstdc++): C++ PATCH to implement C++17 noexcept in type system
On Tue, Nov 8, 2016 at 9:11 AM, Christophe Lyon wrote: > Hi, > > On 7 November 2016 at 23:56, Jonathan Wakely wrote: >> On 7 November 2016 at 22:49, Jason Merrill wrote: >>> Tested x86_64-pc-linux-gnu. Are the libstdc++ changes OK for trunk? >> >> Yes, I like the approach, thanks. > > The new test "g++.dg/cpp1z/noexcept-type11.C" fails on arm: > /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/cpp1z/noexcept-type11.C:3:6: > warning: mangled name for 'void f(int (*)() noexcept)' will change in > C++17 because the exception specification is part of a function type > [-Wc++1z-compat] > In function 'void f(int (*)() noexcept)': > cc1plus: warning: mangled name for '' will change in C++17 > because the exception specification is part of a function type > [-Wc++1z-compat] > cc1plus: warning: mangled name for '' will change in C++17 > because the exception specification is part of a function type > [-Wc++1z-compat] > > FAIL: g++.dg/cpp1z/noexcept-type11.C (test for excess errors) I can confirm this and created PR78269 for tracking. I also noticed noexcept-type9.C fails on both arm/aarch64 target. Thanks, bin > > > Christophe
Re: [patch, avr] Add flash size to device info and make wrap around default
On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote: On 08.11.2016 08:08, Pitchumani Sivanupandi wrote: I have updated patch to include the flash size as well. Took that info from device headers (it was fed into crt's device information note section also). The new option would render -mn-flash superfluous, but we should keep it for backward compatibility. Ok. Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from LINK_RELAX_SPEC? And what happens if relaxation is off? Yes. Removed link_pmem_wrap from link_relax. Disabling relaxation doesn't change -mpmem-wrap-around behavior. Now, wrap around behavior is changed as follows: For 8K flash devices: Device specs adds --pmem-wrap-around=8k linker option if -mno-pmem-wrap-around is NOT enabled. It makes the --pmem-wrap-around=8k linker option default for 8k flash devices. For 16/32/64K flash devices: Spec string 'link_pmem_wrap' added to all 16/32/64k flash devices specs. Other wise no changes i.e. It adds --pmem-wrap-around=16/32/64k option if -mpmem-wrap-around option is enabled. For other devices, no changes in device specs. Reg tested with default and -mrelax options enabled. No issues. Regards, Pitchumani gcc/ChangeLog 2016-11-08 Pitchumani Sivanupandi * config/avr/avr-arch.h (avr_mcu_t): Add flash_size member. * config/avr/avr-devices.c(avr_mcu_types): Add flash size info. * config/avr/avr-mcu.def: Likewise. * config/avr/gen-avr-mmcu-specs.c (print_mcu): Remove hard-coded prefix check to find wrap-around value, instead use MCU flash size. For 8k flash devices, update link_pmem_wrap spec string to add --pmem-wrap-around=8k. * config/avr/specs.h: Remove link_pmem_wrap from LINK_RELAX_SPEC and add to linker specs (LINK_SPEC) directly. flashsize-and-wrap-around.patch diff --git a/gcc/config/avr/avr-mcus.def b/gcc/config/avr/avr-mcus.def index 6bcc6ff..9d4aa1a 100644 /* Classic, == 128K. */ -AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,0x0060, 0x0, 2) -AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 0x0060, 0x0, 2) -AVR_MCU ("at43usb320", ARCH_AVR31, AVR_ISA_NONE, "__AVR_AT43USB320__", 0x0060, 0x0, 2) +AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,0x0060, 0x0, 2, 0x2) +AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2) +AVR_MCU ("at43usb320", ARCH_AVR31, AVR_ISA_NONE, "__AVR_AT43USB320__", 0x0060, 0x0, 2, 0x1) This looks incorrect: either .flash_size should be 0x2 or n_flash be 1. As you draw the information from internal hardware descriptions, I'd guess that the new information is more reliable? ...as this also determines whether AT43USB320supports ELPM this also means that the device is in the wrong multilib set? I couldn't find the data sheet at atmel.com, and the one I found on the net only mentions 64KiB program memory. It mentions LPM on pp. 9 but has no reference to the supported instruction set. From the manual I would conclude that this device should be avr3, not avr31? Yes. I couldn't find any other source other than the datasheet in net. This device do not have internal program memory. Flash size is set to 64K as the device could address only 64K. No RAMPZ register, so no ELPM. Moved device to AVR3 architecture. +AVR_MCU ("atxmega384c3", ARCH_AVRXMEGA6, AVR_ISA_RMW, "__AVR_ATxmega384C3__", 0x2000, 0x0, 6, 0x62000) +AVR_MCU ("atxmega384d3", ARCH_AVRXMEGA6, AVR_ISA_NONE, "__AVR_ATxmega384D3__", 0x2000, 0x0, 6, 0x62000) /* Xmega, 128K < Flash, RAM > 64K RAM. */ Two more glitches; presumably, .n_flash should be 7 here? Or even better, we can drop .n_flash field in the future and use the more reliable, finer grained information from .flash_size instead? Updated. Yes, in future we could use only flash_size. Regards, Pitchumani gcc/ChangeLog 2016-11-09 Pitchumani Sivanupandi * config/avr/avr-arch.h (avr_mcu_t): Add flash_size member. * config/avr/avr-devices.c(avr_mcu_types): Add flash size info. * config/avr/avr-mcu.def: Likewise. Moved at43usb32x to AVR3 architecture as it do not support ELPM (no RAMPZ register). * config/avr/gen-avr-mmcu-specs.c (print_mcu): Remove hard-coded prefix check to find wrap-around value, instead use MCU flash size. For 8k flash devices, update link_pmem_wrap spec string to add --pmem-wrap-around=8k. * config/avr/specs.h: Remove link_pmem_wrap from LINK_RELAX_SPEC and add to linker specs (LINK_SPEC) directly. diff --git a/gcc/config/avr/avr-arch.h b/gcc/config/avr/avr-arch.h index 42eaee5..e0961d4 100644 --- a/gcc/config/avr/avr-arch.h +++ b/gcc/config/avr/avr-arch.h @@ -122,6 +122,9 @@ typedef struct /* Number of 64k segments in the flash. */ int n_flash; + + /* Flash size in bytes. */ + int flash_size; } avr_mcu_t; /* AVR dev
Re: [match.pd] Fix for PR35691
On Wed, 9 Nov 2016, Prathamesh Kulkarni wrote: > On 8 November 2016 at 16:46, Richard Biener wrote: > > On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote: > > > >> On 8 November 2016 at 13:23, Richard Biener wrote: > >> > On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote: > >> > > >> >> On 7 November 2016 at 23:06, Prathamesh Kulkarni > >> >> wrote: > >> >> > On 7 November 2016 at 15:43, Richard Biener wrote: > >> >> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote: > >> >> >> > >> >> >>> On 4 November 2016 at 13:41, Richard Biener > >> >> >>> wrote: > >> >> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote: > >> >> >>> > > >> >> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote: > >> >> >>> >> > >> >> >>> >> > > > > The transform would also work for vectors > >> >> >>> >> > > > > (element_precision for > >> >> >>> >> > > > > the test but also a value-matching zero which should > >> >> >>> >> > > > > ensure the > >> >> >>> >> > > > > same number of elements). > >> >> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal > >> >> >>> >> > > > length by a > >> >> >>> >> > > > matching zero. > >> >> >>> >> > > > Could you please elaborate on that ? > >> >> >>> >> > > > >> >> >>> >> > > He may have meant something like: > >> >> >>> >> > > > >> >> >>> >> > > (op (cmp @0 integer_zerop@2) (cmp @1 @2)) > >> >> >>> >> > > >> >> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 > >> >> >>> >> > which was the > >> >> >>> >> > point of the pattern. > >> >> >>> >> > >> >> >>> >> Oups, that's what I had written first, and then I somehow > >> >> >>> >> managed to confuse > >> >> >>> >> myself enough to remove it so as to remove the call to > >> >> >>> >> types_match :-( > >> >> >>> >> > >> >> >>> >> > > So the last operand is checked with operand_equal_p instead > >> >> >>> >> > > of > >> >> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on > >> >> >>> >> > > the > >> >> >>> >> > > comparison results should already imply that the number of > >> >> >>> >> > > elements is the > >> >> >>> >> > > same. > >> >> >>> >> > > >> >> >>> >> > Though for equality compares we also allow scalar results IIRC. > >> >> >>> >> > >> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to > >> >> >>> >> generate one > >> >> >>> >> for a testcase, at least until the GIMPLE FE lands... > >> >> >>> >> > >> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, > >> >> >>> >> > > maybe some > >> >> >>> >> > > vector mode on s390?), it would be cool to do the same for > >> >> >>> >> > > floats (most > >> >> >>> >> > > likely at the RTL level). > >> >> >>> >> > > >> >> >>> >> > On GIMPLE view-converts could come to the rescue here as well. > >> >> >>> >> > Or we cab > >> >> >>> >> > just allow bit-and/or on floats as much as we allow them on > >> >> >>> >> > pointers. > >> >> >>> >> > >> >> >>> >> Would that generate sensible code on targets that do not have > >> >> >>> >> logic insns for > >> >> >>> >> floats? Actually, even on x86_64 that generates inefficient > >> >> >>> >> code, so there > >> >> >>> >> would be some work (for instance grep finds no gen_iordf3, only > >> >> >>> >> gen_iorv2df3). > >> >> >>> >> > >> >> >>> >> I am also a bit wary of doing those obfuscating optimizations > >> >> >>> >> too early... > >> >> >>> >> a==0 is something that other optimizations might use. long > >> >> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so... > >> >> >>> >> > >> >> >>> >> (and I am assuming that signaling NaNs don't make the whole > >> >> >>> >> transformation > >> >> >>> >> impossible, which might be wrong) > >> >> >>> > > >> >> >>> > Yeah. I also think it's not so much important - I just wanted to > >> >> >>> > mention > >> >> >>> > vectors... > >> >> >>> > > >> >> >>> > Btw, I still think we need a more sensible infrastructure for > >> >> >>> > passes > >> >> >>> > to gather, analyze and modify complex conditions. (I'm always > >> >> >>> > pointing > >> >> >>> > to tree-affine.c as an, albeit not very good, example for handling > >> >> >>> > a similar problem) > >> >> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware > >> >> >>> of > >> >> >>> this match.pd feature. > >> >> >>> The current patch keeps it restricted to only bitwise operators on > >> >> >>> integers. > >> >> >>> Bootstrap+test running on x86_64-unknown-linux-gnu. > >> >> >>> OK to commit if passes ? > >> >> >> > >> >> >> +/* PR35691: Transform > >> >> >> + (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0. > >> >> >> + (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0. */ > >> >> >> + > >> >> >> > >> >> >> Please omit the vertical space > >> >> >> > >> >> >> +(for bitop (bit_and bit_ior) > >> >> >> + cmp (eq ne) > >> >> >> + (simplify > >> >> >> + (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop)) > >> >> >> > >> >> >> if you capture the first integer_zerop as @2 then you can re-use > >> >> >> it... > >> >> >> > >> >> >> + (if (
Re: [PATCH] combine: Do not call simplify from inside change_zero_ext (PR78232)
> You need more than that, e.g. reg_nonzero_bits_for_combine looks at the > nonzero_bits field no matter what else. Right, this one is sticky, although you can invalidate it globally by means of nonzero_sign_valid AFAICS. > I didn't have the heart to look into this testcase breakage; the > save/restore thing would make this already way over-complex code a lot > more complex still, so I made a safer change instead. I agree that the save/restore approach is probably overkill but, on the other hand, your change only papers over the issue so a big ??? comment is in order if it goes in. I'd still think that we should try to invalidate if possible. -- Eric Botcazou
[Patch, Fortran, committed] PR 71894: [OOP] ICE in gfc_add_component_ref, at fortran/class.c:227
Hi all, I have committed to trunk another obvious patch to fix an ICE on invalid code: https://gcc.gnu.org/viewcvs?rev=241993&root=gcc&view=rev Cheers, Janus
Re: [PATCH] S390: Fix PR/77822.
On Tue, Nov 08, 2016 at 05:46:04PM +0100, Matthias Klose wrote: > On 08.11.2016 15:38, Dominik Vogt wrote: > > The attached patch fixes PR/77822 on s390/s390x dor gcc-6 *only*. > > See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77822 > > > > Bootstrapped and regression tested on s390 and s390x biarch on a > > zEC12. > > missing the testcase mentioned in the ChangeLog. Sorry, patch with the missing file attached. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/ChangeLog * config/s390/s390.md ("extzv", "*extzv_zEC12") ("*extzv_z10"): Check validity of zero_extend arguments. gcc/testsuite/ChangeLog * gcc.target/s390/pr77822.c: New test for PR/77822. >From ce5616dca250c4735105346111e5ea7934b49b55 Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Tue, 8 Nov 2016 09:54:03 +0100 Subject: [PATCH] S390: Fix PR/77822. Check the range of the arguments in extzv patterns. This avoids generating risbg instructions with an out of range shift count. --- gcc/config/s390/s390.md | 12 +- gcc/testsuite/gcc.target/s390/pr77822.c | 307 2 files changed, 317 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/pr77822.c diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index b3d4370..899ed62 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -3708,6 +3708,10 @@ (clobber (reg:CC CC_REGNUM))])] "TARGET_Z10" { + if (!IN_RANGE (INTVAL (operands[3]), 0, GET_MODE_BITSIZE (DImode) - 1) + || !IN_RANGE (INTVAL (operands[3]) + INTVAL (operands[2]), 1, + GET_MODE_BITSIZE (DImode))) +FAIL; /* Starting with zEC12 there is risbgn not clobbering CC. */ if (TARGET_ZEC12) { @@ -3726,7 +3730,9 @@ (match_operand:GPR 1 "register_operand" "d") (match_operand 2 "const_int_operand" "") ; size (match_operand 3 "const_int_operand" "")))] ; start] - "TARGET_ZEC12" + "TARGET_ZEC12 + && IN_RANGE (INTVAL (operands[3]), 0, GET_MODE_BITSIZE (DImode) - 1) + && IN_RANGE (INTVAL (operands[3]) + INTVAL (operands[2]), 1, GET_MODE_BITSIZE (DImode))" "risbgn\t%0,%1,64-%2,128+63,+%3+%2" ; dst, src, start, end, shift [(set_attr "op_type" "RIE")]) @@ -3737,7 +3743,9 @@ (match_operand 2 "const_int_operand" "") ; size (match_operand 3 "const_int_operand" ""))) ; start (clobber (reg:CC CC_REGNUM))] - "TARGET_Z10" + "TARGET_Z10 + && IN_RANGE (INTVAL (operands[3]), 0, GET_MODE_BITSIZE (DImode) - 1) + && IN_RANGE (INTVAL (operands[3]) + INTVAL (operands[2]), 1, GET_MODE_BITSIZE (DImode))" "risbg\t%0,%1,64-%2,128+63,+%3+%2" ; dst, src, start, end, shift [(set_attr "op_type" "RIE") (set_attr "z10prop" "z10_super_E1")]) diff --git a/gcc/testsuite/gcc.target/s390/pr77822.c b/gcc/testsuite/gcc.target/s390/pr77822.c new file mode 100644 index 000..6789152 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/pr77822.c @@ -0,0 +1,307 @@ +/* This testcase checks that the shift operand of r*sbg instructions is in + range. */ + +/* { dg-do compile } */ +/* { dg-options "-O3 -march=zEC12 -Wno-shift-count-overflow" } */ + +int g; + +void pos_ll_129 (long long b) +{ + if (b >> 129 & 1) +g = b; +} + +void sizepos_ll_134 (long long b) +{ + if (b >> 134 & 1) +g = b; +} + +void pos_ll_65 (long long b) +{ + if (b >> 65 & 1) +g = b; +} + +void sizepos_ll_70 (long long b) +{ + if (b >> 70 & 1) +g = b; +} + +void pos_ll_33 (long long b) +{ + if (b >> 33 & 1) +g = b; +} + +void sizepos_ll_38 (long long b) +{ + if (b >> 38 & 1) +g = b; +} + +void pos_ll_17 (long long b) +{ + if (b >> 17 & 1) +g = b; +} + +void sizepos_ll_22 (long long b) +{ + if (b >> 22 & 1) +g = b; +} + +void pos_ll_8 (long long b) +{ + if (b >> 8 & 1) +g = b; +} + +void sizepos_ll_13 (long long b) +{ + if (b >> 13 & 1) +g = b; +} + +void pos_l_129 (long b) +{ + if (b >> 129 & 1) +g = b; +} + +void sizepos_l_134 (long b) +{ + if (b >> 134 & 1) +g = b; +} + +void pos_l_65 (long b) +{ + if (b >> 65 & 1) +g = b; +} + +void sizepos_l_70 (long b) +{ + if (b >> 70 & 1) +g = b; +} + +void pos_l_33 (long b) +{ + if (b >> 33 & 1) +g = b; +} + +void sizepos_l_38 (long b) +{ + if (b >> 38 & 1) +g = b; +} + +void pos_l_17 (long b) +{ + if (b >> 17 & 1) +g = b; +} + +void sizepos_l_22 (long b) +{ + if (b >> 22 & 1) +g = b; +} + +void pos_l_8 (long b) +{ + if (b >> 8 & 1) +g = b; +} + +void sizepos_l_13 (long b) +{ + if (b >> 13 & 1) +g = b; +} + +void pos_i_129 (int b) +{ + if (b >> 129 & 1) +g = b; +} + +void sizepos_i_134 (int b) +{ + if (b >> 134 & 1) +g = b; +} + +void pos_i_65 (int b) +{ + if (b >> 65 & 1) +g = b; +} + +void sizepos_i_70 (int b) +{ + if (b >> 70 & 1) +g = b; +} + +void pos_i_33 (int b) +{ + if (b >> 33 & 1) +g = b; +} + +void sizepos_i_38 (int b) +{ + if (b >> 38 & 1) +g = b; +} + +void pos_i_17 (int b)
Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)
On Mon, 7 Nov 2016, Richard Biener wrote: > On Mon, 7 Nov 2016, Richard Biener wrote: > > > On Fri, 4 Nov 2016, Jakub Jelinek wrote: > > > > > Hi! > > > > > > Just 2 nits: > > > > > > On Fri, Oct 28, 2016 at 01:46:57PM +0200, Richard Biener wrote: > > > > +/* Return a pointer to the Nth token in PARERs tokens_buf. */ > > > > > > PARSERs ? > > > > Fixed. > > > > > > @@ -454,7 +423,7 @@ c_lex_one_token (c_parser *parser, c_token *token) > > > > /* Return a pointer to the next token from PARSER, reading it in if > > > > necessary. */ > > > > > > > > -static inline c_token * > > > > +c_token * > > > > c_parser_peek_token (c_parser *parser) > > > > { > > > >if (parser->tokens_avail == 0) > > > > > > I wonder if turning all of these into non-inlines is a good idea. > > > Can't you move them to the common header instead? > > > > The issue with moving is that I failed to export the definition of > > c_parser in c-parser.h due to gengtype putting vec > > handlers into gtype-c.h but not gtype-objc.h and thus objc bootstrap > > fails :/ > > If anybody wants to try, f82dc04b921a52a9a5c90d957a824e1c2d04 > has it (objc build) still broken on the gimplefe git branch. Ok, it seems I found a solution: diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h index d178254..765acbb 100644 --- a/gcc/c/c-parser.h +++ b/gcc/c/c-parser.h @@ -129,7 +129,7 @@ struct GTY(()) c_parser { /* Buffer to hold all the tokens from parsing the vector attribute for the SIMD-enabled functions (formerly known as elemental functions). */ - vec *cilk_simd_fn_tokens; + vec * GTY((skip)) cilk_simd_fn_tokens; }; /* Possibly kinds of declarator to parse. */ I believe this is safe as cilk_simd_fn_tokens is state that is local to parsing a single omp/cilk construct (not sure why that state was added to c_parser in the first place - I guess it was convenient). I'll push back c_parser to the header and put inlines I need to export there as well. Joseph, is this (with regard to the inlines) your preference as well? I'm meanwhile testing the following on trunk, ok if it passes there? Thanks, Richard. 2016-11-09 Richard Biener c/ * c-parser.c (c_parser): Mark cilk_simd_fn_tokens GTY((skip)). Index: gcc/c/c-parser.c === --- gcc/c/c-parser.c(revision 241991) +++ gcc/c/c-parser.c(working copy) @@ -256,7 +256,7 @@ struct GTY(()) c_parser { /* Buffer to hold all the tokens from parsing the vector attribute for the SIMD-enabled functions (formerly known as elemental functions). */ - vec *cilk_simd_fn_tokens; + vec * GTY((skip)) cilk_simd_fn_tokens; };
Re: [Patch, Fortran, committed] PR 71894: [OOP] ICE in gfc_add_component_ref, at fortran/class.c:227
Hi Janus, may I ask you to attach also the "obvious" patches to the mail you send to the list? It is far more comfortable to take a look at the patch in the mail than open the webpage. Furthermore is it considered polite to attach the patches even when they are obvious. This is not to treat you badly, but to ensure a certain quality of gfortran. When a diff is attached I look at it, but I will not open the link, not when I am only on a mobile. Thank you in advance, Andre On Wed, 9 Nov 2016 10:35:10 +0100 Janus Weil wrote: > Hi all, > > I have committed to trunk another obvious patch to fix an ICE on invalid code: > > https://gcc.gnu.org/viewcvs?rev=241993&root=gcc&view=rev > > Cheers, > Janus -- Andre Vehreschild * Email: vehre ad gmx dot de
[PATCH 0/6][ARM] Implement support for ACLE Coprocessor Intrinsics
Hello, This patch series aims to implement support for the ACLE coprocessor intrinsics. To do so I first refactor the NEON builtin framework to accommodate the usage of this framework to implement other builtins. I then port the existing CRC builtins to this framework to reduce code-duplication and as an exercise for porting the builtins. The last four patches implement the new ACLE Coprocessor intrinsics. This work also facilitates the implementation of ACLE builtins in the future. Tested the series by bootstrapping arm-none-linux-gnuabihf and found no regressions, also did a normal build for arm-none-eabi and ran the acle.exp tests for a Cortex-M3. (6) Andre Vieira Refactor NEON builtin framework to work for other builtins Move CRC builtins to refactored framework Implement support for ACLE Coprocessor CDP intrinsics Implement support for ACLE Coprocessor LDC and STC intrinsics Implement support for ACLE Coprocessor MCR and MRC intrinsics Implement support for ACLE Coprocessor MCRR and MRRC intrinsics
Re: [PATCH] combine: Do not call simplify from inside change_zero_ext (PR78232)
On Wed, Nov 09, 2016 at 10:28:37AM +0100, Eric Botcazou wrote: > > You need more than that, e.g. reg_nonzero_bits_for_combine looks at the > > nonzero_bits field no matter what else. > > Right, this one is sticky, although you can invalidate it globally by means > of > nonzero_sign_valid AFAICS. > > > I didn't have the heart to look into this testcase breakage; the > > save/restore thing would make this already way over-complex code a lot > > more complex still, so I made a safer change instead. > > I agree that the save/restore approach is probably overkill but, on the other > hand, your change only papers over the issue so a big ??? comment is in order > if it goes in. I'd still think that we should try to invalidate if possible. I think the better solution would be not to reuse i2dest at all, actually create a new register instead (we do have machinery for handling that). I'll put the ??? somewhere there. Thanks for looking, Segher
[PATCH 1/6][ARM] Refactor NEON builtin framework to work for other builtins
Hi, Refactor NEON builtin framework such that it can be used to implement other builtins. Is this OK for trunk? Regards, Andre gcc/ChangeLog 2016-11-09 Andre Vieira * config/arm/arm-builtins.c (neon_builtin_datum): Rename to .. (arm_builtin_datum): ... this. (arm_init_neon_builtin): Rename to ... (arm_init_builtin): ... this. Add a new parameters PREFIX and USE_SIG_IN_NAME. (arm_init_neon_builtins): Replace 'arm_init_neon_builtin' with 'arm_init_builtin'. Replace type 'neon_builtin_datum' with 'arm_builtin_datum'. (arm_init_vfp_builtins): Likewise. (builtin_arg): Rename enum's replacing 'NEON_ARG' with 'ARG_BUILTIN' and add a 'ARG_BUILTIN_NEON_MEMORY. (arm_expand_neon_args): Rename to ... (arm_expand_builtin_args): ... this. Rename builtin_arg enum values and differentiate between ARG_BUILTIN_MEMORY and ARG_BUILTIN_NEON_MEMORY. (arm_expand_neon_builtin_1): Rename to ... (arm_expand_builtin_1): ... this. Rename builtin_arg enum values, arm_expand_builtin_args and add bool parameter NEON. (arm_expand_neon_builtin): Use arm_expand_builtin_1. (arm_expand_vfp_builtin): Likewise. (NEON_MAX_BUILTIN_ARGS): Remove, it was unused. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index e73043db6db69fa64bb1e72cf71a36d7169062db..154dafc6bf2525165406ae07f7551a665b2bdee8 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -202,7 +202,7 @@ typedef struct { const enum insn_code code; unsigned int fcode; enum arm_type_qualifiers *qualifiers; -} neon_builtin_datum; +} arm_builtin_datum; #define CF(N,X) CODE_FOR_neon_##N##X @@ -242,7 +242,7 @@ typedef struct { VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \ VAR1 (T, N, L) -/* The NEON builtin data can be found in arm_neon_builtins.def and +/* The builtin data can be found in arm_neon_builtins.def, arm_vfp_builtins.def. The entries in arm_neon_builtins.def require TARGET_NEON to be true. The feature tests are checked when the builtins are expanded. @@ -252,14 +252,14 @@ typedef struct { would be specified after the assembler mnemonic, which usually refers to the last vector operand. The modes listed per instruction should be the same as those defined for that - instruction's pattern in neon.md. */ + instruction's pattern, for instance in neon.md. */ -static neon_builtin_datum vfp_builtin_data[] = +static arm_builtin_datum vfp_builtin_data[] = { #include "arm_vfp_builtins.def" }; -static neon_builtin_datum neon_builtin_data[] = +static arm_builtin_datum neon_builtin_data[] = { #include "arm_neon_builtins.def" }; @@ -915,11 +915,17 @@ arm_init_simd_builtin_scalar_types (void) "__builtin_neon_uti"); } -/* Set up a NEON builtin. */ +/* Set up a builtin. It will use information stored in the argument struct D to + derive the builtin's type signature and name. It will append the name in D + to the PREFIX passed and use these to create a builtin declaration that is + then stored in 'arm_builtin_decls' under index FCODE. This FCODE is also + written back to D for future use. If USE_SIG_IN_NAME is true the builtin's + name is appended with type signature information to distinguish between + signedness and poly. */ static void -arm_init_neon_builtin (unsigned int fcode, - neon_builtin_datum *d) +arm_init_builtin (unsigned int fcode, arm_builtin_datum *d, + const char * prefix, bool use_sig_in_name) { bool print_type_signature_p = false; char type_signature[SIMD_MAX_BUILTIN_ARGS] = { 0 }; @@ -1007,12 +1013,12 @@ arm_init_neon_builtin (unsigned int fcode, gcc_assert (ftype != NULL); - if (print_type_signature_p) -snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s_%s", - d->name, type_signature); + if (print_type_signature_p && use_sig_in_name) +snprintf (namebuf, sizeof (namebuf), "%s_%s_%s", + prefix, d->name, type_signature); else -snprintf (namebuf, sizeof (namebuf), "__builtin_neon_%s", - d->name); +snprintf (namebuf, sizeof (namebuf), "%s_%s", + prefix, d->name); fndecl = add_builtin_function (namebuf, ftype, fcode, BUILT_IN_MD, NULL, NULL_TREE); @@ -1048,8 +1054,8 @@ arm_init_neon_builtins (void) for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++, fcode++) { - neon_builtin_datum *d = &neon_builtin_data[i]; - arm_init_neon_builtin (fcode, d); + arm_builtin_datum *d = &neon_builtin_data[i]; + arm_init_builtin (fcode, d, "__builtin_neon", true); } } @@ -1062,8 +1068,8 @@ arm_init_vfp_builtins (void) for (i = 0; i < ARRAY_SIZE (vfp_builtin_data); i++, fcode++) { - neon_builtin_datum *d = &vfp_builtin_data[i]; - arm_init_neon_builtin (fcode, d); + arm_builtin_datum *d = &vfp_builtin_da
[PATCH 2/6][ARM] Move CRC builtins to refactored framework
Hi, This patch refactors the implementation of the ARM ACLE CRC builtins to use the builtin framework. Is this OK for trunk? Regards, Andre gcc/ChangeLog 2016-11-09 Andre Vieira * config/arm/arm-builtins.c (arm_unsigned_binop_qualifiers): New. (UBINOP_QUALIFIERS): New. (si_UP): Define. (acle_builtin_data): New. Change comment. (arm_builtins): Remove ARM_BUILTIN_CRC32B, ARM_BUILTIN_CRC32H, ARM_BUILTIN_CRC32W, ARM_BUILTIN_CRC32CB, ARM_BUILTIN_CRC32CH, ARM_BUILTIN_CRC32CW. Add ARM_BUILTIN_ACLE_BASE and include arm_acle_builtins.def. (ARM_BUILTIN_ACLE_PATTERN_START): Define. (arm_init_acle_builtins): New. (CRC32_BUILTIN): Remove. (bdesc_2arg): Remove entries for crc32b, crc32h, crc32w, crc32cb, crc32ch and crc32cw. (arm_init_crc32_builtins): Remove. (arm_init_builtins): Use arm_init_acle_builtins rather than arm_init_crc32_builtins. (arm_expand_acle_builtin): New. (arm_expand_builtin): Use 'arm_expand_acle_builtin'. * config/arm/arm_acle_builtins.def: New. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 154dafc6bf2525165406ae07f7551a665b2bdee8..2130a3004f17c47be6e42412c1ea30f3cff20573 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -157,6 +157,13 @@ arm_load1_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_none, qualifier_struct_load_store_lane_index }; #define LOAD1LANE_QUALIFIERS (arm_load1_lane_qualifiers) +/* unsigned T (unsigned T, unsigned T, unsigned T). */ +static enum arm_type_qualifiers +arm_unsigned_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned, + qualifier_unsigned }; +#define UBINOP_QUALIFIERS (arm_unsigned_binop_qualifiers) + /* The first argument (return type) of a store should be void type, which we represent with qualifier_void. Their first operand will be a DImode pointer to the location to store to, so we must use @@ -242,17 +249,16 @@ typedef struct { VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \ VAR1 (T, N, L) -/* The builtin data can be found in arm_neon_builtins.def, - arm_vfp_builtins.def. The entries in arm_neon_builtins.def require - TARGET_NEON to be true. The feature tests are checked when the - builtins are expanded. +/* The builtin data can be found in arm_neon_builtins.def, arm_vfp_builtins.def + and arm_acle_builtins.def. The entries in arm_neon_builtins.def require + TARGET_NEON to be true. The feature tests are checked when the builtins are + expanded. - The mode entries in the following table correspond to the "key" - type of the instruction variant, i.e. equivalent to that which - would be specified after the assembler mnemonic, which usually - refers to the last vector operand. The modes listed per - instruction should be the same as those defined for that - instruction's pattern, for instance in neon.md. */ + The mode entries in the following table correspond to the "key" type of the + instruction variant, i.e. equivalent to that which would be specified after + the assembler mnemonic for neon instructions, which usually refers to the + last vector operand. The modes listed per instruction should be the same as + those defined for that instruction's pattern, for instance in neon.md. */ static arm_builtin_datum vfp_builtin_data[] = { @@ -266,6 +272,15 @@ static arm_builtin_datum neon_builtin_data[] = #undef CF #undef VAR1 +#define VAR1(T, N, A) \ + {#N, UP (A), CODE_FOR_##N, 0, T##_QUALIFIERS}, + +static arm_builtin_datum acle_builtin_data[] = +{ +#include "arm_acle_builtins.def" +}; + +#undef VAR1 #define VAR1(T, N, X) \ ARM_BUILTIN_NEON_##N##X, @@ -518,13 +533,6 @@ enum arm_builtins ARM_BUILTIN_WMERGE, - ARM_BUILTIN_CRC32B, - ARM_BUILTIN_CRC32H, - ARM_BUILTIN_CRC32W, - ARM_BUILTIN_CRC32CB, - ARM_BUILTIN_CRC32CH, - ARM_BUILTIN_CRC32CW, - ARM_BUILTIN_GET_FPSCR, ARM_BUILTIN_SET_FPSCR, @@ -556,6 +564,14 @@ enum arm_builtins #include "arm_neon_builtins.def" +#undef VAR1 +#define VAR1(T, N, X) \ + ARM_BUILTIN_##N, + + ARM_BUILTIN_ACLE_BASE, + +#include "arm_acle_builtins.def" + ARM_BUILTIN_MAX }; @@ -565,6 +581,9 @@ enum arm_builtins #define ARM_BUILTIN_NEON_PATTERN_START \ (ARM_BUILTIN_NEON_BASE + 1) +#define ARM_BUILTIN_ACLE_PATTERN_START \ + (ARM_BUILTIN_ACLE_BASE + 1) + #undef CF #undef VAR1 #undef VAR2 @@ -1025,6 +1044,23 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum *d, arm_builtin_decls[fcode] = fndecl; } +/* Set up ACLE builtins, even builtins for instructions that are not + in the current target ISA to allow the user to compile particular modules + with different target specific options that differ from the command line + options. Such builtins will be rejected in arm_expand_builtin. */ + +static void +arm_init_acle_builtins (void) +{ + unsigned int i, fcode = ARM_BUILTIN_ACLE_PATTERN_START; + + for (i = 0; i < ARRAY_SIZE (acle_bu
[PATCH 3/6][ARM] Implement support for ACLE Coprocessor CDP intrinsics
Hi, This patch implements support for the ARM ACLE Coprocessor CDP intrinsics. See below a table mapping the intrinsics to their respective instructions: ++--+ | Intrinsic signature| Instruction pattern | ++--+ |void __arm_cdp(coproc, opc1, CRd, CRn, CRm, opc2) |CDP coproc, opc1, CRd, CRn, CRm, opc2 | ++--+ |void __arm_cdp2(coproc, opc1, CRd, CRn, CRm, opc2) |CDP2 coproc, opc1, CRd, CRn, CRm, opc2| ++--+ Note that any untyped variable in the intrinsic signature is required to be a compiler-time constant and has the type 'unsigned int'. We do some boundary checks for coproc:[0-15], opc1:[0-15], CR*:[0-31], opc2:[0-7]. If either of these requirements are not met a diagnostic is issued. I renamed neon_const_bounds in this patch, to arm_const_bounds, simply because it is also used in the Coprocessor intrinsics. It also requires the expansion of the builtin frame work such that it accepted 'void' modes and intrinsics with 6 arguments. I also changed acle.exp to run tests for multiple options, where all lto option sets are appended with -ffat-objects to allow for assembly scans. Is this OK for trunk? Regards, Andre gcc/ChangeLog: 2016-11-09 Andre Vieira * config/arm/arm.md (): New. * config/arm/arm.c (neon_const_bounds): Rename this ... (arm_const_bounds): ... this. (arm_coproc_builtin_available): New. * config/arm/arm-builtins.c (SIMD_MAX_BUILTIN_ARGS): Increase. (arm_type_qualifiers): Add 'qualifier_unsigned_immediate'. (CDP_QUALIFIERS): Define to... (arm_cdp_qualifiers): ... this. New. (void_UP): Define. (arm_expand_builtin_args): Add case for 6 arguments. * config/arm/arm-protos.h (neon_const_bounds): Rename this ... (arm_const_bounds): ... this. (arm_coproc_builtin_available): New. * config/arm/arm_acle.h (__arm_cdp): New. (__arm_cdp2): New. * config/arm/arm_acle_builtins.def (cdp): New. (cdp2): New. * config/arm/iterators.md (CDPI,CDP,cdp): New. * config/arm/neon.md: Rename all 'neon_const_bounds' to 'arm_const_bounds'. * config/arm/types.md (coproc): New. * config/arm/unspecs.md (VUNSPEC_CDP, VUNSPEC_CDP2): New. * gcc/doc/extend.texi (ACLE): Add a mention of Coprocessor intrinsics. gcc/testsuite/ChangeLog: 2016-11-09 Andre Vieira * gcc.target/arm/acle/acle.exp: Run tests for different options and make sure fat-lto-objects is used such that we can still do assemble scans. * gcc.target/arm/acle/cdp.c: New. * gcc.target/arm/acle/cdp2.c: New. * lib/target-supports.exp (check_effective_target_arm_coproc1_ok): New. (check_effective_target_arm_coproc1_ok_nocache): New. (check_effective_target_arm_coproc2_ok): New. (check_effective_target_arm_coproc2_ok_nocache): New. (check_effective_target_arm_coproc3_ok): New. (check_effective_target_arm_coproc3_ok_nocache): New. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 2130a3004f17c47be6e42412c1ea30f3cff20573..bdb8aad8658af089b4977373654bb2d2c0b5c653 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -38,7 +38,7 @@ #include "langhooks.h" #include "case-cfn-macros.h" -#define SIMD_MAX_BUILTIN_ARGS 5 +#define SIMD_MAX_BUILTIN_ARGS 7 enum arm_type_qualifiers { @@ -53,6 +53,7 @@ enum arm_type_qualifiers /* Used when expanding arguments if an operand could be an immediate. */ qualifier_immediate = 0x8, /* 1 << 3 */ + qualifier_unsigned_immediate = 0x9, qualifier_maybe_immediate = 0x10, /* 1 << 4 */ /* void foo (...). */ qualifier_void = 0x20, /* 1 << 5 */ @@ -164,6 +165,18 @@ arm_unsigned_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned }; #define UBINOP_QUALIFIERS (arm_unsigned_binop_qualifiers) +/* void (unsigned immediate, unsigned immediate, unsigned immediate, +unsigned immediate, unsigned immediate, unsigned immediate). */ +static enum arm_type_qualifiers +arm_cdp_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_unsigned_immediate, + qualifier_unsigned_immediate, + qualifier_unsigned_immediate, + qualifier_unsigned_immediate, + qualifier_unsigned_immediate, + qualifier_unsigned_immediate }; +#define CDP_QUALIFIERS \ + (arm_cdp_qualifiers) /* The first argument (return type) of a store should be void type, which we represent with qualifier_void. Their first operand will be a DImode pointer to the location to store to, so we must use @@ -200,6 +213,7 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define oi_UP OImode #define hf_UP HFmode #define si_UP SImode +#define void_UP VOIDmo
[PATCH 4/6][ARM] Implement support for ACLE Coprocessor LDC and STC intrinsics
Hi, This patch implements support for the ARM ACLE Coprocessor LDC and STC intrinsics. See below a table mapping the intrinsics to their respective instructions: ++--+ | Intrinsic signature| Instruction pattern | ++--+ |void __arm_ldc(coproc, CRd, const void* p) |LDC coproc, CRd, [...]| ++--+ |void __arm_ldcl(coproc, CRd, const void* p) |LDCL coproc, CRd, [...] | ++--+ |void __arm_ldc2(coproc, CRd, const void* p) |LDC2 coproc, CRd, [...] | ++--+ |void __arm_ldc2l(coproc, CRd, const void* p)|LDC2L coproc, CRd, [...] | ++--+ |void __arm_stc(coproc, CRd, void* p)|STC coproc, CRd, [...]| ++--+ |void __arm_stcl(coproc, CRd, void* p) |STCL coproc, CRd, [...] | ++--+ |void __arm_stc2(coproc, CRd, void* p) |STC2 coproc, CRd, [...] | ++--+ |void __arm_stc2l(coproc, CRd, void* p) |STC2L coproc, CRd, [...] | ++--+ Note that any untyped variable in the intrinsic signature is required to be a compiler-time constant and has the type 'unsigned int'. We do some boundary checks for coproc:[0-15], CR*:[0-31]. If either of these requirements are not met a diagnostic is issued. Is this ok for trunk? Regards, Andre gcc/ChangeLog: 2016-11-09 Andre Vieira * config/arm/arm.md (*ldcstc): New. (): New. * config/arm/arm.c (arm_coproc_builtin_available): Add support for ldc,ldcl,stc,stcl,ldc2,ldc2l,stc2 and stc2l. (arm_coproc_ldc_stc_legitimate_address): New. * config/arm/arm-builtins.c (arm_type_qualifiers): Add 'qualifier_const_pointer'. (LDC_QUALIFIERS): Define to... (arm_ldc_qualifiers): ... this. New. (STC_QUALIFIERS): Define to... (arm_stc_qualifiers): ... this. New. * config/arm/arm-protos.h (arm_coproc_ldc_stc_legitimate_address): New. * config/arm/arm_acle.h (__arm_ldc, __arm_ldcl, __arm_stc, __arm_stcl, __arm_ldc2, __arm_ldc2l, __arm_stc2, __arm_stc2l): New. * config/arm/arm_acle_builtins.def (ldc, ldc2, ldcl, ldc2l, stc, stc2, stcl, stc2l): New. * config/arm/constraints.md (Uz): New. * config/arm/iterators.md (LDCSTCI, ldcstc, LDCSTC): New. * config/arm/unspecs.md (VUNSPEC_LDC, VUNSPEC_LDC2, VUNSPEC_LDCL, VUNSPEC_LDC2L, VUNSPEC_STC, VUNSPEC_STC2, VUNSPEC_STCL, VUNSPEC_STC2L): New. gcc/testsuite/ChangeLog: 2016-11-09 Andre Vieira * gcc.target/arm/acle/ldc: New. * gcc.target/arm/acle/ldc2: New. * gcc.target/arm/acle/ldcl: New. * gcc.target/arm/acle/ldc2l: New. * gcc.target/arm/acle/stc: New. * gcc.target/arm/acle/stc2: New. * gcc.target/arm/acle/stcl: New. * gcc.target/arm/acle/stc2l: New. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index bdb8aad8658af089b4977373654bb2d2c0b5c653..e6e58cd6a656732e37c6bca23ad980eea522a710 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -50,6 +50,8 @@ enum arm_type_qualifiers qualifier_const = 0x2, /* 1 << 1 */ /* T *foo. */ qualifier_pointer = 0x4, /* 1 << 2 */ + /* const T * foo */ + qualifier_const_pointer = 0x6, /* Used when expanding arguments if an operand could be an immediate. */ qualifier_immediate = 0x8, /* 1 << 3 */ @@ -177,6 +179,23 @@ arm_cdp_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned_immediate }; #define CDP_QUALIFIERS \ (arm_cdp_qualifiers) + +/* void (unsigned immediate, unsigned immediate, const void *). */ +static enum arm_type_qualifiers +arm_ldc_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_unsigned_immediate, + qualifier_unsigned_immediate, qualifier_const_pointer }; +#define LDC_QUALIFIERS \ + (arm_ldc_qualifiers) + +/* void (unsigned immediate, unsigned immediate, void *). */ +static enum arm_type_qualifiers +arm_stc_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_unsigned_immediate, + qualifier_unsigned_immediate, qualifier_pointer }; +#define STC_QUALIFIERS \ + (arm_stc_qualifiers) + /* The first argument (retur
[PATCH 5/6][ARM] Implement support for ACLE Coprocessor MCR and MRC intrinsics
Hi, This patch implements support for the ARM ACLE Coprocessor MCR and MRC intrinsics. See below a table mapping the intrinsics to their respective instructions: +---+---+ | Intrinsic signature | Instruction pattern | +---+---+ |void __arm_mcr(coproc, opc1, uint32_t value, CRn, CRm, opc2) | MCR coproc, opc1, Rt, CRn, CRm, opc2 | +---+---+ |void __arm_mcr2(coproc, opc1, uint32_t value, CRn, CRm, opc2) | MCR2 coproc, opc1, Rt, CRn, CRm, opc2 | +---+---+ |uint32_t __arm_mrc(coproc, opc1, CRn, CRm, opc2) | MRC coproc, opc1, Rt, CRn, CRm, opc2 | +---+---+ |uint32_t __arm_mrc2(coproc, opc1, CRn, CRm, opc2) | MRC2 coproc, opc1, Rt, CRn, CRm, opc2 | +---+---+ Note that any untyped variable in the intrinsic signature is required to be a compiler-time constant and has the type 'unsigned int'. We do some boundary checks for coproc:[0-15], opc1[0-7] CR*:[0-31],opc2:[0-7]. If either of these requirements are not met a diagnostic is issued. Is this OK for trunk? Regards, Andre gcc/ChangeLog: 2016-11-09 Andre Vieira * config/arm/arm.md (): New. (): New. * config/arm/arm.c (arm_coproc_builtin_available): Add support for mcr, mrc, mcr2 and mrc2. * config/arm/arm-builtins.c (MCR_QUALIFIERS): Define to... (arm_mcr_qualifiers): ... this. New. (MRC_QUALIFIERS): Define to ... (arm_mrc_qualifiers): ... this. New. (MCR_QUALIFIERS): Define to ... (arm_mcr_qualifiers): ... this. New. * config/arm/arm_acle.h (__arm_mcr, __arm_mrc, __arm_mcr2, __arm_mrc2): New. * config/arm/arm_acle_builtins.def (mcr, mcr2, mrc, mrc2): New. * config/arm/iterators.md (MCRI, mcr, MCR, MRCI, mrc, MRC): New. * config/arm/unspecs.md (VUNSPEC_MCR, VUNSPEC_MCR2, VUNSPEC_MRC, VUNSPEC_MRC2): New. gcc/ChangeLog: 2016-11-09 Andre Vieira * gcc.target/arm/acle/mcr.c: New. * gcc.target/arm/acle/mrc.c: New. * gcc.target/arm/acle/mcr2.c: New. * gcc.target/arm/acle/mrc2.c: New. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index e6e58cd6a656732e37c6bca23ad980eea522a710..44f255356dcc3ea6b8f554ba96f99fd7856bf6a1 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -196,6 +196,26 @@ arm_stc_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define STC_QUALIFIERS \ (arm_stc_qualifiers) +/* void (unsigned immediate, unsigned immediate, T, unsigned immediate, +unsigned immediate, unsigned immediate). */ +static enum arm_type_qualifiers +arm_mcr_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_unsigned_immediate, + qualifier_unsigned_immediate, qualifier_none, + qualifier_unsigned_immediate, qualifier_unsigned_immediate, + qualifier_unsigned_immediate }; +#define MCR_QUALIFIERS \ + (arm_mcr_qualifiers) + +/* T (unsigned immediate, unsigned immediate, unsigned immediate, + unsigned immediate, unsigned immediate). */ +static enum arm_type_qualifiers +arm_mrc_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_unsigned_immediate, + qualifier_unsigned_immediate, qualifier_unsigned_immediate, + qualifier_unsigned_immediate, qualifier_unsigned_immediate }; +#define MRC_QUALIFIERS \ + (arm_mrc_qualifiers) /* The first argument (return type) of a store should be void type, which we represent with qualifier_void. Their first operand will be a DImode pointer to the location to store to, so we must use diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index d3191abee163d5fbf753fb5072be50fdb2b4c785..25439a343e8540c5fca5cbe19e8b76e2fdb97a73 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -30803,6 +30803,8 @@ bool arm_coproc_builtin_available (enum unspecv builtin) case VUNSPEC_LDCL: case VUNSPEC_STC: case VUNSPEC_STCL: + case VUNSPEC_MCR: + case VUNSPEC_MRC: if (arm_arch4) return true; break; @@ -30811,6 +30813,8 @@ bool arm_coproc_builtin_available (enum unspecv builtin) case VUNSPEC_LDC2L: case VUNSPEC_STC2: case VUNSPEC_STC2L: + case VUNSPEC_MCR2: + case VUNSPEC_MRC2: /* Only present in ARMv5*, ARMv6 (but not ARMv6-M), ARMv7* and ARMv8-{A,M}. */ if (arm_arch5) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 45cdd576fa5b7f26e2d1cfa2c86a8c876
[PATCH 6/6][ARM] Implement support for ACLE Coprocessor MCRR and MRRC intrinsics
Hi, This patch implements support for the ARM ACLE Coprocessor MCR and MRC intrinsics. See below a table mapping the intrinsics to their respective instructions: +---+---+ | Intrinsic signature | Instruction pattern | +---+---+ |void __arm_mcrr(coproc, opc1, uint64_t value, CRm) | MCRR coproc, opc1, Rt, Rt2, CRm | +---+---+ |void __arm_mcrr2(coproc, opc1, uint64_t value, CRm)| MCRR2 coproc, opc1, Rt, Rt2, CRm | +---+---+ |uint64_t __arm_mrrc(coproc, opc1, CRm) | MRRC coproc, opc1, Rt, Rt2, CRm | +---+---+ |uint64_t __arm_mrrc2(coproc, opc1, CRm)| MRRC2 coproc, opc1, Rt, Rt2, CRm | +---+---+ Note that any untyped variable in the intrinsic signature is required to be a compiler-time constant and has the type 'unsigned int'. We do some boundary checks for coproc:[0-15], opc1[0-7] CR*:[0-31]. If either of these requirements are not met a diagnostic is issued. I added a new arm_arch variable for ARMv5TE to use when deciding whether or not the MCRR and MRCC intrinsics are available. Is this OK for trunk? Regards, Andre gcc/ChangeLog: 2016-11-09 Andre Vieira * config/arm/arm.md (): New. (): New. * config/arm/arm.c (arm_arch5te): New. (arm_option_override): Set arm_arch5te. (arm_coproc_builtin_available): Add support for mcrr, mcrr2, mrrc and mrrc2. * config/arm/arm-builtins.c (MCRR_QUALIFIERS): Define to... (arm_mcrr_qualifiers): ... this. New. (MRRC_QUALIFIERS): Define to... (arm_mrrc_qualifiers): ... this. New. * config/arm/arm_acle.h (__arm_mcrr, __arm_mcrr2, __arm_mrrc, __arm_mrrc2): New. * config/arm/arm_acle_builtins.def (mcrr, mcrr2, mrrc, mrrc2): New. * config/arm/iterators.md (MCRRI, mcrr, MCRR): New. (MRRCI, mrrc, MRRC): New. * config/arm/unspecs.md (VUNSPEC_MCRR, VUNSPEC_MCRR2, VUNSPEC_MRRC, VUNSPEC_MRRC2): New. gcc/testsuite/ChangeLog: 2016-11-09 Andre Vieira * gcc.target/arm/acle/mcrr: New. * gcc.target/arm/acle/mcrr2: New. * gcc.target/arm/acle/mrrc: New. * gcc.target/arm/acle/mrrc2: New. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 44f255356dcc3ea6b8f554ba96f99fd7856bf6a1..ab641d6d5fb5f64b5f3317f461e13e5222150237 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -216,6 +216,24 @@ arm_mrc_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned_immediate, qualifier_unsigned_immediate }; #define MRC_QUALIFIERS \ (arm_mrc_qualifiers) + +/* void (unsigned immediate, unsigned immediate, T, unsigned immediate). */ +static enum arm_type_qualifiers +arm_mcrr_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_void, qualifier_unsigned_immediate, + qualifier_unsigned_immediate, qualifier_none, + qualifier_unsigned_immediate }; +#define MCRR_QUALIFIERS \ + (arm_mcrr_qualifiers) + +/* T (unsigned immediate, unsigned immediate, unsigned immediate). */ +static enum arm_type_qualifiers +arm_mrrc_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_unsigned_immediate, + qualifier_unsigned_immediate, qualifier_unsigned_immediate }; +#define MRRC_QUALIFIERS \ + (arm_mrrc_qualifiers) + /* The first argument (return type) of a store should be void type, which we represent with qualifier_void. Their first operand will be a DImode pointer to the location to store to, so we must use diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 25439a343e8540c5fca5cbe19e8b76e2fdb97a73..3f4e2a39580217b6564f047998681e5b8419e741 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -797,6 +797,9 @@ int arm_arch5 = 0; /* Nonzero if this chip supports the ARM Architecture 5E extensions. */ int arm_arch5e = 0; +/* Nonzero if this chip supports the ARM Architecture 5TE extensions. */ +int arm_arch5te = 0; + /* Nonzero if this chip supports the ARM Architecture 6 extensions. */ int arm_arch6 = 0; @@ -3231,6 +3234,7 @@ arm_option_override (void) arm_arch4t = arm_arch4 && (ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB)); arm_arch5 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH5); arm_arch5e = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH5E); + arm_arch5te = arm_arch5e && ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB); arm_arch6 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH6); arm_arch6k
Re: [PATCH] Convert character arrays to string csts
On Fri, Nov 4, 2016 at 2:33 PM, Martin Liška wrote: > On 11/03/2016 02:00 PM, Jan Hubicka wrote: >>> On 11/03/2016 01:12 PM, Martin Liška wrote: + tree init = DECL_INITIAL (decl); + if (init + && TREE_READONLY (decl) + && can_convert_ctor_to_string_cst (init)) +DECL_INITIAL (decl) = build_string_cst_from_ctor (init); >>> >>> I'd merge these two new functions since they're only ever called >>> together. We'd then have something like >>> >>> if (init && TREE_READONLY (decl)) >>> init = convert_ctor_to_string_cst (init); >>> if (init) >>> DECL_INITIAL (decl) = init; > > Done. > >>> >>> I'll defer to Jan on whether finalize_decl seems like a good place >>> to do this. >> >> I think finalize_decl may be bit too early because frontends may expects the >> ctors to be in a way they produced them. We only want to convert those >> arrays >> seen by middle-end so I would move the logic to varpool_node::analyze >> >> Otherwise the patch seems fine to me. >> >> Honza >>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c index 283bd1c..b2d1fd5 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c @@ -4,12 +4,15 @@ char *buffer1; char *buffer2; +const char global[] = {'a', 'b', 'c', 'd', '\0'}; + #define SIZE 1000 int main (void) { const char* const foo1 = "hello world"; + const char local[] = "abcd"; buffer1 = __builtin_malloc (SIZE); __builtin_strcpy (buffer1, foo1); @@ -45,6 +48,10 @@ main (void) __builtin_abort (); if (__builtin_memchr (foo1, null, 12) != foo1 + 11) __builtin_abort (); + if (__builtin_memchr (global, null, 5) == 0) +__builtin_abort (); + if (__builtin_memchr (local, null, 5) == 0) +__builtin_abort (); >>> >>> How is that a meaningful test? This seems to work even with an >>> unpatched gcc. I'd have expected something that shows a benefit for >>> doing this conversion, and maybe also a test that shows it isn't >>> done in cases where it's not allowed. > > It's meaningful as it scans that there's no __builtin_memchr in optimized > dump. > I'm adding new tests that does the opposite test. > >>> tree -build_string_literal (int len, const char *str) +build_string_literal (int len, const char *str, bool build_addr_expr) >>> >>> New arguments should be documented in the function comment. > > Yep, improved. > >>> +/* Return TRUE when CTOR can be converted to a string constant. */ >>> >>> "if", not "when". > > Done. > >>> + unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor); + FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value) +{ + if (key == NULL_TREE + || TREE_CODE (key) != INTEGER_CST + || !tree_fits_uhwi_p (value) + || !useless_type_conversion_p (TREE_TYPE (value), char_type_node)) + return false; >>> >>> Shouldn't all elements have the same type, or do you really have to >>> call useless_type_conversion in a loop? >>> + /* Allow zero character just at the end of a string. */ + if (integer_zerop (value)) + return idx == elements - 1; >>> >>> Don't you also have to explicitly check it's there? >>> >>> >>> Bernd > > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. I'm curious about the @@ -1495,7 +1495,8 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p) { if (!TREE_STATIC (decl)) { - DECL_INITIAL (decl) = NULL_TREE; + if (!TREE_READONLY (decl) || TREE_CODE (init) != STRING_CST) + DECL_INITIAL (decl) = NULL_TREE; init = build2 (INIT_EXPR, void_type_node, decl, init); gimplify_and_add (init, seq_p); ggc_free (init); change. Why keep DECL_INITIAL if you build a INIT_EXPR anyway? @@ -4438,6 +4439,19 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, break; } + /* Replace a ctor with a string constant with possible. */ + if (TREE_READONLY (object) + && VAR_P (object)) + { + tree string_ctor = convert_ctor_to_string_cst (ctor); + if (string_ctor) + { + TREE_OPERAND (*expr_p, 1) = string_ctor; + DECL_INITIAL (object) = string_ctor; + break; + } + } + /* Fetch information about the constructor to direct later processing. We might want to make static versions of it in various cases, and can only do so if it known to be a valid constant initializer. */ hmm, so both these hunks will end up keeping a DECL_INITIAL for non-static local consts? Usually we end up
Re: [Patch, Fortran, committed] PR 71894: [OOP] ICE in gfc_add_component_ref, at fortran/class.c:227
Hi Andre, sorry, didn't want to be impolite. I was just not aware that it makes a big difference whether I attach the patch or post the link. Patch below. Cheers, Janus Index: gcc/fortran/class.c === --- gcc/fortran/class.c(Revision 241992) +++ gcc/fortran/class.c(Arbeitskopie) @@ -224,7 +224,8 @@ gfc_add_component_ref (gfc_expr *e, const char *na break; tail = &((*tail)->next); } - if (derived->components->next->ts.type == BT_DERIVED && + if (derived->components && derived->components->next && + derived->components->next->ts.type == BT_DERIVED && derived->components->next->ts.u.derived == NULL) { /* Fix up missing vtype. */ 2016-11-09 10:54 GMT+01:00 Andre Vehreschild : > Hi Janus, > > may I ask you to attach also the "obvious" patches to the mail you send to the > list? It is far more comfortable to take a look at the patch in the mail than > open the webpage. Furthermore is it considered polite to attach the patches > even when they are obvious. This is not to treat you badly, but to ensure a > certain quality of gfortran. When a diff is attached I look at it, but I will > not open the link, not when I am only on a mobile. > > Thank you in advance, > Andre > > On Wed, 9 Nov 2016 10:35:10 +0100 > Janus Weil wrote: > >> Hi all, >> >> I have committed to trunk another obvious patch to fix an ICE on invalid >> code: >> >> https://gcc.gnu.org/viewcvs?rev=241993&root=gcc&view=rev >> >> Cheers, >> Janus > > > -- > Andre Vehreschild * Email: vehre ad gmx dot de
Re: [PATCH 7/7, GCC, ARM, V8M] Added support for ARMV8-M Security Extension cmse_nonsecure_caller intrinsic
Hi Andre, On 25/10/16 17:30, Andre Vieira (lists) wrote: On 24/08/16 12:01, Andre Vieira (lists) wrote: On 25/07/16 14:28, Andre Vieira (lists) wrote: This patch adds support ARMv8-M's Security Extension's cmse_nonsecure_caller intrinsic. This intrinsic is used to check whether an entry function was called from a non-secure state. See Section 5.4.3 of ARM®v8-M Security Extensions: Requirements on Development Tools (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html) for further details. The FIXME in config/arm/arm_cmse.h is for a diagnostic message that is suggested in the ARMv8-M Security Extensions document mentioned above, to diagnose the use of the cmse_nonsecure_caller intrinsic outside of functions with the 'cmse_nonsecure_entry' attribute. Checking whether the intrinsic is called from within such functions can easily be done inside 'arm_expand_builtin'. However, making the warning point to the right location is more complicated. The ARMv8-M Security Extensions specification does mention that such a diagnostic might become mandatory, so I might have to pick this up later, otherwise it is left as a potential extra feature. *** gcc/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * config/arm/arm-builtins.c (arm_builtins): Define ARM_BUILTIN_CMSE_NONSECURE_CALLER. (bdesc_2arg): Add line for cmse_nonsecure_caller. (arm_expand_builtin): Handle cmse_nonsecure_caller. * config/arm/arm_cmse.h (cmse_nonsecure_caller): New. *** gcc/testsuite/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse-1.c: Add test for cmse_nonsecure_caller. Added more documentation as requested. --- This patch adds support ARMv8-M's Security Extension's cmse_nonsecure_caller intrinsic. This intrinsic is used to check whether an entry function was called from a non-secure state. See Section 5.4.3 of ARM®v8-M Security Extensions: Requirements on Development Tools (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html) for further details. The FIXME in config/arm/arm_cmse.h is for a diagnostic message that is suggested in the ARMv8-M Security Extensions document mentioned above, to diagnose the use of the cmse_nonsecure_caller intrinsic outside of functions with the 'cmse_nonsecure_entry' attribute. Checking whether the intrinsic is called from within such functions can easily be done inside 'arm_expand_builtin'. However, making the warning point to the right location is more complicated. The ARMv8-M Security Extensions specification does mention that such a diagnostic might become mandatory, so I might have to pick this up later, otherwise it is left as a potential extra feature. *** gcc/ChangeLog *** 2016-07-xx Andre Vieira Thomas Preud'homme * config/arm/arm-builtins.c (arm_builtins): Define ARM_BUILTIN_CMSE_NONSECURE_CALLER. (bdesc_2arg): Add line for cmse_nonsecure_caller. (arm_expand_builtin): Handle cmse_nonsecure_caller. * config/arm/arm_cmse.h (cmse_nonsecure_caller): New. * doc/extend.texi (ARM ARMv8-M Security Extensions): New intrinsic. *** gcc/testsuite/ChangeLog *** 2016-07-xx Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse-1.c: Add test for cmse_nonsecure_caller. Hi, Rebased previous patch on top of trunk as requested. No changes to ChangeLog. Cheers, Andre @@ -1832,6 +1834,17 @@ arm_init_builtins (void) = add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr, ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE); } + + if (arm_arch_cmse) +{ + tree ftype_cmse_nonsecure_caller + = build_function_ty Should this be use_cmse ? This looks ok to me otherwise. I believe patch [6/7] is the only one needing approval after this... Kyrill
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: > Hi All, > > I re-send all patches sent by Ilya earlier for review which support > vectorization of loop epilogues and loops with low trip count. We > assume that the only patch - vec-tails-07-combine-tail.patch - was not > approved by Jeff. > > I did re-base of all patches and performed bootstrapping and > regression testing that did not show any new failures. Also all > changes related to new vect_do_peeling algorithm have been changed > accordingly. > > Is it OK for trunk? Hi, I can't approve patches, but had some comments after going through the implementation. One confusing part is cost model change, as well as the way it's used to decide how epilogue loop should be vectorized. Given vect-tail is disabled at the moment and the cost change needs further tuning, is it reasonable to split this part out and get vectorization part reviewed/submitted independently? For example, let user specified parameters make the decision for now. Cost and target dependent changes should go in at last, this could make the patch easier to read. The implementation computes/shares quite amount information from main loop to epilogue loop vectorization. Furthermore, variables/fields for such information are somehow named in a misleading way. For example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the flag controlling whether epilogue loop should be vectorized with masking. However, it's actually controlled by exactly the same flag as whether epilogue loop should be combined into the main loop with masking: @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) slpeel_make_loop_iterate_ntimes (loop, niters_vector); + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) +vect_combine_loop_epilogue (loop_vinfo); + /* Reduce loop iterations by the vectorization factor. */ scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), expected_iterations / vf); IMHO, we should decouple main loop vectorization and epilogue vectorization as much as possible by sharing as few information as we can. The general idea is to handle epilogue loop just like normal short-trip loop. For example, we can rename LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something else), and we don't differentiate its meaning between main and epilogue(short-trip) loop. It only indicates the current loop should be vectorized with masking no matter it's a main loop or epilogue loop, and it works just like the current implementation. After this change, we can refine vectorization and make it more general for normal loop and epilogue(short trip) loop. For example, this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue loop and use it to control how it should be vectorized: + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) +{ + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; +} + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) + && min_profitable_combine_iters >= 0) +{ This works, but not that good for understanding or maintaining. Thanks, bin
Reject out-of-range bit pos in bit-fields insns operating on a register.
As seen by the testcase in PR77822, combine can generate out-of-range bit pos in a bit-field insn, unless the pattern explicitly rejects it. This only makes a difference for expressions that are undefined at runtime. Without that we would either generate bad assembler or ICE in output_btst. PR target/78254 * config/m68k/m68k.md: Reject out-of-range bit pos in bit-fields insns operating on a register. Index: gcc/config/m68k/m68k.md === --- gcc/config/m68k/m68k.md (revision 241995) +++ gcc/config/m68k/m68k.md (working copy) @@ -706,7 +706,7 @@ (minus:SI (const_int 31) (match_operand:SI 1 "general_operand" "di"))) (const_int 0)))] - "" + "!(CONST_INT_P (operands[1]) && !IN_RANGE (INTVAL (operands[1]), 0, 31))" { return output_btst (operands, operands[1], operands[0], insn, 31); }) @@ -765,9 +765,10 @@ (cc0) (compare (zero_extract:SI (match_operand:SI 0 "register_operand" "do") (const_int 1) - (match_operand:SI 1 "const_int_operand" "n")) + (match_operand:SI 1 "const_int_operand" "n")) (const_int 0)))] - "!TARGET_COLDFIRE" + "!TARGET_COLDFIRE + && !(REG_P (operands[0]) && !IN_RANGE (INTVAL (operands[1]), 0, 31))" { if (GET_CODE (operands[0]) == MEM) { @@ -790,7 +791,8 @@ (const_int 1) (match_operand:SI 1 "const_int_operand" "n")) (const_int 0)))] - "TARGET_COLDFIRE" + "TARGET_COLDFIRE + && !(REG_P (operands[0]) && !IN_RANGE (INTVAL (operands[1]), 0, 31))" { if (GET_CODE (operands[0]) == MEM) { @@ -5397,6 +5399,7 @@ (match_operand:SI 2 "const_int_operand" "n")) (match_operand:SI 3 "register_operand" "d"))] "TARGET_68020 && TARGET_BITFIELD + && IN_RANGE (INTVAL (operands[2]), 0, 31) && (INTVAL (operands[1]) == 8 || INTVAL (operands[1]) == 16) && INTVAL (operands[2]) % INTVAL (operands[1]) == 0" { @@ -5438,6 +5441,7 @@ (match_operand:SI 2 "const_int_operand" "n") (match_operand:SI 3 "const_int_operand" "n")))] "TARGET_68020 && TARGET_BITFIELD + && IN_RANGE (INTVAL (operands[3]), 0, 31) && (INTVAL (operands[2]) == 8 || INTVAL (operands[2]) == 16) && INTVAL (operands[3]) % INTVAL (operands[2]) == 0" { @@ -5480,6 +5484,7 @@ (match_operand:SI 2 "const_int_operand" "n") (match_operand:SI 3 "const_int_operand" "n")))] "TARGET_68020 && TARGET_BITFIELD + && IN_RANGE (INTVAL (operands[3]), 0, 31) && (INTVAL (operands[2]) == 8 || INTVAL (operands[2]) == 16) && INTVAL (operands[3]) % INTVAL (operands[2]) == 0" { @@ -5610,7 +5615,7 @@ (sign_extract:SI (match_operand:SI 1 "register_operand" "d") (match_operand:SI 2 "const_int_operand" "n") (match_operand:SI 3 "const_int_operand" "n")))] - "TARGET_68020 && TARGET_BITFIELD" + "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[3]), 0, 31)" "bfexts %1{%b3:%b2},%0") (define_insn "*extv_bfextu_reg" @@ -5618,7 +5623,7 @@ (zero_extract:SI (match_operand:SI 1 "register_operand" "d") (match_operand:SI 2 "const_int_operand" "n") (match_operand:SI 3 "const_int_operand" "n")))] - "TARGET_68020 && TARGET_BITFIELD" + "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[3]), 0, 31)" { if (GET_CODE (operands[2]) == CONST_INT) { @@ -5637,7 +5642,7 @@ (match_operand:SI 1 "const_int_operand" "n") (match_operand:SI 2 "const_int_operand" "n")) (const_int 0))] - "TARGET_68020 && TARGET_BITFIELD" + "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[2]), 0, 31)" { CC_STATUS_INIT; return "bfclr %0{%b2:%b1}"; @@ -5648,7 +5653,7 @@ (match_operand:SI 1 "const_int_operand" "n") (match_operand:SI 2 "const_int_operand" "n")) (const_int -1))] - "TARGET_68020 && TARGET_BITFIELD" + "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[2]), 0, 31)" { CC_STATUS_INIT; return "bfset %0{%b2:%b1}"; @@ -5659,7 +5664,7 @@ (match_operand:SI 1 "const_int_operand" "n") (match_operand:SI 2 "const_int_operand" "n")) (match_operand:SI 3 "register_operand" "d"))] - "TARGET_68020 && TARGET_BITFIELD" + "TARGET_68020 && TARGET_BITFIELD && IN_RANGE (INTVAL (operands[2]), 0, 31)" { #if 0 /* These special cases are now recognized by a specific pattern. */ @@ -5707,7 +5712,8 @@ (match_operand:SI 1 "const_int_operand" "n") (match_ope
Re: Prevent aliasing between arguments in calls to move_alloc
Hi Steve, Committed as r241995. Thanks. Paul On 8 November 2016 at 20:43, Steve Kargl wrote: > Yes. I saw Ian's analysis in c.l.f. It seems we both got > caught out on this one. The patch looks fine. > > -- > steve > > On Tue, Nov 08, 2016 at 08:26:37PM +0100, Paul Richard Thomas wrote: >> Hi Steve, >> >> I moved too quickly and caused a regression. See the link in the >> testcase. The attached fixes the problem and bootstraps/regtests. >> >> OK for trunk? >> >> Paul >> >> >> On 5 November 2016 at 16:17, Steve Kargl >> wrote: >> > On Sat, Nov 05, 2016 at 10:05:30AM +0100, Paul Richard Thomas wrote: >> >> >> >> Bootstraps and regtests on FC21/x86_64 - OK for trunk? >> > >> > OK with minor nit (see below). >> > >> >> >> >> + /* F2003 12.4.1.7 */ >> >> + if (to->expr_type == EXPR_VARIABLE && from->expr_type ==EXPR_VARIABLE >> > >> > Need a space after ==. >> > >> > -- >> > Steve >> >> >> >> -- >> The difference between genius and stupidity is; genius has its limits. >> >> Albert Einstein > >> Index: gcc/fortran/check.c >> === >> *** gcc/fortran/check.c (revision 241872) >> --- gcc/fortran/check.c (working copy) >> *** gfc_check_move_alloc (gfc_expr *from, gf >> *** 3343,3355 >> } >> >> /* F2003 12.4.1.7 */ >> ! if (to->expr_type == EXPR_VARIABLE && from->expr_type ==EXPR_VARIABLE >> && !strcmp (to->symtree->n.sym->name, from->symtree->n.sym->name)) >> { >> ! gfc_error ("The FROM and TO arguments at %L are either the same >> object " >> ! "or subobjects thereof and so violate aliasing restrictions " >> ! "(F2003 12.4.1.7)", &to->where); >> ! return false; >> } >> >> /* CLASS arguments: Make sure the vtab of from is present. */ >> --- 3343,3380 >> } >> >> /* F2003 12.4.1.7 */ >> ! if (to->expr_type == EXPR_VARIABLE && from->expr_type == EXPR_VARIABLE >> && !strcmp (to->symtree->n.sym->name, from->symtree->n.sym->name)) >> { >> ! gfc_ref *to_ref, *from_ref; >> ! to_ref = to->ref; >> ! from_ref = from->ref; >> ! bool aliasing = true; >> ! >> ! for (; from_ref && to_ref; >> !from_ref = from_ref->next, to_ref = to_ref->next) >> ! { >> ! if (to_ref->type != from->ref->type) >> ! aliasing = false; >> ! else if (to_ref->type == REF_ARRAY >> !&& to_ref->u.ar.type != AR_FULL >> !&& from_ref->u.ar.type != AR_FULL) >> ! /* Play safe; assume sections and elements are different. */ >> ! aliasing = false; >> ! else if (to_ref->type == REF_COMPONENT >> !&& to_ref->u.c.component != from_ref->u.c.component) >> ! aliasing = false; >> ! >> ! if (!aliasing) >> ! break; >> ! } >> ! >> ! if (aliasing) >> ! { >> ! gfc_error ("The FROM and TO arguments at %L violate aliasing " >> ! "restrictions (F2003 12.4.1.7)", &to->where); >> ! return false; >> ! } >> } >> >> /* CLASS arguments: Make sure the vtab of from is present. */ >> Index: gcc/testsuite/gfortran.dg/move_alloc_18.f90 >> === >> *** gcc/testsuite/gfortran.dg/move_alloc_18.f90 (revision 0) >> --- gcc/testsuite/gfortran.dg/move_alloc_18.f90 (working copy) >> *** >> *** 0 >> --- 1,21 >> + ! { dg-do compile } >> + ! >> + ! Test that the anti-aliasing restriction does not knock out valid code. >> + ! >> + ! Contributed by Andrew Balwin on >> + ! https://groups.google.com/forum/#!topic/comp.lang.fortran/oiXdl1LPb_s >> + ! >> + PROGRAM TEST >> + IMPLICIT NONE >> + >> + TYPE FOOBAR >> + INTEGER, ALLOCATABLE :: COMP(:) >> + END TYPE >> + >> + TYPE (FOOBAR) :: MY_ARRAY(6) >> + >> + ALLOCATE (MY_ARRAY(1)%COMP(10)) >> + >> + CALL MOVE_ALLOC (MY_ARRAY(1)%COMP, MY_ARRAY(2)%COMP) >> + >> + END PROGRAM TEST > > > -- > Steve -- The difference between genius and stupidity is; genius has its limits. Albert Einstein
[PATCH] Remove flag_evaluation_order (Java remains)
The Java FE was the only thing setting flag_evaluation_order to nonzero thus the following removes that flag and adjusts code checking it (a followup will remove the now unused arg from tree_swap_operands_p). Bootstrap / regtest in progress on x86_64-unknown-linux-gnu. Richard. 2016-11-09 Richard Biener * common.opt (flag_evaluation_order): Remove. * expr.c (expand_operands): Remove code guarded by flag_evaluation_order. * fold-const.c (reorder_operands_p): Remove, it always returns true. (negate_expr_p): Remove calls to reorder_operands_p. (fold_negate_expr): Likewise. (tree_swap_operands_p): Likewise. (fold_binary_loc): Likewise. Index: gcc/common.opt === --- gcc/common.opt (revision 241992) +++ gcc/common.opt (working copy) @@ -58,10 +58,6 @@ int flag_incremental_link = 0 Variable int flag_complex_method = 1 -; Nonzero if subexpressions must be evaluated from left-to-right. -Variable -int flag_evaluation_order = 0 - ; Language specific warning pass for unused results. Variable bool flag_warn_unused_result = false Index: gcc/expr.c === --- gcc/expr.c (revision 241992) +++ gcc/expr.c (working copy) @@ -7681,10 +7682,6 @@ expand_operands (tree exp0, tree exp1, r } else { - /* If we need to preserve evaluation order, copy exp0 into its own -temporary variable so that it can't be clobbered by exp1. */ - if (flag_evaluation_order && TREE_SIDE_EFFECTS (exp1)) - exp0 = save_expr (exp0); *op0 = expand_expr (exp0, target, VOIDmode, modifier); *op1 = expand_expr (exp1, NULL_RTX, VOIDmode, modifier); } Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 241992) +++ gcc/fold-const.c(working copy) @@ -133,7 +133,6 @@ static tree fold_binary_op_with_conditio tree, tree, tree, tree, int); static tree fold_div_compare (location_t, enum tree_code, tree, tree, tree); -static bool reorder_operands_p (const_tree, const_tree); static tree fold_negate_const (tree, tree); static tree fold_not_const (const_tree, tree); static tree fold_relational_const (enum tree_code, tree, tree, tree); @@ -435,9 +434,7 @@ negate_expr_p (tree t) && ! TYPE_OVERFLOW_WRAPS (type))) return false; /* -(A + B) -> (-B) - A. */ - if (negate_expr_p (TREE_OPERAND (t, 1)) - && reorder_operands_p (TREE_OPERAND (t, 0), -TREE_OPERAND (t, 1))) + if (negate_expr_p (TREE_OPERAND (t, 1))) return true; /* -(A + B) -> (-A) - B. */ return negate_expr_p (TREE_OPERAND (t, 0)); @@ -447,9 +444,7 @@ negate_expr_p (tree t) return !HONOR_SIGN_DEPENDENT_ROUNDING (element_mode (type)) && !HONOR_SIGNED_ZEROS (element_mode (type)) && (! INTEGRAL_TYPE_P (type) -|| TYPE_OVERFLOW_WRAPS (type)) -&& reorder_operands_p (TREE_OPERAND (t, 0), - TREE_OPERAND (t, 1)); +|| TYPE_OVERFLOW_WRAPS (type)); case MULT_EXPR: if (TYPE_UNSIGNED (type)) @@ -606,9 +601,7 @@ fold_negate_expr (location_t loc, tree t && !HONOR_SIGNED_ZEROS (element_mode (type))) { /* -(A + B) -> (-B) - A. */ - if (negate_expr_p (TREE_OPERAND (t, 1)) - && reorder_operands_p (TREE_OPERAND (t, 0), -TREE_OPERAND (t, 1))) + if (negate_expr_p (TREE_OPERAND (t, 1))) { tem = negate_expr (TREE_OPERAND (t, 1)); return fold_build2_loc (loc, MINUS_EXPR, type, @@ -628,8 +621,7 @@ fold_negate_expr (location_t loc, tree t case MINUS_EXPR: /* - (A - B) -> B - A */ if (!HONOR_SIGN_DEPENDENT_ROUNDING (element_mode (type)) - && !HONOR_SIGNED_ZEROS (element_mode (type)) - && reorder_operands_p (TREE_OPERAND (t, 0), TREE_OPERAND (t, 1))) + && !HONOR_SIGNED_ZEROS (element_mode (type))) return fold_build2_loc (loc, MINUS_EXPR, type, TREE_OPERAND (t, 1), TREE_OPERAND (t, 0)); break; @@ -6761,27 +6753,12 @@ fold_single_bit_test (location_t loc, en return NULL_TREE; } -/* Check whether we are allowed to reorder operands arg0 and arg1, - such that the evaluation of arg1 occurs before arg0. */ - -static bool -reorder_operands_p (const_tree arg0, const_tree arg1) -{ - if (! flag_evaluation_order) - return true; - if (TREE_CONSTANT (arg0) || TREE_CONSTANT (arg1)) -return true; - return ! TREE_SIDE_EFFECTS (arg0) -&& ! TREE_SIDE_EFFECTS (arg1); -} - /* Test whether it is preferable two swap two operands, ARG0 and ARG1,
[PATCH, GCC/testsuite] Fix PR78269: noexcept-type9.C should be a compile test
g++.dg/cpp1z/noexcept-type9.C contains a dg-error but is currently a runtime test. The intent was clearly to make it a compile test to test that error, especially since there is no main. This patch changes it to a compile test. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2016-11-09 Thomas Preud'homme PR testsuite/78269 * g++.dg/cpp1z/noexcept-type9.C: Make it a compile test. Applied as obvious. Best regards, Thomas diff --git a/gcc/testsuite/g++.dg/cpp1z/noexcept-type9.C b/gcc/testsuite/g++.dg/cpp1z/noexcept-type9.C index 4547c4cc4d3c2bf3f7e049b4e6c1179b85982a62..a29618a3001266b003ab966daa80ca5e8d5020da 100644 --- a/gcc/testsuite/g++.dg/cpp1z/noexcept-type9.C +++ b/gcc/testsuite/g++.dg/cpp1z/noexcept-type9.C @@ -1,6 +1,6 @@ // Test for PMF template args. // { dg-options -std=c++1z } -// { dg-do run } +// { dg-do compile } struct A {
[PATCH] Remove redundant canonicalize_comparison from DOM
DOM currently canonicalizes GIMPLE_COND operand order. But it does so before constant/copy propagation into that stmt (so quite pointless) and furthermore we fold the stmt if we constant/copy propagated into it and fold already does the canonicalization. Thus, removed. Bootstrap & regtest pending on x86_64-unknown-linux-gnu. Richard. 2016-11-09 Richard Biener * tree-ssa-dom.c (canonicalize_comparison): Remove. (optimize_stmt): Remove redundant pre-propagation canonicalization of comparison operand order. Index: gcc/tree-ssa-dom.c === --- gcc/tree-ssa-dom.c (revision 241992) +++ gcc/tree-ssa-dom.c (working copy) @@ -753,49 +753,6 @@ make_pass_dominator (gcc::context *ctxt) } -/* Given a conditional statement CONDSTMT, convert the - condition to a canonical form. */ - -static void -canonicalize_comparison (gcond *condstmt) -{ - tree op0; - tree op1; - enum tree_code code; - - gcc_assert (gimple_code (condstmt) == GIMPLE_COND); - - op0 = gimple_cond_lhs (condstmt); - op1 = gimple_cond_rhs (condstmt); - - code = gimple_cond_code (condstmt); - - /* If it would be profitable to swap the operands, then do so to - canonicalize the statement, enabling better optimization. - - By placing canonicalization of such expressions here we - transparently keep statements in canonical form, even - when the statement is modified. */ - if (tree_swap_operands_p (op0, op1, false)) -{ - /* For relationals we need to swap the operands -and change the code. */ - if (code == LT_EXPR - || code == GT_EXPR - || code == LE_EXPR - || code == GE_EXPR) - { - code = swap_tree_comparison (code); - - gimple_cond_set_code (condstmt, code); - gimple_cond_set_lhs (condstmt, op1); - gimple_cond_set_rhs (condstmt, op0); - - update_stmt (condstmt); - } -} -} - /* A trivial wrapper so that we can present the generic jump threading code with a simple API for simplifying statements. */ static tree @@ -1789,9 +1746,6 @@ optimize_stmt (basic_block bb, gimple_st print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); } - if (gimple_code (stmt) == GIMPLE_COND) -canonicalize_comparison (as_a (stmt)); - update_stmt_if_modified (stmt); opt_stats.num_stmts++;
Re: [PATCH, vec-tails] Support loop epilogue vectorization
Thanks Richard for your comments. Your proposed to handle epilogue loop just like normal short-trip loop but this is not exactly truth since e.g. epilogue must not be peeled for alignment. It is not clear for me what are my next steps? Should I re-design the patch completely or simply decompose the whole patch to different parts? But it means that we must start review process from beginning but release is closed to its end. Note also that i work for Intel till the end of year and have not idea who will continue working on this project. Any help will be appreciated. Thanks. Yuri. 2016-11-09 13:37 GMT+03:00 Bin.Cheng : > On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: >> Hi All, >> >> I re-send all patches sent by Ilya earlier for review which support >> vectorization of loop epilogues and loops with low trip count. We >> assume that the only patch - vec-tails-07-combine-tail.patch - was not >> approved by Jeff. >> >> I did re-base of all patches and performed bootstrapping and >> regression testing that did not show any new failures. Also all >> changes related to new vect_do_peeling algorithm have been changed >> accordingly. >> >> Is it OK for trunk? > > Hi, > I can't approve patches, but had some comments after going through the > implementation. > > One confusing part is cost model change, as well as the way it's used > to decide how epilogue loop should be vectorized. Given vect-tail is > disabled at the moment and the cost change needs further tuning, is it > reasonable to split this part out and get vectorization part > reviewed/submitted independently? For example, let user specified > parameters make the decision for now. Cost and target dependent > changes should go in at last, this could make the patch easier to > read. > > The implementation computes/shares quite amount information from main > loop to epilogue loop vectorization. Furthermore, variables/fields > for such information are somehow named in a misleading way. For > example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the > flag controlling whether epilogue loop should be vectorized with > masking. However, it's actually controlled by exactly the same flag > as whether epilogue loop should be combined into the main loop with > masking: > @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) > >slpeel_make_loop_iterate_ntimes (loop, niters_vector); > > + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) > +vect_combine_loop_epilogue (loop_vinfo); > + >/* Reduce loop iterations by the vectorization factor. */ >scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), >expected_iterations / vf); > > IMHO, we should decouple main loop vectorization and epilogue > vectorization as much as possible by sharing as few information as we > can. The general idea is to handle epilogue loop just like normal > short-trip loop. For example, we can rename > LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something > else), and we don't differentiate its meaning between main and > epilogue(short-trip) loop. It only indicates the current loop should > be vectorized with masking no matter it's a main loop or epilogue > loop, and it works just like the current implementation. > > After this change, we can refine vectorization and make it more > general for normal loop and epilogue(short trip) loop. For example, > this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue > loop and use it to control how it should be vectorized: > + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) > +{ > + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; > + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; > +} > + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) > + && min_profitable_combine_iters >= 0) > +{ > > This works, but not that good for understanding or maintaining. > > Thanks, > bin
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev wrote: > Thanks Richard for your comments. > Your proposed to handle epilogue loop just like normal short-trip loop > but this is not exactly truth since e.g. epilogue must not be peeled > for alignment. Yes there must be some differences, my motivation is to minimize that so we don't need to specially check normal/epilogue loops at too many places. Of course it's just my feeling when going through the patch set, and could be wrong. Thanks, bin > > It is not clear for me what are my next steps? Should I re-design the > patch completely or simply decompose the whole patch to different > parts? But it means that we must start review process from beginning > but release is closed to its end. > Note also that i work for Intel till the end of year and have not idea > who will continue working on this project. > > Any help will be appreciated. > > Thanks. > Yuri. > > 2016-11-09 13:37 GMT+03:00 Bin.Cheng : >> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: >>> Hi All, >>> >>> I re-send all patches sent by Ilya earlier for review which support >>> vectorization of loop epilogues and loops with low trip count. We >>> assume that the only patch - vec-tails-07-combine-tail.patch - was not >>> approved by Jeff. >>> >>> I did re-base of all patches and performed bootstrapping and >>> regression testing that did not show any new failures. Also all >>> changes related to new vect_do_peeling algorithm have been changed >>> accordingly. >>> >>> Is it OK for trunk? >> >> Hi, >> I can't approve patches, but had some comments after going through the >> implementation. >> >> One confusing part is cost model change, as well as the way it's used >> to decide how epilogue loop should be vectorized. Given vect-tail is >> disabled at the moment and the cost change needs further tuning, is it >> reasonable to split this part out and get vectorization part >> reviewed/submitted independently? For example, let user specified >> parameters make the decision for now. Cost and target dependent >> changes should go in at last, this could make the patch easier to >> read. >> >> The implementation computes/shares quite amount information from main >> loop to epilogue loop vectorization. Furthermore, variables/fields >> for such information are somehow named in a misleading way. For >> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the >> flag controlling whether epilogue loop should be vectorized with >> masking. However, it's actually controlled by exactly the same flag >> as whether epilogue loop should be combined into the main loop with >> masking: >> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) >> >>slpeel_make_loop_iterate_ntimes (loop, niters_vector); >> >> + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) >> +vect_combine_loop_epilogue (loop_vinfo); >> + >>/* Reduce loop iterations by the vectorization factor. */ >>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), >>expected_iterations / vf); >> >> IMHO, we should decouple main loop vectorization and epilogue >> vectorization as much as possible by sharing as few information as we >> can. The general idea is to handle epilogue loop just like normal >> short-trip loop. For example, we can rename >> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something >> else), and we don't differentiate its meaning between main and >> epilogue(short-trip) loop. It only indicates the current loop should >> be vectorized with masking no matter it's a main loop or epilogue >> loop, and it works just like the current implementation. >> >> After this change, we can refine vectorization and make it more >> general for normal loop and epilogue(short trip) loop. For example, >> this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue >> loop and use it to control how it should be vectorized: >> + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) >> +{ >> + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; >> + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; >> +} >> + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) >> + && min_profitable_combine_iters >= 0) >> +{ >> >> This works, but not that good for understanding or maintaining. >> >> Thanks, >> bin
Re: [PATCH] DECL_RTL and DECL_RTL_INCOMING in RTL dumps
On 11/08/2016 07:03 PM, David Malcolm wrote: int __RTL("rtl-combine") f1 (int n) { (function "f1" (param "n" (DECL_RTL (reg/v:SI %1 [ n ]) ) ;; DECL_RTL The ;; DECL_RTL etc. comments seem somewhat redundant and add clutter. Please remove those. Also, why is the closing paren on its own line? That doesn't seem right. Later (not for this patch) I'd really like to see some logic not to add linebreaks before simple expressions, so that we'd have (DECL_RTL (reg:SI xyz)) on a single line. Bernd
Re: [PATCH, vec-tails] Support loop epilogue vectorization
I am familiar with SVE extension and understand that implemented approach might be not suitable for ARM. The main point is that only load/store instructions are masked but all other calculations are not (we did special conversion for reduction statements to implement merging predication semantic). For SVE peeling for niters is not required but it is not true for x86 - we must determine what vectorization scheme is more profitable: loop combining (the only essential for SVE) or separate epilogue vectorization using masking or less vectorization factor. So I'd like to have the full list of required changes to our implementation to try to remove them. Thanks. Yuri. 2016-11-09 14:46 GMT+03:00 Bin.Cheng : > On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev wrote: >> Thanks Richard for your comments. >> Your proposed to handle epilogue loop just like normal short-trip loop >> but this is not exactly truth since e.g. epilogue must not be peeled >> for alignment. > Yes there must be some differences, my motivation is to minimize that > so we don't need to specially check normal/epilogue loops at too many > places. > Of course it's just my feeling when going through the patch set, and > could be wrong. > > Thanks, > bin >> >> It is not clear for me what are my next steps? Should I re-design the >> patch completely or simply decompose the whole patch to different >> parts? But it means that we must start review process from beginning >> but release is closed to its end. >> Note also that i work for Intel till the end of year and have not idea >> who will continue working on this project. >> >> Any help will be appreciated. >> >> Thanks. >> Yuri. >> >> 2016-11-09 13:37 GMT+03:00 Bin.Cheng : >>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: Hi All, I re-send all patches sent by Ilya earlier for review which support vectorization of loop epilogues and loops with low trip count. We assume that the only patch - vec-tails-07-combine-tail.patch - was not approved by Jeff. I did re-base of all patches and performed bootstrapping and regression testing that did not show any new failures. Also all changes related to new vect_do_peeling algorithm have been changed accordingly. Is it OK for trunk? >>> >>> Hi, >>> I can't approve patches, but had some comments after going through the >>> implementation. >>> >>> One confusing part is cost model change, as well as the way it's used >>> to decide how epilogue loop should be vectorized. Given vect-tail is >>> disabled at the moment and the cost change needs further tuning, is it >>> reasonable to split this part out and get vectorization part >>> reviewed/submitted independently? For example, let user specified >>> parameters make the decision for now. Cost and target dependent >>> changes should go in at last, this could make the patch easier to >>> read. >>> >>> The implementation computes/shares quite amount information from main >>> loop to epilogue loop vectorization. Furthermore, variables/fields >>> for such information are somehow named in a misleading way. For >>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the >>> flag controlling whether epilogue loop should be vectorized with >>> masking. However, it's actually controlled by exactly the same flag >>> as whether epilogue loop should be combined into the main loop with >>> masking: >>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) >>> >>>slpeel_make_loop_iterate_ntimes (loop, niters_vector); >>> >>> + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) >>> +vect_combine_loop_epilogue (loop_vinfo); >>> + >>>/* Reduce loop iterations by the vectorization factor. */ >>>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), >>>expected_iterations / vf); >>> >>> IMHO, we should decouple main loop vectorization and epilogue >>> vectorization as much as possible by sharing as few information as we >>> can. The general idea is to handle epilogue loop just like normal >>> short-trip loop. For example, we can rename >>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something >>> else), and we don't differentiate its meaning between main and >>> epilogue(short-trip) loop. It only indicates the current loop should >>> be vectorized with masking no matter it's a main loop or epilogue >>> loop, and it works just like the current implementation. >>> >>> After this change, we can refine vectorization and make it more >>> general for normal loop and epilogue(short trip) loop. For example, >>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue >>> loop and use it to control how it should be vectorized: >>> + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) >>> +{ >>> + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; >>> + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; >>> +} >>> + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) >>> +
[WIP C++ PATCH] P0217R3 - C++17 structured bindings
Hi! The following patch is a WIP on P0217R3 - decomposition declarations. It contains various FIXMEs, Jason, do you think you could finish it up? The most important unfinished part in the patch is that cp_finish_decomp for classes doesn't try to check if std::tuple_size::value is well-formed integral constant expression and use decl.get() or get(decl) as initializers and std::tuple_element::type as type of the individual vars (that need cp_finish_decl then and dropping of DEC_HAS_VALUE_EXPR_P/DECL_VALUE_EXPR if they have any) - template-ids, lookups, instantiations isn't something I'm comfortable enough with to write that. Another thing is with the non-reference decompositions - I think int arr[2]; ... auto [ x, y ] = arr; works properly as copy-initialization, avoids explicit copy constructors, as I've tried to test in the decomp6.C, though the templates used during the VEC_INIT_EXPR gimplification aren't actually instantiated (see the fixme routine in the test that really shouldn't be necessary). But auto [ x, y ] { arr }; doesn't seem to work correctly, see the commented out part of decomp6.C, the cp-gimplify.c hunk is kind of hackish, but it seems to try to use A::A conversion ctor while I believe it should use the A::A(const A &) copy ctor. Another thing is in find_decomp_class_base - I think the current code will just reject if the same base class with any non-static data members appears as virtual base in more than one place in the bases tree, not really sure what should be done, how to check if the paths to that base are accessible etc. Another thing is in find_decomp_class_base caller, while that function can return some base class if the only non-static data members are in such a base and not in the derived class(es), the caller isn't prepared to expand that, not sure if we need build_base_path or what to actually generate the COMPONENT_REFs for the artificial FIELD_DECLs or what. Not sure about how the qualifiers from class fields and from the cv-qualifiers on the decomposition declaration should be treated, at the moment I'm oring them in. The match.pd hunk is needed, otherwise the generic folding happily folds int arr[2]; ... auto [ x, y ] = arr; &x == &arr[0] into 0, because it thinks x and arr are distinct VAR_DECLs. Though, if such comparisons are required to be folded in constexpr contexts under certain conditions, we'd need to handle the DECL_VALUE_EXPRs in constexpr.c somehow. Mangling of the decomposition declaration base decls at namespace scope isn't implemented; I think the DECL_ASSEMBLER_NAME for them (they have NULL DECL_NAME) could be done in cp_finish_decomp, where we have all the corresponding identifiers available, but it would probably need to be done through direct calls to the demangler, as the base decl isn't all that is needed for that. Per IRC discussions, the associated VAR_DECLs are chained through DECL_CHAIN. It isn't that easy though, because decls are pushed into the bindings in reverse order and afterwards nreversed. Plus at least for the (not yet implemented) std::tuple_size::value stuff where the VAR_DECLs shouldn't have DECL_VALUE_EXPR, the desired final order is that the nameless artificial base decl comes first and then the other decls in the order they appear. But that means that at cp_finish_decl time of the base decl they are actually in the order y, x, D.1234 for tha above auto [ x, y ] = arr; - so the finalizing of those is done by separate cp_finish_decomp that needs to know the start of the chain (the last named variable of the decomposition) plus the base artificial decl and count. For range for it uses the DECL_VALUE_EXPRs to find those, maybe it would be better to pass around during parsing not just address of a tree (the range decl), but address of a triplet (range decl, first decomp decl in the chain (i.e. last one) and count, so that range for parsing doesn't have to rediscover those. The base decl has the DECL_DECOMPOSITION_P flag on it set and NULL_TREE DECL_NAME, the instantiation code in the patch doesn't parse DECL_VALUE_EXPRs though, because if the decomp initializer is not type dependent, those DECL_VALUE_EXPRs aren't actually in a standardized form that could be easily parseable, so it just assumes those named decls have also DECL_DECOMPOSITION_P flag on and non-NULL DECL_NAME and follow the base decl in DECL_CHAIN (the new ones created with tsubst_expr in reverse order again). If nothing attempts to fold stuff in templates, perhaps we could avoid setting DECL_VALUE_EXPRs at all when processing_template_decl? --- gcc/match.pd.jj 2016-11-07 18:32:56.0 +0100 +++ gcc/match.pd2016-11-08 14:00:05.391773322 +0100 @@ -2547,8 +2547,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (with { int equal = 2; - if (decl_in_symtab_p (base0) - && decl_in_symtab_p (base1)) + /* Punt in GENERIC on variables with value expressions; + the value expressions might point to fields/eleme
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Wed, Nov 9, 2016 at 12:12 PM, Yuri Rumyantsev wrote: > I am familiar with SVE extension and understand that implemented > approach might be not suitable for ARM. The main point is that only > load/store instructions are masked but all other calculations are not > (we did special conversion for reduction statements to implement > merging predication semantic). For SVE peeling for niters is not > required but it is not true for x86 - we must determine what > vectorization scheme is more profitable: loop combining (the only > essential for SVE) or separate epilogue vectorization using masking or > less vectorization factor. So I'd like to have the full list of > required changes to our implementation to try to remove them. Hmm, sorry that my comment gave impression that I was trying to hold back the patch, it's not what I meant by any means. Also it's not related to SVE, As a matter of fact, I haven't read any document about SVE yet. Sorry again for the false impression conveyed by previous messages. Thanks, bin > > Thanks. > Yuri. > > 2016-11-09 14:46 GMT+03:00 Bin.Cheng : >> On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev wrote: >>> Thanks Richard for your comments. >>> Your proposed to handle epilogue loop just like normal short-trip loop >>> but this is not exactly truth since e.g. epilogue must not be peeled >>> for alignment. >> Yes there must be some differences, my motivation is to minimize that >> so we don't need to specially check normal/epilogue loops at too many >> places. >> Of course it's just my feeling when going through the patch set, and >> could be wrong. >> >> Thanks, >> bin >>> >>> It is not clear for me what are my next steps? Should I re-design the >>> patch completely or simply decompose the whole patch to different >>> parts? But it means that we must start review process from beginning >>> but release is closed to its end. >>> Note also that i work for Intel till the end of year and have not idea >>> who will continue working on this project. >>> >>> Any help will be appreciated. >>> >>> Thanks. >>> Yuri. >>> >>> 2016-11-09 13:37 GMT+03:00 Bin.Cheng : On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: > Hi All, > > I re-send all patches sent by Ilya earlier for review which support > vectorization of loop epilogues and loops with low trip count. We > assume that the only patch - vec-tails-07-combine-tail.patch - was not > approved by Jeff. > > I did re-base of all patches and performed bootstrapping and > regression testing that did not show any new failures. Also all > changes related to new vect_do_peeling algorithm have been changed > accordingly. > > Is it OK for trunk? Hi, I can't approve patches, but had some comments after going through the implementation. One confusing part is cost model change, as well as the way it's used to decide how epilogue loop should be vectorized. Given vect-tail is disabled at the moment and the cost change needs further tuning, is it reasonable to split this part out and get vectorization part reviewed/submitted independently? For example, let user specified parameters make the decision for now. Cost and target dependent changes should go in at last, this could make the patch easier to read. The implementation computes/shares quite amount information from main loop to epilogue loop vectorization. Furthermore, variables/fields for such information are somehow named in a misleading way. For example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the flag controlling whether epilogue loop should be vectorized with masking. However, it's actually controlled by exactly the same flag as whether epilogue loop should be combined into the main loop with masking: @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) slpeel_make_loop_iterate_ntimes (loop, niters_vector); + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) +vect_combine_loop_epilogue (loop_vinfo); + /* Reduce loop iterations by the vectorization factor. */ scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), expected_iterations / vf); IMHO, we should decouple main loop vectorization and epilogue vectorization as much as possible by sharing as few information as we can. The general idea is to handle epilogue loop just like normal short-trip loop. For example, we can rename LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something else), and we don't differentiate its meaning between main and epilogue(short-trip) loop. It only indicates the current loop should be vectorized with masking no matter it's a main loop or epilogue loop, and it works just like the current implementation. After this change, we can refine vectorization and
Re: [RFC] Check number of uses in simplify_cond_using_ranges().
On Tue, Nov 8, 2016 at 5:18 PM, Marc Glisse wrote: > On Tue, 8 Nov 2016, Dominik Vogt wrote: > >> On Fri, Nov 04, 2016 at 01:54:20PM +0100, Richard Biener wrote: >>> >>> On Fri, Nov 4, 2016 at 12:08 PM, Dominik Vogt >>> wrote: On Fri, Nov 04, 2016 at 09:47:26AM +0100, Richard Biener wrote: > > On Thu, Nov 3, 2016 at 4:03 PM, Dominik Vogt > wrote: >> >> Is VRP the right pass to do this optimisation or should a later >> pass rather attempt to eliminate the new use of b_5 instead? Uli >> has brought up the idea a mini "sign extend elimination" pass that >> checks if the result of a sign extend could be replaced by the >> original quantity in all places, and if so, eliminate the ssa >> name. (I guess that won't help with the above code because l is >> used also as a function argument.) How could a sensible approach >> to deal with the situation look like? > > > We run into this kind of situation regularly and for general foldings > in match.pd we settled with single_use () even though it is not > perfect. > Note the usual complaint is not extra extension instructions but > the increase of register pressure. > > This is because it is hard to do better when you are doing local > optimization. > > As for the question on whether VRP is the right pass to do this the > answer is two-fold -- VRP has the most precise range information. > But the folding itself should be moved to generic code and use > get_range_info (). All right, is this a sensible approach then? >>> >>> >>> Yes. >>> 1. Using has_single_use as in the experimental patch is good enough (provided further testing does not show serious regressions). >>> >>> >>> I'd approve that, yes. >>> 2. Rip the whole top level if-block from simplify_cond_using_ranges(). 3. Translate that code to match.pd syntax. >>> >>> >>> Might be some work but yes, that's also desired (you'd lose the ability >>> to emit the warnings though). >> >> >> Could you give me a match-pd-hint please? We now have something >> like this: >> >> (simplify >> (cond (gt SSA_NAME@0 INTEGER_CST@1) @2 @3) >> (if (... many conditions ...) >> (cond (gt ... ...) @2 @3)) >> >> The generated code ends up in gimple_simplify_COND_EXPR, but when >> gimple_simplify is actually called, it goes through the >> GIMPLE_COND case and calls gimple_resimplify2(..., GT, ...) and >> there it tries gimple_simplify_GT_EXPR(), peeling of the (cond >> ...), i.e. it never tries the generated code. > > > Not sure what you mean here. > >> There is another pattern in match.pd that uses a (cond ...) as the >> first operand, and I do not understand how this works. Should we >> just use "(gt SSA_NAME@0 INTEGER_CST@1)" as the first operand >> instead, and wouldn't this pattern be too general that way? > > > IIUC, you are trying to move the second half of simplify_cond_using_ranges > to match.pd. I don't see any reason to restrict it to the case where the > comparison result is used directly in a COND_EXPR, so that would look like: > > (for cmp (...) > (simplify > (cmp (convert SSA_NAME@0) INTEGER_CST@1) > (if (...) >(cmp @0 (convert @1) > > maybe? I think I may have missed your point. Yeah, if you'd use (cond (gt ... then it only matches in assignments with COND_EXPRs on the RHS, _not_ in GIMPLE_CONDs. So you ran into the (cond vs. GIMPLE_COND "mismatch". You'd essentially implement sth similar to shorten_compare in match.pd. Btw, moving to match.pd shouldn't be a blocker for adding proper single_use tests just in case you get lost ... Richard. > (and yes, the first half would give a very general (simplify (cmp SSA_NAME@0 > INTEGER_CST@1) ...), that doesn't seem so bad) > > -- > Marc Glisse
[Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression
Dear All, The title of this PR says what this is all about, except that it applies uniquely applicable to character function result string lengths. Ian Harvey wrote the first patch for this PR for which many thanks. However, two issues came up that took some little while to understand; (i) In comment #1, it was found that calls from sibling procedures could generate streams of undefined references at the link stage; and (ii) An entity of the same name as the contained procedure entity in module scope caused similar problems. The relationship with Ian's patch is still obvious. The fundamental difference is that the parameter arrays are automatically promoted to module scope using a unique symtree. This fixes both the issues above. Dominique, could you please check that the -m32 issue has gone away? Bootstrapped and regtested on FC21/x86_64 - OK for trunk? Paul 2016-11-09 Paul Thomas PR fortran/44265 * gfortran.h : Add fn_result_spec bitfield to gfc_symbol. * resolve.c (flag_fn_result_spec): New function. (resolve_fntype): Call it for character result lengths. * symbol.c (gfc_new_symbol): Set fn_result_spec to zero. * trans-decl.c (gfc_sym_mangled_identifier): Include the procedure name in the mangled name for symbols with the fn_result_spec bit set. (gfc_get_symbol_decl): Mangle the name of these symbols. (gfc_create_module_variable): Allow them through the assert. (gfc_generate_function_code): Remove the assert before the initialization of sym->tlink because the frontend no longer uses this field. * trans-expr.c (gfc_map_intrinsic_function): Add a case to treat the LEN_TRIM intrinsic. 2016-11-09 Paul Thomas PR fortran/44265 * gfortran.dg/char_result_14.f90: New test. * gfortran.dg/char_result_15.f90: New test. -- The difference between genius and stupidity is; genius has its limits. Albert Einstein Index: gcc/fortran/gfortran.h === *** gcc/fortran/gfortran.h (revision 241994) --- gcc/fortran/gfortran.h (working copy) *** typedef struct gfc_symbol *** 1498,1503 --- 1498,1505 unsigned equiv_built:1; /* Set if this variable is used as an index name in a FORALL. */ unsigned forall_index:1; + /* Set if the symbol is used in a function result specification . */ + unsigned fn_result_spec:1; /* Used to avoid multiple resolutions of a single symbol. */ unsigned resolved:1; /* Set if this is a module function or subroutine with the Index: gcc/fortran/resolve.c === *** gcc/fortran/resolve.c (revision 241994) --- gcc/fortran/resolve.c (working copy) *** resolve_equivalence (gfc_equiv *eq) *** 15732,15737 --- 15732,15776 } + /* Function called by resolve_fntype to flag other symbol used in the +length type parameter specification of function resuls. */ + + static bool + flag_fn_result_spec (gfc_expr *expr, + gfc_symbol *sym ATTRIBUTE_UNUSED, + int *f ATTRIBUTE_UNUSED) + { + gfc_namespace *ns; + gfc_symbol *s; + + if (expr->expr_type == EXPR_VARIABLE) + { + s = expr->symtree->n.sym; + for (ns = s->ns; ns; ns = ns->parent) + if (!ns->parent) + break; + + if (!s->fn_result_spec && s->ns->parent != NULL + && s->attr.flavor == FL_PARAMETER) + { + if (ns->proc_name && ns->proc_name->attr.flavor == FL_MODULE) + { + gfc_symtree *st; + s->fn_result_spec = 1; + /* Make sure that this symbol is translated as a module +variable. */ + st = gfc_get_unique_symtree (ns); + st->n.sym = s; + s->refs++; + } + else if (s->attr.use_assoc || s->attr.used_in_submodule) + s->fn_result_spec = 1; + } + } + return false; + } + + /* Resolve function and ENTRY types, issue diagnostics if needed. */ static void *** resolve_fntype (gfc_namespace *ns) *** 15782,15787 --- 15821,15829 el->sym->attr.untyped = 1; } } + + if (sym->ts.type == BT_CHARACTER) + gfc_traverse_expr (sym->ts.u.cl->length, NULL, flag_fn_result_spec, 0); } Index: gcc/fortran/symbol.c === *** gcc/fortran/symbol.c(revision 241994) --- gcc/fortran/symbol.c(working copy) *** gfc_new_symbol (const char *name, gfc_na *** 2933,2938 --- 2933,2939 p->common_block = NULL; p->f2k_derived = NULL; p->assoc = NULL; + p->fn_result_spec = 0; return p; } Index: gcc/fortran/trans-decl.c === *** gcc/fortran/trans-decl.c(revision 241994) --- gcc/fortran/trans-decl.c
[PING][PATCH v2][aarch64] Add mcpu flag for Qualcomm falkor core
Ping! Siddhesh On 4 November 2016 at 21:17, Siddhesh Poyarekar wrote: > This adds an mcpu option for the upcoming Qualcomm Falkor core. This > is identical to the qdf24xx part that was added earlier and hence > retains the same tuning structure and continues to have the a57 > pipeline for now. The part number has also been changed and this > patch fixes this for both qdf24xx and falkor options. > > Tested with aarch64 and armhf. > > Siddhesh > > * gcc/config/aarch64/aarch64-cores.def (qdf24xx): Update part > number. > (falkor): New core. > * gcc/config/aarch64/aarch64-tune.md: Regenerated. > * gcc/config/arm/arm-cores.def (falkor): New core. > * gcc/config/arm/arm-tables.opt: Regenerated. > * gcc/config/arm/arm-tune.md: Regenerated. > * gcc/config/arm/bpabi.h (BE8_LINK_SPEC): Add falkor support. > * gcc/config/arm/t-aprofile (MULTILIB_MATCHES): Add falkor > support. > * gcc/doc/invoke.texi (AArch64 Options/-mtune): Add falkor. > (ARM Options/-mtune): Add falkor. > > --- > gcc/config/aarch64/aarch64-cores.def | 3 ++- > gcc/config/aarch64/aarch64-tune.md | 2 +- > gcc/config/arm/arm-cores.def | 1 + > gcc/config/arm/arm-tables.opt| 3 +++ > gcc/config/arm/arm-tune.md | 2 +- > gcc/config/arm/bpabi.h | 2 ++ > gcc/config/arm/t-aprofile| 1 + > gcc/doc/invoke.texi | 9 + > 8 files changed, 16 insertions(+), 7 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-cores.def > b/gcc/config/aarch64/aarch64-cores.def > index f9b7552..4b00f3f 100644 > --- a/gcc/config/aarch64/aarch64-cores.def > +++ b/gcc/config/aarch64/aarch64-cores.def > @@ -54,7 +54,8 @@ AARCH64_CORE("cortex-a73", cortexa73, cortexa57, 8A, > AARCH64_FL_FOR_ARCH8 | AA > AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, 0x53, 0x001) > > /* Qualcomm ('Q') cores. */ > -AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0x800) > +AARCH64_CORE("falkor", falkor,cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0xC00) > +AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0xC00) > > /* Cavium ('C') cores. */ > AARCH64_CORE("thunderx",thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a1) > diff --git a/gcc/config/aarch64/aarch64-tune.md > b/gcc/config/aarch64/aarch64-tune.md > index 022b131..29afcdf 100644 > --- a/gcc/config/aarch64/aarch64-tune.md > +++ b/gcc/config/aarch64/aarch64-tune.md > @@ -1,5 +1,5 @@ > ;; -*- buffer-read-only: t -*- > ;; Generated automatically by gentune.sh from aarch64-cores.def > (define_attr "tune" > - > "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53" > + > "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,falkor,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53" > (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) > diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def > index 2072e1e..1bfec9c 100644 > --- a/gcc/config/arm/arm-cores.def > +++ b/gcc/config/arm/arm-cores.def > @@ -173,6 +173,7 @@ ARM_CORE("cortex-a57", cortexa57, cortexa57, 8A, > ARM_FSET_MAKE_CPU1 (FL_LDSCHED > ARM_CORE("cortex-a72", cortexa72, cortexa57, 8A, ARM_FSET_MAKE_CPU1 > (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57) > ARM_CORE("cortex-a73", cortexa73, cortexa57, 8A, ARM_FSET_MAKE_CPU1 > (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73) > ARM_CORE("exynos-m1", exynosm1, exynosm1,8A, ARM_FSET_MAKE_CPU1 > (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), exynosm1) > +ARM_CORE("falkor", falkor,cortexa57, 8A, ARM_FSET_MAKE_CPU1 > (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx) > ARM_CORE("qdf24xx",qdf24xx, cortexa57, 8A, ARM_FSET_MAKE_CPU1 > (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx) > ARM_CORE("xgene1", xgene1,xgene1, 8A,ARM_FSET_MAKE_CPU1 > (FL_LDSCHED | FL_FOR_ARCH8A),xgene1) > > diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt > index ee9e3bb..7b15c8c 100644 > --- a/gcc/config/arm/arm-tables.opt > +++ b/gcc/config/arm/arm-tables.opt > @@ -328,6 +328,9 @@ EnumValue > Enum(processor_type) String(exynos-m1) Value(exynosm1) > > EnumValue > +Enum(processor_type) String(falkor) Value(falkor) > + > +EnumValue > Enum(processor_type) String(qdf24xx) Value(qdf24xx) > > EnumValue > diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On Wed, 9 Nov 2016, Yuri Rumyantsev wrote: > Thanks Richard for your comments. > Your proposed to handle epilogue loop just like normal short-trip loop > but this is not exactly truth since e.g. epilogue must not be peeled > for alignment. But if we know the epilogue data-refs are aligned we should have reflected that in the code so the vectorizer wouldn't even try to peel for alignment. OTOH peeling for alignment already knows that peeling a short-trip loop is not likely profitable (maybe the condition needs to be hardened to also work for VF/2). > It is not clear for me what are my next steps? Should I re-design the > patch completely or simply decompose the whole patch to different > parts? But it means that we must start review process from beginning > but release is closed to its end. What I disliked about the series from the beginning is that it does everything at once rather than first introducing vectorizing of epilogues as an independent patch. Lumping all in together makes it hard to decipher the conditions each feature is enabled. I'm mostly concerned about the predication part and thus if we can get the other parts separated and committed that would be a much smaller patch to look at and experiment. Note that only stage1 is at its end, we usually still accept patches that were posted before stage1 end during stage3. > Note also that i work for Intel till the end of year and have not idea > who will continue working on this project. Noted. Richard. > Any help will be appreciated. > > Thanks. > Yuri. > > 2016-11-09 13:37 GMT+03:00 Bin.Cheng : > > On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: > >> Hi All, > >> > >> I re-send all patches sent by Ilya earlier for review which support > >> vectorization of loop epilogues and loops with low trip count. We > >> assume that the only patch - vec-tails-07-combine-tail.patch - was not > >> approved by Jeff. > >> > >> I did re-base of all patches and performed bootstrapping and > >> regression testing that did not show any new failures. Also all > >> changes related to new vect_do_peeling algorithm have been changed > >> accordingly. > >> > >> Is it OK for trunk? > > > > Hi, > > I can't approve patches, but had some comments after going through the > > implementation. > > > > One confusing part is cost model change, as well as the way it's used > > to decide how epilogue loop should be vectorized. Given vect-tail is > > disabled at the moment and the cost change needs further tuning, is it > > reasonable to split this part out and get vectorization part > > reviewed/submitted independently? For example, let user specified > > parameters make the decision for now. Cost and target dependent > > changes should go in at last, this could make the patch easier to > > read. > > > > The implementation computes/shares quite amount information from main > > loop to epilogue loop vectorization. Furthermore, variables/fields > > for such information are somehow named in a misleading way. For > > example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the > > flag controlling whether epilogue loop should be vectorized with > > masking. However, it's actually controlled by exactly the same flag > > as whether epilogue loop should be combined into the main loop with > > masking: > > @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) > > > >slpeel_make_loop_iterate_ntimes (loop, niters_vector); > > > > + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) > > +vect_combine_loop_epilogue (loop_vinfo); > > + > >/* Reduce loop iterations by the vectorization factor. */ > >scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), > >expected_iterations / vf); > > > > IMHO, we should decouple main loop vectorization and epilogue > > vectorization as much as possible by sharing as few information as we > > can. The general idea is to handle epilogue loop just like normal > > short-trip loop. For example, we can rename > > LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something > > else), and we don't differentiate its meaning between main and > > epilogue(short-trip) loop. It only indicates the current loop should > > be vectorized with masking no matter it's a main loop or epilogue > > loop, and it works just like the current implementation. > > > > After this change, we can refine vectorization and make it more > > general for normal loop and epilogue(short trip) loop. For example, > > this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue > > loop and use it to control how it should be vectorized: > > + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) > > +{ > > + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; > > + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; > > +} > > + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) > > + && min_profitable_combine_iters >= 0) > > +{ > > > > This works, but not that good for understanding or mainta
[PATCH, ARM] Enable ldrd/strd peephole rules unconditionally
Hi! This patch enables the ldrd/strd peephole rules unconditionally. It is meant to fix cases, where the patch to reduce the sha512 stack usage splits ldrd/strd instructions into separate ldr/str insns, but is technically independent from the other patch: See https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00523.html It was necessary to change check_effective_target_arm_prefer_ldrd_strd to retain the true prefer_ldrd_strd tuning flag. Bootstrapped and reg-tested on arm-linux-gnueabihf. Is it OK for trunk? Thanks Bernd.2016-11-02 Bernd Edlinger PR target/77308 * config/arm/arm.md (*thumb2_ldrd, *thumb2_ldrd_base, *thumb2_ldrd_base_neg, *thumb2_strd, *thumb2_strd_base, *thumb2_strd_base_neg): Recognize insn regardless of current_tune->prefer_ldrd_strd. * config/arm/ldrdstrd.md: Enable all ldrd/strd peephole rules whenever possible. testsuite: 2016-11-02 Bernd Edlinger PR target/77308 * gcc.target/arm/pr53447-5.c: New test. * lib/target-supports.exp (check_effective_target_arm_prefer_ldrd_strd): Adjust. Index: gcc/config/arm/arm.md === --- gcc/config/arm/arm.md (revision 241816) +++ gcc/config/arm/arm.md (working copy) @@ -11636,7 +11636,6 @@ (mem:SI (plus:SI (match_dup 1) (match_operand:SI 4 "const_int_operand" ""] "TARGET_LDRD && TARGET_THUMB2 && reload_completed - && current_tune->prefer_ldrd_strd && ((INTVAL (operands[2]) + 4) == INTVAL (operands[4])) && (operands_ok_ldrd_strd (operands[0], operands[3], operands[1], INTVAL (operands[2]), @@ -11653,7 +11652,6 @@ (mem:SI (plus:SI (match_dup 1) (const_int 4] "TARGET_LDRD && TARGET_THUMB2 && reload_completed - && current_tune->prefer_ldrd_strd && (operands_ok_ldrd_strd (operands[0], operands[2], operands[1], 0, false, true))" "ldrd%?\t%0, %2, [%1]" @@ -11668,7 +11666,6 @@ (set (match_operand:SI 2 "s_register_operand" "=r") (mem:SI (match_dup 1)))] "TARGET_LDRD && TARGET_THUMB2 && reload_completed - && current_tune->prefer_ldrd_strd && (operands_ok_ldrd_strd (operands[0], operands[2], operands[1], -4, false, true))" "ldrd%?\t%0, %2, [%1, #-4]" @@ -11684,7 +11681,6 @@ (match_operand:SI 3 "const_int_operand" ""))) (match_operand:SI 4 "s_register_operand" "r"))] "TARGET_LDRD && TARGET_THUMB2 && reload_completed - && current_tune->prefer_ldrd_strd && ((INTVAL (operands[1]) + 4) == INTVAL (operands[3])) && (operands_ok_ldrd_strd (operands[2], operands[4], operands[0], INTVAL (operands[1]), @@ -11701,7 +11697,6 @@ (const_int 4))) (match_operand:SI 2 "s_register_operand" "r"))] "TARGET_LDRD && TARGET_THUMB2 && reload_completed - && current_tune->prefer_ldrd_strd && (operands_ok_ldrd_strd (operands[1], operands[2], operands[0], 0, false, false))" "strd%?\t%1, %2, [%0]" @@ -11716,7 +11711,6 @@ (set (mem:SI (match_dup 0)) (match_operand:SI 2 "s_register_operand" "r"))] "TARGET_LDRD && TARGET_THUMB2 && reload_completed - && current_tune->prefer_ldrd_strd && (operands_ok_ldrd_strd (operands[1], operands[2], operands[0], -4, false, false))" "strd%?\t%1, %2, [%0, #-4]" Index: gcc/config/arm/ldrdstrd.md === --- gcc/config/arm/ldrdstrd.md (revision 241816) +++ gcc/config/arm/ldrdstrd.md (working copy) @@ -29,9 +29,7 @@ (match_operand:SI 2 "memory_operand" "")) (set (match_operand:SI 1 "arm_general_register_operand" "") (match_operand:SI 3 "memory_operand" ""))] - "TARGET_LDRD - && current_tune->prefer_ldrd_strd - && !optimize_function_for_size_p (cfun)" + "TARGET_LDRD" [(const_int 0)] { if (!gen_operands_ldrd_strd (operands, true, false, false)) @@ -63,9 +61,7 @@ (match_operand:SI 0 "arm_general_register_operand" "")) (set (match_operand:SI 3 "memory_operand" "") (match_operand:SI 1 "arm_general_register_operand" ""))] - "TARGET_LDRD - && current_tune->prefer_ldrd_strd - && !optimize_function_for_size_p (cfun)" + "TARGET_LDRD" [(const_int 0)] { if (!gen_operands_ldrd_strd (operands, false, false, false)) @@ -102,9 +98,7 @@ (match_operand:SI 5 "const_int_operand" "")) (set (match_operand:SI 3 "memory_operand" "") (match_dup 1))] - "TARGET_LDRD - && current_tune->prefer_ldrd_strd - && !optimize_function_for_size_p (cfun)" + "TARGET_LDRD" [(const_int 0)] { if (!gen_operands_ldrd_strd (operands, false, true, false)) @@ -147,10 +141,8 @@ (match_dup 0)) (set (match_operand:SI 3 "memory_operand" "") (match_dup 1))] - "TARGET_LDRD - && c
Re: [PATCH, RFC] Improve ivopts group costs
On Thu, Nov 3, 2016 at 4:00 PM, Bin.Cheng wrote: > On Thu, Nov 3, 2016 at 1:35 PM, Evgeny Kudryashov > wrote: >> Hello, >> >> I'm facing the following problem related to ivopts. The problem is that GCC >> generates a lot of induction variables and as a result there is an >> unnecessary increase of stack usage and register pressure. >> >> For instance, for the attached testcase (tc_ivopts.c) GCC generates 26 >> induction variables (25 for each of lhsX[{0-4}][{0-4}][k] and one for >> rhs[k][j][{0-2}]). You should be able to reproduce this issue on arm target >> using GCC with "-O2 -mcpu=cortex-a9 -mtune=cortex-a9". For reference, I'm >> attaching assembly I get on current trunk. >> >> The reason might be in use groups costs, in particular, in the way of >> estimation. Currently, the cost of a tuple (group, candidate) is a sum of >> per-use costs in the group. So, the cost of a group grows proportional to >> the number of uses in the group. This approach has a negative effect on the >> algorithm for finding the best set of induction variables: the part of a >> total cost related to adding a new candidate is almost washed out by the >> cost of the group. In addition, when there is a lot of groups with many uses >> inside and a target is out of free registers, the cost of spill is washing >> out too. As a result, GCC prefers to use native candidates (candidate >> created for a particular group) for each group of uses instead of >> considering the real cost of introducing a new variable into a set. >> >> The summing approach was added as a part of this patch >> (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00641.html) and the >> motivation for taking the sum does not seem to be >> specifically discussed. >> >> I propose the following patch that changes a group cost from cost summing to >> selecting the largest one inside a group. For the given test case I have: >> necessary size of stack is decreased by almost 3 times and ldr\str amount >> are decreased by less than 2 times. Also I'm attaching assembly after >> applying the patch. >> >> The essential change in the patch is just: >> >> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c >> index f9211ad..a149418 100644 >> --- a/gcc/tree-ssa-loop-ivopts.c >> +++ b/gcc/tree-ssa-loop-ivopts.c >> @@ -5151,7 +5151,8 @@ determine_group_iv_cost_address (struct ivopts_data >> *data, >> offset and where the iv_use is. */ >> cost = get_computation_cost (data, next, cand, true, >> NULL, &can_autoinc, NULL); >> - sum_cost += cost; >> + if (sum_cost < cost) >> +sum_cost = cost; >> } >>set_group_iv_cost (data, group, cand, sum_cost, depends_on, >> NULL_TREE, ERROR_MARK, inv_expr); >> >> Any suggestions? > Hi Evgeny, > > I tend to be practical here. Given cost is not model that well in > IVOPT, any heuristic tuning in cost is quite likely resulting in > improvement in some cases, while regressions in others. At least we > need to run good number of benchmarks for such changes. Specific to > this one, the summary of cost in a group is a natural result of the > original code, in which uses' cost is added up before candidate > selection. Take your code as an example, choosing loop's original > candidate for group results in lots of memory accesses with [base + > index << scale] addressing mode, which could have higher cost than > [base + offset] mode wrto u-arch, IMHO, it makes sense to add up this > cost. OTOH, I totally agree that IVOPT tends to choose too many > candidates at the moment, especially for large loops. Is it possible > to achieve the same effect by penalizing register pressure cost? > Meanwhile, I can collect benchmark data for your patch on AArch64 and > get back to you later. I collected spec2k6 data on one AArch64 processors, it doesn't give meaningful improvement or regression. Looking at the test, it boils down to how we choose induction variable for loops having below memory references: for (biv) MEM[base + biv << scale + off1]; MEM[base + biv << scale + off2]; // ... MEM[base + biv << scale + offn]; On targets support [base + index << scale + offset] addressing mode , the biv should be preferred (if cost of the addressing mode is not blocking) thus register pressure is 2. While on targets only support [base + index << scale], it is more complicated. Choosing biv actually increases the register pressure by 1 than choosing {base_1 + biv << scale, step} as the candidate, or an additional computation outside of address expression is required for each memory referece. Well, there is one exception when "base" is actually anticipated on exit edge of this loop. So this is really target dependent cost model issue, IMHO, decreasing group cost in target-independent way is not that good, what do you think? I will look deeper why choosing biv can get rid of spilling. BTW, the case can be further extended: for (biv) MEM[base_1
Re: [WIP C++ PATCH] P0217R3 - C++17 structured bindings
Hi! On Wed, Nov 09, 2016 at 01:24:22PM +0100, Jakub Jelinek wrote: > The following patch is a WIP on P0217R3 - decomposition declarations. > It contains various FIXMEs, Jason, do you think you could finish it up? Some more comments: Invalid? code like int arr[2]; extern int x, y; auto [ x, y ] = arr; depends on PR78217 fix, so I haven't added testsuite coverage for that yet. Nor for decomp at namespace scope. There is no coverage for bitfields either. And the testsuite coverage surely needs to have some verification of the exact types and cv quals of the individual decls, the tests only cover their addresses. As implemented in the patch, those decls with DECL_VALUE_EXPRs have non-reference type always, not sure if it is ok or not. Jakub
[PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)
As shown in the attached test-case, the assert cannot always be true. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? Martin >From b55459461f3f7396a094be6801082715ddb4b30d Mon Sep 17 00:00:00 2001 From: marxin Date: Wed, 9 Nov 2016 11:52:00 +0100 Subject: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270) gcc/ChangeLog: 2016-11-09 Martin Liska PR sanitizer/78270 * gimplify.c (gimplify_switch_expr): gcc/testsuite/ChangeLog: 2016-11-09 Martin Liska PR sanitizer/78270 * gcc.dg/asan/pr78269.c: New test. --- gcc/gimplify.c | 1 - gcc/testsuite/gcc.dg/asan/pr78269.c | 13 + 2 files changed, 13 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/asan/pr78269.c diff --git a/gcc/gimplify.c b/gcc/gimplify.c index e5930e6..15976e2 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -2266,7 +2266,6 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p) labels = gimplify_ctxp->case_labels; gimplify_ctxp->case_labels = saved_labels; - gcc_assert (gimplify_ctxp->live_switch_vars->elements () == 0); delete gimplify_ctxp->live_switch_vars; gimplify_ctxp->live_switch_vars = saved_live_switch_vars; diff --git a/gcc/testsuite/gcc.dg/asan/pr78269.c b/gcc/testsuite/gcc.dg/asan/pr78269.c new file mode 100644 index 000..55840b0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/asan/pr78269.c @@ -0,0 +1,13 @@ +// { dg-do compile } +// { dg-additional-options "-Wno-switch-unreachable" } + +typedef struct +{ +} bdaddr_t; + +int a; +void fn1 () +{ + switch (a) +&(bdaddr_t){}; +} -- 2.10.1
Re: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)
On Wed, Nov 09, 2016 at 02:16:45PM +0100, Martin Liška wrote: > As shown in the attached test-case, the assert cannot always be true. > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? > Martin > >From b55459461f3f7396a094be6801082715ddb4b30d Mon Sep 17 00:00:00 2001 > From: marxin > Date: Wed, 9 Nov 2016 11:52:00 +0100 > Subject: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270) > > gcc/ChangeLog: > > 2016-11-09 Martin Liska > > PR sanitizer/78270 > * gimplify.c (gimplify_switch_expr): No description on what you've changed. That said, I'm not 100% sure it is the right fix. As the testcase shows, for switch without GIMPLE_BIND wrapping the body we can have variables that are in scope from the switch onwards. I bet we could also have variables that go out of scope, say if in the compound literal's initializer there is ({ ... }) that declares variables. I doubt you can have a valid case/default label in those though, so perhaps it would be simpler not to create live_switch_vars at all if SWITCH_BODY is not a BIND_EXPR? Jakub
Re: [PATCH] Remove unneeded gcc_assert in gimplifier (PR sanitizer/78270)
On 11/09/2016 02:29 PM, Jakub Jelinek wrote: > On Wed, Nov 09, 2016 at 02:16:45PM +0100, Martin Liška wrote: >> As shown in the attached test-case, the assert cannot always be true. >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >> >> Ready to be installed? >> Martin > >> >From b55459461f3f7396a094be6801082715ddb4b30d Mon Sep 17 00:00:00 2001 >> From: marxin >> Date: Wed, 9 Nov 2016 11:52:00 +0100 >> Subject: [PATCH] Remove unneeded gcc_assert in gimplifier (PR >> sanitizer/78270) >> >> gcc/ChangeLog: >> >> 2016-11-09 Martin Liska >> >> PR sanitizer/78270 >> * gimplify.c (gimplify_switch_expr): > > No description on what you've changed. > > That said, I'm not 100% sure it is the right fix. > As the testcase shows, for switch without GIMPLE_BIND wrapping the body > we can have variables that are in scope from the switch onwards. > I bet we could also have variables that go out of scope, say if in the > compound literal's initializer there is ({ ... }) that declares variables. > I doubt you can have a valid case/default label in those though, so > perhaps it would be simpler not to create live_switch_vars at all > if SWITCH_BODY is not a BIND_EXPR? I like the approach you introduced! I'll re-trigger regression tests and send a newer version of patch. Martin > > Jakub >
[PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)
Hello. Following patch fixes [1] (f0 function), where we have off-by-one error. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Ready to be installed? Martin [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257#c0 >From d7c46e5eb4d295d7653eae188cae04a8f7f0719f Mon Sep 17 00:00:00 2001 From: marxin Date: Wed, 9 Nov 2016 10:33:43 +0100 Subject: [PATCH] Fix folding of memcmp("a", "a", 2) (PR tree-optimization/78257) gcc/ChangeLog: 2016-11-09 Martin Liska * fold-const-call.c (fold_const_call): Fix the folding. gcc/testsuite/ChangeLog: 2016-11-09 Martin Liska * gcc.dg/tree-ssa/builtins-folding-generic.c (main): Add new test-case for memcmp. * gcc.dg/tree-ssa/builtins-folding-gimple.c: Likewise. --- gcc/fold-const-call.c| 10 +- gcc/gimplify.c | 1 - gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c | 5 + gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c | 6 ++ 4 files changed, 16 insertions(+), 6 deletions(-) diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c index 1b3a755..38f9717 100644 --- a/gcc/fold-const-call.c +++ b/gcc/fold-const-call.c @@ -1506,7 +1506,7 @@ tree fold_const_call (combined_fn fn, tree type, tree arg0, tree arg1, tree arg2) { const char *p0, *p1; - size_t s2 = 0; + size_t s0, s1, s2 = 0; switch (fn) { case CFN_BUILT_IN_STRNCMP: @@ -1538,11 +1538,11 @@ fold_const_call (combined_fn fn, tree type, tree arg0, tree arg1, tree arg2) } case CFN_BUILT_IN_BCMP: case CFN_BUILT_IN_MEMCMP: - if ((p0 = c_getstr (arg0)) - && (p1 = c_getstr (arg1)) + if ((p0 = c_getstr (arg0, &s0)) + && (p1 = c_getstr (arg1, &s1)) && host_size_t_cst_p (arg2, &s2) - && s2 <= strlen (p0) - && s2 <= strlen (p1)) + && s2 <= s0 + && s2 <= s1) return build_cmp_result (type, memcmp (p0, p1, s2)); return NULL_TREE; diff --git a/gcc/gimplify.c b/gcc/gimplify.c index e5930e6..15976e2 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -2266,7 +2266,6 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p) labels = gimplify_ctxp->case_labels; gimplify_ctxp->case_labels = saved_labels; - gcc_assert (gimplify_ctxp->live_switch_vars->elements () == 0); delete gimplify_ctxp->live_switch_vars; gimplify_ctxp->live_switch_vars = saved_live_switch_vars; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c index 175feff..3c7e001 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-generic.c @@ -66,6 +66,10 @@ main (void) if (__builtin_strncasecmp ("aab", "aac", 2) != 0) __builtin_abort (); + /* MEMCMP. */ + if (__builtin_memcmp ("a", "a", 6) != 0) +__builtin_abort (); + return 0; } @@ -74,3 +78,4 @@ main (void) /* { dg-final { scan-tree-dump-not "__builtin_strncmp" "original" } } */ /* { dg-final { scan-tree-dump-not "__builtin_strncasecmp" "original" } } */ /* { dg-final { scan-tree-dump-not "__builtin_memchr" "original" } } */ +/* { dg-final { scan-tree-dump-not "__builtin_memcmp" "original" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c index 283bd1c..d08ced5 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c @@ -63,6 +63,7 @@ main (void) const char *a = "a"; const char *hello = "hello"; + const char *hello2 = "hello"; const char *empty = ""; const char *ab = "ab"; const char *ba = "ba"; @@ -151,6 +152,10 @@ main (void) if (__builtin_strncasecmp (++s2, ++s3+2, 0) != 0 || s2 != s1+1 || s3 != s1+5) __builtin_abort(); + /* MEMCMP. */ + if (__builtin_memcmp (hello, hello2, 6) != 0) +__builtin_abort (); + return 0; } @@ -158,4 +163,5 @@ main (void) /* { dg-final { scan-tree-dump-not "__builtin_strcasecmp" "optimized" } } */ /* { dg-final { scan-tree-dump-not "__builtin_strncmp" "optimized" } } */ /* { dg-final { scan-tree-dump-not "__builtin_memchr" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "__builtin_memcmp" "optimized" } } */ /* { dg-final { scan-tree-dump-times "__builtin_strncasecmp" 3 "optimized" } } */ -- 2.10.1
Re: [PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)
On Wed, Nov 9, 2016 at 2:56 PM, Martin Liška wrote: > Hello. > > Following patch fixes [1] (f0 function), where we have off-by-one error. > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > Ready to be installed? Ok (w/o the unrelated gimplify.c hunk). Richard. > Martin > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257#c0
[PATCH] Remove unused parameter from tree_swap_operands_p
Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2016-11-09 Richard Biener * fold-const.c (tree_swap_operands_p): Remove unused arg. * fold-const.c (tree_swap_operands_p): Likewise. (fold_binary_loc): Adjust. (fold_ternary_loc): Likewise. * genmatch.c (dt_operand::gen_gimple_exp): Likewise. * gimple-fold.c (fold_stmt_1): Likewise. * gimple-match-head.c (gimple_resimplify2): Likewise. (gimple_resimplify3): Likewise. (gimple_simplify): Likewise. * tree-ssa-dom.c (record_equality): Likewise. * tree-ssa-reassoc.c (optimize_range_tests_var_bound): Likewise. * tree-ssa-sccvn.c (vn_nary_op_compute_hash): Likewise. * tree-ssa-threadedge.c (simplify_control_stmt_condition_1): Likewise. diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 603aff0..a774848 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -9159,13 +9159,13 @@ fold_binary_loc (location_t loc, /* If this is a commutative operation, and ARG0 is a constant, move it to ARG1 to reduce the number of tests below. */ if (commutative_tree_code (code) - && tree_swap_operands_p (arg0, arg1, true)) + && tree_swap_operands_p (arg0, arg1)) return fold_build2_loc (loc, code, type, op1, op0); /* Likewise if this is a comparison, and ARG0 is a constant, move it to ARG1 to reduce the number of tests below. */ if (kind == tcc_comparison - && tree_swap_operands_p (arg0, arg1, true)) + && tree_swap_operands_p (arg0, arg1)) return fold_build2_loc (loc, swap_tree_comparison (code), type, op1, op0); tem = generic_simplify (loc, code, type, op0, op1); @@ -11271,7 +11271,7 @@ fold_ternary_loc (location_t loc, enum tree_code code, tree type, /* If this is a commutative operation, and OP0 is a constant, move it to OP1 to reduce the number of tests below. */ if (commutative_ternary_tree_code (code) - && tree_swap_operands_p (op0, op1, true)) + && tree_swap_operands_p (op0, op1)) return fold_build3_loc (loc, code, type, op1, op0, op2); tem = generic_simplify (loc, code, type, op0, op1, op2); @@ -11400,7 +11400,7 @@ fold_ternary_loc (location_t loc, enum tree_code code, tree type, /* If the second operand is simpler than the third, swap them since that produces better jump optimization results. */ if (truth_value_p (TREE_CODE (arg0)) - && tree_swap_operands_p (op1, op2, false)) + && tree_swap_operands_p (op1, op2)) { location_t loc0 = expr_location_or (arg0, loc); /* See if this can be inverted. If it can't, possibly because diff --git a/gcc/fold-const.h b/gcc/fold-const.h index ae37142..46dcd28 100644 --- a/gcc/fold-const.h +++ b/gcc/fold-const.h @@ -124,7 +124,7 @@ extern tree build_invariant_address (tree, tree, HOST_WIDE_INT); extern tree constant_boolean_node (bool, tree); extern tree div_if_zero_remainder (const_tree, const_tree); -extern bool tree_swap_operands_p (const_tree, const_tree, bool); +extern bool tree_swap_operands_p (const_tree, const_tree); extern enum tree_code swap_tree_comparison (enum tree_code); extern bool ptr_difference_const (tree, tree, HOST_WIDE_INT *); diff --git a/gcc/genmatch.c b/gcc/genmatch.c index b14034d..41951c5 100644 --- a/gcc/genmatch.c +++ b/gcc/genmatch.c @@ -2701,7 +2701,7 @@ dt_operand::gen_gimple_expr (FILE *f, int indent) gen_opname (child_opname0, 0); gen_opname (child_opname1, 1); fprintf_indent (f, indent, - "if (tree_swap_operands_p (%s, %s, false))\n", + "if (tree_swap_operands_p (%s, %s))\n", child_opname0, child_opname1); fprintf_indent (f, indent, " std::swap (%s, %s);\n", diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 5d46405..aabc8ff 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -4166,7 +4166,7 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, tree (*valueize) (tree)) { tree rhs1 = gimple_assign_rhs1 (stmt); tree rhs2 = gimple_assign_rhs2 (stmt); - if (tree_swap_operands_p (rhs1, rhs2, false)) + if (tree_swap_operands_p (rhs1, rhs2)) { gimple_assign_set_rhs1 (stmt, rhs2); gimple_assign_set_rhs2 (stmt, rhs1); @@ -4232,7 +4232,7 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, tree (*valueize) (tree)) /* Canonicalize operand order. */ tree lhs = gimple_cond_lhs (stmt); tree rhs = gimple_cond_rhs (stmt); - if (tree_swap_operands_p (lhs, rhs, false)) + if (tree_swap_operands_p (lhs, rhs)) { gcond *gc = as_a (stmt); gimple_cond_set_lhs (gc, rhs); diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c index 4eab90d..09f729f 100644 --- a/gcc/gimple-ma
Re: [PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)
On 11/09/2016 02:58 PM, Richard Biener wrote: > On Wed, Nov 9, 2016 at 2:56 PM, Martin Liška wrote: >> Hello. >> >> Following patch fixes [1] (f0 function), where we have off-by-one error. >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >> >> Ready to be installed? > > Ok (w/o the unrelated gimplify.c hunk). Thanks, installed as r242000 and sorry for the wrong hunk. As the patch fixed just of the half of issues in the PR, should I close it or should I keep it for tracking (thus maybe changing to enhancement?)? Thanks, Martin > > Richard. > >> Martin >> >> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257#c0
[arm-embedded][PATCH, gcc/ARM, ping] Add support for Cortex-M23
Hi, We have decided to backport this patch to add support for ARM Cortex-M23 to our embedded-6-branch. 2016-11-09 Thomas Preud'homme Backport from mainline 2016-11-04 Thomas Preud'homme gcc/ * config/arm/arm-arches.def (armv8-m.base): Set Cortex-M23 as representative core for this architecture. * config/arm/arm-cores.def (cortex-m23): Define new processor. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Likewise. * config/arm/arm.c (arm_v6m_tune): Add Cortex-M23 to the list of cores this tuning parameters apply to in the comment. * config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M23 to the list of valid -mcpu options. * doc/invoke.texi (ARM Options): Document new Cortex-M23 processor. Best regards, Thomas --- Begin Message --- On 02/11/16 10:13, Kyrill Tkachov wrote: On 02/11/16 10:07, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 26/10/16 17:42, Thomas Preudhomme wrote: Hi, This patch adds support for the Cortex-M23 processor launched by ARM [1]. The patch adds support for the name and wires it up to the ARMv8-M Baseline architecture and arm_v6m_tune tuning parameters for the time being. It also updates documentation to mention this new processor. [1] http://www.arm.com/products/processors/cortex-m/cortex-m23-processor.php ChangeLog entry is as follows: *** gcc/Changelog *** 2016-10-26 Thomas Preud'homme * config/arm/arm-arches.def (armv8-m.base): Set Cortex-M23 as representative core for this architecture. * config/arm/arm-cores.def (cortex-m23): Define new processor. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Likewise. * config/arm/arm.c (arm_v6m_tune): Add Cortex-M23 to the list of cores this tuning parameters apply to in the comment. * config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M23 to the list of valid -mcpu options. * doc/invoke.texi (ARM Options): Document new Cortex-M23 processor. Tested by building libgcc and libstdc++ for Cortex-M23 and running a hello world compiled for it. Is this ok for master? Ok. Thanks, Kyrill Committed, thanks. Best regards, Thomas --- End Message ---
[arm-embedded][PATCH, gcc/ARM, ping] Add support for Cortex-M33
Hi, We have decided to backport this patch to add support for ARM Cortex-M33 to our embedded-6-branch. 2016-11-09 Thomas Preud'homme Backport from mainline 2016-11-04 Thomas Preud'homme gcc/ * config/arm/arm-arches.def (armv8-m.main+dsp): Set Cortex-M33 as representative core for this architecture. * config/arm/arm-cores.def (cortex-m33): Define new processor. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Likewise. * config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M33 to the list of valid -mcpu options. * doc/invoke.texi (ARM Options): Document new Cortex-M33 processor. Best regards, Thomas --- Begin Message --- On 02/11/16 10:14, Kyrill Tkachov wrote: On 02/11/16 10:07, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 26/10/16 17:42, Thomas Preudhomme wrote: Hi, This patch adds support for the Cortex-M33 processor launched by ARM [1]. The patch adds support for the name and wires it up to the ARMv8-M Mainline with DSP extensions architecture and arm_v7m_tune tuning parameters for the time being. It also updates documentation to mention this new processor. [1] http://www.arm.com/products/processors/cortex-m/cortex-m33-processor.php ChangeLog entry is as follows: *** gcc/Changelog *** 2016-10-26 Thomas Preud'homme * config/arm/arm-arches.def (armv8-m.main+dsp): Set Cortex-M33 as representative core for this architecture. * config/arm/arm-cores.def (cortex-m33): Define new processor. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Likewise. * config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M33 to the list of valid -mcpu options. * doc/invoke.texi (ARM Options): Document new Cortex-M33 processor. Tested by building libgcc and libstdc++ for Cortex-M33 and running a hello world compiled for it. Is this ok for master? Ok. Thanks, Kyrill Committed, thanks. Best regards, Thomas --- End Message ---
Re: [RFC] Check number of uses in simplify_cond_using_ranges().
On Wed, Nov 09, 2016 at 01:43:58PM +0100, Richard Biener wrote: > On Tue, Nov 8, 2016 at 5:18 PM, Marc Glisse wrote: > > On Tue, 8 Nov 2016, Dominik Vogt wrote: > > > >> On Fri, Nov 04, 2016 at 01:54:20PM +0100, Richard Biener wrote: > >>> > >>> On Fri, Nov 4, 2016 at 12:08 PM, Dominik Vogt > >>> wrote: > > On Fri, Nov 04, 2016 at 09:47:26AM +0100, Richard Biener wrote: > > > > On Thu, Nov 3, 2016 at 4:03 PM, Dominik Vogt > > wrote: > >> > >> Is VRP the right pass to do this optimisation or should a later > >> pass rather attempt to eliminate the new use of b_5 instead? Uli > >> has brought up the idea a mini "sign extend elimination" pass that > >> checks if the result of a sign extend could be replaced by the > >> original quantity in all places, and if so, eliminate the ssa > >> name. (I guess that won't help with the above code because l is > >> used also as a function argument.) How could a sensible approach > >> to deal with the situation look like? > > > > > > We run into this kind of situation regularly and for general foldings > > in match.pd we settled with single_use () even though it is not > > perfect. > > Note the usual complaint is not extra extension instructions but > > the increase of register pressure. > > > > This is because it is hard to do better when you are doing local > > optimization. > > > > As for the question on whether VRP is the right pass to do this the > > answer is two-fold -- VRP has the most precise range information. > > But the folding itself should be moved to generic code and use > > get_range_info (). > > > All right, is this a sensible approach then? > >>> > >>> > >>> Yes. > >>> > 1. Using has_single_use as in the experimental patch is good > enough (provided further testing does not show serious > regressions). > >>> > >>> > >>> I'd approve that, yes. > >>> > 2. Rip the whole top level if-block from simplify_cond_using_ranges(). > 3. Translate that code to match.pd syntax. > >>> > >>> > >>> Might be some work but yes, that's also desired (you'd lose the ability > >>> to emit the warnings though). > >> > >> > >> Could you give me a match-pd-hint please? We now have something > >> like this: > >> > >> (simplify > >> (cond (gt SSA_NAME@0 INTEGER_CST@1) @2 @3) > >> (if (... many conditions ...) > >> (cond (gt ... ...) @2 @3)) > >> > >> The generated code ends up in gimple_simplify_COND_EXPR, but when > >> gimple_simplify is actually called, it goes through the > >> GIMPLE_COND case and calls gimple_resimplify2(..., GT, ...) and > >> there it tries gimple_simplify_GT_EXPR(), peeling of the (cond > >> ...), i.e. it never tries the generated code. > > > > > > Not sure what you mean here. > > > >> There is another pattern in match.pd that uses a (cond ...) as the > >> first operand, and I do not understand how this works. Should we > >> just use "(gt SSA_NAME@0 INTEGER_CST@1)" as the first operand > >> instead, and wouldn't this pattern be too general that way? > > > > > > IIUC, you are trying to move the second half of simplify_cond_using_ranges > > to match.pd. I don't see any reason to restrict it to the case where the > > comparison result is used directly in a COND_EXPR, so that would look like: > > > > (for cmp (...) > > (simplify > > (cmp (convert SSA_NAME@0) INTEGER_CST@1) > > (if (...) > >(cmp @0 (convert @1) > > > > maybe? I think I may have missed your point. > > Yeah, if you'd use (cond (gt ... then it only matches in assignments > with COND_EXPRs on the RHS, _not_ in GIMPLE_CONDs. > > So you ran into the (cond vs. GIMPLE_COND "mismatch". > > You'd essentially implement sth similar to shorten_compare in match.pd. Something like the attached patch? Robin and me have spent quite some time to figure out the new pattern. Two questions: 1) In the match expression you cannot just use SSA_NAME@0 because then the "case SSA_NAME:" is added to a switch for another pattern that already has that label. Thus we made that "proxy" predicate "ssa_name_p" that forces the code for the new pattern out of the old switch and into a separate code block. We couldn't figure out whether this joining of case labels is a feature in the matching language. So, is this the right way to deal with the conflicting labels? 2) There's an awful lot of if-clauses in the replacement expression. Is it the right place to do all this checking? > Btw, moving to match.pd shouldn't be a blocker for adding proper > single_use tests > just in case you get lost ... Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany >From 26f75d777254c7e29d721c6813eca33539ac6574 Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Wed, 2 Nov 2016 14:01:46 +0100 Subject: [PATCH] Convert folding in simplify_cond_using_ranges() to match.pd. In cases like this ac_4 = *p_3(D); bc_5 = ac_4
Re: [PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)
On Wed, Nov 9, 2016 at 3:08 PM, Martin Liška wrote: > On 11/09/2016 02:58 PM, Richard Biener wrote: >> On Wed, Nov 9, 2016 at 2:56 PM, Martin Liška wrote: >>> Hello. >>> >>> Following patch fixes [1] (f0 function), where we have off-by-one error. >>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >>> >>> Ready to be installed? >> >> Ok (w/o the unrelated gimplify.c hunk). > > Thanks, installed as r242000 and sorry for the wrong hunk. > As the patch fixed just of the half of issues in the PR, should I close > it or should I keep it for tracking (thus maybe changing to enhancement?)? Keep it open for the other issues - this one was minor and easy to fix. Richard. > Thanks, > Martin > > >> >> Richard. >> >>> Martin >>> >>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78257#c0 >
Re: [patch, avr] Add flash size to device info and make wrap around default
On 09.11.2016 10:14, Pitchumani Sivanupandi wrote: On Tuesday 08 November 2016 02:57 PM, Georg-Johann Lay wrote: On 08.11.2016 08:08, Pitchumani Sivanupandi wrote: I have updated patch to include the flash size as well. Took that info from device headers (it was fed into crt's device information note section also). The new option would render -mn-flash superfluous, but we should keep it for backward compatibility. Ok. Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from LINK_RELAX_SPEC? And what happens if relaxation is off? Yes. Removed link_pmem_wrap from link_relax. Disabling relaxation doesn't change -mpmem-wrap-around behavior. Now, wrap around behavior is changed as follows: For 8K flash devices: Device specs adds --pmem-wrap-around=8k linker option if -mno-pmem-wrap-around is NOT enabled. It makes the --pmem-wrap-around=8k linker option default for 8k flash devices. For 16/32/64K flash devices: Spec string 'link_pmem_wrap' added to all 16/32/64k flash devices specs. Other wise no changes i.e. It adds --pmem-wrap-around=16/32/64k option if -mpmem-wrap-around option is enabled. For other devices, no changes in device specs. Reg tested with default and -mrelax options enabled. No issues. Regards, Pitchumani gcc/ChangeLog 2016-11-08 Pitchumani Sivanupandi * config/avr/avr-arch.h (avr_mcu_t): Add flash_size member. * config/avr/avr-devices.c(avr_mcu_types): Add flash size info. * config/avr/avr-mcu.def: Likewise. * config/avr/gen-avr-mmcu-specs.c (print_mcu): Remove hard-coded prefix check to find wrap-around value, instead use MCU flash size. For 8k flash devices, update link_pmem_wrap spec string to add --pmem-wrap-around=8k. * config/avr/specs.h: Remove link_pmem_wrap from LINK_RELAX_SPEC and add to linker specs (LINK_SPEC) directly. flashsize-and-wrap-around.patch diff --git a/gcc/config/avr/avr-mcus.def b/gcc/config/avr/avr-mcus.def index 6bcc6ff..9d4aa1a 100644 /* Classic, == 128K. */ -AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,0x0060, 0x0, 2) -AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 0x0060, 0x0, 2) -AVR_MCU ("at43usb320", ARCH_AVR31, AVR_ISA_NONE, "__AVR_AT43USB320__", 0x0060, 0x0, 2) +AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL,0x0060, 0x0, 2, 0x2) +AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2) +AVR_MCU ("at43usb320", ARCH_AVR31, AVR_ISA_NONE, "__AVR_AT43USB320__", 0x0060, 0x0, 2, 0x1) This looks incorrect: either .flash_size should be 0x2 or n_flash be 1. As you draw the information from internal hardware descriptions, I'd guess that the new information is more reliable? ...as this also determines whether AT43USB320supports ELPM this also means that the device is in the wrong multilib set? I couldn't find the data sheet at atmel.com, and the one I found on the net only mentions 64KiB program memory. It mentions LPM on pp. 9 but has no reference to the supported instruction set. From the manual I would conclude that this device should be avr3, not avr31? Yes. I couldn't find any other source other than the datasheet in net. This device do not have internal program memory. Flash size is set to 64K as the device could address only 64K. No RAMPZ register, so no ELPM. Moved device to AVR3 architecture. /* Classic, > 8K, <= 64K. */ -AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL, 0x0060, 0x0, 1) -AVR_MCU ("at43usb355", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT43USB355__", 0x0060, 0x0, 1) -AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT76C711__", 0x0060, 0x0, 1) +AVR_MCU ("avr3", ARCH_AVR3, AVR_ISA_NONE, NULL, 0x0060, 0x0, 1, 0x6000) +AVR_MCU ("at43usb355", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT43USB355__", 0x0060, 0x0, 1, 0x6000) +AVR_MCU ("at76c711", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT76C711__", 0x0060, 0x0, 1, 0x4000) +AVR_MCU ("at43usb320", ARCH_AVR3, AVR_ISA_NONE, "__AVR_AT43USB320__", 0x0060, 0x0, 1, 0x1) /* Classic, == 128K. */ -AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL, 0x0060, 0x0, 2) -AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 0x0060, 0x0, 2) -AVR_MCU ("at43usb320", ARCH_AVR31, AVR_ISA_NONE, "__AVR_AT43USB320__", 0x0060, 0x0, 2) +AVR_MCU ("avr31",ARCH_AVR31, AVR_ERRATA_SKIP, NULL, 0x0060, 0x0, 2, 0x2) +AVR_MCU ("atmega103",ARCH_AVR31, AVR_ERRATA_SKIP, "__AVR_ATmega103__", 0x0060, 0x0, 2, 0x2) /* Classic + MOVW + JMP/CALL. */ If at43usb320 is in the wrong multilib, then this should be handled as separate issue / patch together with its own PR. Sorry for the confusion. I just noticed that so
Re: [RFC] Check number of uses in simplify_cond_using_ranges().
On Wed, Nov 9, 2016 at 3:30 PM, Dominik Vogt wrote: > On Wed, Nov 09, 2016 at 01:43:58PM +0100, Richard Biener wrote: >> On Tue, Nov 8, 2016 at 5:18 PM, Marc Glisse wrote: >> > On Tue, 8 Nov 2016, Dominik Vogt wrote: >> > >> >> On Fri, Nov 04, 2016 at 01:54:20PM +0100, Richard Biener wrote: >> >>> >> >>> On Fri, Nov 4, 2016 at 12:08 PM, Dominik Vogt >> >>> wrote: >> >> On Fri, Nov 04, 2016 at 09:47:26AM +0100, Richard Biener wrote: >> > >> > On Thu, Nov 3, 2016 at 4:03 PM, Dominik Vogt >> > wrote: >> >> >> >> Is VRP the right pass to do this optimisation or should a later >> >> pass rather attempt to eliminate the new use of b_5 instead? Uli >> >> has brought up the idea a mini "sign extend elimination" pass that >> >> checks if the result of a sign extend could be replaced by the >> >> original quantity in all places, and if so, eliminate the ssa >> >> name. (I guess that won't help with the above code because l is >> >> used also as a function argument.) How could a sensible approach >> >> to deal with the situation look like? >> > >> > >> > We run into this kind of situation regularly and for general foldings >> > in match.pd we settled with single_use () even though it is not >> > perfect. >> > Note the usual complaint is not extra extension instructions but >> > the increase of register pressure. >> > >> > This is because it is hard to do better when you are doing local >> > optimization. >> > >> > As for the question on whether VRP is the right pass to do this the >> > answer is two-fold -- VRP has the most precise range information. >> > But the folding itself should be moved to generic code and use >> > get_range_info (). >> >> >> All right, is this a sensible approach then? >> >>> >> >>> >> >>> Yes. >> >>> >> 1. Using has_single_use as in the experimental patch is good >> enough (provided further testing does not show serious >> regressions). >> >>> >> >>> >> >>> I'd approve that, yes. >> >>> >> 2. Rip the whole top level if-block from simplify_cond_using_ranges(). >> 3. Translate that code to match.pd syntax. >> >>> >> >>> >> >>> Might be some work but yes, that's also desired (you'd lose the ability >> >>> to emit the warnings though). >> >> >> >> >> >> Could you give me a match-pd-hint please? We now have something >> >> like this: >> >> >> >> (simplify >> >> (cond (gt SSA_NAME@0 INTEGER_CST@1) @2 @3) >> >> (if (... many conditions ...) >> >> (cond (gt ... ...) @2 @3)) >> >> >> >> The generated code ends up in gimple_simplify_COND_EXPR, but when >> >> gimple_simplify is actually called, it goes through the >> >> GIMPLE_COND case and calls gimple_resimplify2(..., GT, ...) and >> >> there it tries gimple_simplify_GT_EXPR(), peeling of the (cond >> >> ...), i.e. it never tries the generated code. >> > >> > >> > Not sure what you mean here. >> > >> >> There is another pattern in match.pd that uses a (cond ...) as the >> >> first operand, and I do not understand how this works. Should we >> >> just use "(gt SSA_NAME@0 INTEGER_CST@1)" as the first operand >> >> instead, and wouldn't this pattern be too general that way? >> > >> > >> > IIUC, you are trying to move the second half of simplify_cond_using_ranges >> > to match.pd. I don't see any reason to restrict it to the case where the >> > comparison result is used directly in a COND_EXPR, so that would look like: >> > >> > (for cmp (...) >> > (simplify >> > (cmp (convert SSA_NAME@0) INTEGER_CST@1) >> > (if (...) >> >(cmp @0 (convert @1) >> > >> > maybe? I think I may have missed your point. >> >> Yeah, if you'd use (cond (gt ... then it only matches in assignments >> with COND_EXPRs on the RHS, _not_ in GIMPLE_CONDs. >> >> So you ran into the (cond vs. GIMPLE_COND "mismatch". >> >> You'd essentially implement sth similar to shorten_compare in match.pd. > > Something like the attached patch? Robin and me have spent quite > some time to figure out the new pattern. Two questions: > > 1) In the match expression you cannot just use SSA_NAME@0 because >then the "case SSA_NAME:" is added to a switch for another >pattern that already has that label. Thus we made that "proxy" >predicate "ssa_name_p" that forces the code for the new pattern >out of the old switch and into a separate code block. We >couldn't figure out whether this joining of case labels is a >feature in the matching language. So, is this the right way to >deal with the conflicting labels? No, just do not match SSA_NAME. And instead of + (with { gimple *def_stmt = SSA_NAME_DEF_STMT (@0); } + (if (is_gimple_assign (def_stmt) + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt))) you instead want to change the pattern to (simpify (cmp (convert @0) INTEGER_CST@1) @0 will then be your innerop note that you can't use get_value_range but you have
Re: [PATCH 7/7, GCC, ARM, V8M] Added support for ARMV8-M Security Extension cmse_nonsecure_caller intrinsic
On 09/11/16 10:26, Kyrill Tkachov wrote: > > @@ -1832,6 +1834,17 @@ arm_init_builtins (void) > = add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr, > ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE); > } > + > + if (arm_arch_cmse) > +{ > + tree ftype_cmse_nonsecure_caller > += build_function_ty > > Should this be use_cmse ? > This looks ok to me otherwise. > I believe patch [6/7] is the only one needing approval after this... > > Kyrill Hi, Yeah it should indeed be 'use_cmse'. Here is the reworked version, no changes to ChangeLog. Cheers, Andre diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index e73043db6db69fa64bb1e72cf71a36d7169062db..232d2de52106324e9f22b535a1b02cb95a2294b7 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -528,6 +528,8 @@ enum arm_builtins ARM_BUILTIN_GET_FPSCR, ARM_BUILTIN_SET_FPSCR, + ARM_BUILTIN_CMSE_NONSECURE_CALLER, + #undef CRYPTO1 #undef CRYPTO2 #undef CRYPTO3 @@ -1832,6 +1834,17 @@ arm_init_builtins (void) = add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr, ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, NULL_TREE); } + + if (use_cmse) +{ + tree ftype_cmse_nonsecure_caller + = build_function_type_list (unsigned_type_node, NULL); + arm_builtin_decls[ARM_BUILTIN_CMSE_NONSECURE_CALLER] + = add_builtin_function ("__builtin_arm_cmse_nonsecure_caller", + ftype_cmse_nonsecure_caller, + ARM_BUILTIN_CMSE_NONSECURE_CALLER, BUILT_IN_MD, + NULL, NULL_TREE); +} } /* Return the ARM builtin for CODE. */ @@ -2452,6 +2465,12 @@ arm_expand_builtin (tree exp, emit_insn (pat); return target; +case ARM_BUILTIN_CMSE_NONSECURE_CALLER: + target = gen_reg_rtx (SImode); + op0 = arm_return_addr (0, NULL_RTX); + emit_insn (gen_addsi3 (target, op0, const1_rtx)); + return target; + case ARM_BUILTIN_TEXTRMSB: case ARM_BUILTIN_TEXTRMUB: case ARM_BUILTIN_TEXTRMSH: diff --git a/gcc/config/arm/arm_cmse.h b/gcc/config/arm/arm_cmse.h index 894343bb835b61e09c14668d45aa43a8693fd011..82b58b1c4f4a12ba6062e2cc2632653788d0eeb7 100644 --- a/gcc/config/arm/arm_cmse.h +++ b/gcc/config/arm/arm_cmse.h @@ -163,6 +163,13 @@ __attribute__ ((__always_inline__)) cmse_TTAT (void *__p) __CMSE_TT_ASM (at) +/* FIXME: diagnose use outside cmse_nonsecure_entry functions. */ +__extension__ static __inline int __attribute__ ((__always_inline__)) +cmse_nonsecure_caller (void) +{ + return __builtin_arm_cmse_nonsecure_caller (); +} + #define CMSE_AU_NONSECURE 2 #define CMSE_MPU_NONSECURE 16 #define CMSE_NONSECURE 18 diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 462e6c71e20791b35f02adabfc97b9b013fd296a..88f6e014c3def8e6a2d2452df5d4937a4f0dd1ef 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -12625,6 +12625,7 @@ cmse_address_info_t cmse_TTAT_fptr (FPTR) void * cmse_check_address_range (void *, size_t, int) typeof(p) cmse_nsfptr_create (FPTR p) intptr_t cmse_is_nsfptr (FPTR) +int cmse_nonsecure_caller (void) @end smallexample @node AVR Built-in Functions diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c index d5b9a2d9d59569de170da814ae660e9fb2b943e7..ddcf12a30a6c1806969d239c448da81ccf49532e 100644 --- a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c +++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c @@ -65,3 +65,32 @@ int foo (char * p) /* { dg-final { scan-assembler-times "ttat " 2 } } */ /* { dg-final { scan-assembler-times "bl.cmse_check_address_range" 7 } } */ /* { dg-final { scan-assembler-not "cmse_check_pointed_object" } } */ + +typedef int (*int_ret_funcptr_t) (void); +typedef int __attribute__ ((cmse_nonsecure_call)) (*int_ret_nsfuncptr_t) (void); + +int __attribute__ ((cmse_nonsecure_entry)) +baz (void) +{ + return cmse_nonsecure_caller (); +} + +int __attribute__ ((cmse_nonsecure_entry)) +qux (int_ret_funcptr_t int_ret_funcptr) +{ + int_ret_nsfuncptr_t int_ret_nsfunc_ptr; + + if (cmse_is_nsfptr (int_ret_funcptr)) +{ + int_ret_nsfunc_ptr = cmse_nsfptr_create (int_ret_funcptr); + return int_ret_nsfunc_ptr (); +} + return 0; +} +/* { dg-final { scan-assembler "baz:" } } */ +/* { dg-final { scan-assembler "__acle_se_baz:" } } */ +/* { dg-final { scan-assembler-not "\tcmse_nonsecure_caller" } } */ +/* { dg-final { scan-rtl-dump "and.*reg.*const_int 1" expand } } */ +/* { dg-final { scan-assembler "bic" } } */ +/* { dg-final { scan-assembler "push\t\{r4, r5, r6" } } */ +/* { dg-final { scan-assembler "msr\tAPSR_nzcvq" } } */
Re: [PATCHv2 6/7, GCC, ARM, V8M] ARMv8-M Security Extension's cmse_nonsecure_call: use __gnu_cmse_nonsecure_call
On 27/10/16 11:01, Andre Vieira (lists) wrote: > On 25/10/16 17:30, Andre Vieira (lists) wrote: >> On 24/08/16 12:01, Andre Vieira (lists) wrote: >>> On 25/07/16 14:26, Andre Vieira (lists) wrote: This patch extends support for the ARMv8-M Security Extensions 'cmse_nonsecure_call' to use a new library function '__gnu_cmse_nonsecure_call'. This library function is responsible for (without using r0-r3 or d0-d7): 1) saving and clearing all callee-saved registers using the secure stack 2) clearing the LSB of the address passed in r4 and using blxns to 'jump' to it 3) clearing ASPR, including the 'ge bits' if DSP is enabled 4) clearing FPSCR if using non-soft float-abi 5) restoring callee-saved registers. The decisions whether to include DSP 'ge bits' clearing and floating point registers (single/double precision) all depends on the multilib used. See Section 5.5 of ARM®v8-M Security Extensions (http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/index.html). *** gcc/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * config/arm/arm.c (detect_cmse_nonsecure_call): New. (cmse_nonsecure_call_clear_caller_saved): New. (arm_reorg): Use cmse_nonsecure_call_clear_caller_saved. * config/arm/arm-protos.h (detect_cmse_nonsecure_call): New. * config/arm/arm.md (call): Handle cmse_nonsecure_entry. (call_value): Likewise. (nonsecure_call_internal): New. (nonsecure_call_value_internal): New. * config/arm/thumb1.md (*nonsecure_call_reg_thumb1_v5): New. (*nonsecure_call_value_reg_thumb1_v5): New. * config/arm/thumb2.md (*nonsecure_call_reg_thumb2): New. (*nonsecure_call_value_reg_thumb2): New. * config/arm/unspecs.md (UNSPEC_NONSECURE_MEM): New. *** libgcc/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * config/arm/cmse_nonsecure_call.S: New. * config/arm/t-arm: Compile cmse_nonsecure_call.S *** gcc/testsuite/ChangeLog *** 2016-07-25 Andre Vieira Thomas Preud'homme * gcc.target/arm/cmse/cmse.exp: Run tests in mainline dir. * gcc.target/arm/cmse/cmse-9.c: Added some extra tests. * gcc.target/arm/cmse/baseline/bitfield-4.c: New. * gcc.target/arm/cmse/baseline/bitfield-5.c: New. * gcc.target/arm/cmse/baseline/bitfield-6.c: New. * gcc.target/arm/cmse/baseline/bitfield-7.c: New. * gcc.target/arm/cmse/baseline/bitfield-8.c: New. * gcc.target/arm/cmse/baseline/bitfield-9.c: New. * gcc.target/arm/cmse/baseline/bitfield-and-union-1.c: New. * gcc.target/arm/cmse/baseline/cmse-11.c: New. * gcc.target/arm/cmse/baseline/cmse-13.c: New. * gcc.target/arm/cmse/baseline/cmse-6.c: New. * gcc/testsuite/gcc.target/arm/cmse/baseline/union-1.c: New. * gcc/testsuite/gcc.target/arm/cmse/baseline/union-2.c: New. * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: New. * gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: New. * gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: New. * gcc.target/arm/cmse/mainline/hard/cmse-13.c: New. * gcc.target/arm/cmse/mainline/hard/cmse-7.c: New. * gcc.target/arm/cmse/mainline/hard/cmse-8.c: New. * gcc.target/arm/cmse/mainline/soft/cmse-13.c: New. * gcc.target/arm/cmse/mainline/soft/cmse-7.c: New. * gcc.target/arm/cmse/mainline/soft/cmse-8.c: New. * gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: New. * gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: New. * gcc.target/arm/cmse/mainline/softfp/cmse-13.c: New. * gcc.target/arm/cmse/mainline/softfp/cmse-7.c: New. * gcc.target/arm/cmse/mainline/softfp/cmse-8.c: New. >>> >>> Updated this patch to correctly clear only the cumulative >>> exception-status (0-4,7) and the condition code bits (28-31) of the FPSCR. >>> >>> >>> >>> This patch extends support for the ARMv8-M Security Extensions >>> 'cmse_nonsecure_call' to use a new library function >>> '__gnu_cmse_nonsecure_call'. This library function is responsible for >>> (without using r0-r3 or d0-d7): >>> 1) saving and clearing all callee-saved registers using the secure stack >>> 2) clearing the LSB of the address passed in r4 and using blxns to >>> 'jump' to it >>> 3) clearing ASPR, including the 'ge bits' if DSP is enabled >>> 4) clearing the cumulative exception-status (0-4, 7) and the condition >>> bits (28-31) of the FPSCR if using non-soft float-abi >>> 5) restoring callee-saved registers. >>> >>> The decisions whether to include DSP 'ge bits' clearing and floating >>> point registers (single/doubl
Re: [Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression
Sorry for the bad news, but while gfortran regtests with regression with the patch I still get a link error with the original test: % gfc pr44265.f90 Undefined symbols for architecture x86_64: "___fruits_MOD_names", referenced from: _MAIN__ in ccyeNqa1.o ld: symbol(s) not found for architecture x86_64 collect2: error: ld returned 1 exit status and its variants. Thanks for working on this issue Dominique > Le 9 nov. 2016 à 13:47, Paul Richard Thomas a > écrit : > > Dear All, > > The title of this PR says what this is all about, except that it > applies uniquely applicable to character function result string > lengths. > > Ian Harvey wrote the first patch for this PR for which many thanks. > However, two issues came up that took some little while to understand; > (i) In comment #1, it was found that calls from sibling procedures > could generate streams of undefined references at the link stage; and > (ii) An entity of the same name as the contained procedure entity in > module scope caused similar problems. > > The relationship with Ian's patch is still obvious. The fundamental > difference is that the parameter arrays are automatically promoted to > module scope using a unique symtree. This fixes both the issues above. > > Dominique, could you please check that the -m32 issue has gone away? > > Bootstrapped and regtested on FC21/x86_64 - OK for trunk? > > Paul > > 2016-11-09 Paul Thomas > >PR fortran/44265 >* gfortran.h : Add fn_result_spec bitfield to gfc_symbol. >* resolve.c (flag_fn_result_spec): New function. >(resolve_fntype): Call it for character result lengths. >* symbol.c (gfc_new_symbol): Set fn_result_spec to zero. >* trans-decl.c (gfc_sym_mangled_identifier): Include the >procedure name in the mangled name for symbols with the >fn_result_spec bit set. >(gfc_get_symbol_decl): Mangle the name of these symbols. >(gfc_create_module_variable): Allow them through the assert. >(gfc_generate_function_code): Remove the assert before the >initialization of sym->tlink because the frontend no longer >uses this field. >* trans-expr.c (gfc_map_intrinsic_function): Add a case to >treat the LEN_TRIM intrinsic. > > 2016-11-09 Paul Thomas > >PR fortran/44265 >* gfortran.dg/char_result_14.f90: New test. >* gfortran.dg/char_result_15.f90: New test. > > > -- > The difference between genius and stupidity is; genius has its limits. > > Albert Einstein >
Re: [PATCH] fix PR68468
On 11/05/2016 06:14 PM, Waldemar Brodkorb wrote: Hi, the following patch fixes PR68468. Patch is used for a while in Buildroot without issues. 2016-11-05 Waldemar Brodkorb PR gcc/68468 * libgcc/unwind-dw2-fde-dip.c: fix build on FDPIC targets. This is ok. Bernd
[PATCH] Add testcase for PR middle-end/77718
Hi! I've noticed this PR is still open, but has been fixed, though apparently Bernd's patch from the PR has been applied as is with the needed tab eaten (instead 7 spaces) and no testcase has been added. Tested with cross-compiler and the r240625 change reverted and as is and running the resulting assembler on ppc64le. Ok for trunk? 2016-11-09 Jakub Jelinek PR target/77718 * builtins.c (expand_builtin_memcmp): Formatting fix. * gcc.c-torture/execute/pr77718.c: New test. --- gcc/ChangeLog.jj2016-11-09 15:22:28.0 +0100 +++ gcc/ChangeLog 2016-11-09 16:29:35.152056326 +0100 @@ -5027,7 +5027,8 @@ 2016-09-29 Bernd Schmidt - * builtins.c (expand_builtin_memcmp): don't swap args unless + PR target/77718 + * builtins.c (expand_builtin_memcmp): Don't swap args unless result is only being compared with zero. 2016-09-29 Marek Polacek --- gcc/builtins.c.jj 2016-10-31 13:28:06.0 +0100 +++ gcc/builtins.c 2016-11-09 16:19:08.886909150 +0100 @@ -3754,7 +3754,7 @@ expand_builtin_memcmp (tree exp, rtx tar { src_str = c_getstr (arg1); if (src_str != NULL) - std::swap (arg1_rtx, arg2_rtx); + std::swap (arg1_rtx, arg2_rtx); } /* If SRC is a string constant and block move would be done --- gcc/testsuite/gcc.c-torture/execute/pr77718.c.jj2016-11-09 16:28:30.414868074 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr77718.c 2016-11-09 16:22:06.0 +0100 @@ -0,0 +1,25 @@ +/* PR middle-end/77718 */ + +char a[64] __attribute__((aligned (8))); + +__attribute__((noinline, noclone)) int +foo (void) +{ + return __builtin_memcmp ("bb", a, 6); +} + +__attribute__((noinline, noclone)) int +bar (void) +{ + return __builtin_memcmp (a, "bb", 6); +} + +int +main () +{ + __builtin_memset (a, 'a', sizeof (a)); + if (((foo () < 0) ^ ('a' > 'b')) + || ((bar () < 0) ^ ('a' < 'b'))) +__builtin_abort (); + return 0; +} Jakub
[PING][PATCH][aarch64] Improve Logical And Immediate Expressions
Ping. Link to original post: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02305.html
Re: [PATCH] Fix folding of memcmp("a", "a", 2) (PR, tree-optimization/78257)
This patch breaks bootstrap on AIX and probably on any 32 bit host. Thanks, David /nasfarm/edelsohn/src/src/gcc/fold-const-call.c: In function 'tree_node* fold_const_call(combined_fn, tree, tree, tree, tree)': /nasfarm/edelsohn/src/src/gcc/fold-const-call.c:1541:36: error: cannot convert 'size_t* {aka long unsigned int*}' to 'long long unsigned int*' for argument '2' to 'const char* c_getstr(tree, long long unsigned int*)' if ((p0 = c_getstr (arg0, &s0)) ^ /nasfarm/edelsohn/src/src/gcc/fold-const-call.c:1542:32: error: cannot convert 'size_t* {aka long unsigned int*}' to 'long long unsigned int*' for argument '2' to 'const char* c_getstr(tree, long long unsigned int*)' && (p1 = c_getstr (arg1, &s1)) ^ make[3]: *** [fold-const-call.o] Error 1
[PATCH, Fortran] PR78259: ICE in gfc_trans_subcomponent_assign, at fortran/trans-expr.c:7330
All, Will commit the below to trunk as an obvious fix for PR78259. (The regression was introduced in r241626 from https://gcc.gnu.org/ml/fortran/2016-10/msg00206.html). The patch clears regtests. --- Fritz Reese From: Fritz O. Reese Date: Wed, 9 Nov 2016 10:59:17 -0500 Subject: [PATCH] Fix ICE in gfc_trans_subcomponent_assign due to NULL components. PR fortran/78259 * gcc/fortran/trans-expr.c (gfc_trans_subcomponent_assign): Guard against NULL values. PR fortran/78259 * gcc/testsuite/gfortran.dg/pr78259.f90: New test. --- diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 7159b17..0352916 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -7321,7 +7321,8 @@ gfc_trans_subcomponent_assign (tree dest, gfc_component * gfc_constructor *c = gfc_constructor_first (expr->value.constructor); /* We mark that the entire union should be initialized with a contrived EXPR_NULL expression at the beginning. */ - if (c->n.component == NULL && c->expr->expr_type == EXPR_NULL) + if (c != NULL && c->n.component == NULL + && c->expr != NULL && c->expr->expr_type == EXPR_NULL) { tmp = build2_loc (input_location, MODIFY_EXPR, void_type_node, dest, build_constructor (TREE_TYPE (dest), NULL)); diff --git a/gcc/testsuite/gfortran.dg/pr78259.f90 b/gcc/testsuite/gfortran.dg/p new file mode 100644 index 000..82f48ea --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr78259.f90 @@ -0,0 +1,22 @@ +! { dg-do "compile" } +! { dg-options "-fdec-structure" } +! +! PR fortran/78259 +! +! ICE in gfc_trans_subcomponent_assign +! + +subroutine sub + structure /s/ +union + map +integer n(2) + end map + map +integer(8) m /2/ + end map +end union + end structure + record /s/ r + r.n(1) = 1 +end
Re: [PATCH] Add testcase for PR middle-end/77718
On November 9, 2016 4:34:27 PM GMT+01:00, Jakub Jelinek wrote: >Hi! > >I've noticed this PR is still open, but has been fixed, though >apparently Bernd's patch from the PR has been applied as is with >the needed tab eaten (instead 7 spaces) and no testcase has been added. > >Tested with cross-compiler and the r240625 change reverted and as is >and running the resulting assembler on ppc64le. > >Ok for trunk? OK. Richard. >2016-11-09 Jakub Jelinek > > PR target/77718 > * builtins.c (expand_builtin_memcmp): Formatting fix. > > * gcc.c-torture/execute/pr77718.c: New test. > >--- gcc/ChangeLog.jj 2016-11-09 15:22:28.0 +0100 >+++ gcc/ChangeLog 2016-11-09 16:29:35.152056326 +0100 >@@ -5027,7 +5027,8 @@ > > 2016-09-29 Bernd Schmidt > >- * builtins.c (expand_builtin_memcmp): don't swap args unless >+ PR target/77718 >+ * builtins.c (expand_builtin_memcmp): Don't swap args unless > result is only being compared with zero. > > 2016-09-29 Marek Polacek >--- gcc/builtins.c.jj 2016-10-31 13:28:06.0 +0100 >+++ gcc/builtins.c 2016-11-09 16:19:08.886909150 +0100 >@@ -3754,7 +3754,7 @@ expand_builtin_memcmp (tree exp, rtx tar > { > src_str = c_getstr (arg1); > if (src_str != NULL) >- std::swap (arg1_rtx, arg2_rtx); >+ std::swap (arg1_rtx, arg2_rtx); > } > > /* If SRC is a string constant and block move would be done >--- gcc/testsuite/gcc.c-torture/execute/pr77718.c.jj 2016-11-09 >16:28:30.414868074 +0100 >+++ gcc/testsuite/gcc.c-torture/execute/pr77718.c 2016-11-09 >16:22:06.0 +0100 >@@ -0,0 +1,25 @@ >+/* PR middle-end/77718 */ >+ >+char a[64] __attribute__((aligned (8))); >+ >+__attribute__((noinline, noclone)) int >+foo (void) >+{ >+ return __builtin_memcmp ("bb", a, 6); >+} >+ >+__attribute__((noinline, noclone)) int >+bar (void) >+{ >+ return __builtin_memcmp (a, "bb", 6); >+} >+ >+int >+main () >+{ >+ __builtin_memset (a, 'a', sizeof (a)); >+ if (((foo () < 0) ^ ('a' > 'b')) >+ || ((bar () < 0) ^ ('a' < 'b'))) >+__builtin_abort (); >+ return 0; >+} > > Jakub
[PATCH] (v2) print-rtl-function.c: add (param) directive to dump
On Wed, 2016-11-09 at 12:59 +0100, Bernd Schmidt wrote: > On 11/08/2016 07:03 PM, David Malcolm wrote: > > int __RTL("rtl-combine") f1 (int n) > > { > > (function "f1" > > (param "n" > > (DECL_RTL > > (reg/v:SI %1 [ n ]) > > ) ;; DECL_RTL > > The ;; DECL_RTL etc. comments seem somewhat redundant and add > clutter. > Please remove those. Done. > Also, why is the closing paren on its own line? That doesn't seem > right. It's because of the m_sawclose when calling print_rtx, which leads to a leading newline. I fixed this by introducing a rtx_writer::finish_directive method to unset m_sawclose. (That field seems misnamed; perhaps it should be renamed to m_force_newline or somesuch? That feels like a separate patch though). > Later (not for this patch) I'd really like to see some logic not to > add > linebreaks before simple expressions, so that we'd have (DECL_RTL > (reg:SI xyz)) on a single line. I've been thinking about a patch that would track indentation, which would only insert newlines when close to some column limit, and would line up the open parentheses for siblings. That way you'd get things like: (DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars) (const_int -4)) [1 i+0 S4 A32])) where the newline is auto-inserted to fit things within e.g. an 80 char column, and where "const_int" gets indented to be underneath the "reg", since they're both operands of the same "plus". Anyway, the following updated version of the patch eliminates the newlines and comments. Updated examples: aarch64 function taking one argument: (function "times_two" (param "i" (DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars) (const_int -4)) [1 i+0 S4 A32])) (DECL_RTL_INCOMING (reg:SI x0 [ i ]))) (insn-chain ;; etc x86_64 function taking three ints: (function "test_1" (param "i" (DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars) (const_int -4)) [1 i+0 S4 A32])) (DECL_RTL_INCOMING (reg:SI di [ i ]))) (param "j" (DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars) (const_int -8)) [1 j+0 S4 A32])) (DECL_RTL_INCOMING (reg:SI si [ j ]))) (param "k" (DECL_RTL (mem/c:SI (plus:DI (reg/f:DI virtual-stack-vars) (const_int -12)) [1 k+0 S4 A32])) (DECL_RTL_INCOMING (reg:SI dx [ k ]))) (insn-chain ;; etc As before, only lightly tested so far. OK for trunk if it passes bootstrap and regrtest? gcc/ChangeLog: * print-rtl-function.c (print_any_param_name): New function. (print_param): New function. (print_rtx_function): Call print_param for each argument. * print-rtl.c (rtx_writer::finish_directive): New function. * print-rtl.h (rtx_writer::finish_directive): New decl. --- gcc/print-rtl-function.c | 36 gcc/print-rtl.c | 9 + gcc/print-rtl.h | 2 ++ 3 files changed, 47 insertions(+) diff --git a/gcc/print-rtl-function.c b/gcc/print-rtl-function.c index b62f1b3..f18491a 100644 --- a/gcc/print-rtl-function.c +++ b/gcc/print-rtl-function.c @@ -127,6 +127,38 @@ can_have_basic_block_p (const rtx_insn *insn) return true; } +/* Subroutine of print_param. Write the name of ARG, if any, to OUTFILE. */ + +static void +print_any_param_name (FILE *outfile, tree arg) +{ + if (DECL_NAME (arg)) +fprintf (outfile, " \"%s\"", IDENTIFIER_POINTER (DECL_NAME (arg))); +} + +/* Print a "(param)" directive for ARG to OUTFILE. */ + +static void +print_param (FILE *outfile, rtx_writer &w, tree arg) +{ + fprintf (outfile, " (param"); + print_any_param_name (outfile, arg); + fprintf (outfile, "\n"); + + /* Print the value of DECL_RTL (without lazy-evaluation). */ + fprintf (outfile, "(DECL_RTL "); + rtx decl_rtl = DECL_WRTL_CHECK (arg)->decl_with_rtl.rtl; + w.print_rtx (decl_rtl); + w.finish_directive (); + + /* Print DECL_INCOMING_RTL. */ + fprintf (outfile, "(DECL_RTL_INCOMING "); + w.print_rtx (DECL_INCOMING_RTL (arg)); + fprintf (outfile, ")"); + + w.finish_directive (); +} + /* Write FN to OUTFILE in a form suitable for parsing, with indentation and comments to make the structure easy for a human to grok. Track the basic blocks of insns in the chain, wrapping those that are within @@ -202,6 +234,10 @@ print_rtx_function (FILE *outfile, function *fn, bool compact) fprintf (outfile, "(function \"%s\"\n", dname); + /* Params. */ + for (tree arg = DECL_ARGUMENTS (fdecl); arg; arg = DECL_CHAIN (arg)) +print_param (outfile, w, arg); + /* The instruction chain. */ fprintf (outfile, " (insn-chain\n"); basic_block curr_bb = NULL; diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c index 30ff8fa..3bbd395 100644 --- a/gcc/print-rtl.c +++ b/gcc/print-rtl.c @@ -921,6 +921,15 @@ rtx_writer::print_rtx (const_rtx in_rtx) m_sawclose = 1; } +/* Emit a closing parenthesis and newline. */ + +void +rtx_writer::finish_directive ()
[PATCH, GCC/ARM] Fix ICE when compiling empty FIQ interrupt handler in ARM mode
Hi, This patch fixes the following ICE when building when compiling an empty FIQ interrupt handler in ARM mode: empty_fiq_handler.c:5:1: error: insn does not satisfy its constraints: } ^ (insn/f 13 12 14 (set (reg/f:SI 13 sp) (plus:SI (reg/f:SI 11 fp) (const_int 4 [0x4]))) irq.c:5 4 {*arm_addsi3} (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:SI 13 sp) (plus:SI (reg/f:SI 11 fp) (const_int 4 [0x4]))) (nil))) The ICE was provoked by missing an alternative to reflect that ARM mode can do an add of general register into sp which is unpredictable in Thumb mode add immediate. ChangeLog entries are as follow: *** gcc/ChangeLog *** 2016-11-04 Thomas Preud'homme * config/arm/arm.md (arm_addsi3): Add alternative for addition of general register with general register or ARM constant into SP register. *** gcc/testsuite/ChangeLog *** 2016-11-04 Thomas Preud'homme * gcc.target/arm/empty_fiq_handler.c: New test. Testing: bootstrapped on ARMv7-A ARM mode & testsuite shows no regression. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 8393f65bcf4c9c3e61b91e5adcd5f59ff7c6ec3f..70cd31f6cb176fe29efc1fbbf692bfc270ef5a1b 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -609,9 +609,9 @@ ;; (plus (reg rN) (reg sp)) into (reg rN). In this case reload will ;; put the duplicated register first, and not try the commutative version. (define_insn_and_split "*arm_addsi3" - [(set (match_operand:SI 0 "s_register_operand" "=rk,l,l ,l ,r ,k ,r,r ,k ,r ,k,k,r ,k ,r") -(plus:SI (match_operand:SI 1 "s_register_operand" "%0 ,l,0 ,l ,rk,k ,r,rk,k ,rk,k,r,rk,k ,rk") - (match_operand:SI 2 "reg_or_int_operand" "rk ,l,Py,Pd,rI,rI,k,Pj,Pj,L ,L,L,PJ,PJ,?n")))] + [(set (match_operand:SI 0 "s_register_operand" "=rk,l,l ,l ,r ,k ,r,k ,r ,k ,r ,k,k,r ,k ,r") + (plus:SI (match_operand:SI 1 "s_register_operand" "%0 ,l,0 ,l ,rk,k ,r,r ,rk,k ,rk,k,r,rk,k ,rk") + (match_operand:SI 2 "reg_or_int_operand" "rk ,l,Py,Pd,rI,rI,k,rI,Pj,Pj,L ,L,L,PJ,PJ,?n")))] "TARGET_32BIT" "@ add%?\\t%0, %0, %2 @@ -621,6 +621,7 @@ add%?\\t%0, %1, %2 add%?\\t%0, %1, %2 add%?\\t%0, %2, %1 + add%?\\t%0, %1, %2 addw%?\\t%0, %1, %2 addw%?\\t%0, %1, %2 sub%?\\t%0, %1, #%n2 @@ -640,10 +641,10 @@ operands[1], 0); DONE; " - [(set_attr "length" "2,4,4,4,4,4,4,4,4,4,4,4,4,4,16") + [(set_attr "length" "2,4,4,4,4,4,4,4,4,4,4,4,4,4,4,16") (set_attr "predicable" "yes") - (set_attr "predicable_short_it" "yes,yes,yes,yes,no,no,no,no,no,no,no,no,no,no,no") - (set_attr "arch" "t2,t2,t2,t2,*,*,*,t2,t2,*,*,a,t2,t2,*") + (set_attr "predicable_short_it" "yes,yes,yes,yes,no,no,no,no,no,no,no,no,no,no,no,no") + (set_attr "arch" "t2,t2,t2,t2,*,*,*,a,t2,t2,*,*,a,t2,t2,*") (set (attr "type") (if_then_else (match_operand 2 "const_int_operand" "") (const_string "alu_imm") (const_string "alu_sreg"))) diff --git a/gcc/testsuite/gcc.target/arm/empty_fiq_handler.c b/gcc/testsuite/gcc.target/arm/empty_fiq_handler.c new file mode 100644 index ..bbcfd0e32f9d0cc60c8a013fd1bb584b21aaad16 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/empty_fiq_handler.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ + +/* Below code used to trigger an ICE due to missing constraints for + sp = fp + cst pattern. */ + +void fiq_handler (void) __attribute__((interrupt ("FIQ"))); + +void +fiq_handler (void) +{ +}
Re: [PATCH] fix PR68468
On Wed, Nov 09, 2016 at 04:08:39PM +0100, Bernd Schmidt wrote: > On 11/05/2016 06:14 PM, Waldemar Brodkorb wrote: > >Hi, > > > >the following patch fixes PR68468. > >Patch is used for a while in Buildroot without issues. > > > >2016-11-05 Waldemar Brodkorb Two spaces before < instead of just one. > > > > PR gcc/68468 PR libgcc/68468 instead. > > * libgcc/unwind-dw2-fde-dip.c: fix build on FDPIC targets. Capital F in Fix. No libgcc/ prefix for files in libgcc/ChangeLog. > This is ok. I think Waldemar does not have SVN write access, are you going to check it in or who will do that? Jakub
Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)
On Sat, Oct 29, 2016 at 05:47:54AM +, Bernd Edlinger wrote: > On 10/28/16 16:05, Bernd Edlinger wrote: > > On 10/27/16 22:23, Joseph Myers wrote: > >> On Thu, 27 Oct 2016, Bernd Edlinger wrote: > >> > >>> Hi, > >>> > >>> by code reading I became aware that libgcc can call count_leading_zeros > >>> in certain cases which can give undefined results. This happens on > >>> signed int128 -> float or double conversions, when the int128 is in > >>> the range > >>> INT64_MAX+1 to UINT64_MAX. > >> > >> I'd expect testcases added to the testsuite that exercise this case at > >> runtime, if not already present. > >> > > > > Yes, thanks. I somehow expected there were already test cases, > > somewhere, but now when you ask that, I begin to doubt as well... > > > > I will try to add an asm("int 3") and see if that gets hit at all. > > > > The breakpoint got hit only once, in the libgo testsuite: runtime/pprof. > > I see there are some int to float conversion tests at > gcc.dg/torture/fp-int-convert*.c, where it is easy to add a > test case that hits the breakpoint too. > > However the test case does not fail before the patch, > it is just slightly undefined behavior, that is not > causing problems (at least for x86_64). > > Find attached a new patch with test case. These new test cases look like they are going to be out of exponent range for _Float16 - so the testcases will fail for a target which tests either of: gcc.dg/torture/fp-int-convert-float16.c gcc.dg/torture/fp-int-convert-float16-timode.c Should they have an M_OK1 check? Thanks, James > > > Boot-strapped on x86_64-pc-linux-gnu. > Is it OK for trunk? > > > Thanks > Bernd. > 2016-10-27 Bernd Edlinger > > PR libgcc/78067 > * libgcc2.c (__floatdisf, __floatdidf): Avoid undefined results from > count_leading_zeros. > > testsuite: > 2016-10-27 Bernd Edlinger > > PR libgcc/78067 > * gcc.dg/torture/fp-int-convert.h: Add more conversion tests. > > > Index: libgcc2.c > === > --- libgcc2.c (revision 241400) > +++ libgcc2.c (working copy) > @@ -1643,6 +1643,11 @@ > hi = -(UWtype) hi; > >UWtype count, shift; > +#if !defined (COUNT_LEADING_ZEROS_0) || COUNT_LEADING_ZEROS_0 != W_TYPE_SIZE > + if (hi == 0) > +count = W_TYPE_SIZE; > + else > +#endif >count_leading_zeros (count, hi); > >/* No leading bits means u == minimum. */ > Index: gcc/testsuite/gcc.dg/torture/fp-int-convert.h > === > --- gcc/testsuite/gcc.dg/torture/fp-int-convert.h (revision 241647) > +++ gcc/testsuite/gcc.dg/torture/fp-int-convert.h (working copy) > @@ -53,6 +53,8 @@ do { > \ >TEST_I_F_VAL (U, F, HVAL1U (P, U), P_OK (P, U)); \ >TEST_I_F_VAL (U, F, HVAL1U (P, U) + 1, P_OK (P, U)); \ >TEST_I_F_VAL (U, F, HVAL1U (P, U) - 1, P_OK (P, U)); \ > + TEST_I_F_VAL (I, F, WVAL0S (I), 1);\ > + TEST_I_F_VAL (I, F, -WVAL0S (I), 1); \ > } while (0) > > #define P_OK(P, T) ((P) >= sizeof(T) * CHAR_BIT) > @@ -74,6 +76,7 @@ do { > \ >? (S)1 \ >: (((S)1 << (sizeof(S) * CHAR_BIT - 2)) \ > + ((S)3 << (sizeof(S) * CHAR_BIT - 2 - P > +#define WVAL0S(S) (S)((S)1 << (sizeof(S) * CHAR_BIT / 2 - 1)) > > #define TEST_I_F_VAL(IT, FT, VAL, PREC_OK) \ > do { \
Re: [PATCH, GCC/ARM] Fix PR77933: stack corruption on ARM when using high registers and lr
I've reworked the patch following comments from Wilco [1] (sorry could not find it in my MUA for some reason). [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00317.html == Context == When saving registers, function thumb1_expand_prologue () aims at minimizing the number of push instructions. One of the optimization it does is to push LR alongside high register(s) (after having moved them to low register(s)) when there is no low register to save. The way this is implemented is to add LR to the pushable_regs mask if it is live just before pushing the registers in that mask. The mask of live pushable registers which is used to detect whether LR needs to be saved is then clear to ensure LR is only saved once. == Problem == However beyond deciding what register to push pushable_regs is used to track what pushable register can be used to move a high register before being pushed, hence the name. That mask is cleared when all high registers have been assigned a low register but the clearing assumes the high registers were assigned to the registers with the biggest number in that mask. This is not the case because LR is not considered when looking for a register in that mask. Furthermore, LR might have been saved in the TARGET_BACKTRACE path above yet the mask of live pushable registers is not cleared in that case. == Solution == This patch changes the loop to iterate over register LR to r0 so as to both fix the stack corruption reported in PR77933 and reuse lr to push some high register when possible. This patch also introduce a new variable lr_needs_saving to record whether LR (still) needs to be saved at a given point in code and sets the variable accordingly throughout the code, thus fixing the second issue. Finally, this patch create a new push_mask variable to distinguish between the mask of registers to push and the mask of live pushable registers. == Note == Other bits could have been improved but have been left out to allow the patch to be backported to stable branch: (1) using argument registers that are not holding an argument (2) using push_mask consistently instead of l_mask (in TARGET_BACKTRACE), mask (low register push) and push_mask (3) the !l_mask case improved in TARGET_BACKTRACE since offset == 0 (4) rename l_mask to a more appropriate name (live_pushable_regs_mask?) ChangeLog entry are as follow: *** gcc/ChangeLog *** 2016-11-08 Thomas Preud'homme PR target/77933 * config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr being live in the function and lr needing to be saved. Distinguish between already saved pushable registers and registers to push. Check for LR being an available pushable register. *** gcc/testsuite/ChangeLog *** 2016-11-08 Thomas Preud'homme PR target/77933 * gcc.target/arm/pr77933-1.c: New test. * gcc.target/arm/pr77933-2.c: Likewise. Testing: no regression on arm-none-eabi GCC cross-compiler targeting Cortex-M0 Is this ok for trunk? Best regards, Thomas On 02/11/16 17:08, Thomas Preudhomme wrote: Hi, When saving registers, function thumb1_expand_prologue () aims at minimizing the number of push instructions. One of the optimization it does is to push lr alongside high register(s) (after having moved them to low register(s)) when there is no low register to save. The way this is implemented is to add lr to the list of registers that can be pushed just before the push happens. This would then push lr and allows it to be used for further push if there was not enough registers to push all high registers to be pushed. However, the logic that decides what register to move high registers to before being pushed only looks at low registers (see for loop initialization). This means not only that lr is not used for pushing high registers but also that lr is not removed from the list of registers to be pushed when it's not used. This extra lr push is not poped in epilogue leading in stack corruption. This patch changes the loop to iterate over register r0 to lr so as to both fix the stack corruption and reuse lr to push some high register when possible. ChangeLog entry are as follow: *** gcc/ChangeLog *** 2016-11-01 Thomas Preud'homme PR target/77933 * config/arm/arm.c (thumb1_expand_prologue): Also check for lr being a pushable register. *** gcc/testsuite/ChangeLog *** 2016-11-01 Thomas Preud'homme PR target/77933 * gcc.target/arm/pr77933.c: New test. Testing: no regression on arm-none-eabi GCC cross-compiler targeting Cortex-M0 Is this ok for trunk? Best regards, Thomas diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index dd8d5e5db8ca50daab648e58df290969aa794862..ddbda3e46dbcabb6c5775f847bc338c37705e122 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -24806,6 +24806,7 @@ thumb1_expand_prologue (void) unsigned long live_regs_mask; unsigned long l_mask;
Re: [PATCH] (v2) print-rtl-function.c: add (param) directive to dump
On 11/09/2016 05:46 PM, David Malcolm wrote: OK for trunk if it passes bootstrap and regrtest? gcc/ChangeLog: * print-rtl-function.c (print_any_param_name): New function. (print_param): New function. (print_rtx_function): Call print_param for each argument. * print-rtl.c (rtx_writer::finish_directive): New function. * print-rtl.h (rtx_writer::finish_directive): New decl. + + rtx decl_rtl = DECL_WRTL_CHECK (arg)->decl_with_rtl.rtl; Isn't this DECL_RTL_IF_SET? If so, please use that macro. Otherwise ok. Bernd
Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)
On Wed, 9 Nov 2016, Richard Biener wrote: > I'll push back c_parser to the header and put inlines I need to export > there as well. > > Joseph, is this (with regard to the inlines) your preference as well? I'm not clear what the proposal is. If some patch is now proposed different from what was previously posted to gcc-patches, please post it. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)
On Wed, 9 Nov 2016, James Greenhalgh wrote: > These new test cases look like they are going to be out of exponent range > for _Float16 - so the testcases will fail for a target which tests either > of: > > gcc.dg/torture/fp-int-convert-float16.c > gcc.dg/torture/fp-int-convert-float16-timode.c > > Should they have an M_OK1 check? Yes. -- Joseph S. Myers jos...@codesourcery.com
Re: [patch,avr] Add new option -mabsdata.
On 11/07/2016 05:54 AM, Georg-Johann Lay wrote: @@ -15261,6 +15262,13 @@ GCC supports the following AVR devices a @include avr-mmcu.texi +@item -mabsdata +@opindex mabsdata + +Assume that all data in static stocage can be accessed by LDS / STS s/stocage/storage/ +inctructions. This option has only an effect on reduced Tiny devices like s/inctructions/instructions/ +ATtiny40. + @item -maccumulate-args @opindex maccumulate-args The documentation changes are OK with those typos fixed. -Sandra
Re: [PATCH] enable -fprintf-return-value by default
On 11/08/2016 08:13 PM, Martin Sebor wrote: The -fprintf-return-value optimization has been disabled since the last time it caused a bootstrap failure on powerpc64le. With the underlying problems fixed GCC has bootstrapped fine on all of powerpc64, powerpc64le and x86_64 and tested with no regressions. I'd like to re-enable the option. The attached patch does that. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 17c5c22..adebeff 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -8301,7 +8301,7 @@ if (snprintf (buf, "%08x", i) >= sizeof buf) The @option{-fprintf-return-value} option relies on other optimizations and yields best results with @option{-O2}. It works in tandem with the @option{-Wformat-length} option. The @option{-fprintf-return-value} -option is disabled by default. +option is enabled by default. @item -fno-peephole @itemx -fno-peephole2 Near the beginning of this chapter, in @node Invoking GCC, it says: Many options have long names starting with @samp{-f} or with @samp{-W}---for example, @option{-fmove-loop-invariants}, @option{-Wformat} and so on. Most of these have both positive and negative forms; the negative form of @option{-ffoo} is @option{-fno-foo}. This manual documents only one of these two forms, whichever one is not the default. So you should be documenting the non-default negative form -fno-printf-return-value instead of the default positive form. The corresponding entry in the list in @node Option Summary needs to be adjusted, too. -Sandra
RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN) with MIPS III.
> From: Moore, Catherine [mailto:catherine_mo...@mentor.com] > Sent: 08 November 2016 20:47 > To: Toma Tabacu; gcc-patches@gcc.gnu.org > Cc: Matthew Fortune > Subject: RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN) > with MIPS III. > > > > > -Original Message- > > From: Toma Tabacu [mailto:toma.tab...@imgtec.com] > > Sent: Monday, November 7, 2016 11:21 AM > > gcc/testsuite/ChangeLog: > > > > 2016-11-07 Toma Tabacu > > > > * gcc.target/mips/mips.exp (mips-dg-options): Upgrade to MIPS IV if > using > > (HAS_MOVN) with MIPS III. > > > > diff --git a/gcc/testsuite/gcc.target/mips/mips.exp > > b/gcc/testsuite/gcc.target/mips/mips.exp > > index 39f44ff..e22d782 100644 > > --- a/gcc/testsuite/gcc.target/mips/mips.exp > > +++ b/gcc/testsuite/gcc.target/mips/mips.exp > > @@ -1129,7 +1129,7 @@ proc mips-dg-options { args } { > > # We need MIPS IV or higher for: > > # > > # > > - } elseif { $isa < 3 > > + } elseif { $isa < 4 > >&& [mips_have_test_option_p options "HAS_MOVN"] } > > { > > mips_make_test_option options "-mips4" > > # We need MIPS III or higher for: > > Hi Toma, > > The patch itself is OK, but the ChangeLog entry line length is greater than > 80. > Yes, sorry. The version below fixes this. > Do you have write access to the repository? Please let me know if you would > like me to commit this for you? > > Thanks, > Catherine No, I don't have write access. I would be grateful if you could commit the patch for me, if you think it would be OK with Matthew. To be clear, I am not in a rush to get this committed. Thanks, Toma Tabacu gcc/testsuite/ChangeLog: 2016-11-09 Toma Tabacu * gcc.target/mips/mips.exp (mips-dg-options): Upgrade to MIPS IV if using (HAS_MOVN) with MIPS III. diff --git a/gcc/testsuite/gcc.target/mips/mips.exp b/gcc/testsuite/gcc.target/mips/mips.exp index 39f44ff..e22d782 100644 --- a/gcc/testsuite/gcc.target/mips/mips.exp +++ b/gcc/testsuite/gcc.target/mips/mips.exp @@ -1129,7 +1129,7 @@ proc mips-dg-options { args } { # We need MIPS IV or higher for: # # - } elseif { $isa < 3 + } elseif { $isa < 4 && [mips_have_test_option_p options "HAS_MOVN"] } { mips_make_test_option options "-mips4" # We need MIPS III or higher for:
Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)
On November 9, 2016 5:45:32 PM GMT+01:00, Joseph Myers wrote: >On Wed, 9 Nov 2016, Richard Biener wrote: > >> I'll push back c_parser to the header and put inlines I need to >export >> there as well. >> >> Joseph, is this (with regard to the inlines) your preference as well? > >I'm not clear what the proposal is. If some patch is now proposed >different from what was previously posted to gcc-patches, please post >it. I prefer the posted patch with c_parser not exported. But Jakub raised concerns about the inlines thus I digged again and found a possible solution to the gengtype issue. Thus my question if you prefer the inlines to to stay as inlines and thus move to the header or if you do not request this change (that will save me work). Thanks, Richard.
Re: [PATCH, LIBGCC] Avoid count_leading_zeros with undefined result (PR 78067)
On 11/09/16 17:52, Joseph Myers wrote: > On Wed, 9 Nov 2016, James Greenhalgh wrote: > >> These new test cases look like they are going to be out of exponent range >> for _Float16 - so the testcases will fail for a target which tests either >> of: >> >> gcc.dg/torture/fp-int-convert-float16.c >> gcc.dg/torture/fp-int-convert-float16-timode.c >> >> Should they have an M_OK1 check? > > Yes. > Yes, but maybe introduce a test if the half-wide value fits? like: #define M_OK2(M, T) ((M) > sizeof(T) * CHAR_BIT / 2 - 1) Bernd.
[Patch, Fortran, committed] PR 60777: [F03] RECURSIVE function rejected in specification expression
Hi all, I just committed a close-to-obvious fix for PR60777 (ok'd by Steve in the PR), see attachment: https://gcc.gnu.org/viewcvs?rev=242009&root=gcc&view=rev Cheers, Janus Index: gcc/fortran/expr.c === --- gcc/fortran/expr.c (Revision 241993) +++ gcc/fortran/expr.c (Arbeitskopie) @@ -2794,12 +2794,12 @@ external_spec_function (gfc_expr *e) return false; } - if (f->attr.recursive) -{ - gfc_error ("Specification function %qs at %L cannot be RECURSIVE", -f->name, &e->where); + /* F08:7.1.11.6. */ + if (f->attr.recursive + && !gfc_notify_std (GFC_STD_F2003, + "Specification function '%s' " + "at %L cannot be RECURSIVE", f->name, &e->where)) return false; -} function_allowed: return restricted_args (e->value.function.actual); ! { dg-do run } ! ! PR 60777: [F03] RECURSIVE function rejected in specification expression ! ! Contributed by Vladimir Fuka module recur implicit none contains pure recursive function f(n) result(answer) integer, intent(in) :: n integer :: answer if (n<2) then answer = 1 else answer = f(n-1)*n end if end function pure function usef(n) integer,intent(in) :: n character(f(n)):: usef usef = repeat('*',f(n)) end function end module program testspecexpr use recur implicit none if (usef(1) /= '*') call abort() if (usef(2) /= '**') call abort() if (usef(3) /= '**') call abort() end
Re: [PATCH] Fix PR78189
On 9 November 2016 at 09:36, Bin.Cheng wrote: > On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener wrote: >> On Mon, 7 Nov 2016, Christophe Lyon wrote: >> >>> Hi Richard, >>> >>> >>> On 7 November 2016 at 09:01, Richard Biener wrote: >>> > >>> > The following fixes an oversight when computing alignment in the >>> > vectorizer. >>> > >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. >>> > >>> > Richard. >>> > >>> > 2016-11-07 Richard Biener >>> > >>> > PR tree-optimization/78189 >>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Fix >>> > alignment computation. >>> > >>> > * g++.dg/torture/pr78189.C: New testcase. >>> > >>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C >>> > === >>> > --- gcc/testsuite/g++.dg/torture/pr78189.C (revision 0) >>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C (working copy) >>> > @@ -0,0 +1,41 @@ >>> > +/* { dg-do run } */ >>> > +/* { dg-additional-options "-ftree-slp-vectorize -fno-vect-cost-model" } >>> > */ >>> > + >>> > +#include >>> > + >>> > +struct A >>> > +{ >>> > + void * a; >>> > + void * b; >>> > +}; >>> > + >>> > +struct alignas(16) B >>> > +{ >>> > + void * pad; >>> > + void * misaligned; >>> > + void * pad2; >>> > + >>> > + A a; >>> > + >>> > + void Null(); >>> > +}; >>> > + >>> > +void B::Null() >>> > +{ >>> > + a.a = nullptr; >>> > + a.b = nullptr; >>> > +} >>> > + >>> > +void __attribute__((noinline,noclone)) >>> > +NullB(void * misalignedPtr) >>> > +{ >>> > + B* b = reinterpret_cast(reinterpret_cast(misalignedPtr) - >>> > offsetof(B, misaligned)); >>> > + b->Null(); >>> > +} >>> > + >>> > +int main() >>> > +{ >>> > + B b; >>> > + NullB(&b.misaligned); >>> > + return 0; >>> > +} >>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c >>> > index 9346cfe..b03cb1e 100644 >>> > --- gcc/tree-vect-data-refs.c >>> > +++ gcc/tree-vect-data-refs.c >>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct >>> > data_reference *dr) >>> >base = ref; >>> >while (handled_component_p (base)) >>> > base = TREE_OPERAND (base, 0); >>> > + unsigned int base_alignment; >>> > + unsigned HOST_WIDE_INT base_bitpos; >>> > + get_object_alignment_1 (base, &base_alignment, &base_bitpos); >>> > + /* As data-ref analysis strips the MEM_REF down to its base operand >>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to >>> > + adjust things to make base_alignment valid as the alignment of >>> > + DR_BASE_ADDRESS. */ >>> >if (TREE_CODE (base) == MEM_REF) >>> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr, >>> > - build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 0)); >>> > - unsigned int base_alignment = get_object_alignment (base); >>> > +{ >>> > + base_bitpos -= mem_ref_offset (base).to_short_addr () * >>> > BITS_PER_UNIT; >>> > + base_bitpos &= (base_alignment - 1); >>> > +} >>> > + if (base_bitpos != 0) >>> > +base_alignment = base_bitpos & -base_bitpos; >>> > + /* Also look at the alignment of the base address DR analysis >>> > + computed. */ >>> > + unsigned int base_addr_alignment = get_pointer_alignment (base_addr); >>> > + if (base_addr_alignment > base_alignment) >>> > +base_alignment = base_addr_alignment; >>> > >>> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) >>> > DR_VECT_AUX (dr)->base_element_aligned = true; >>> >>> Since you committed this patch (r241892), I'm seeing execution failures: >>> gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test >>> gcc.dg/vect/pr40074.c execution test >>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9 >>> --with-fpu=neon-fp16 >>> (using qemu as simulator) >> >> The difference is that we now vectorize the testcase with versioning >> for alignment (but it should never execute the vectorized variant). >> I need arm peoples help to understand what is wrong. > Hi All, > I will look at it. > Hi, This is causing new regressions on armeb: gcc.dg/vect/vect-strided-a-u8-i2-gap.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 2 loops" 1 gcc.dg/vect/vect-strided-a-u8-i2-gap.c scan-tree-dump-times vect "vectorized 2 loops" 1 pr470074 passes now, Thanks, Christophe > Thanks, > bin >> >> At least the testcase shows there is (kind-of) a missed optimization >> that we no longer figure out versioning for alignment is useless. >> I'll look into that. >> >> Richard.
Re: [PATCH 2/4] BRIG (HSAIL) frontend: The FE itself.
Hi Martin, On Fri, Nov 4, 2016 at 4:58 PM, Martin Jambor wrote: > I personally primarily want this for debugging purposes, and we should > try to eventually report errors in a more comprehensible way than > HSAILasm. But more generally, and more importantly, if the input, > whether human readable or not, is incorrect, the compiler should not > produce an "internal error." If that happens, users will file bugs > against gcc when in fact it is the generating tool that is broken. > One you maintain the FE, you personally will not want this :-) Yes, I will add better error messages as soon asissues are reported. > I was wondering why in case of VECTOR_TYPE, instead of > return build1 (VIEW_CONVERT_EXPR, type, expr); > you do not do > return fold (convert_to_vector (type, expr)); > > it seemed to me natural given how you handle the other cases and that > the function internally also builds a V_C_E, after two checks. So I > thought the checks (matching overall size and only converting from an > integer or another vector-type) were the problem and was wondering > which one. Especially the size one is something I believe you should > never violate too. I see. I cannot recall the origins of this, but my guess is that the code is copied from an older Go frontend version. I tested with your suggestion and it seems to work fine. > copy_move_inst_handler operator() starts by type-casting whatever it > gets to BrigInstSourceType* and immediately dereferencing it, loading > field sourceType, even though, in case of an LDA, what it is actually > looking at is not a BrigInstSourceType but a BrigInstAddr, which is an > entirely different structure that does not have that field. So you > read the 8-bit segment and the first reserved field and happily pass > that to gccbrig_tree_type_for_hsa_type, hoping it does not explode on > random data. That is wrong. > > Later on in the operator() you actually figure out that you are > looking at BrigInstSourceType but that is too late. Ah, I see the issue now. It worked as with LDA the input type was forced and handled explicitly. I now made the code more explicit regarding this. > Indeed it does, thanks. However, I think that at the point when you > do: > > if (offs > 0) > const_offset = build_int_cst (size_type_node, offs); > > const_offset can already be non-NULL you may lose whatever value it > had before. You might want to keep the constant offset as integer > until the very moment when you may want to use it. Great catch! There indeed was a major memory corruption with group and private variable which was not caught by the PRM conformance suite I use for testing. Should be now fixed. > (Also, single statements should not have braces around them ;-) ...also fixed these, once again ;) > I see. If you think that MULT_HIGHPART_EXPR is broken, it may be > worth filing a bug report (probably during next stage1). It can be considered broken or not just fully implemented for all targets and various vector types (especially those not directly supported by the target). Yes, I rather look closer and report this after the patch has been merged to be able to provide BRIG test cases that work with upstream code base. > This week I have checked out the updated tree from github and looked > at only a few changed functions there. Hopefully the steering > committee will decide soon and we'll get attention of a global > reviewer during the next few weeks too. > > Thanks for addressing so many of my comments, Thanks again, an updated FE patch attached. BR, Pekka 002-brig-fe-gcc.patch.gz Description: GNU Zip compressed data
[PATCH, i386]: Fix PR78626, wrong code with -fschedule-insns
Hello! We need earlyclobber on output operand of doubleword shift insns, since we have to prevent (partial) output matching %ecx as count argument. 2016-11-09 Uros Bizjak PR target/78262 * config/i386/i386.md (*3_doubleword): Mark operand 0 as earlyclobber. testsuite/ChangeLog: 2016-11-09 Uros Bizjak PR target/78262 * gcc.target/i386/pr78262.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline, will be backported to release branches. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 242004) +++ config/i386/i386.md (working copy) @@ -10339,7 +10339,7 @@ "operands[2] = gen_lowpart (QImode, operands[2]);") (define_insn_and_split "*3_doubleword" - [(set (match_operand:DWI 0 "register_operand" "=r") + [(set (match_operand:DWI 0 "register_operand" "=&r") (any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0") (match_operand:QI 2 "nonmemory_operand" "c"))) (clobber (reg:CC FLAGS_REG))] Index: testsuite/gcc.target/i386/pr78262.c === --- testsuite/gcc.target/i386/pr78262.c (nonexistent) +++ testsuite/gcc.target/i386/pr78262.c (working copy) @@ -0,0 +1,32 @@ +/* { dg-do run } */ +/* { dg-require-effective-target int128 } */ +/* { dg-options "-O -fschedule-insns" } */ + +typedef unsigned char u8; +typedef unsigned __int128 u128; + +static u128 u128_0; +static u128 *p128; + +u128 __attribute__ ((noinline, noclone)) +foo(u8 u8_0) +{ + p128 = &u128_0; + u128_0 = u8_0; + u128_0 = u128_0 << 127 | u128_0 >> 1; + u128_0 >>= (u8)u128_0; + return 2 + u128_0; +} + +int +main() +{ + u128 x = foo(5); + if (p128 != &u128_0) +__builtin_abort(); + if (u128_0 != ((u128)2 << 124)) +__builtin_abort(); + if (x != ((u128)2 << 124) + 2) +__builtin_abort(); + return 0; +}
C++ PATCH for C++17 auto non-type template parameters
This was pretty straightforward; auto template parameters look more or less the same as any other parameter with dependent type, so substitution works without any changes, we just need to do auto deduction in a couple of places. The most involved bit was handling deduction of a type parameter from the type of an array bound, which only happens if deduction otherwise doesn't find a binding. At first I tried to work this into unify et al, but eventually decided to do it in a (much simpler) separate function. Tested x86_64-pc-linux-gnu, applying to trunk. commit 04ade0b5013698fd1e458ef1425f7afd023feaf0 Author: Jason Merrill Date: Mon Nov 7 16:45:06 2016 -0800 Implement P0127R2, Declaring non-type parameters with auto. gcc/cp/ * cp-tree.h (enum auto_deduction_context): Add adc_unify. * decl.c (grokdeclarator): Allow 'auto' in C++17 template non-type parameter types. * pt.c (do_auto_deduction): Add outer_targs parameter. (convert_template_argument): Call do_auto_deduction. If adc_unify, don't give up on dependent init. (unify): Likewise. In C++17, walk into the type of a TEMPLATE_PARM_INDEX. (for_each_template_parm): Add any_fn parameter. (struct pair_fn_data): Likewise. (for_each_template_parm_r): Call it for any tree. In C++17, walk into the type of a TEMPLATE_PARM_INDEX. (zero_r, array_deduction_r, try_array_deduction): New. (type_unification_real): Call try_array_deduction. (get_partial_spec_bindings): Likewise. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins): Define __cpp_template_auto. diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index 55dbf44..70eade1 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -942,6 +942,7 @@ c_cpp_builtins (cpp_reader *pfile) cpp_define (pfile, "__cpp_aggregate_bases=201603"); cpp_define (pfile, "__cpp_deduction_guides=201606"); cpp_define (pfile, "__cpp_noexcept_function_type=201510"); + cpp_define (pfile, "__cpp_template_auto=201606"); } if (flag_concepts) cpp_define (pfile, "__cpp_concepts=201507"); diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 20b52ad..9b5b5bc 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5163,6 +5163,7 @@ enum auto_deduction_context adc_unspecified, /* Not given */ adc_variable_type, /* Variable initializer deduction */ adc_return_type, /* Return type deduction */ + adc_unify, /* Template argument deduction */ adc_requirement/* Argument dedution constraint */ }; @@ -6088,7 +6089,8 @@ extern tree make_template_placeholder (tree); extern tree do_auto_deduction (tree, tree, tree); extern tree do_auto_deduction (tree, tree, tree, tsubst_flags_t, - auto_deduction_context); + auto_deduction_context, +tree = NULL_TREE); extern tree type_uses_auto (tree); extern tree type_uses_auto_or_concept (tree); extern void append_type_to_template_for_access_check (tree, tree, tree, diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index c0321f9..bd37faa 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -11135,7 +11135,8 @@ grokdeclarator (const cp_declarator *declarator, if (ctype || in_namespace) error ("cannot use %<::%> in parameter declaration"); - if (type_uses_auto (type)) + if (type_uses_auto (type) + && !(cxx_dialect >= cxx1z && template_parm_flag)) { if (cxx_dialect >= cxx14) error ("% parameter not permitted in this context"); diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 3df71dd..64e566e 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -161,7 +161,7 @@ static tree convert_nontype_argument (tree, tree, tsubst_flags_t); static tree convert_template_argument (tree, tree, tree, tsubst_flags_t, int, tree); static tree for_each_template_parm (tree, tree_fn_t, void*, - hash_set *, bool); + hash_set *, bool, tree_fn_t = NULL); static tree expand_template_argument_pack (tree); static tree build_template_parm_index (int, int, int, tree, tree); static bool inline_needs_template_parms (tree, bool); @@ -7299,6 +7299,13 @@ convert_template_argument (tree parm, { tree t = tsubst (TREE_TYPE (parm), args, complain, in_decl); + if (tree a = type_uses_auto (t)) + { + t = do_auto_deduction (t, arg, a, complain, adc_unspecified); + if (t == error_mark_node) + return error_mark_node; + } + if (invalid_
C++ PATCH for some C++17 class deduction issues
Discussion at the meeting led me to notice that class template deduction wasn't working with template template parameters. This patch also improves a few diagnostic issues. Tested x86_64-pc-linux-gnu, applying to trunk. commit 66d2621abd8ed6bb47d0fda747a572e00aece25e Author: Jason Merrill Date: Tue Nov 8 10:14:24 2016 -0800 Fix C++17 template placeholder for template template parm. * parser.c (cp_parser_simple_type_specifier): Allow placeholder for template template parameter. (cp_parser_type_id_1): Improve diagnostic. * decl.c (grokdeclarator): Handle class deduction diagnostics here. * pt.c (splice_late_return_type): Not here. (tsubst) [TEMPLATE_TYPE_PARM]: Substitute into placeholder template. (do_class_deduction): Handle non-class templates. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index bd37faa..4b18d4e 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -9490,6 +9490,11 @@ grokdeclarator (const cp_declarator *declarator, if (initialized > 1) funcdef_flag = true; + location_t typespec_loc = smallest_type_quals_location (type_quals, + declspecs->locations); + if (typespec_loc == UNKNOWN_LOCATION) +typespec_loc = declspecs->locations[ds_type_spec]; + /* Look inside a declarator for the name being declared and get it as a string, for an error message. */ for (id_declarator = declarator; @@ -10011,6 +10016,16 @@ grokdeclarator (const cp_declarator *declarator, /* We might have ignored or rejected some of the qualifiers. */ type_quals = cp_type_quals (type); + if (cxx_dialect >= cxx1z && type && is_auto (type) + && innermost_code != cdk_function + && id_declarator && declarator != id_declarator) +if (tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (type)) +{ + error_at (typespec_loc, "template placeholder type %qT must be followed " + "by a simple declarator-id", type); + inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here", tmpl); +} + staticp = 0; inlinep = decl_spec_seq_has_spec_p (declspecs, ds_inline); virtualp = decl_spec_seq_has_spec_p (declspecs, ds_virtual); @@ -10247,12 +10262,7 @@ grokdeclarator (const cp_declarator *declarator, { if (SCALAR_TYPE_P (type) || VOID_TYPE_P (type)) { - location_t loc; - loc = smallest_type_quals_location (type_quals, - declspecs->locations); - if (loc == UNKNOWN_LOCATION) - loc = declspecs->locations[ds_type_spec]; - warning_at (loc, OPT_Wignored_qualifiers, "type " + warning_at (typespec_loc, OPT_Wignored_qualifiers, "type " "qualifiers ignored on function return type"); } /* We now know that the TYPE_QUALS don't apply to the @@ -10301,11 +10311,12 @@ grokdeclarator (const cp_declarator *declarator, funcdecl_p = inner_declarator && inner_declarator->kind == cdk_id; /* Handle a late-specified return type. */ + tree late_return_type = declarator->u.function.late_return_type; if (funcdecl_p) { - if (type_uses_auto (type)) + if (tree auto_node = type_uses_auto (type)) { - if (!declarator->u.function.late_return_type) + if (!late_return_type) { if (current_class_type && LAMBDA_TYPE_P (current_class_type)) @@ -10333,8 +10344,32 @@ grokdeclarator (const cp_declarator *declarator, name, type); return error_mark_node; } + if (tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node)) + { + if (!late_return_type) + { + if (dguide_name_p (unqualified_id)) + error_at (typespec_loc, "deduction guide for " + "%qT must have trailing return type", + TREE_TYPE (tmpl)); + else + error_at (typespec_loc, "deduced class type %qT " + "in function return type", type); + inform (DECL_SOURCE_LOCATION (tmpl), + "%qD declared here", tmpl); + } + else if (CLASS_TYPE_P (late_return_type) +&& CLASSTYPE_TEMPLATE_INFO (late_return_type) +&& (CLASSTYPE_TI_TEMPLATE (late_return_type) +== tmpl))
RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN) with MIPS III.
> -Original Message- > From: Toma Tabacu [mailto:toma.tab...@imgtec.com] > Sent: Wednesday, November 9, 2016 12:19 PM > Subject: RE: [PATCH,testsuite] MIPS: Upgrade to MIPS IV if using (HAS_MOVN) > with MIPS III. > > No, I don't have write access. > I would be grateful if you could commit the patch for me, if you think it > would be OK with Matthew. > To be clear, I am not in a rush to get this committed. You're all set now. I committed this: revision 242021. Catherine > gcc/testsuite/ChangeLog: > > 2016-11-09 Toma Tabacu > > * gcc.target/mips/mips.exp (mips-dg-options): Upgrade to MIPS IV if > using (HAS_MOVN) with MIPS III. >
Re: [PATCH][1/2] GIMPLE Frontend, C FE parts (and GIMPLE parser)
On Mon, Nov 7, 2016 at 2:24 AM, Richard Biener wrote: > The issue with moving is that I failed to export the definition of > c_parser in c-parser.h due to gengtype putting vec > handlers into gtype-c.h but not gtype-objc.h and thus objc bootstrap > fails :/ > > I believe (well, I hope) that code generation for the C parser > should be mostly unaffected (inlining is still done as determined > useful) and the performance of the GIMPLE parser shouldn't be > too important. > > If anybody feels like digging into the gengtype issue, I gave up > after trying for half a day to trick it to do what I want > (like for example also putting it in gtype-objc.h). I have a fix for that issue waiting for review: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02265.html Jason
[Patch, Fortran, committed] PR 46459: ICE (segfault): Invalid read in compare_actual_formal [error recovery]
Hi all, I have committed yet another obvious ice-on-invalid fix: https://gcc.gnu.org/viewcvs?rev=242020&root=gcc&view=rev Cheers, Janus Index: gcc/fortran/interface.c === --- gcc/fortran/interface.c (Revision 241993) +++ gcc/fortran/interface.c (Arbeitskopie) @@ -3190,6 +3190,7 @@ compare_actual_formal (gfc_actual_arglist **ap, gf shape array, if the dummy argument has the VOLATILE attribute. */ if (f->sym->attr.volatile_ + && a->expr->expr_type == EXPR_VARIABLE && a->expr->symtree->n.sym->as && a->expr->symtree->n.sym->as->type == AS_ASSUMED_SHAPE && !(f->sym->as && f->sym->as->type == AS_ASSUMED_SHAPE)) @@ -3219,6 +3220,7 @@ compare_actual_formal (gfc_actual_arglist **ap, gf dummy argument has the VOLATILE attribute. */ if (f->sym->attr.volatile_ + && a->expr->expr_type == EXPR_VARIABLE && a->expr->symtree->n.sym->attr.pointer && a->expr->symtree->n.sym->as && !(f->sym->as
[PATCH], PowerPC ISA 3.0, allow QImode/HImode to go into vector registers
The PowerPC ISA 3.0 (power9) has new instructions that make it feasible to allow QImode and HImode to be allocated to vector registers. The new instructions are: * load byte with zero extend * load half word with zero extend * store byte * store half word * extract byte from vector and zero extend * extract half word from vector and zero extend * sign extend byte to word/double word * sign extend half word to word/double word I have bootstrapped a previous version of the changes on a little endian Power8 system, and I'm now repeating the bootstrap on both a big endian Power8 and a little endian Power8. Assuming there are no regressions with the patches, can I check these patches into the trunk? I have built the spec 2006 CPU benchmark suite with these changes, and the power8 (ISA 2.07) code generation does not change. I have also built the spec 2006 CPU benchmark for power9. The following 15 (out of 30) benchmarks had code changes: * perlbench (char <-> floating point) * gcc (one extra ld/std) * gamess(int <-> floating point) * gromacs (one fmr instead of li/mtvsrd) * cactusADM (char/int <-> floating point) * namd (floating point -> int) * gobmk (floating point -> int) * dealII(int/long in vector regs. vs. gprs) * povray(char/int <-> floating point) * calculix (int -> zero extend to long) * hmmer (floating point -> int) * h264ref (zero extend short) * tonto (floating point -> int) * omnetpp (floating point -> int) * wrf (floating point -> unsigned/int) [gcc] 2016-11-09 Michael Meissner * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): If ISA 3.0, enable HImode and QImode to go in vector registers by default if the -mvsx-small-integer option is enabled. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_preferred_reload_class): Don't force integer constants to be loaded into vector registers that we can easily make into memory (or being created in the GPRs and moved over with direct move). * config/rs6000/vsx.md (UNSPEC_P9_MEMORY): Delete, no longer used. (vsx_extract_): Rework V4SImode, V8HImode, and V16QImode vector extraction on ISA 3.0 when the scalar integer can be allocated in vector registers. Generate the VEC_SELECT directy, and don't use UNSPEC's to avoid having the scalar type in a vector register. Make the expander target registers, and let the combiner fold in results storing to memory, if the machine supports stores. (vsx_extract__di): Likewise. (vsx_extract__p9): Likewise. (vsx_extract__di_p9): Likewise. (vsx_extract__store_p9): Likewise. (vsx_extract_si): Likewise. (vsx_extract__p8): Likewise. (p9_lxsizx): Delete, no longer used. (p9_stxsix): Likewise. * config/rs6000/rs6000.md (INT_ISA3): New mode iterator for integers in vector registers for ISA 3.0. (QHI): Update comment. (zero_extendqi2): Add support for ISA 3.0 scalar load or vector extract instructions in sign/zero extend. (zero_extendhi): Likewise. (extendqi): Likewise. (extendhi2): Likewise. (HImode splitter for load/sign extend in vector register): Likewise. (float2): Eliminate old method of optimizing floating point conversions to/from small data types and rewrite it to support QImode/HImode being allowed in vector registers on ISA 3.0. (float2_internal): Likewise. (floatuns2): Likewise. (floatuns2_internal): Likewise. (fix_trunc2): Likewise. (fix_trunc2_internal): Likewise. (fixuns_trunc2): Likewise. (fixuns_trunc2_internal): Likewise. VSPLITISW on ISA 2.07. (movhi_internal): Combine movhi_internal and movqi_internal into one mov_internal with an iterator. Add support for QImode and HImode being allowed in vector registers. Make large number of attributes and constraints easier to read. (movqi_internal): Likewise. (mov_internal): Likewise. (movdi_internal64): Fix constraint to allow loading -16..15 with VSPLITISW on ISA 2.07. (integer XXSPLTIB splitter): Add support for QI, HI, and SImode as well as DImode. [gcc/testsuite] 2016-11-09 Michael Meissner * gcc.target/powerpc/vsx-qimode.c: New test for QImode, HImode being allowed in vector registers. * gcc.target/powerpc/vsx-qimode2.c: Likewise. * gcc.target/powerpc/vsx-qimode3.c: Likewise. * gcc.target/powerpc/vsx-himode.c: Likewise. * gcc.target/powerpc/vsx-himode2.c: Likewise. * gcc.target/powe
Re: [Patch, fortran] PR44265 - Link error with reference to parameter array in specification expression
> Le 9 nov. 2016 à 20:09, Paul Richard Thomas a > écrit : > > Dear Dominique, > > I am deeply embarrassed. This is the consequence of an additional > condition added at the last minute. No reason to be embarrassed;-) > The attached removes it and makes sure that the original bug is tested > in char_result_14.f90. The ChangeLogs are the same. > > OK for trunk? IMO yes > > Paul > I have a last glitch (which can be deferred if needed): FUNCTION Get(i) RESULT(s) CHARACTER(*), PARAMETER :: names(3) = [ & 'Apple ', & 'Orange ', & 'Mango ' ]; INTEGER, INTENT(IN) :: i CHARACTER(LEN_TRIM(names(i))) :: s ! s = names(i) print *, len(s) END FUNCTION Get PROGRAM WheresThatbLinkingConstantGone IMPLICIT NONE interface FUNCTION Get(i) RESULT(s) CHARACTER(*), PARAMETER :: names(3) = [ & 'Apple ', & 'Orange ', & 'Mango ' ]; INTEGER, INTENT(IN) :: i CHARACTER(LEN_TRIM(names(i))) :: s END FUNCTION Get end interface integer :: i i = len(Get(1)) print *, i END PROGRAM WheresThatbLinkingConstantGone does not link. Dominique
Re: [PATCH], PowerPC ISA 3.0, allow QImode/HImode to go into vector registers
Of course it would help, if I actually attached the patches: -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 241924) +++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy) @@ -325,6 +325,9 @@ (define_mode_iterator INT [QI HI SI DI T ; Any supported integer mode that fits in one register. (define_mode_iterator INT1 [QI HI SI (DI "TARGET_POWERPC64")]) +; Integer modes supported in VSX registers with ISA 3.0 instructions +(define_mode_iterator INT_ISA3 [QI HI SI DI]) + ; Everything we can extend QImode to. (define_mode_iterator EXTQI [SI (DI "TARGET_POWERPC64")]) @@ -334,7 +337,7 @@ (define_mode_iterator EXTHI [SI (DI "TAR ; Everything we can extend SImode to. (define_mode_iterator EXTSI [(DI "TARGET_POWERPC64")]) -; QImode or HImode for small atomic ops +; QImode or HImode for small integer moves and small atomic ops (define_mode_iterator QHI [QI HI]) ; QImode, HImode, SImode for fused ops only for GPR loads @@ -735,13 +738,15 @@ (define_code_attr SMINMAX [(smin "SM ;; complex forms. Basic data transfer is done later. (define_insn "zero_extendqi2" - [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,r") - (zero_extend:EXTQI (match_operand:QI 1 "reg_or_mem_operand" "m,r")))] + [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,r,?*wJwK,?*wK") + (zero_extend:EXTQI (match_operand:QI 1 "reg_or_mem_operand" "m,r,Z,*wK")))] "" "@ lbz%U1%X1 %0,%1 - rlwinm %0,%1,0,0xff" - [(set_attr "type" "load,shift")]) + rlwinm %0,%1,0,0xff + lxsibzx %x0,%y1 + vextractub %0,%1,7" + [(set_attr "type" "load,shift,fpload,vecperm")]) (define_insn_and_split "*zero_extendqi2_dot" [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y") @@ -786,13 +791,15 @@ (define_insn_and_split "*zero_extendqi2" - [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,r") - (zero_extend:EXTHI (match_operand:HI 1 "reg_or_mem_operand" "m,r")))] + [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,r,?*wJwK,?*wK") + (zero_extend:EXTHI (match_operand:HI 1 "reg_or_mem_operand" "m,r,Z,wK")))] "" "@ lhz%U1%X1 %0,%1 - rlwinm %0,%1,0,0x" - [(set_attr "type" "load,shift")]) + rlwinm %0,%1,0,0x + lxsihzx %x0,%y1 + vextractuh %0,%1,6" + [(set_attr "type" "load,shift,fpload,vecperm")]) (define_insn_and_split "*zero_extendhi2_dot" [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y") @@ -893,11 +900,13 @@ (define_insn_and_split "*zero_extendsi2" - [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r") - (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r")))] + [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,?*wK") + (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r,?*wK")))] "" - "extsb %0,%1" - [(set_attr "type" "exts")]) + "@ + extsb %0,%1 + vextsb2d %0,%1" + [(set_attr "type" "exts,vecperm")]) (define_insn_and_split "*extendqi2_dot" [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y") @@ -948,14 +957,30 @@ (define_expand "extendhi2" "") (define_insn "*extendhi2" - [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,r") - (sign_extend:EXTHI (match_operand:HI 1 "reg_or_mem_operand" "m,r")))] + [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,r,?*wK,?*wK") + (sign_extend:EXTHI (match_operand:HI 1 "reg_or_mem_operand" "m,r,Z,wK")))] "rs6000_gen_cell_microcode" "@ lha%U1%X1 %0,%1 - extsh %0,%1" - [(set_attr "type" "load,exts") - (set_attr "sign_extend" "yes")]) + extsh %0,%1 + # + vextsh2d %0,%1" + [(set_attr "type" "load,exts,fpload,vecperm") + (set_attr "sign_extend" "yes") + (set_attr "length" "4,4,8,4")]) + +(define_split + [(set (match_operand:EXTHI 0 "altivec_register_operand" "") + (sign_extend:EXTHI +(match_operand:HI 1 "indexed_or_indirect_operand" "")))] + "TARGET_P9_VECTOR && reload_completed" + [(set (match_dup 2) + (match_dup 1)) + (set (match_dup 0) + (sign_extend:EXTHI (match_dup 2)))] +{ + operands[2] = gen_rtx_REG (HImode, REGNO (operands[1])); +}) (define_insn "*extendhi2_noload" [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") @@ -5299,30 +5324,33 @@ (define_insn_and_split "*floatunssidf2_i (set_attr "type" "fp")]) ;; ISA 3.0 adds instructions lxsi[bh]zx to directly load QImode and HImode to -;; vector registers. At the moment, QI/HImode are not allowed in floating -;; point or vector registers, so we use UNSPEC's to use the load byte and -;; half-word instructions. +;; vector registers. These insns favor doing the sign/zero extension in +;; the vector registers, rather then loading up a GPR, doing a sign/zero +;; extension and then a direct move. (define_expand
Re: [PATCH] (v2) print-rtl-function.c: add (param) directive to dump
On Wed, 2016-11-09 at 17:46 +0100, Bernd Schmidt wrote: > On 11/09/2016 05:46 PM, David Malcolm wrote: > > > OK for trunk if it passes bootstrap and regrtest? > > > > gcc/ChangeLog: > > * print-rtl-function.c (print_any_param_name): New function. > > (print_param): New function. > > (print_rtx_function): Call print_param for each argument. > > * print-rtl.c (rtx_writer::finish_directive): New function. > > * print-rtl.h (rtx_writer::finish_directive): New decl. > > + > > + rtx decl_rtl = DECL_WRTL_CHECK (arg)->decl_with_rtl.rtl; > > Isn't this DECL_RTL_IF_SET? If so, please use that macro. Otherwise > ok. Yes, it is; thanks. Using it required including varasm.h. For reference, here's what I committed, as r242023 (having verified bootstrap and regrtest).Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 242022) +++ gcc/ChangeLog (revision 242023) @@ -1,3 +1,12 @@ +2016-11-09 David Malcolm + + * print-rtl-function.c: Include varasm.h. + (print_any_param_name): New function. + (print_param): New function. + (print_rtx_function): Call print_param for each argument. + * print-rtl.c (rtx_writer::finish_directive): New function. + * print-rtl.h (rtx_writer::finish_directive): New decl. + 2016-11-09 Uros Bizjak PR target/78262 Index: gcc/print-rtl.c === --- gcc/print-rtl.c (revision 242022) +++ gcc/print-rtl.c (revision 242023) @@ -804,6 +804,15 @@ m_sawclose = 1; } +/* Emit a closing parenthesis and newline. */ + +void +rtx_writer::finish_directive () +{ + fprintf (m_outfile, ")\n"); + m_sawclose = 0; +} + /* Print an rtx on the current line of FILE. Initially indent IND characters. */ Index: gcc/print-rtl.h === --- gcc/print-rtl.h (revision 242022) +++ gcc/print-rtl.h (revision 242023) @@ -31,6 +31,8 @@ void print_rtl (const_rtx rtx_first); int print_rtl_single_with_indent (const_rtx x, int ind); + void finish_directive (); + private: void print_rtx_operand_code_0 (const_rtx in_rtx, int idx); void print_rtx_operand_code_e (const_rtx in_rtx, int idx); Index: gcc/print-rtl-function.c === --- gcc/print-rtl-function.c (revision 242022) +++ gcc/print-rtl-function.c (revision 242023) @@ -33,6 +33,7 @@ #include "langhooks.h" #include "memmodel.h" #include "emit-rtl.h" +#include "varasm.h" /* Print an "(edge-from)" or "(edge-to)" directive describing E to OUTFILE. */ @@ -127,6 +128,37 @@ return true; } +/* Subroutine of print_param. Write the name of ARG, if any, to OUTFILE. */ + +static void +print_any_param_name (FILE *outfile, tree arg) +{ + if (DECL_NAME (arg)) +fprintf (outfile, " \"%s\"", IDENTIFIER_POINTER (DECL_NAME (arg))); +} + +/* Print a "(param)" directive for ARG to OUTFILE. */ + +static void +print_param (FILE *outfile, rtx_writer &w, tree arg) +{ + fprintf (outfile, " (param"); + print_any_param_name (outfile, arg); + fprintf (outfile, "\n"); + + /* Print the value of DECL_RTL (without lazy-evaluation). */ + fprintf (outfile, "(DECL_RTL "); + w.print_rtx (DECL_RTL_IF_SET (arg)); + w.finish_directive (); + + /* Print DECL_INCOMING_RTL. */ + fprintf (outfile, "(DECL_RTL_INCOMING "); + w.print_rtx (DECL_INCOMING_RTL (arg)); + fprintf (outfile, ")"); + + w.finish_directive (); +} + /* Write FN to OUTFILE in a form suitable for parsing, with indentation and comments to make the structure easy for a human to grok. Track the basic blocks of insns in the chain, wrapping those that are within @@ -197,6 +229,10 @@ fprintf (outfile, "(function \"%s\"\n", dname); + /* Params. */ + for (tree arg = DECL_ARGUMENTS (fdecl); arg; arg = DECL_CHAIN (arg)) +print_param (outfile, w, arg); + /* The instruction chain. */ fprintf (outfile, " (insn-chain\n"); basic_block curr_bb = NULL;
[PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const
match.pd transforms (A&C)|(B&~C) to ((A^B)&C)^B, which is fewer operations if C is not const (and it is not on simple tests at least, this transform is done very early already). Various processors have "insert" instructions that can do this, but combine cannot build those from the xor-and-xor, especially it has no chance at all to do that if A or B or multiple instructions as well (on PowerPC, the rl[ws]imi instructions that can do this with a rotate, or a simple shift with appropriate C; other ISAs have similar insns). This patch makes RTL simplify transform (xor (and (xor A B) C) B) back to (ior (and A C) (and B ~C)) for constant C (and similar with A instead of B for that last term). Tested on powerpc64-linux {-m32,-m64}; no regressions. Is this okay for trunk? Segher 2016-11-09 Segher Boessenkool * simplify-rtx.c (simplify_binary_operation_1): Simplify (xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and (xor (and (xor A B) C) B) to (ior (and A ~C) (and B C)) if C is a const_int. --- gcc/simplify-rtx.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 5c3dea1..62d60f3 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -2886,6 +2886,37 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, } } +/* If we have (xor (and (xor A B) C) A) with C a constant we can instead + do (ior (and A ~C) (and B C)) which is a machine instruction on some + machines, and also has shorter instruction path length. */ + if (GET_CODE (op0) == AND + && GET_CODE (XEXP (op0, 0)) == XOR + && CONST_INT_P (XEXP (op0, 1)) + && rtx_equal_p (XEXP (XEXP (op0, 0), 0), trueop1)) + { + rtx a = trueop1; + rtx b = XEXP (XEXP (op0, 0), 1); + rtx c = XEXP (op0, 1); + rtx nc = simplify_gen_unary (NOT, mode, c, mode); + rtx a_nc = simplify_gen_binary (AND, mode, a, nc); + rtx bc = simplify_gen_binary (AND, mode, b, c); + return simplify_gen_binary (IOR, mode, a_nc, bc); + } +/* Similarly, (xor (and (xor A B) C) B) as (ior (and A C) (and B ~C)) */ + else if (GET_CODE (op0) == AND + && GET_CODE (XEXP (op0, 0)) == XOR + && CONST_INT_P (XEXP (op0, 1)) + && rtx_equal_p (XEXP (XEXP (op0, 0), 1), trueop1)) + { + rtx a = XEXP (XEXP (op0, 0), 0); + rtx b = trueop1; + rtx c = XEXP (op0, 1); + rtx nc = simplify_gen_unary (NOT, mode, c, mode); + rtx b_nc = simplify_gen_binary (AND, mode, b, nc); + rtx ac = simplify_gen_binary (AND, mode, a, c); + return simplify_gen_binary (IOR, mode, ac, b_nc); + } + /* (xor (comparison foo bar) (const_int 1)) can become the reversed comparison if STORE_FLAG_VALUE is 1. */ if (STORE_FLAG_VALUE == 1 -- 1.9.3
[PATCH] libstdc++: Allow using without lock free atomic int
Compiling programs using std::future for old arm processors fails. The problem is caused by preprocessor check for atomic lock free int. Future can be changed to work correctly without lock free atomics with minor changes to exception_ptr implementation. Without lock free atomics there is question if deadlock can happen. But atomic operations can't call outside code preventing any ABBA or recursive mutex acquiring deadlocks. Deadlock could happen if throwing an exception or access is_lock_free() == false atomic from asynchronous signal handler. Throwing from signal handler is undefined behavior. I don't know about accessing atomics from asynchronous signal handler but that feels like undefined behavior if is_lock_free returns false. Bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64735 differences to current if atomic builtins available: * Race detector annotations that are empty by default * Check for __gthread_active_p * Generate x86 code uses xadd instead of xsub This makes code a bit worse. But problem is duplicated to any other user of __exchange_and_add. The internal API implementation should be fixed to generate better code for all cases. But that is a follow up patch. ** libstdc++ test results x86_64 build from trunk@241870 resulted to exactly same results with and without patch. That is two localetest cases failing: FAIL: 22_locale/numpunct/members/char/3.cc execution test FAIL: 22_locale/time_get/get_date/wchar_t/4.cc execution test of expected passes11941 of unexpected failures2 of expected failures 65 of unsupported tests 244 armv5tel from gcc-6-branch@241866 in qemu-arm-static -cpu arm926 resulted to following differences. Before patch: WARNING: program timed out. FAIL: 22_locale/locale/cons/12658_thread-2.cc execution test FAIL: 22_locale/numpunct/members/char/3.cc execution test FAIL: 22_locale/time_get/get_date/wchar_t/4.cc execution test FAIL: 27_io/fpos/14775.cc execution test of expected passes10628 of unexpected failures4 of expected failures 66 of unsupported tests 390 After patch: WARNING: program timed out. FAIL: 22_locale/locale/cons/12658_thread-1.cc execution test WARNING: program timed out. FAIL: 22_locale/locale/cons/12658_thread-2.cc execution test FAIL: 22_locale/numpunct/members/char/3.cc execution test FAIL: 22_locale/time_get/get_date/wchar_t/4.cc execution test FAIL: 27_io/fpos/14775.cc execution test of expected passes10831 of unexpected failures5 of expected failures 66 of unsupported tests 281 My cpu isn't fast enough to run 12658 thread tests in qemu. --- 2016-11-09 Pauli Nieminen * include/std/future: Remove check for ATOMIC_INT_LOCK_FREE * libsupc++/eh_ptr.cc (exception_ptr::_M_addref) (exception_ptr::_M_release) (__gxx_dependent_exception_cleanup) (rethrow_exception): Use atomicity.h reference counting helpers. * libsupc++/eh_throw.cc (__gxx_exception_cleanup): Likewise. * libsupc++/eh_tm.cc (free_any_cxa_exception): Likewise. * libsupc++/exception: Remove check for ATOMIC_INT_LOCK_FREE * libsupc++/exception_ptr.h: Likewise * libsupc++/nested_exception.cc: Likewise * libsupc++/nested_exception.h: Likewise * src/c++11/future.cc: Likewise * testsuite: Remove atomic builtins check from exception_ptr, nested_exception, future, promise, packaged_task, async and shared_future. --- libstdc++-v3/include/std/future| 4 +--- libstdc++-v3/libsupc++/eh_ptr.cc | 26 +- libstdc++-v3/libsupc++/eh_throw.cc | 9 libstdc++-v3/libsupc++/eh_tm.cc| 11 + libstdc++-v3/libsupc++/exception | 3 +-- libstdc++-v3/libsupc++/exception_ptr.h | 4 libstdc++-v3/libsupc++/nested_exception.cc | 2 -- libstdc++-v3/libsupc++/nested_exception.h | 4 libstdc++-v3/src/c++11/future.cc | 4 ++-- .../testsuite/18_support/exception_ptr/40296.cc| 1 - .../18_support/exception_ptr/60612-terminate.cc| 1 - .../18_support/exception_ptr/60612-unexpected.cc | 1 - .../testsuite/18_support/exception_ptr/62258.cc| 1 - .../testsuite/18_support/exception_ptr/64241.cc| 1 - .../18_support/exception_ptr/current_exception.cc | 1 - .../testsuite/18_support/exception_ptr/lifespan.cc | 1 - .../18_support/exception_ptr/make_exception_ptr.cc | 1 - .../exception_ptr/make_exception_ptr_2.cc | 1 - .../testsuite/18_support/exception_ptr/move.cc | 1 - .../18_support/exception_ptr/requirements.cc | 1 - .../18_support/exception_ptr/requirements_neg.cc | 1 - .../18_support/exception_ptr/rethrow_exception.cc | 1 - .../testsuite/18_support/nested_exception/51438.cc | 1 - .../testsuite/18_support/nested_exception/62154
[PATCH] libstdc++: Improve code generation for atomic reference counting
Atomic reference counting generates pessimistic code in platforms where builtin atomics could optimize code for following branch with subtract instruction. To allow better code generation with compile time constant addition can be checked for negative value. Those cases can then be better optimized with __atomic_fetch_sub builtin. Extra branch checking presence of threads in application generates a lot extra instructions when builtin atomics often generate very few instructions. Can I assume that __builtin_constant_p is available in all compilers with _GLIBCXX_ATOMIC_BUILTINS is available? bits/stl_list.h assumes __builtin_constant_p is available inside _GLIBCXX_USE_CXX11_ABI. x86_64 test results: FAIL: 22_locale/numpunct/members/char/3.cc execution test FAIL: 22_locale/time_get/get_date/wchar_t/4.cc execution test of expected passes11941 of unexpected failures2 of expected failures 65 of unsupported tests 244 --- 2016-11-09 Pauli Nieminen * include/ext/atomicity.h (__exchange_and_add): Optimize negative add with builtin __atomic_fetch_sub. (__atomic_add): Call __exchange_and_add to reuse same code (__exchange_adn_add_dispatch) (__atomic_add_dispatch): Optimize threading check if builtins generate inline instructions. --- libstdc++-v3/include/ext/atomicity.h | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/libstdc++-v3/include/ext/atomicity.h b/libstdc++-v3/include/ext/atomicity.h index 0fcb390..ecf5a9d 100644 --- a/libstdc++-v3/include/ext/atomicity.h +++ b/libstdc++-v3/include/ext/atomicity.h @@ -35,6 +35,14 @@ #include #include +#if defined(__GCC_ATOMIC_INT_LOCK_FREE) +#define _GLIBCXX_INT_LOCK_FREE __GCC_ATOMIC_INT_LOCK_FREE +#else +#define _GLIBCXX_INT_LOCK_FREE 2 +#endif +#define _GLIBCXX_ATOMIC_INLINE \ + (_GLIBCXX_ATOMIC_BUILTINS && _GLIBCXX_INT_LOCK_FREE == 2) + namespace __gnu_cxx _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -46,11 +54,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #ifdef _GLIBCXX_ATOMIC_BUILTINS static inline _Atomic_word __exchange_and_add(volatile _Atomic_word* __mem, int __val) - { return __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); } + { +if (__builtin_constant_p(__val) && __val < 0) + return __atomic_fetch_sub(__mem, 0 - __val, __ATOMIC_ACQ_REL); +else + return __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); + } static inline void __atomic_add(volatile _Atomic_word* __mem, int __val) - { __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); } + { __exchange_and_add(__mem, __val); } #else _Atomic_word __attribute__ ((__unused__)) @@ -78,7 +91,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __exchange_and_add_dispatch(_Atomic_word* __mem, int __val) { #ifdef __GTHREADS -if (__gthread_active_p()) +if (_GLIBCXX_ATOMIC_INLINE || __gthread_active_p()) return __exchange_and_add(__mem, __val); else return __exchange_and_add_single(__mem, __val); @@ -92,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __atomic_add_dispatch(_Atomic_word* __mem, int __val) { #ifdef __GTHREADS -if (__gthread_active_p()) +if (_GLIBCXX_ATOMIC_INLINE || __gthread_active_p()) __atomic_add(__mem, __val); else __atomic_add_single(__mem, __val); @@ -104,6 +117,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _GLIBCXX_END_NAMESPACE_VERSION } // namespace +#undef _GLIBCXX_INT_LOCK_FREE +#undef _GLIBCXX_ATOMIC_INLINE + // Even if the CPU doesn't need a memory barrier, we need to ensure // that the compiler doesn't reorder memory accesses across the // barriers. -- 2.9.3
Re: [PATCH] simplify-rtx: Transform (xor (and (xor A B) C) B) with C const
On 11/09/2016 10:13 PM, Segher Boessenkool wrote: * simplify-rtx.c (simplify_binary_operation_1): Simplify (xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and (xor (and (xor A B) C) B) to (ior (and A ~C) (and B C)) if C is a const_int. I think one of the xors should have A as the second operand. +/* If we have (xor (and (xor A B) C) A) with C a constant we can instead + do (ior (and A ~C) (and B C)) which is a machine instruction on some + machines, and also has shorter instruction path length. */ + if (GET_CODE (op0) == AND Comments doesn't line up with the if/else on my monitor; could be email damage but please check. Other than that, I think it does qualify as a simplification (or at least an improvement), so OK. Would be nice to check for it with a testcase. Bernd
[ARM] PR 78253 do not resolve weak ref locally
Hi, PR 78253 shows that the handling of weak references has changed for ARM with gcc-5. When r220674 was committed, default_binds_local_p_2 gained a new parameter (weak_dominate), which, when true, implies that a reference to a weak symbol defined locally will be resolved locally, even though it could be overridden by a strong definition in another object file. With r220674, default_binds_local_p forces weak_dominate=true, effectively changing the previous behavior. The attached patch introduces default_binds_local_p_4 which is a copy of default_binds_local_p_2, but using weak_dominate=false, and updates the ARM target to call default_binds_local_p_4 instead of default_binds_local_p_2. I ran cross-tests on various arm* configurations with no regression, and checked that the test attached to the original bugzilla now works as expected. I am not sure why weak_dominate defaults to true, and I couldn't really understand why by reading the threads related to r220674 and following updates to default_binds_local_p_* which all deal with other corner cases and do not discuss the weak_dominate parameter. Or should this patch be made more generic? Thanks, Christophe 2016-11-09 Christophe Lyon PR target/78253 * output.h (default_binds_local_p_4): New. * varasm.c (default_binds_local_p_4): New, use weak_dominate=false. * config/arm/linux-elf.h (TARGET_BINDS_LOCAL_P): Define to default_binds_local_p_4. diff --git a/gcc/config/arm/linux-elf.h b/gcc/config/arm/linux-elf.h index cc17b51..4f32ce8 100644 --- a/gcc/config/arm/linux-elf.h +++ b/gcc/config/arm/linux-elf.h @@ -110,7 +110,7 @@ strong definitions in dependent shared libraries, will resolve to COPY relocated symbol in the executable. See PR65780. */ #undef TARGET_BINDS_LOCAL_P -#define TARGET_BINDS_LOCAL_P default_binds_local_p_2 +#define TARGET_BINDS_LOCAL_P default_binds_local_p_4 /* Define this to be nonzero if static stack checking is supported. */ #define STACK_CHECK_STATIC_BUILTIN 1 diff --git a/gcc/output.h b/gcc/output.h index 0924499..11b5ce5 100644 --- a/gcc/output.h +++ b/gcc/output.h @@ -585,6 +585,7 @@ extern bool default_binds_local_p (const_tree); extern bool default_binds_local_p_1 (const_tree, int); extern bool default_binds_local_p_2 (const_tree); extern bool default_binds_local_p_3 (const_tree, bool, bool, bool, bool); +extern bool default_binds_local_p_4 (const_tree); extern void default_globalize_label (FILE *, const char *); extern void default_globalize_decl_name (FILE *, tree); extern void default_emit_unwind_label (FILE *, tree, int, int); diff --git a/gcc/varasm.c b/gcc/varasm.c index 6a7ffc2..7a3cf99 100644 --- a/gcc/varasm.c +++ b/gcc/varasm.c @@ -6952,6 +6952,16 @@ default_binds_local_p_2 (const_tree exp) !flag_pic); } +/* Similar to default_binds_local_p_2, but local weak definition does + not imply local resolution (weak_dominate is false). */ + +bool +default_binds_local_p_4 (const_tree exp) +{ + return default_binds_local_p_3 (exp, flag_shlib != 0, false, true, + !flag_pic); +} + bool default_binds_local_p_1 (const_tree exp, int shlib) {
Review debug message generation
Hi Here is a proposal to review how we generate the debug output in case of assertion failure. It removes usage of format_word which, as a side effect will fix PR 77459. Should I reference this PR in the ChangeLog ? I introduced a print_literal function to avoid using strlen on them. I know that gcc optimizes calls to strlen on literals but in our case we were not directly calling it on the literals. Tested under Linux x86_64, ok to commit ? * src/c++11/debug.cc (format_word): Delete. (print_literal): New. Replace call to print_word for literals. François On 08/11/2016 22:35, fdumont at gcc dot gnu.org wrote: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77459 --- Comment #8 from François Dumont --- Ok, at least it confirms what I thought about builtins. So the problem is rather a buggy target. Even if so I'll try to find an alternative approach to avoid snprintf usage. diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc index d79e43b..2b9c00b 100644 --- a/libstdc++-v3/src/c++11/debug.cc +++ b/libstdc++-v3/src/c++11/debug.cc @@ -546,11 +546,6 @@ namespace using _Error_formatter = __gnu_debug::_Error_formatter; using _Parameter = __gnu_debug::_Error_formatter::_Parameter; - template -int -format_word(char* buf, int n, const char* fmt, _Tp s) -{ return std::min(__builtin_snprintf(buf, n, fmt, s), n - 1); } - void get_max_length(std::size_t& max_length) { @@ -577,6 +572,11 @@ namespace bool _M_wordwrap; }; + template +void +print_literal(PrintContext& ctx, const char(&word)[Length]) +{ print_word(ctx, word, Length - 1); } + void print_word(PrintContext& ctx, const char* word, std::ptrdiff_t count = -1) @@ -627,18 +627,19 @@ namespace } else { - print_word(ctx, "\n", 1); + print_literal(ctx, "\n"); print_word(ctx, word, count); } } + template void print_type(PrintContext& ctx, const type_info* info, - const char* unknown_name) + const char(&unknown_name)[Length]) { if (!info) - print_word(ctx, unknown_name); + print_literal(ctx, unknown_name); else { int status; @@ -784,20 +785,18 @@ namespace { if (type._M_name) { - const int bufsize = 64; - char buf[bufsize]; - int written - = format_word(buf, bufsize, "\"%s\"", type._M_name); - print_word(ctx, buf, written); + print_literal(ctx, "\""); + print_word(ctx, type._M_name); + print_literal(ctx, "\""); } -print_word(ctx, " {\n"); +print_literal(ctx, " {\n"); if (type._M_type) { - print_word(ctx, " type = "); + print_literal(ctx, " type = "); print_type(ctx, type._M_type, ""); - print_word(ctx, ";\n"); + print_literal(ctx, ";\n"); } } @@ -809,9 +808,9 @@ namespace if (inst._M_name) { - int written - = format_word(buf, bufsize, "\"%s\" ", inst._M_name); - print_word(ctx, buf, written); + print_literal(ctx, "\""); + print_word(ctx, inst._M_name); + print_literal(ctx, "\" "); } int written @@ -820,7 +819,7 @@ namespace if (inst._M_type) { - print_word(ctx, " type = "); + print_literal(ctx, " type = "); print_type(ctx, inst._M_type, ""); } } @@ -838,36 +837,36 @@ namespace { const auto& ite = variant._M_iterator; - print_word(ctx, "iterator "); + print_literal(ctx, "iterator "); print_description(ctx, ite); if (ite._M_type) { if (ite._M_constness != _Error_formatter::__unknown_constness) { - print_word(ctx, " ("); + print_literal(ctx, " ("); print_field(ctx, param, "constness"); - print_word(ctx, " iterator)"); + print_literal(ctx, " iterator)"); } - print_word(ctx, ";\n"); + print_literal(ctx, ";\n"); } if (ite._M_state != _Error_formatter::__unknown_state) { - print_word(ctx, " state = "); + print_literal(ctx, " state = "); print_field(ctx, param, "state"); - print_word(ctx, ";\n"); + print_literal(ctx, ";\n"); } if (ite._M_sequence) { - print_word(ctx, " references sequence "); + print_literal(ctx, " references sequence "); if (ite._M_seq_type) { - print_word(ctx, "with type '"); + print_literal(ctx, "with type '"); print_field(ctx, param, "seq_type"); - print_word(ctx, "' "); + print_literal(ctx, "' "); } int written @@ -875,34 +874,34 @@ namespace print_word(ctx, buf, written); } - print_word(ctx, "}\n", 2); + print_literal(ctx, "}\n"); } break; case _Parameter::__sequence: - print_word(ctx, "sequence "); + print_literal(ctx, "sequence "); print_description(ctx, variant._M_sequence); if (variant._M_sequence._M_type) - print_word(ctx, ";\n", 2); + print_literal(ctx, ";\n"); - print_word(ctx, "}\n", 2); + print_literal(ctx, "}\n"); break; case _Parameter::__instance: - prin