AW: Fix for type confusion bug on microblaze
Yes, you are correct. Tested it and it works as intended. Thanks, Jonas --- a/gcc/config/microblaze/microblaze.h +++ b/gcc/config/microblaze/microblaze.h @@ -695,7 +695,7 @@ do { \ fprintf (STREAM, "\t.align\t%d\n", (LOG)) #define ASM_OUTPUT_SKIP(STREAM,SIZE) \ - fprintf (STREAM, "\t.space\t%lu\n", (SIZE)) + fprintf (STREAM, "\t.space\t" HOST_WIDE_INT_PRINT_UNSIGNED "\n", (SIZE)) #define ASCII_DATA_ASM_OP "\t.ascii\t" #define STRING_ASM_OP "\t.asciz\t" -Ursprüngliche Nachricht- Von: Jeff Law Gesendet: Mittwoch, 11. September 2019 19:33 An: Jonas Pfeil ; gcc-patches@gcc.gnu.org Betreff: Re: Fix for type confusion bug on microblaze On 9/11/19 5:28 AM, Jonas Pfeil wrote: > The Microblaze target confuses unsigned long with HOST_WIDE_INT. > > This works fine for many architectures, but fails on ARM > (HOST_WIDE_INT is 8 bytes, unsigned long is 4 bytes). Leading to print > a uninitialized register instead of the desired number, fix by using the > correct printf-specifier. > > Tested to fix the issue and work with an ARM->MB cross compiler. > > --- a/gcc/config/microblaze/microblaze.h > +++ b/gcc/config/microblaze/microblaze.h > @@ -695,7 +695,7 @@ do { > \ >fprintf (STREAM, "\t.align\t%d\n", (LOG)) > > #define ASM_OUTPUT_SKIP(STREAM,SIZE) \ > - fprintf (STREAM, "\t.space\t%lu\n", (SIZE)) > + fprintf (STREAM, "\t.space\t" HOST_WIDE_INT_PRINT_DEC "\n", (SIZE)) I believe that we should be using HOST_WIDE_INT_PRINT_UNSIGNED, not HOST_WIDE_INT_PRINT_DEC. Can you please verify that works for your cross builds? Thanks, jeff
RE: [PATCH][AArch64] Vectorize MULH(R)S patterns with SVE2 instructions
Hi Richard, Thanks for your comments and advice; I have applied the relevant changes. Regards, Yuliang UPDATE: Added new tests. Built and regression tested on aarch64-none-elf and aarch64-linux-gnu. gcc/ChangeLog: 2019-09-1 Yuliang Wang PR tree-optimization/89386 * config/aarch64/aarch64-sve2.md (mull) (shrnb, shrnt): New SVE2 patterns. (mulhs3): New pattern for MULHRS. * config/aarch64/iterators.md (UNSPEC_SMULLB, UNSPEC_SMULLT) (UNSPEC_UMULLB, UNSPEC_UMULLT, UNSPEC_SHRNB, UNSPEC_SHRNT) (UNSPEC_RSHRNB, UNSPEC_RSHRNT, UNSPEC_SMULHS, UNSPEC_SMULHRS) UNSPEC_UMULHS, UNSPEC_UMULHRS): New unspecs. (MULLBT, SHRNB, SHRNT, MULHRS): New int iterators. (su, r): Handle the unspecs above. (bt): New int attribute. * internal-fn.def (IFN_MULHS, IFN_MULHRS): New internal functions. * internal-fn.c (first_commutative_argument): Commutativity info for above. * optabs.def (smulhs_optab, smulhrs_optab, umulhs_optab, umulhrs_optab): New optabs. * doc/md.texi (smulhs$var{m3}, umulhs$var{m3}) (smulhrs$var{m3}, umulhrs$var{m3}): Documentation for the above. * tree-vect-patterns.c (vect_recog_mulhs_pattern): New pattern function. (vect_vect_recog_func_ptrs): Add it. * testsuite/gcc.target/aarch64/sve2/mulhrs_1.c: New test. * testsuite/gcc.dg/vect/vect-mulhrs-1.c: As above. * testsuite/gcc.dg/vect/vect-mulhrs-2.c: As above. * testsuite/gcc.dg/vect/vect-mulhrs-3.c: As above. * testsuite/gcc.dg/vect/vect-mulhrs-4.c: As above. * doc/sourcebuild.texi (vect_mulhrs_hi): Document new target selector. * testsuite/lib/target-supports.exp (check_effective_target_vect_mulhrs_hi): Return true for AArch64 without SVE2. -Original Message- From: Richard Sandiford Sent: 30 August 2019 12:49 To: Yuliang Wang Cc: gcc-patches@gcc.gnu.org; nd Subject: Re: [PATCH][AArch64] Vectorize MULH(R)S patterns with SVE2 instructions Thanks for doing this. The patch looks good, so this review is mostly a list of very minor formatting comments, sorry. Yuliang Wang writes: > 2019-08-22 Yuliang Wang > Please add a line here pointing at the PR: PR tree-optimization/89386 The commit hooks pick this up automatically and link the commit to the bugzilla ticket. (The PR was filed for SSSE3, but the target-independent bits are still needed there.) > * config/aarch64/aarch64-sve2.md: support for SVE2 > instructions [S/U]MULL[T/B] + [R]SHRN[T/B] and MULHRS pattern variants Unfortunately the changelog format is pretty strict here. Lines have to be 80 chars or shorter, indented by tabs, and each pattern, function, variable or type needs to be listed individually regardless of how useful that seems. So I think this should be something like: * config/aarch64/aarch64-sve2.md (mull) (shrnb, shrnt, mulhs3): New patterns. (See below for why the "*" patterns aren't listed.) > * config/aarch64/iterators.md: iterators and attributes for > above Here too the iterators need to be listed: * config/aarch64/iterators.md (UNSPEC_SMULLB, UNSPEC_SMULLT) (UNSPEC_UMULLB, UNSPEC_UMULLT, UNSPEC_SHRNB, UNSPEC_SHRNT) (UNSPEC_RSHRNB, UNSPEC_RSHRNT, UNSPEC_SMULHS, UNSPEC_SMULHRS) UNSPEC_UMULHS, UNSPEC_UMULHRS): New unspecs. (MULLBT, SHRNB, SHRNT, MULHRS): New int iterators. (su, r): Handle the unspecs above. (bt): New int attribute. > * internal-fn.def: internal functions for MULH[R]S patterns * internal-fn.def (IFN_MULHS, IFN_MULHRS): New internal functions. > * optabs.def: optabs definitions for above and sign variants * optabs.def (smulhs_optab, smulhrs_optab, umulhs_optab) (umulhrs_optab): New optabs. > * tree-vect-patterns.c (vect_recog_multhi_pattern): pattern > recognition function for MULHRS * tree-vect-patterns.c (vect_recog_multhi_pattern): New function. (vect_vect_recog_func_ptrs): Add it. > * gcc.target/aarch64/sve2/mulhrs_1.c: new test for all > variants Just: * gcc.target/aarch64/sve2/mulhrs_1.c: New test. (Sorry that this is so finicky. I'm just the messenger. :-)) > diff --git a/gcc/config/aarch64/aarch64-sve2.md > b/gcc/config/aarch64/aarch64-sve2.md > index > 2334e5a7b7dc524bbd1f4d0a48ba5cd991970118..51783604ad8f83eb1d070c133009 > ed41a2a0252d 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -63,3 +63,89 @@ > movprfx\t%0, %2\;h\t%0., %1/m, %0., > %3." >[(set_attr "movprfx" "*,yes")] > ) > + > +;; Multiply long top / bottom Very minor, but: GCC comments traditionally end with "." even if they're not full sentences. > +(define_insn "*mull" > + [(set (match_operand: 0 "register_operand" "=w") > +(unspec: [(match_operand:SVE_BHSI 1 "register_operand" "w") > +
Re: [10/32] Remove global call sets: combine.c
Segher Boessenkool writes: > On Wed, Sep 11, 2019 at 08:08:38PM +0100, Richard Sandiford wrote: >>hard_reg_set_iterator hrsi; >> - EXECUTE_IF_SET_IN_HARD_REG_SET (regs_invalidated_by_call, 0, i, hrsi) >> + EXECUTE_IF_SET_IN_HARD_REG_SET (abi.full_and_partial_reg_clobbers (), >> + 0, i, hrsi) > > So "abi" in that means calls? "abi" is the interface of the callee function, taking things like function attributes and -fipa-ra into account. The register sets are describing what the callee does rather than what calls to it do. E.g. on targets that allow linker stubs to be inserted between calls, the scratch registers reserved for linker stubs are still call-clobbered, even if the target of the call doesn't use them. (Those call clobbers are represented separately, at least when TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS is true. When it's false we don't use -fipa-ra information at all.) > It is not such a great name like that. Since its children are > very_long_names, it doesn't need to be only three chars itself, > either? OK, what name would you prefer? Richard
Re: [PATCH] Fix PR 91708
On Wed, 11 Sep 2019, Bernd Edlinger wrote: > On 9/11/19 8:30 PM, Richard Biener wrote: > > On September 11, 2019 7:41:22 PM GMT+02:00, Bernd Edlinger > > wrote: > >> On 9/11/19 6:08 PM, Jeff Law wrote: > >>> On 9/11/19 7:49 AM, Bernd Edlinger wrote: > On 9/11/19 9:23 AM, Richard Biener wrote: > > On Tue, 10 Sep 2019, Bernd Edlinger wrote: > > > >> Hi! > >> > >> This ICE happens when compiling real_nextafter in real.c. > >> CSE sees this: > >> > >> (insn 179 178 180 11 (set (reg:SI 319) > >> (reg/v/f:SI 273 [ rD.73757 ])) > >> "../../gcc-trunk-1/gcc/real.c":120:10 643 {*thumb2_movsi_vfp} > >> (nil)) > >> [...] > >> (insn 181 180 182 11 (set (mem:SI (reg:SI 319) [0 MEM > >> [(voidD.73 *)r_77(D)]+0 S4 A8]) > >> (unspec:SI [ > >> (reg:SI 320) > >> ] UNSPEC_UNALIGNED_STORE)) > >> "../../gcc-trunk-1/gcc/real.c":120:10 129 {unaligned_storesi} > >> (nil)) > >> [...] > >> (insn 186 185 187 11 (set (mem:SI (plus:SI (reg/v/f:SI 273 [ > >> rD.73757 ]) > >> (const_int 20 [0x14])) [0 MEM > >> [(voidD.73 *)r_77(D)]+20 S4 A8]) > >> (unspec:SI [ > >> (reg:SI 320) > >> ] UNSPEC_UNALIGNED_STORE)) > >> "../../gcc-trunk-1/gcc/real.c":120:10 129 {unaligned_storesi} > >> (expr_list:REG_DEAD (reg:SI 320) > >> (expr_list:REG_DEAD (reg/f:SI 319 [ rD.73757 ]) > >> (nil > >> [...] > >> (insn 234 233 235 11 (set (reg:SI 340) > >> (mem:SI (reg/v/f:SI 273 [ rD.73757 ]) [52 MEM >> int> [(struct real_valueD.28367 *)r_77(D)]+0 S4 A32])) > >> "../../gcc-trunk-1/gcc/real.c":5185:9 643 {*thumb2_movsi_vfp} > >> (nil)) > >> > >> > >> ... and transforms insn 234 in an invalid insn: > >> > >> > >> (insn 234 233 235 11 (set (reg:SI 340 [ MEM > >> [(struct real_valueD.28367 *)r_77(D)] ]) > >> (mem:SI (plus:SI (reg/v/f:SI 273 [ rD.73757 ]) > >> (const_int 20 [0x14])) [0 MEM > >> [(voidD.73 *)r_77(D)]+20 S4 A8])) "../../gcc-trunk-1/gcc/real.c":5185:9 > >> 643 {*thumb2_movsi_vfp} > >> (nil)) > >> > >> which triggers the assertion in the arm back-end, because the MEM > >> is not aligned. > >> > >> To fix that I changed exp_equiv_p to consider MEMs with different > >> MEM_ALIGN or > >> ALIAS_SET as different. > > > > I think the appropriate fix is to retain the mem_attrs of the > >> original > > MEM since obviously the replacement does not change those? It's a > >> mere > > change of the MEMs address but implementation-wise CSE replaces the > >> whole > > MEM? > > > > For CSE exp_equiv_p simply compares the addresses, only if for_gcse > > we do further checks like mem_attrs_eq_p. IMHO that is correct. > > > > I'm not sure what CSE does above, that is, what does it think it > >> does? > > It doesn't replace the loaded value, it seems to do sth else which > > is odd for CSE (replaces a REG with a PLUS - but why?) > > > > What I think CSE is thinking there is this: > > insn 181 stores the value of (reg:SI 320) at (mem:SI (reg:SI 319) [0 > >> MEM [(voidD.73 *)r_77(D)]+0 S4 A8] > that is the same address as where insn 234 reads the value back but > >> there we know it is aligned. > > insn 186 stores the same value at (mem:SI (plus:SI (reg/v/f:SI 273 [ > >> rD.73757 ]) (const_int 20 [0x14])) [0 MEM [(voidD.73 > >> *)r_77(D)]+20 S4 A8] > > But since both reg:SI 320 is dead and reg:SI 319 is dead as well, > >> when > insn 234 processed, it replaces one mem with another mem that has > >> the same > value. > >>> Just to make sure I understand. Are you saying the addresses for the > >>> MEMs are equal or the contents of the memory location are equal. > >>> > >> > >> The MEMs all have different addresses, but the same value, they are > >>from a > >> memset or something: > >> > >> (gdb) call dump_class(elt) > >> Equivalence chain for (mem:SI (reg/v/f:SI 273 [ r ]) [0 MEM > >> [(void *)r_77(D)]+0 S4 A8]): > >> (unspec:SI [ > >>(reg:SI 320) > >>] UNSPEC_UNALIGNED_STORE) > >> (mem:SI (plus:SI (reg/v/f:SI 273 [ r ]) > >> (const_int 20 [0x14])) [0 MEM [(void *)r_77(D)]+20 S4 A8]) > >> (mem:SI (plus:SI (reg/v/f:SI 273 [ r ]) > >> (const_int 16 [0x10])) [0 MEM [(void *)r_77(D)]+16 S4 A8]) > >> (mem:SI (plus:SI (reg/v/f:SI 273 [ r ]) > >> (const_int 12 [0xc])) [0 MEM [(void *)r_77(D)]+12 S4 A8]) > >> (mem:SI (plus:SI (reg/v/f:SI 273 [ r ]) > >> (const_int 8 [0x8])) [0 MEM [(void *)r_77(D)]+8 S4 A8]) > >> (mem:SI (plus:SI (reg/v/f:SI 273 [ r ]) > >> (const_int 4 [0x4])) [0 MEM [(void *)r_77(D)]+4 S4 A8]) > >> (mem:SI (reg/v/f:SI 273 [ r ]) [0 MEM [(void *)r_77(D)]+0 > >> S4 A8]) > >> > >> > >> The insn 234, uses the same address as the last in the list > >> (mem:SI (reg/v/f:SI 273 [ r
Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts
On Wed, 11 Sep 2019, Kewen.Lin wrote: > Hi, > > Sorry for the late update. I've updated the words of target hooks part. > > Could someone help to review it? Thanks in advance! > > By the way, as previous emails in this thread, Bin has approved the IVOPTs > part, while Segher has approved the rs6000 part. The target hooks part is OK. I guess we'll have to extend it eventually in case other targets want to make use of it. Thanks, Richard. > > Thanks, > Kewen > > - > > gcc/ChangeLog > > 2019-09-11 Kewen Lin > > PR middle-end/80791 > * config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macro. > (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. > (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. > * target.def (have_count_reg_decr_p): New hook. > (doloop_cost_for_generic): Likewise. > (doloop_cost_for_address): Likewise. > * doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): Likewise. > (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. > (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. > * doc/tm.texi: Regenerate. > * tree-ssa-loop-ivopts.c (comp_cost::operator+=): Consider infinite cost > addend. > (record_group): Init doloop_p. > (add_candidate_1): Add optional argument doloop, change the handlings > accordingly. > (add_candidate): Likewise. > (generic_predict_doloop_p): Update attribute. > (force_expr_to_var_cost): Add costing for expressions COND_EXPR/LT_EXPR/ > LE_EXPR/GT_EXPR/GE_EXPR/EQ_EXPR/NE_EXPR/UNORDERED_EXPR/ORDERED_EXPR/ > UNLT_EXPR/UNLE_EXPR/UNGT_EXPR/UNGE_EXPR/UNEQ_EXPR/LTGT_EXPR/MAX_EXPR/ > MIN_EXPR. > (get_computation_cost): Update for doloop IV cand extra cost. > (determine_group_iv_cost_cond): Update for doloop IV cand. > (determine_iv_cost): Likewise. > (ivopts_estimate_reg_pressure): Likewise. > (may_eliminate_iv): Update handlings for doloop IV cand. > (add_iv_candidate_for_doloop): New function. > (find_iv_candidates): Call function add_iv_candidate_for_doloop. > (iv_ca_set_no_cp): Update for doloop IV cand. > (iv_ca_set_cp): Likewise. > (iv_ca_dump): Dump register cost. > (find_doloop_use): New function. > (analyze_and_mark_doloop_use): Likewise. > (tree_ssa_iv_optimize_loop): Call function analyze_and_mark_doloop_use. > > gcc/testsuite/ChangeLog > > 2019-09-11 Kewen Lin > > PR middle-end/80791 > * gcc.dg/tree-ssa/ivopts-3.c: Adjust for doloop change. > * gcc.dg/tree-ssa/ivopts-lt.c: Likewise. > * gcc.dg/tree-ssa/pr32044.c: Likewise. > > > on 2019/8/23 下午6:18, Segher Boessenkool wrote: > > Hi! > > > > On Fri, Aug 23, 2019 at 05:43:32PM +0800, Bin.Cheng wrote: > >> On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin wrote: > >> Not sure if non-ivopts parts are already approved? If so, the patch > >> is okay with above issues addressed. > > > > The rs6000 part is fine. The target.def entries need some spell check > > and copy-editing, but are obvious and trivial otherwise, and/or you can > > approve it as ivopts maintainer. > > > >> Thanks very much for your time! > > > > And thank you as well Bin :-) > > > > > > Segher > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 247165 (AG München)
Re: [Patch, fortran] PR91717 - ICE on concatenating deferred-length character and character literal
Hi Steve, I have inserted a my_core%msg = '"" so that it is initialised. The reallocation on assignment explicitly deals with an unallocated entity or one of its allocatable components by allocation, rather than reallocation. I realise that my explanation of the patch might be a bit sparse. The ultimate caller is frontend_passes.c(realloc_string_callback), which looks for overlap even for scalar cases. The ICE came about because there are no array references in the expressions being compared. Flagging that there are identical component chains in this case and returning 1 from gfc_dep_resolver covers this case. OK to commit? Cheers Paul realloc_string_callback) On Thu, 12 Sep 2019 at 00:05, Steve Kargl wrote: > > On Wed, Sep 11, 2019 at 11:27:56PM +0100, Paul Richard Thomas wrote: > > Hi Steve, > > > > Being an allocatable component, this code appears on entry into scope: > > > > my_core.msg = 0B; > > my_core._msg_length = 0; > > { > > > > Thus, according to the standard, my_core%msg is defined. > > > > I'll politely defer to the Fortran standard. > > 5.4.10 Allocatable variables > > The allocation status of an allocatable variable is either allocated > or unallocated. An allocatable variable becomes allocated as described > in 9.7.1.3. It becomes unallocated as described in 9.7.3.2. > > An unallocated allocatable variable shall not be referenced or defined. > > 7.5.4.6 Default initialization for components > > Allocatable components are always initialized to unallocated. > > In your testcase, you have an unallocatable allocatable entity > referenced on the RHS in the first line that involves my_core%msg. > From the Fortran standard's point of view, I believe the code is > nonconforming. > > type core > character (len=:), allocatable :: msg > end type > > type(core) :: my_core > > print *, allocated(my_core%msg) > > ! Comment out to avoid ICE. > ! my_core%msg on RHS is unallocated in next line. > ! my_core%msg = my_core%msg//"my message is: > ! my_core%msg = my_core%msg//"Hello!" > ! > ! if (my_core%msg .ne. "my message is: Hello!") stop 1 > end > > % gfcx -o z a.f90 && ./z > F > > > detected for this assignment even though no array references are > > involved. There is certainly the need for a temporary, which the patch > > generates. > > > > Your patch may fix the ICE, but if the code compiles and > execute. You are getting the "right" answer fortuitiouly. > > Of course, I could be wrong. > > -- > Steve -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein
Re: C++ PATCH for c++/91740 - ICE with constexpr call and ?: in ARRAY_REF
Hi, On 11/09/19 23:15, Marek Polacek wrote: --- gcc/cp/pt.c +++ gcc/cp/pt.c @@ -26709,7 +26709,7 @@ build_non_dependent_expr (tree expr) if (TREE_CODE (expr) == COND_EXPR) return build3 (COND_EXPR, TREE_TYPE (expr), - TREE_OPERAND (expr, 0), + build_non_dependent_expr (TREE_OPERAND (expr, 0)), (TREE_OPERAND (expr, 1) ? build_non_dependent_expr (TREE_OPERAND (expr, 1)) : build_non_dependent_expr (TREE_OPERAND (expr, 0))), Looks like we would end up unnecessarily calling build_non_dependent_expr three times instead of two: probably is very cheap, probably the code is cleaner this way but I'm a little annoyed at this anyway, for the record ;) Cheers, Paolo.
Re: C++ PATCH for c++/91740 - ICE with constexpr call and ?: in ARRAY_REF
Hi again, On 12/09/19 11:03, Paolo Carlini wrote: Hi, On 11/09/19 23:15, Marek Polacek wrote: --- gcc/cp/pt.c +++ gcc/cp/pt.c @@ -26709,7 +26709,7 @@ build_non_dependent_expr (tree expr) if (TREE_CODE (expr) == COND_EXPR) return build3 (COND_EXPR, TREE_TYPE (expr), - TREE_OPERAND (expr, 0), + build_non_dependent_expr (TREE_OPERAND (expr, 0)), (TREE_OPERAND (expr, 1) ? build_non_dependent_expr (TREE_OPERAND (expr, 1)) : build_non_dependent_expr (TREE_OPERAND (expr, 0))), Looks like we would end up unnecessarily calling build_non_dependent_expr three times instead of two: probably is very cheap, probably the code is cleaner this way but I'm a little annoyed at this anyway, for the record ;) Sorry, I misread the code: normally TREE_OPERAND (expr, 1) isn't NULL_TREE thus we are fine. Paolo.
[Patch, fortran] PR91686 - ICE in gimplify:2554
Committed as 'obvious' in revision 275681: 2019-09-12 Paul Thomas PR fortran/91686 Backport from mainline * trans-expr.c (gfc_trans_assignment_1): Copy and paste section handling the rse.pre block from mainline. 2019-09-12 Paul Thomas PR fortran/91686 * gfortran.dg/pr91686.f90 : New test. The rse.pre block was being added outside the loop so that a temporary index was being used before it was declared and the loop index was not defined. Paul
[PATCH] DWARF array bounds missing from C++ array definitions
A variable redeclaration or definition that provides additional type information for it, e.g. outermost array bounds, is not reflected in the debug information for the variable. With this patch, the debug info of the variable specialization gets a type attribute with the adjusted type. This patch affects mostly only array bounds. However, when the symbolic type used in a declaration and in a definition are different, although they refer to the same type, debug information will end up (correctly?) naming different symbolic types in the specification and the definition. Also, when a readonly declaration of an array loses the readonly flag at the definition because of the initializer, the definition may end up referencing a type while the specification refers to a const-qualified version of that type. If the type of the variable is already const-qualified, e.g. an array of a const type, the difference is meaningless. Regstrapped on x86_64-linux-gnu. Ok to install? for gcc/ChangeLog * dwarf2out.c (completing_type_p): New. (gen_variable_die): Use it. for gcc/testsuite/ChangeLog * gcc.dg/debug/dwarf2/array-0.c: New. * gcc.dg/debug/dwarf2/array-1.c: New. * gcc.dg/debug/dwarf2/array-2.c: New. * gcc.dg/debug/dwarf2/array-3.c: New. * g++.dg/debug/dwarf2/array-0.C: New. * g++.dg/debug/dwarf2/array-1.C: New. * g++.dg/debug/dwarf2/array-2.C: New. Based on libstdc++-v3's src/c++98/pool_allocator.cc:__pool_alloc_base::_S_heap_size. * g++.dg/debug/dwarf2/array-3.C: New. Based on gcc's config/i386/i386-features.c:xlogue_layout::s_instances. * g++.dg/debug/dwarf2/array-4.C: New. --- gcc/dwarf2out.c | 31 ++- gcc/testsuite/g++.dg/debug/dwarf2/array-0.C | 13 +++ gcc/testsuite/g++.dg/debug/dwarf2/array-1.C | 13 +++ gcc/testsuite/g++.dg/debug/dwarf2/array-2.C | 15 + gcc/testsuite/g++.dg/debug/dwarf2/array-3.C | 20 + gcc/testsuite/g++.dg/debug/dwarf2/array-4.C | 16 ++ gcc/testsuite/gcc.dg/debug/dwarf2/array-0.c | 10 + gcc/testsuite/gcc.dg/debug/dwarf2/array-1.c | 10 + gcc/testsuite/gcc.dg/debug/dwarf2/array-2.c |8 +++ gcc/testsuite/gcc.dg/debug/dwarf2/array-3.c |8 +++ 10 files changed, 143 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-0.C create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-1.C create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-2.C create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-3.C create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-4.C create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-0.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-1.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-2.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-3.c diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index c359c2d4af981..ad533c14d2480 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -23687,6 +23687,33 @@ local_function_static (tree decl) && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL; } +/* Return true iff DECL completes (overrides) the type of OLD_DIE + within CONTEXT_DIE. */ + +static bool +completing_type_p (tree decl, dw_die_ref old_die, dw_die_ref context_die) +{ + tree type = TREE_TYPE (decl); + int cv_quals; + + if (decl_by_reference_p (decl)) +{ + type = TREE_TYPE (type); + cv_quals = TYPE_UNQUALIFIED; +} + else +cv_quals = decl_quals (decl); + + dw_die_ref type_die = modified_type_die (type, + cv_quals | TYPE_QUALS (type), + false, + context_die); + + dw_die_ref old_type_die = get_AT_ref (old_die, DW_AT_type); + + return type_die != old_type_die; +} + /* Generate a DIE to represent a declared data object. Either DECL or ORIGIN must be non-null. */ @@ -23939,7 +23966,9 @@ gen_variable_die (tree decl, tree origin, dw_die_ref context_die) && !DECL_ABSTRACT_P (decl_or_origin) && variably_modified_type_p (TREE_TYPE (decl_or_origin), decl_function_context - (decl_or_origin + (decl_or_origin))) + || (old_die && specialization_p + && completing_type_p (decl_or_origin, old_die, context_die))) { tree type = TREE_TYPE (decl_or_origin); diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/array-0.C b/gcc/testsuite/g++.dg/debug/dwarf2/array-0.C new file mode 100644 index 0..a3458bd0d32a4 --- /dev/null +++ b/gcc/testsuite/g++.dg/debug/dwarf2/array-0.C @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-gdwarf-2 -dA" } */ +struct S +{ + static int arra
Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)
On Wed, Sep 11, 2019 at 5:50 PM Wilco Dijkstra wrote: > > While code hoisting generally improves codesize, it can affect performance > negatively. Benchmarking shows it doesn't help SPEC and negatively affects > embedded benchmarks, so only enable code hoisting with -Os on Arm. > > Bootstrap OK, OK for commit? Do we document target specific deviations from "default" behavior somewhere? invoke.texi has @item -fcode-hoisting @opindex fcode-hoisting Perform code hoisting. Code hoisting tries to move the evaluation of expressions executed on all paths to the function exit as early as possible. This is especially useful as a code size optimization, but it often helps for code speed as well. This flag is enabled by default at @option{-O2} and higher. > ChangeLog: > 2019-09-11 Wilco Dijkstra > > > PR tree-optimization/80155 > * common/config/arm/arm-common.c (arm_option_optimization_table): > Enable -fcode-hoisting with -Os. > > -- > diff --git a/gcc/common/config/arm/arm-common.c > b/gcc/common/config/arm/arm-common.c > index > 41a920f6dc96833e778faa8dbcc19beac483734c..b0d5fb300bf01acc1fb6f4631635f8a1bfe6441c > 100644 > --- a/gcc/common/config/arm/arm-common.c > +++ b/gcc/common/config/arm/arm-common.c > @@ -39,6 +39,8 @@ static const struct default_options > arm_option_optimization_table[] = > /* Enable section anchors by default at -O1 or higher. */ > { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 }, > { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, > +/* Enable code hoisting only with -Os. */ > +{ OPT_LEVELS_SIZE, OPT_fcode_hoisting, NULL, 1 }, > { OPT_LEVELS_NONE, 0, NULL, 0 } >}; > >
Re: [gofrontend-dev] Re: libgo: Update to Go 1.13beta1 release
On Sep 11 2019, Ian Lance Taylor wrote: > On Tue, Sep 10, 2019 at 11:54 PM Andreas Schwab wrote: >> >> On Sep 10 2019, Ian Lance Taylor wrote: >> >> > On Mon, Sep 9, 2019 at 2:00 PM Andreas Schwab >> > wrote: >> >> >> >> ../../../libgo/go/golang.org/x/sys/cpu/cpu.go:17:30: error: reference to >> >> undefined name ‘cacheLineSize’ >> >>17 | type CacheLinePad struct{ _ [cacheLineSize]byte } >> >> | ^ >> >> ../../../libgo/go/golang.org/x/sys/cpu/cpu_linux.go:56:2: error: >> >> reference to undefined name ‘doinit’ >> >>56 | doinit() >> >> | ^ >> > >> > Thanks, should be fixed by SVN revision 275611, just committed. >> >> Nope. >> >> ../../../libgo/go/golang.org/x/sys/cpu/cpu_linux.go:56:2: error: reference >> to undefined name 'doinit' >>56 | doinit() >> | ^ >> make[4]: *** [golang.org/x/sys/cpu.lo] Error 1 > > Bother, sorry. I just committed this patch. Perhaps this will fix it. It does. Thanks, Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH][AArch64] Vectorize MULH(R)S patterns with SVE2 instructions
Yuliang Wang writes: > Hi Richard, > > Thanks for your comments and advice; I have applied the relevant changes. > > Regards, > Yuliang > > > UPDATE: > > Added new tests. Built and regression tested on aarch64-none-elf and > aarch64-linux-gnu. > > gcc/ChangeLog: > > 2019-09-1 Yuliang Wang > > PR tree-optimization/89386 > > * config/aarch64/aarch64-sve2.md (mull) > (shrnb, shrnt): New SVE2 patterns. > (mulhs3): New pattern for MULHRS. > * config/aarch64/iterators.md (UNSPEC_SMULLB, UNSPEC_SMULLT) > (UNSPEC_UMULLB, UNSPEC_UMULLT, UNSPEC_SHRNB, UNSPEC_SHRNT) > (UNSPEC_RSHRNB, UNSPEC_RSHRNT, UNSPEC_SMULHS, UNSPEC_SMULHRS) > UNSPEC_UMULHS, UNSPEC_UMULHRS): New unspecs. > (MULLBT, SHRNB, SHRNT, MULHRS): New int iterators. > (su, r): Handle the unspecs above. > (bt): New int attribute. > * internal-fn.def (IFN_MULHS, IFN_MULHRS): New internal functions. > * internal-fn.c (first_commutative_argument): Commutativity info for > above. > * optabs.def (smulhs_optab, smulhrs_optab, umulhs_optab, umulhrs_optab): > New optabs. > * doc/md.texi (smulhs$var{m3}, umulhs$var{m3}) > (smulhrs$var{m3}, umulhrs$var{m3}): Documentation for the above. > * tree-vect-patterns.c (vect_recog_mulhs_pattern): New pattern function. > (vect_vect_recog_func_ptrs): Add it. > * testsuite/gcc.target/aarch64/sve2/mulhrs_1.c: New test. > * testsuite/gcc.dg/vect/vect-mulhrs-1.c: As above. > * testsuite/gcc.dg/vect/vect-mulhrs-2.c: As above. > * testsuite/gcc.dg/vect/vect-mulhrs-3.c: As above. > * testsuite/gcc.dg/vect/vect-mulhrs-4.c: As above. > * doc/sourcebuild.texi (vect_mulhrs_hi): Document new target selector. > * testsuite/lib/target-supports.exp > (check_effective_target_vect_mulhrs_hi): > Return true for AArch64 without SVE2. (with SVE2) Thanks for doing this. Applied with some very minor whitespace tweaks as r275682. Richard
Re: [RFC PATCH] Add inlining growth bias flag
On Fri, Sep 6, 2019 at 12:59 PM Graham Markall wrote: > > This patch is an RFC to invite comments as to whether the approach > to solving the problem is a suitable one for upstream GCC, or whether > there are alternative approaches that would be recommended. > > Motivation > -- > > We have observed that in some cases the estimation of callee growth for > inlining particular functions can be tuned for better overall code size > with particular programs on particular targets. Although modification of > the heuristics to make a general improvement is a difficult problem to > tackle, simply biasing the growth by a fixed amount can lead to > improvements in code size within the context of a particular program. > > This has first been tested on a proprietary program, where setting the > growth bias to -2 resulted in a saving of 1.35% in the text section size > (62396 bytes as opposed to 63252 bytes). Using the Embench suite ( > https://www.embench.org/ ) for a riscv32 bare-metal target with -Os also > shows that adjusting the inline growth estimate carefully can also > reduce code size. Results presented here are percentages relative to the > Embench reference platform (which is the standard way of presenting > Embench results) for values of the bias from -2 to 2: > > Benchmark -2 -1 0 1 2 > aha-mont64 99.05 99.05 99.05 99.05 99.05 > crc32 100.00 100.00 100.00 100.00 100.00 > cubic94.66 94.66 94.66 94.66 94.66 > edn 100.00 100.00 100.00 100.00 100.00 > huffbench99.88 99.88 99.88 99.88 99.88 > matmult-int 100.00 100.00 100.00 102.86 102.86 > minver 97.96 97.96 97.96 106.88 106.88 > nbody98.87 98.87 98.87 98.87 98.87 > nettle-aes 99.31 99.10 99.10 99.10 99.10 > nettle-sha25699.89 99.89 99.89 99.89 99.89 > nsichneu100.00 100.00 100.00 100.00 100.00 > picojpeg102.56 102.54 99.73 99.38 99.28 > qrduino 99.70 99.70 99.90 99.90 99.90 > sglib-combined 95.52 100.00 100.00 100.43 101.12 > slre100.66 100.66 99.09 98.85 98.85 > st 96.36 96.36 96.36 96.82 96.82 > statemate 100.00 100.00 100.00 100.00 100.00 > ud 100.00 100.00 100.00 100.00 100.00 > wikisort 99.48 99.48 99.48 99.48 99.48 > Mean 99.14 99.36 99.15 99.77 99.80 > > In most cases, leaving the bias at 0 (unmodified) produces the smallest > code. However, there are some cases where an alternative value prevails, > The "Best Diff" column shows the reduction in size compared to leaving > the bias at 0, for cases where changing it yielded an improvement: > > Benchmark BestWorst Best diff > aha-mont64 0 0 > crc32 0 0 > cubic 0 0 > edn 0 0 > huffbench 0 0 > matmult-int 0 1 / 2 > minver 0 1 / 2 > nbody 0 0 > nettle-aes 0 -2 > nettle-sha256 0 0 > nsichneu0 0 > picojpeg2 -2 -0.45% > qrduino -1 /-2 0 -0.20% > sglib-combined -2 1 -4.48% > slre1 / 2 -1 / -2 -0.24% > st 0 1 / 2 > statemate 0 0 > ud 0 0 > wikisort0 0 > > > In summary, for this test setup: > > - In most cases, leaving the growth bias at 0 is optimal. > - Biasing the growth estimate positively may either increase or decrease > code size. > - Biasing the estimate negatively may also either increase or decrease > code size. > - In a small number of cases, biasing the growth estimate improves code > size (up to 4.48% smaller for sglib-combined). > > > Patch > - > > The growth bias can be set with a flag: > > -finline-growth-bias= > > which controls the bias (an increase or decrease) of the inline growth > in ipa-inline.c. In cases where the bias would result in a negative > function size, we clip the growth estimate so that adding the growth to > the size of the function results in a size of 0, by setting the growth > to the negative of the function size. > > There is not a great deal of validation of the argument - if a > non-integer is passed as the parameter (e.g. -finline-growth-bias=xyz), > it will be as if the parameter were 0. More validation could be added if > necessary. It seemed to me that GCC's infrastructure doesn't seem to > anticipate option values / parameters that could contain negative > values. For parameters, -1 seemed to represent an error and could result > in an ICE (-2 etc. pass through OK though). For options, I looked at the > UInteger and Host_Wide_Int numeric types, but they both expect a > positive integer. I did consider extending this with an Integer type > that could accept positive and negative integers, (e.g. starting with > augmenting switch_bit_fields in opt-
Re: [PATCH] DWARF array bounds missing from C++ array definitions
On Thu, Sep 12, 2019 at 11:24 AM Alexandre Oliva wrote: > > A variable redeclaration or definition that provides additional type > information for it, e.g. outermost array bounds, is not reflected in > the debug information for the variable. With this patch, the debug > info of the variable specialization gets a type attribute with the > adjusted type. > > This patch affects mostly only array bounds. However, when the > symbolic type used in a declaration and in a definition are different, > although they refer to the same type, debug information will end up > (correctly?) naming different symbolic types in the specification and > the definition. Also, when a readonly declaration of an array loses > the readonly flag at the definition because of the initializer, the > definition may end up referencing a type while the specification > refers to a const-qualified version of that type. If the type of the > variable is already const-qualified, e.g. an array of a const type, > the difference is meaningless. > > Regstrapped on x86_64-linux-gnu. Ok to install? Is this PR91507? How do you get around the gdb issue? Thanks, Richard. > > for gcc/ChangeLog > > * dwarf2out.c (completing_type_p): New. > (gen_variable_die): Use it. > > for gcc/testsuite/ChangeLog > > * gcc.dg/debug/dwarf2/array-0.c: New. > * gcc.dg/debug/dwarf2/array-1.c: New. > * gcc.dg/debug/dwarf2/array-2.c: New. > * gcc.dg/debug/dwarf2/array-3.c: New. > * g++.dg/debug/dwarf2/array-0.C: New. > * g++.dg/debug/dwarf2/array-1.C: New. > * g++.dg/debug/dwarf2/array-2.C: New. Based on libstdc++-v3's > src/c++98/pool_allocator.cc:__pool_alloc_base::_S_heap_size. > * g++.dg/debug/dwarf2/array-3.C: New. Based on > gcc's config/i386/i386-features.c:xlogue_layout::s_instances. > * g++.dg/debug/dwarf2/array-4.C: New. > --- > gcc/dwarf2out.c | 31 > ++- > gcc/testsuite/g++.dg/debug/dwarf2/array-0.C | 13 +++ > gcc/testsuite/g++.dg/debug/dwarf2/array-1.C | 13 +++ > gcc/testsuite/g++.dg/debug/dwarf2/array-2.C | 15 + > gcc/testsuite/g++.dg/debug/dwarf2/array-3.C | 20 + > gcc/testsuite/g++.dg/debug/dwarf2/array-4.C | 16 ++ > gcc/testsuite/gcc.dg/debug/dwarf2/array-0.c | 10 + > gcc/testsuite/gcc.dg/debug/dwarf2/array-1.c | 10 + > gcc/testsuite/gcc.dg/debug/dwarf2/array-2.c |8 +++ > gcc/testsuite/gcc.dg/debug/dwarf2/array-3.c |8 +++ > 10 files changed, 143 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-0.C > create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-1.C > create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-2.C > create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-3.C > create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-4.C > create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-0.c > create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-1.c > create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-2.c > create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-3.c > > diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c > index c359c2d4af981..ad533c14d2480 100644 > --- a/gcc/dwarf2out.c > +++ b/gcc/dwarf2out.c > @@ -23687,6 +23687,33 @@ local_function_static (tree decl) > && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL; > } > > +/* Return true iff DECL completes (overrides) the type of OLD_DIE > + within CONTEXT_DIE. */ > + > +static bool > +completing_type_p (tree decl, dw_die_ref old_die, dw_die_ref context_die) > +{ > + tree type = TREE_TYPE (decl); > + int cv_quals; > + > + if (decl_by_reference_p (decl)) > +{ > + type = TREE_TYPE (type); > + cv_quals = TYPE_UNQUALIFIED; > +} > + else > +cv_quals = decl_quals (decl); > + > + dw_die_ref type_die = modified_type_die (type, > + cv_quals | TYPE_QUALS (type), > + false, > + context_die); > + > + dw_die_ref old_type_die = get_AT_ref (old_die, DW_AT_type); > + > + return type_die != old_type_die; > +} > + > /* Generate a DIE to represent a declared data object. > Either DECL or ORIGIN must be non-null. */ > > @@ -23939,7 +23966,9 @@ gen_variable_die (tree decl, tree origin, dw_die_ref > context_die) > && !DECL_ABSTRACT_P (decl_or_origin) > && variably_modified_type_p (TREE_TYPE (decl_or_origin), >decl_function_context > - (decl_or_origin > + (decl_or_origin))) > + || (old_die && specialization_p > + && completing_type_p (decl_or_origin, old_die, context_die))) > { >tree type = TREE_TYPE (decl_or_origi
Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
Hi, Michael, Since I was involved in other tasks, it is a little bit late to reply you. Sorry for that. I composed a new one with your suggestions. Please review that when you are in convenience. > Generally I do like the idea of the transformation, and the basic building > blocks seem to be sound. But I dislike it being a separate pass, so > please integrate the code you have written into the existing loop split > pass. Most building blocks can be used as is, except the main driver. This new transformation was integrated into the pass of original loop split. >> +@item max-cond-loop-split-insns >> +The maximum number of insns to be increased due to loop split on >> +semi-invariant condition statement. > "to be increased" --> "to be generated" (or "added") Done. >> +@item min-cond-loop-split-prob >> +The minimum threshold for probability of semi-invaraint condition >> +statement to trigger loop split. > typo, semi-invariant Done. > I think somewhere in the docs your definition of semi-invariant needs > to be stated in some form (can be short, doesn't need to reproduce the > diagram or such), so don't just replicate the short info from the > params.def file. Done. >> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB, >> + "min-cond-loop-split-prob", >> + "The minimum threshold for probability of semi-invaraint condition " >> + "statement to trigger loop split.", > Same typo: "semi-invariant". Done. >> -/* This file implements loop splitting, i.e. transformation of loops like >> +/* This file implements two kind of loop splitting. > kind_s_, plural Done. >> +/* Another transformation of loops like: >> + >> + for (i = INIT (); CHECK (i); i = NEXT ()) >> + { >> + if (expr (a_1, a_2, ..., a_n)) >> + a_j = ...; // change at least one a_j >> + else >> + S; // not change any a_j >> + } > You should mention that 'expr' needs to be pure, i.e. once it > becomes false and the inputs don't change, that it remains false. Done. >> +static bool >> +branch_removable_p (basic_block branch_bb) >> +{ >> + if (single_pred_p (branch_bb)) >> +return true; >> + >> + edge e; >> + edge_iterator ei; >> + >> + FOR_EACH_EDGE (e, ei, branch_bb->preds) >> +{ >> + if (dominated_by_p (CDI_DOMINATORS, e->src, branch_bb)) >> + continue; >> + >> + if (dominated_by_p (CDI_DOMINATORS, branch_bb, e->src)) >> + continue; > My gut feeling is surprised by this. So one of the predecessors of > branch_bb dominates it. Why should that indicate that branch_bb > can be safely removed? > > Think about something like this: > > esrc --> cond_bb --> branch_bb > '---^ If all predecessors of branch_bb dominate it, these predecessors should also be in dominating relationship among them, and the conditional statement must be branch_bb's immediate dominator, and branch_bb is removable. In your example. For "esrc", loop is continued, nothing is impacted. But in the next iteration, we encounter "cond_bb", it does not dominate "branch_bb", so the function return false in the following return statement. > (cond_bb is the callers bb of the cond statement in question). Now esrc > dominates branch_bb but still you can't simply remove it, even if > the cond_bb->branch_bb edge becomes unexecutable. >> +static int >> +get_cond_invariant_branch (struct loop *loop, gcond *cond) > Please return an edge here, not an edge index (which removes the using of > -1). I think the name (and comment) should consistently talk about > semi-invariant, not invariant. For when you really need an edge index > later, you could use "EDGE_SUCC(bb, 0) != edge". But you probably don't > really need it, e.g. instead of using the gimple pass-local-flag on a > statement you can just as well also store the edge in your info structure. Done. >> +static bool >> +is_cond_in_hidden_loop (const struct loop *loop, basic_block cond_bb, >> + int branch) > With above change in get_cond_invariant_branch, this also should > take an edge, not a bb+edge-index. Done. >> +static int >> +compute_added_num_insns (struct loop *loop, const_basic_block cond_bb, >> +int branch) > This should take an edge as well. Done. >> + for (unsigned i = 0; i < loop->num_nodes; i++) >> +{ >> + /* Do no count basic blocks only in opposite branch. */ >> + if (dominated_by_p (CDI_DOMINATORS, bbs[i], targ_bb_var)) >> + continue; >> + >> + for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p >> (gsi); >> + gsi_next (&gsi)) >> + num += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights); > Replace the loop by > estimate_num_insn_seq (bb_seq (bbs[i]), &eni_size_weights); Done. >> + /* Add a threshold for increased code size to disable loop split. */ >> + if (compute_added_num_insns (loop, cond_bb, branch) > >> + PARAM_VALUE (PARAM_MAX_COND_LOOP_SPLIT_INSNS)) > Operator should
[PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)
--- diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 1391a562c35..28981fa1048 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -11418,6 +11418,19 @@ The maximum number of branches unswitched in a single loop. @item lim-expensive The minimum cost of an expensive expression in the loop invariant motion. +@item max-cond-loop-split-insns +In a loop, if a branch of a conditional statement is selected since certain +loop iteration, any operand that contributes to computation of the conditional +expression remains unchanged in all following iterations, the statement is +semi-invariant, upon which we can do a kind of loop split transformation. +@option{max-cond-loop-split-insns} controls maximum number of insns to be +added due to loop split on semi-invariant conditional statement. + +@item min-cond-loop-split-prob +When FDO profile information is available, @option{min-cond-loop-split-prob} +specifies minimum threshold for probability of semi-invariant condition +statement to trigger loop split. + @item iv-consider-all-candidates-bound Bound on number of candidates for induction variables, below which all candidates are considered for each use in induction variable diff --git a/gcc/params.def b/gcc/params.def index 13001a7bb2d..12bc8c26c9e 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL, "The maximum number of unswitchings in a single loop.", 3, 0, 0) +/* The maximum number of increased insns due to loop split on semi-invariant + condition statement. */ +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS, + "max-cond-loop-split-insns", + "The maximum number of insns to be added due to loop split on " + "semi-invariant condition statement.", + 100, 0, 0) + +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB, + "min-cond-loop-split-prob", + "The minimum threshold for probability of semi-invariant condition " + "statement to trigger loop split.", + 30, 0, 100) + /* The maximum number of insns in loop header duplicated by the copy loop headers pass. */ DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS, diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C new file mode 100644 index 000..51f9da22fc7 --- /dev/null +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */ + +#include +#include + +using namespace std; + +class A +{ +public: + bool empty; + void set (string s); +}; + +class B +{ + map m; + void f (); +}; + +extern A *ga; + +void B::f () +{ + for (map::iterator iter = m.begin (); iter != m.end (); ++iter) +{ + if (ga->empty) +ga->set (iter->second); +} +} + +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c new file mode 100644 index 000..bbd522d6bcd --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */ + +__attribute__((pure)) __attribute__((noinline)) int inc (int i) +{ + return i + 1; +} + +extern int do_something (void); +extern int b; + +void test(int n) +{ + int i; + + for (i = 0; i < n; i = inc (i)) +{ + if (b) +b = do_something(); +} +} + +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } */ diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c index f5f083384bc..e4a1b6d2019 100644 --- a/gcc/tree-ssa-loop-split.c +++ b/gcc/tree-ssa-loop-split.c @@ -32,7 +32,10 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa-loop.h" #include "tree-ssa-loop-manip.h" #include "tree-into-ssa.h" +#include "tree-inline.h" +#include "tree-cfgcleanup.h" #include "cfgloop.h" +#include "params.h" #include "tree-scalar-evolution.h" #include "gimple-iterator.h" #include "gimple-pretty-print.h" @@ -40,7 +43,9 @@ along with GCC; see the file COPYING3. If not see #include "gimple-fold.h" #include "gimplify-me.h" -/* This file implements loop splitting, i.e. transformation of loops like +/* This file implements two kinds of loop splitting. + + One transformation of loops like: for (i = 0; i < 100; i++) { @@ -612,6 +617,722 @@ split_loop (class loop *loop1, class tree_niter_desc *niter) return changed; } +/* Another transformation of loops like: + + for (i = INIT (); CHECK (i); i = NEXT ()) + { + if (expr (a_1, a_2, ..., a_n)) // expr is pure + a_j = ...; // change at least one a_j + else + S; // not change any a_j + } + + into: + + for (i = INIT (); CHECK (i); i = NEXT ()) + { + if (expr (a_1, a_2, ..., a_n)) + a_j = ...; + else + { +
Re: C++ PATCH for c++/91740 - ICE with constexpr call and ?: in ARRAY_REF
On Thu, Sep 12, 2019 at 11:08:43AM +0200, Paolo Carlini wrote: > Hi again, > > On 12/09/19 11:03, Paolo Carlini wrote: > > Hi, > > > > On 11/09/19 23:15, Marek Polacek wrote: > > > --- gcc/cp/pt.c > > > +++ gcc/cp/pt.c > > > @@ -26709,7 +26709,7 @@ build_non_dependent_expr (tree expr) > > > if (TREE_CODE (expr) == COND_EXPR) > > > return build3 (COND_EXPR, > > > TREE_TYPE (expr), > > > - TREE_OPERAND (expr, 0), > > > + build_non_dependent_expr (TREE_OPERAND (expr, 0)), > > > (TREE_OPERAND (expr, 1) > > > ? build_non_dependent_expr (TREE_OPERAND (expr, 1)) > > > : build_non_dependent_expr (TREE_OPERAND (expr, 0))), > > > > Looks like we would end up unnecessarily calling > > build_non_dependent_expr three times instead of two: probably is very > > cheap, probably the code is cleaner this way but I'm a little annoyed at > > this anyway, for the record ;) > > Sorry, I misread the code: normally TREE_OPERAND (expr, 1) isn't NULL_TREE > thus we are fine. And I forgot to mention that build_x_conditional_expr has 6743 ifexp = build_non_dependent_expr (ifexp); 6744 if (op1) 6745 op1 = build_non_dependent_expr (op1); 6746 op2 = build_non_dependent_expr (op2); which means my fix should make more sense. -- Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA
Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
On Mon, Jul 15, 2019 at 4:20 AM Feng Xue OS wrote: > > Some time passed, so ping again. I made this patch, because it can reward us > with 7% > > performance benefit in some real application. For convenience, the > optimization to be > > implemented was listed in the following again. And hope your comments on the > patch, or > > design suggestions. Thanks! Replying again to the very first post since it contains the figure below. > Suppose a loop as: > > void f (std::map m) > { > for (auto it = m.begin (); it != m.end (); ++it) { > /* if (b) is semi-invariant. */ > if (b) { > b = do_something();/* Has effect on b */ > } else { > /* No effect on b */ > } > statements; /* Also no effect on b */ > } > } > > A transformation, kind of loop split, could be: > > void f (std::map m) > { > for (auto it = m.begin (); it != m.end (); ++it) { > if (b) { > b = do_something(); > } else { > ++it; > statements; > break; > } > statements; > } > > for (; it != m.end (); ++it) { > statements; > } > } So if you consider approaching this from unswitching instead we'd unswitch it on if (b) but treat the condition as constant only in the 'false' path, thus the transformed code would look like the following. I believe implementing this in the existing unswitching pass involves a lot less code than putting it into the splitting pass but it would catch exactly the same cases? if (b) { for (auto it = m.begin (); it != m.end (); ++it) { /* if (b) is non-invariant. */ if (b) { b = do_something();/* Has effect on b */ } else { /* No effect on b */ } statements; /* Also no effect on b */ } } else { for (auto it = m.begin (); it != m.end (); ++it) { /* if (b) is invariant. */ if (false) { b = do_something();/* Has effect on b */ } else { /* No effect on b */ } statements; /* Also no effect on b */ } } > If "statements" contains nothing, the second loop becomes an empty one, which > can be removed. > And if "statements" are straight line instructions, we get an opportunity to > vectorize the second loop. > > Feng > > > > From: Feng Xue OS > Sent: Tuesday, June 18, 2019 3:00 PM > To: Richard Biener; Michael Matz > Cc: gcc-patches@gcc.gnu.org > Subject: Ping: [PATCH V2] Loop split upon semi-invariant condition (PR > tree-optimization/89134) > > Richard & Michael, > >I made some adjustments on coding style and added test cases for this > version. > >Would you please take a look at the patch? It is long a little bit and > might steal some >of your time. > > Thanks a lot. > > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 9a46f93d89d..2334b184945 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,23 @@ > +2019-06-18 Feng Xue > + > + PR tree-optimization/89134 > + * doc/invoke.texi (max-cond-loop-split-insns): Document new --params. > + (min-cond-loop-split-prob): Likewise. > + * params.def: Add max-cond-loop-split-insns, min-cond-loop-split-prob. > + * passes.def (pass_cond_loop_split) : New pass. > + * timevar.def (TV_COND_LOOP_SPLIT): New time variable. > + * tree-pass.h (make_pass_cond_loop_split): New declaration. > + * tree-ssa-loop-split.c (split_info): New class. > + (find_vdef_in_loop, vuse_semi_invariant_p): New functions. > + (ssa_semi_invariant_p, stmt_semi_invariant_p): Likewise. > + (branch_removable_p, get_cond_invariant_branch): Likewise. > + (is_cond_in_hidden_loop, compute_added_num_insns): Likewise. > + (can_split_loop_on_cond, mark_cond_to_split_loop): Likewise. > + (split_loop_for_cond, tree_ssa_split_loops_for_cond): Likewise. > + (pass_data_cond_loop_split): New variable. > + (pass_cond_loop_split): New class. > + (make_pass_cond_loop_split): New function. > + > 2019-06-18 Kewen Lin > > PR middle-end/80791 > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index eaef4cd63d2..0427fede3d6 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a > single loop. > @item lim-expensive > The minimum cost of an expensive expression in the loop invariant motion. > > +@item max-cond-loop-split-insns > +The maximum number of insns
[PATCH] PR libstdc++/91748 fix std::for_each_n for random access iterators
This fixes two silly mistakes I made, which weren't caught by the existing tests. PR libstdc++/91748 * include/bits/stl_algo.h (for_each_n): Fix random access iterator case. * testsuite/25_algorithms/for_each/for_each_n.cc: Test with random access iterators. Tested powerpc64le-linux, committed to trunk and gcc-9-branch. commit c3ae6f9b28e496d03e5c1a8eafd2a1169ef6ab65 Author: redi Date: Thu Sep 12 10:51:39 2019 + PR libstdc++/91748 fix std::for_each_n for random access iterators PR libstdc++/91748 * include/bits/stl_algo.h (for_each_n): Fix random access iterator case. * testsuite/25_algorithms/for_each/for_each_n.cc: Test with random access iterators. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@275683 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h index bece93379de..7fe1d8a5734 100644 --- a/libstdc++-v3/include/bits/stl_algo.h +++ b/libstdc++-v3/include/bits/stl_algo.h @@ -3993,7 +3993,11 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO auto __n2 = std::__size_to_integer(__n); using _Cat = typename iterator_traits<_InputIterator>::iterator_category; if constexpr (is_base_of_v) - return std::for_each(__first, __first + __n2, __f); + { + auto __last = __first + __n2; + std::for_each(__first, __last, std::move(__f)); + return __last; + } else { while (__n2-->0) diff --git a/libstdc++-v3/testsuite/25_algorithms/for_each/for_each_n.cc b/libstdc++-v3/testsuite/25_algorithms/for_each/for_each_n.cc index 57c2bbe6d36..016ff57cb28 100644 --- a/libstdc++-v3/testsuite/25_algorithms/for_each/for_each_n.cc +++ b/libstdc++-v3/testsuite/25_algorithms/for_each/for_each_n.cc @@ -47,11 +47,42 @@ void test01() }; auto res = std::for_each_n(con.begin(), Size(con.size()), Func(sum)); - VERIFY( res.ptr == con.end().ptr ); + VERIFY( res == con.end() ); VERIFY( sum == 15 ); } +void +test02() +{ + using __gnu_test::test_container; + using __gnu_test::random_access_iterator_wrapper; + int array[5] = { 2, 4, 6, 8, 10 }; + test_container con(array); + + int prod = 1; + struct Func + { +Func(int& i) : i(i) { } +Func(Func&&) = default; +Func& operator=(Func&&) = delete; +void operator()(int n) const { i *= n; } +int& i; + }; + + struct Size + { +Size(short v) : val(v) { } +operator short() const { return val; } +short val; + }; + auto res = std::for_each_n(con.begin(), Size(con.size()), Func(prod)); + + VERIFY( res == con.end() ); + VERIFY( prod == 3840 ); +} + int main() { test01(); + test02(); }
Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)
Hi Richard, > Do we document target specific deviations from "default" behavior somewhere? Not as far as I know. The other option changes in arm-common.c are not mentioned anywhere, neither is any of arm_option_override_internal. If we want to keep documentation useful, we shouldn't clutter the generic options with target-specific deviations from the default settings. What might be feasible is adding a section for each target which describes changes from the default settings. Wilco
Re: [PATCH] DWARF array bounds missing from C++ array definitions
On Sep 12, 2019, Richard Biener wrote: > Is this PR91507? Looks like it. Interesting, I've had this patch sitting in my tree since early June, waiting for the additional verification I completed last night. That was long before the PR was filed. > How do you get around the gdb issue? I was not even aware of one. I focused on preserving at what I regarded as the right place the information we currently dropped on the floor, figuring if any consumers wouldn't take the type information from the definition as overriding/completing that of the specification, they'd eventually be adjusted to do so. The approach I chose was to add the completion type to the definition, not to the specification. I figured leaving the specification alone, reflecting the information available at its source location, and providing the complete type information at the source location that supplies it, was the right thing to do, regardless of whatever debug information consumers were able to do with that information. There's room for dispute as to the correctness of this approach, however. Someone might argue that we should have a separate (IMHO excessive) completion non-defining declaration, pointing back to the initial incomplete declaration with a DW_AT_specification, and perhaps to omit the incomplete type from the incomplete specification, though that doesn't seem to be in line with the common practice of overriding declaration coordinates in the definition. -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Be the change, be Free! FSF.org & FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)
On Thu, Sep 12, 2019 at 1:29 PM Wilco Dijkstra wrote: > > Hi Richard, > > > Do we document target specific deviations from "default" behavior somewhere? > > Not as far as I know. The other option changes in arm-common.c are not > mentioned > anywhere, neither is any of arm_option_override_internal. > > If we want to keep documentation useful, we shouldn't clutter the generic > options > with target-specific deviations from the default settings. What might be > feasible is > adding a section for each target which describes changes from the default > settings. Agreed. Richard. > Wilco
Re: [PATCH] DWARF array bounds missing from C++ array definitions
On Thu, Sep 12, 2019 at 1:36 PM Alexandre Oliva wrote: > > On Sep 12, 2019, Richard Biener wrote: > > > Is this PR91507? > > Looks like it. Interesting, I've had this patch sitting in my tree > since early June, waiting for the additional verification I completed > last night. That was long before the PR was filed. > > > How do you get around the gdb issue? > > I was not even aware of one. I focused on preserving at what I regarded > as the right place the information we currently dropped on the floor, > figuring if any consumers wouldn't take the type information from the > definition as overriding/completing that of the specification, they'd > eventually be adjusted to do so. > > > The approach I chose was to add the completion type to the definition, > not to the specification. I figured leaving the specification alone, > reflecting the information available at its source location, and > providing the complete type information at the source location that > supplies it, was the right thing to do, regardless of whatever debug > information consumers were able to do with that information. Yes, I agree, this is what my patch in the PR does as well, albeit just in the place where we add the link to the specification DIE and the unconditionally if there's a disagreeing type DIE (we force a type DIE earlier in the caller). Your new predicate looks a bit excessive here given it builds the type again? But that's implementation detail I guess. So I'm obviously fine with your patch and I guess if we independently arrive at this solution that answers my question what "correct DWARF" is by a majority decision ;) So - maybe we can have the patch a bit cleaner by adding a flag to add_type_attribute saying we only want it if it's different from that already present (on the specification DIE)? Thanks, Richard. > There's room for dispute as to the correctness of this approach, > however. Someone might argue that we should have a separate (IMHO > excessive) completion non-defining declaration, pointing back to the > initial incomplete declaration with a DW_AT_specification, and perhaps > to omit the incomplete type from the incomplete specification, though > that doesn't seem to be in line with the common practice of overriding > declaration coordinates in the definition. > > -- > Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo > Be the change, be Free! FSF.org & FSF Latin America board member > GNU Toolchain EngineerFree Software Evangelist > Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
[PATCH] Fix PR91750
The following fixes induction vectorization to use the appropriate type for the IV init and update computations avoiding to create undefined behavior from inappropriately using a signed integer type. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2019-09-12 Richard Biener PR tree-optimization/91750 * tree-vect-loop.c (vectorizable_induction): Compute IV increments in the type of the evolution. * gcc.dg/vect/pr91750.c: New testcase. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 275681) +++ gcc/tree-vect-loop.c(working copy) @@ -7605,6 +7605,7 @@ vectorizable_induction (stmt_vec_info st step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info); gcc_assert (step_expr != NULL_TREE); + tree step_vectype = get_same_sized_vectype (TREE_TYPE (step_expr), vectype); pe = loop_preheader_edge (iv_loop); init_expr = PHI_ARG_DEF_FROM_EDGE (phi, @@ -7613,8 +7614,8 @@ vectorizable_induction (stmt_vec_info st stmts = NULL; if (!nested_in_vect_loop) { - /* Convert the initial value to the desired type. */ - tree new_type = TREE_TYPE (vectype); + /* Convert the initial value to the IV update type. */ + tree new_type = TREE_TYPE (step_expr); init_expr = gimple_convert (&stmts, new_type, init_expr); /* If we are using the loop mask to "peel" for alignment then we need @@ -7634,9 +7635,6 @@ vectorizable_induction (stmt_vec_info st } } - /* Convert the step to the desired type. */ - step_expr = gimple_convert (&stmts, TREE_TYPE (vectype), step_expr); - if (stmts) { new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts); @@ -7669,8 +7667,8 @@ vectorizable_induction (stmt_vec_info st if (! CONSTANT_CLASS_P (new_name)) new_name = vect_init_vector (stmt_info, new_name, TREE_TYPE (step_expr), NULL); - new_vec = build_vector_from_val (vectype, new_name); - vec_step = vect_init_vector (stmt_info, new_vec, vectype, NULL); + new_vec = build_vector_from_val (step_vectype, new_name); + vec_step = vect_init_vector (stmt_info, new_vec, step_vectype, NULL); /* Now generate the IVs. */ unsigned group_size = SLP_TREE_SCALAR_STMTS (slp_node).length (); @@ -7683,7 +7681,7 @@ vectorizable_induction (stmt_vec_info st unsigned ivn; for (ivn = 0; ivn < nivs; ++ivn) { - tree_vector_builder elts (vectype, const_nunits, 1); + tree_vector_builder elts (step_vectype, const_nunits, 1); stmts = NULL; for (unsigned eltn = 0; eltn < const_nunits; ++eltn) { @@ -7694,6 +7692,7 @@ vectorizable_induction (stmt_vec_info st elts.quick_push (elt); } vec_init = gimple_build_vector (&stmts, &elts); + vec_init = gimple_convert (&stmts, vectype, vec_init); if (stmts) { new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts); @@ -7708,10 +7707,13 @@ vectorizable_induction (stmt_vec_info st induc_def = PHI_RESULT (induction_phi); /* Create the iv update inside the loop */ - vec_def = make_ssa_name (vec_dest); - new_stmt = gimple_build_assign (vec_def, PLUS_EXPR, induc_def, vec_step); - gsi_insert_before (&si, new_stmt, GSI_SAME_STMT); - loop_vinfo->add_stmt (new_stmt); + gimple_seq stmts = NULL; + vec_def = gimple_convert (&stmts, step_vectype, induc_def); + vec_def = gimple_build (&stmts, + PLUS_EXPR, step_vectype, vec_def, vec_step); + vec_def = gimple_convert (&stmts, vectype, vec_def); + loop_vinfo->add_stmt (SSA_NAME_DEF_STMT (vec_def)); + gsi_insert_seq_before (&si, stmts, GSI_SAME_STMT); /* Set the arguments of the phi node: */ add_phi_arg (induction_phi, vec_init, pe, UNKNOWN_LOCATION); @@ -7739,8 +7741,8 @@ vectorizable_induction (stmt_vec_info st if (! CONSTANT_CLASS_P (new_name)) new_name = vect_init_vector (stmt_info, new_name, TREE_TYPE (step_expr), NULL); - new_vec = build_vector_from_val (vectype, new_name); - vec_step = vect_init_vector (stmt_info, new_vec, vectype, NULL); + new_vec = build_vector_from_val (step_vectype, new_name); + vec_step = vect_init_vector (stmt_info, new_vec, step_vectype, NULL); for (; ivn < nvects; ++ivn) { gimple *iv = SLP_TREE_VEC_STMTS (slp_node)[ivn - nivs]->stmt; @@ -7749,18 +7751,20 @@ vectorizable_induction (stmt_vec_info st def = gimple_phi_result (iv); else def = gimple_assign_lhs (iv); - new_stmt = gimple_build_assign (make_ssa_name (vectype), -
[PATCH] Microblaze: Type confusion in fprintf
A type confusion leads to illegal (and nonsensical) assembly on certain host architectures (e.g. ARM), where HOST_WIDE_INT (64 bit) and unsigned long (32 bit) have different alignments. So this has printed an uninitialized register instead of the intended value. This fixes float constant initialization on an ARM->Microblaze cross compiler. --- a/gcc/config/microblaze/microblaze.c +++ b/gcc/config/microblaze/microblaze.c @@ -2468,7 +2468,7 @@ print_operand (FILE * file, rtx op, int letter) unsigned long value_long; REAL_VALUE_TO_TARGET_SINGLE (*CONST_DOUBLE_REAL_VALUE (op), value_long); - fprintf (file, HOST_WIDE_INT_PRINT_HEX, value_long); + fprintf (file, "%#lx", value_long); } else {
Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)
>> Suppose a loop as: >> >> void f (std::map m) >> { >> for (auto it = m.begin (); it != m.end (); ++it) { >> /* if (b) is semi-invariant. */ >> if (b) { >> b = do_something();/* Has effect on b */ >> } else { >> /* No effect on b */ >> } >> statements; /* Also no effect on b */ >> } >> } >> >> A transformation, kind of loop split, could be: >> >> void f (std::map m) >> { >> for (auto it = m.begin (); it != m.end (); ++it) { >> if (b) { >> b = do_something(); >> } else { >> ++it; >> statements; >> break; >> } >> statements; >> } >> >> for (; it != m.end (); ++it) { >> statements; >> } >> } > So if you consider approaching this from unswitching instead we'd > unswitch it on if (b) but > treat the condition as constant only in the 'false' path, thus the > transformed code would > look like the following. I believe implementing this in the existing > unswitching pass > involves a lot less code than putting it into the splitting pass but > it would catch > exactly the same cases? May not. Firstly, the following transformation is legal only when "b" is semi-invariant, which means once a branch of "if (b)" is selected since certain iteration, execution will always go to that branch in all following iterations. Most of code in this loop-split patch was composed to check semi-invariantness of a conditional expression, which must also be needed in loop-unswitch solution. Secondly, to duplicate/move an invariant expression out of loop is simple, for all intermediate computations should already be outside, only need to handle its result SSA that is inside the loop. But for semi-invariant, we have to duplicate a tree of statements that contributes to computation of the condition expression, similar to code hoisting, which make us write extra code. And loop-unswitch solution is weaker than loop-split. Suppose initial value of "b" is true, and is changed to false after some iterations, not only unswitch does not help that, but also it introduces extra cost due to enclosing "if (b)". > if (b) >{ > for (auto it = m.begin (); it != m.end (); ++it) { > /* if (b) is non-invariant. */ > if (b) { > b = do_something();/* Has effect on b */ > } else { > /* No effect on b */ > } > statements; /* Also no effect on b */ > } > } > else > { > for (auto it = m.begin (); it != m.end (); ++it) { > /* if (b) is invariant. */ > if (false) { > b = do_something();/* Has effect on b */ > } else { > /* No effect on b */ > } > statements; /* Also no effect on b */ > } > }
Re: [Patch, fortran] PR91717 - ICE on concatenating deferred-length character and character literal
On Thu, Sep 12, 2019 at 09:55:20AM +0100, Paul Richard Thomas wrote: > > OK to commit? > Yes. -- Steve
Re: [PATCH] Fix PR 91708
On 9/12/19 10:08 AM, Richard Biener wrote: > On Wed, 11 Sep 2019, Bernd Edlinger wrote: > >> On 9/11/19 8:30 PM, Richard Biener wrote: > > More like the following? I wonder if we can assert that > MEM_NOTRAP_P () are equal (see all the for_gcse checks in exp_equiv_p). > But as said earlier I wonder in which cases a replacement is > profitable at all - thus, if a > > else if (MEM_P (trial)) > /* Do not replace anything with a MEM. */ > ; > Yes, I like that better, since there is essentially nothing stopping it from replacing a REG with a MEM, except the rtxcost function, maybe. It turned out that the previous version got into an endless loop here, since the loop termination happens when replacing MEM by itself, except when something else terminates the search. So how about this? The only possible MEM->MEM transformations are now: - replacing trial = dest (result mov dest, dest; which is a no-op insn) - replacing trial = src (which is a no-op transformation, and exits the loop) Therefore the overlapping mem move handling no longer necessary. Bootstrap and reg-tested on x86_64-pc-linux-gnu. Is it OK for trunk? Thanks Bernd. 2019-09-12 Bernd Edlinger PR middle-end/91708 * cse.c (cse_insn): Do not replace anything with a MEM. --- gcc/cse.c.orig 2019-07-24 21:21:53.590065924 +0200 +++ gcc/cse.c 2019-09-12 12:50:18.566801155 +0200 @@ -5238,23 +5238,6 @@ cse_insn (rtx_insn *insn) src_elt_cost = MAX_COST; } - /* Avoid creation of overlapping memory moves. */ - if (MEM_P (trial) && MEM_P (dest) && !rtx_equal_p (trial, dest)) - { - rtx src, dest; - - /* BLKmode moves are not handled by cse anyway. */ - if (GET_MODE (trial) == BLKmode) - break; - - src = canon_rtx (trial); - dest = canon_rtx (SET_DEST (sets[i].rtl)); - - if (!MEM_P (src) || !MEM_P (dest) - || !nonoverlapping_memrefs_p (src, dest, false)) - break; - } - /* Try to optimize (set (reg:M N) (const_int A)) (set (reg:M2 O) (const_int B)) @@ -5385,6 +5368,12 @@ cse_insn (rtx_insn *insn) /* Do nothing for this case. */ ; + /* Do not replace anything with a MEM, except the replacement + is a no-op. This allows this loop to terminate. */ + else if (MEM_P (trial) && !rtx_equal_p (trial, SET_SRC(sets[i].rtl))) + /* Do nothing for this case. */ + ; + /* Look for a substitution that makes a valid insn. */ else if (validate_unshare_change (insn, &SET_SRC (sets[i].rtl), trial, 0))
[C++ Patch] Another bunch of location fixes
Hi, nothing special here, various bits I missed so far or in the past meant to test more thoroughly. Thanks, Paolo. // /cp 2019-09-12 Paolo Carlini * decl.c (grokdeclarator): Use declspecs->locations and declarator->id_loc in a few error messages. * pt.c (finish_member_template_decl): Use DECL_SOURCE_LOCATION. (push_template_decl_real): Likewise. /testsuite 2019-09-12 Paolo Carlini * g++.dg/ext/int128-6.C: New. * c-c++-common/pr68107.c: Test location(s). * g++.dg/other/large-size-array.C: Likewise. * g++.dg/template/dtor2.C: Likewise. * g++.dg/template/error9.C: Likewise. * g++.dg/tls/diag-2.C: Likewise. * g++.dg/tls/diag-4.C: Likewise. * g++.dg/tls/diag-5.C: Likewise. * g++.old-deja/g++.pt/memtemp71.C: Likewise. Index: cp/decl.c === --- cp/decl.c (revision 275681) +++ cp/decl.c (working copy) @@ -10948,14 +10948,15 @@ grokdeclarator (const cp_declarator *declarator, { if (! int_n_enabled_p[declspecs->int_n_idx]) { - error ("%<__int%d%> is not supported by this target", -int_n_data[declspecs->int_n_idx].bitsize); + error_at (declspecs->locations[ds_type_spec], + "%<__int%d%> is not supported by this target", + int_n_data[declspecs->int_n_idx].bitsize); explicit_intN = false; } /* Don't pedwarn if the alternate "__intN__" form has been used instead of "__intN". */ else if (!int_n_alt && pedantic && ! in_system_header_at (input_location)) - pedwarn (input_location, OPT_Wpedantic, + pedwarn (declspecs->locations[ds_type_spec], OPT_Wpedantic, "ISO C++ does not support %<__int%d%> for %qs", int_n_data[declspecs->int_n_idx].bitsize, name); } @@ -11330,7 +11331,10 @@ grokdeclarator (const cp_declarator *declarator, && storage_class != sc_static) || typedef_p)) { - error ("multiple storage classes in declaration of %qs", name); + location_t loc + = min_location (declspecs->locations[ds_thread], + declspecs->locations[ds_storage_class]); + error_at (loc, "multiple storage classes in declaration of %qs", name); thread_p = false; } if (decl_context != NORMAL @@ -11489,7 +11493,9 @@ grokdeclarator (const cp_declarator *declarator, type = create_array_type_for_decl (dname, type, declarator->u.array.bounds, declarator->id_loc); - if (!valid_array_size_p (input_location, type, dname)) + if (!valid_array_size_p (dname + ? declarator->id_loc : input_location, + type, dname)) type = error_mark_node; if (declarator->std_attributes) Index: cp/pt.c === --- cp/pt.c (revision 275681) +++ cp/pt.c (working copy) @@ -298,7 +298,8 @@ finish_member_template_decl (tree decl) return NULL_TREE; } else if (TREE_CODE (decl) == FIELD_DECL) -error ("data member %qD cannot be a member template", decl); +error_at (DECL_SOURCE_LOCATION (decl), + "data member %qD cannot be a member template", decl); else if (DECL_TEMPLATE_INFO (decl)) { if (!DECL_TEMPLATE_SPECIALIZATION (decl)) @@ -310,7 +311,8 @@ finish_member_template_decl (tree decl) return decl; } else -error ("invalid member template declaration %qD", decl); +error_at (DECL_SOURCE_LOCATION (decl), + "invalid member template declaration %qD", decl); return error_mark_node; } @@ -5515,7 +5517,8 @@ push_template_decl_real (tree decl, bool is_friend /* [temp.mem] A destructor shall not be a member template. */ - error ("destructor %qD declared as member template", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "destructor %qD declared as member template", decl); return error_mark_node; } if (IDENTIFIER_NEWDEL_OP_P (DECL_NAME (decl)) Index: testsuite/c-c++-common/pr68107.c === --- testsuite/c-c++-common/pr68107.c(revision 275681) +++ testsuite/c-c++-common/pr68107.c(working copy) @@ -3,34 +3,34 @@ #define N ((__SIZE_MAX__ / sizeof (int)) / 2 + 1) -typedef int (*T1)[N]; /* { dg-error "exceeds maximum object size" } */ +typedef int (*T1)[N]; /* { dg-error "15:exceeds maximum object size" } */ typedef int (*T2)[N - 1]; -typedef int (*T3)[N][N]; /* { dg-error "exceeds maximum object size" } */ -typedef int (*T4)[N - 1][N - 1]; /* { dg-error "exceeds maximum object size" }
[PATCH] RISC-V: Allow more load/stores to be compressed
This patch aims to allow more load/store instructions to be compressed by replacing a load/store of 'base register + large offset' with a new load/store of 'new base + small offset'. If the new base gets stored in a compressed register, then the new load/store can be compressed. Since there is an overhead in creating the new base, this change is only attempted when 'base register' is referenced in at least 4 load/stores in a basic block. The optimization is implemented in a new RISC-V specific pass called shorten_memrefs which is enabled for RVC targets. It has been developed for the 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future. The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the Embench benchmark suite (https://www.embench.org/) we see code size reductions of up to 18 bytes (0.7%) and only two cases where code size is increased slightly, by 2 bytes each: Embench results (.text size in bytes, excluding .rodata) Benchmark Without patch With patch Diff aha-mont64 1052 10520 crc32 232232 0 cubic 2446 24482 edn 1454 1450-4 huffbench 1642 16420 matmult-int 420420 0 minver 1056 10560 nbody 714714 0 nettle-aes 2888 2884-4 nettle-sha256 5566 5564-2 nsichneu15052 15052 0 picojpeg8078 80780 qrduino 6140 61400 sglib-combined 2444 24440 slre2438 2420-18 st 880880 0 statemate 3842 38420 ud 702702 0 wikisort4278 42802 - Total 61324 61300 -24 The patch has been tested on the following bare metal targets using QEMU and there were no regressions: rv32i rv32iac rv32im rv32imac rv32imafc rv64imac rv64imafdc We noticed that sched2 undoes some of the addresses generated by this optimization and consequently increases code size, therefore this patch adds a check in sched-deps.c to avoid changes that are expected to increase code size when not optimizing for speed. Since this change touches target-independent code, the patch has been bootstrapped and tested on x86 with no regressions. gcc/ChangeLog * config/riscv/riscv.c (tree-pass.h): New include. (cfg.h) Likewise. (context.h) Likewise. (riscv_compressed_reg_p): New function. (riscv_compressed_lw_address_p): Likewise. (riscv_legitimize_address): Attempt to convert base + large_offset to compressible new_base + small_offset. (riscv_address_cost): Make anticipated compressed load/stores cheaper for code size than uncompressed load/stores. (class pass_shorten_memrefs): New pass. (pass_shorten_memrefs::execute): Likewise. (make_pass_shorten_memrefs): Likewise. (riscv_option_override): Register shorten_memrefs pass for TARGET_RVC. (riscv_register_priority): Move compressed register check to riscv_compressed_reg_p. * sched-deps.c (attempt_change): When optimizing for code size don't make change if it increases code size. --- gcc/config/riscv/riscv.c | 179 +-- gcc/sched-deps.c | 10 +++ 2 files changed, 183 insertions(+), 6 deletions(-) diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index 39bf87a..e510314 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -55,6 +55,9 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic.h" #include "builtins.h" #include "predict.h" +#include "tree-pass.h" +#include "cfg.h" +#include "context.h" /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF. */ #define UNSPEC_ADDRESS_P(X)\ @@ -848,6 +851,44 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p) return riscv_classify_address (&addr, x, mode, strict_p); } +/* Return true if hard reg REGNO can be used in compressed instructions. */ + +static bool +riscv_compressed_reg_p (int regno) +{ + /* x8-x15/f8-f15 are compressible registers. */ + return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15) + || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15))); +} + +/* Return true if load/store from/to address x can be compressed. */ + +static bool +riscv_compressed_lw_address_p (rtx x) +{ + struct riscv_address_info addr; + bool result = riscv_classify_address (&addr, x, GET_MODE (x), +
Re: [PATCH 1/2] Implement p1301 - [[nodiscard("should have a reason")]]
Hi, thanks a lot for the patch, and I'm sorry I didn't notice it sooner; I've adjusted my mail filters to hopefully avoid similar problems. In the future, please CC me on C++ patches and put "C++" and "PATCH" in the subject line. On 7/25/19 7:57 PM, phdoftheho...@gmail.com wrote: From: ThePhD 07-24-2019ThePhD Please use your real name in the git Author field and ChangeLog entries. There are other PhDs around :) In order to get the ChangeLog into the initial comment in git send-patch, I add it to the git commit message. Attaching a patch to your email is also fine for gcc-patches. diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c index 851fd704e5d..f5870d095f2 100644 --- a/gcc/c-family/c-lex.c +++ b/gcc/c-family/c-lex.c @@ -345,24 +345,26 @@ c_common_has_attribute (cpp_reader *pfile) attr_name = NULL_TREE; } } - else - { - /* Some standard attributes need special handling. */ + else + { + /* Some standard attributes need special handling. */ Please avoid changing whitespace on existing lines. In this case you've reindented existing lines to no longer conform to the GNU coding standards, which use 2-space indents; this happens throughout your patch. + else if (is_attribute_p ("nodiscard", attr_name)) + result = 201907; /* placeholder until C++20 Post-Cologne Working Draft. */ The comment can be removed. diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c index 23d2aabc483..4a5128c76a1 100644 --- a/gcc/cp/cvt.c +++ b/gcc/cp/cvt.c + if (implicit != ICV_CAST && fn + && lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn))) + { + tree attr = DECL_ATTRIBUTES (fn); This will break if the function has other attributes. Instead, set attr to the result of lookup_attribute. + bool has_msg = static_cast(msg); The static_cast seems unnecessary; you're already requiring a conversion to bool for the variable initialization. + const char* format = (has_msg ? + "ignoring return value of %qD, " + "declared with attribute %: %<%s%>" : + "ignoring return value of %qD, " + "declared with attribute %%s"); Error messages that aren't passed directly to a function like "error" or "warning" must be escaped with the G_() macro so that they can be translated. + const char* raw_msg = (has_msg ? (const char*)msg : ""); + auto_diagnostic_group d; + if (warning_at (loc, OPT_Wunused_result, + format, fn, raw_msg)) + { + inform (DECL_SOURCE_LOCATION (fn), "declared here"); + } We usually don't wrap a single statement in braces. Please don't add them around existing code unless you're adding more statements. +++ b/gcc/cp/parser.c @@ -26399,13 +26399,26 @@ cp_parser_check_std_attribute (tree attributes, tree attribute) + else if (cxx_dialect >= cxx2a) +{ + if (is_attribute_p ("nodiscard", name) +&& lookup_attribute ("nodiscard", attributes)) +{ + error ("%attribute qE can appear at most once " + "in an attribute-list", name); +} This check shouldn't be limited to C++2a. +++ b/gcc/cp/tree.c +handle_nodiscard_attribute (tree *node, tree name, tree args, int /*flags*/, bool *no_add_attrs) { + if (!args) + *no_add_attrs = true; This makes GCC ignore the nodiscard attribute if there are no arguments; we don't want to set *no_add_attrs here. + else if (cxx_dialect >= cxx2a) ... + else + { + if (!*no_add_attrs) + { + error ("%qE attribute with an argument only available with " + "%<-std=c++2a%> or %<-std=gnu++2a%>", name); + } + } The standard says, "For an attribute-token (including an attribute-scoped-token) not specified in this document, the behavior is implementation-defined. Any attribute-token that is not recognized by the implementation is ignored." So this error is non-conforming. Generally we just support the attribute in all standard modes. I also don't know why you are checking *no_add_attrs here, nothing can set it earlier in the function. +class escaped_string +{ + public: + escaped_string () { m_owned = false; m_str = NULL; }; + ~escaped_string () { if (m_owned) fr
[PATCH RS6000], add xxswapd support
GCC maintainers: The following patch adds define_insn support for xxswapd mnemonic. The xxswapd mnemonic is the more prefered name for the xxpermdi mnemonic. The following patch replaces the define_insn xxpermdi with define_insn xxswapd. The patch has been tested on: powerpc64le-unknown-linux-gnu (Power 8 LE) powerpc64le-unknown-linux-gnu (Power 9 LE) with no regressions. Please let me know if the patch is acceptable for trunk. Carl Love --- RS6000, add xxswapd support gcc/ChangeLog: 2019-09-10 Carl Love * config/rs6000/vsx.md (xxswapd_v4si, xxswapd_v8hi, xxswapd_v16qi): New define_insn. (vsx_xxpermdi4_le_, vsx_xxpermdi8_le_V8HI, vsx_xxpermdi16_le_V16QI): Removed define_insn. --- gcc/config/rs6000/vsx.md | 74 ++-- 1 file changed, 40 insertions(+), 34 deletions(-) diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 7633171df..cd67131eb 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -2941,43 +2941,49 @@ "xxpermdi %x0,%x1,%x1,2" [(set_attr "type" "vecperm")]) -(define_insn "*vsx_xxpermdi4_le_" - [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") -(vec_select:VSX_W - (match_operand:VSX_W 1 "vsx_register_operand" "wa") - (parallel [(const_int 2) (const_int 3) - (const_int 0) (const_int 1)])))] - "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (mode)" - "xxpermdi %x0,%x1,%x1,2" +(define_insn "xxswapd_v16qi" + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") + (vec_select:V16QI + (match_operand:V16QI 1 "vsx_register_operand" "wa") + (parallel [(const_int 8) (const_int 9) +(const_int 10) (const_int 11) +(const_int 12) (const_int 13) +(const_int 14) (const_int 15) +(const_int 0) (const_int 1) +(const_int 2) (const_int 3) +(const_int 4) (const_int 5) +(const_int 6) (const_int 7)])))] + "TARGET_VSX" +;; AIX does not support the extended mnemonic xxswapd. Use the basic +;; mnemonic xxpermdi instead. + "xxpermdi %x0,%x1,%x1,2" [(set_attr "type" "vecperm")]) -(define_insn "*vsx_xxpermdi8_le_V8HI" +(define_insn "xxswapd_v8hi" [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") -(vec_select:V8HI - (match_operand:V8HI 1 "vsx_register_operand" "wa") - (parallel [(const_int 4) (const_int 5) - (const_int 6) (const_int 7) - (const_int 0) (const_int 1) - (const_int 2) (const_int 3)])))] - "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)" - "xxpermdi %x0,%x1,%x1,2" - [(set_attr "type" "vecperm")]) - -(define_insn "*vsx_xxpermdi16_le_V16QI" - [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") -(vec_select:V16QI - (match_operand:V16QI 1 "vsx_register_operand" "wa") - (parallel [(const_int 8) (const_int 9) - (const_int 10) (const_int 11) - (const_int 12) (const_int 13) - (const_int 14) (const_int 15) - (const_int 0) (const_int 1) - (const_int 2) (const_int 3) - (const_int 4) (const_int 5) - (const_int 6) (const_int 7)])))] - "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)" - "xxpermdi %x0,%x1,%x1,2" - [(set_attr "type" "vecperm")]) + (vec_select:V8HI + (match_operand:V8HI 1 "vsx_register_operand" "wa") + (parallel [(const_int 4) (const_int 5) +(const_int 6) (const_int 7) +(const_int 0) (const_int 1) +(const_int 2) (const_int 3)])))] + "TARGET_VSX" +;; AIX does not support the extended mnemonic xxswapd. Use the basic +;; mnemonic xxpermdi instead. + "xxpermdi %x0,%x1,%x1,2" + [(set_attr "type" "vecperm")]) + +(define_insn "xxswapd_" + [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") + (vec_select:VSX_W + (match_operand:VSX_W 1 "vsx_register_operand" "wa") + (parallel [(const_int 2) (const_int 3) +(const_int 0) (const_int 1)])))] + "TARGET_VSX" +;; AIX does not support extended mnemonic xxswapd. Use the basic +;; mnemonic xxpermdi instead. + "xxpermdi %x0,%x1,%x1,2" + [(set_attr "type" "vecperm")]) ;; lxvd2x for little endian loads. We need several of ;; these since the form of the PARALLEL differs by mode. -- 2.17.1
[PATCH, i386]: Add smulhrs3 expanders (PR 89386)
2019-09-12 Uroš Bizjak PR tree-optimization/89386 * config/i386/sse.md (smulhrs3): New expander. (smulhrsv4hi3): Ditto. testsuite/ChangeLog: 2019-09-12 Uroš Bizjak PR tree-optimization/89386 * gcc.target/i386/pr89386.c: New test. * gcc.target/i386/pr89386-1.c: Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline SVN. Uros. Index: gcc/config/i386/sse.md === --- gcc/config/i386/sse.md (revision 275688) +++ gcc/config/i386/sse.md (working copy) @@ -16475,6 +16475,26 @@ ix86_fixup_binary_operands_no_copy (MULT, mode, operands); }) +(define_expand "smulhrs3" + [(set (match_operand:VI2_AVX2 0 "register_operand") + (truncate:VI2_AVX2 + (lshiftrt: + (plus: + (lshiftrt: + (mult: + (sign_extend: + (match_operand:VI2_AVX2 1 "nonimmediate_operand")) + (sign_extend: + (match_operand:VI2_AVX2 2 "nonimmediate_operand"))) + (const_int 14)) + (match_dup 3)) + (const_int 1] + "TARGET_SSSE3" +{ + operands[3] = CONST1_RTX(mode); + ix86_fixup_binary_operands_no_copy (MULT, mode, operands); +}) + (define_insn "*_pmulhrsw3" [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,x,v") (truncate:VI2_AVX2 @@ -16502,6 +16522,26 @@ (set_attr "prefix" "orig,maybe_evex,evex") (set_attr "mode" "")]) +(define_expand "smulhrsv4hi3" + [(set (match_operand:V4HI 0 "register_operand") + (truncate:V4HI + (lshiftrt:V4SI + (plus:V4SI + (lshiftrt:V4SI + (mult:V4SI + (sign_extend:V4SI + (match_operand:V4HI 1 "register_operand")) + (sign_extend:V4SI + (match_operand:V4HI 2 "register_operand"))) + (const_int 14)) + (match_dup 3)) + (const_int 1] + "TARGET_MMX_WITH_SSE && TARGET_SSSE3" +{ + operands[3] = CONST1_RTX(V4HImode); + ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands); +}) + (define_expand "ssse3_pmulhrswv4hi3" [(set (match_operand:V4HI 0 "register_operand") (truncate:V4HI Index: gcc/testsuite/gcc.target/i386/pr89386-1.c === --- gcc/testsuite/gcc.target/i386/pr89386-1.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr89386-1.c (working copy) @@ -0,0 +1,16 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-mssse3 -O2 -ftree-vectorize" } */ + +#define N 4 + +short a[N], b[N], c[N]; + +int foo (void) +{ + int i; + + for (i = 0; i < N; i++) +a[i] = int) b[i] * (int) c[i]) >> 14) + 1) >> 1; +} + +/* { dg-final { scan-assembler "pmulhrsw" } } */ Index: gcc/testsuite/gcc.target/i386/pr89386.c === --- gcc/testsuite/gcc.target/i386/pr89386.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr89386.c (working copy) @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mssse3 -O2 -ftree-vectorize" } */ + +#define N 1024 + +short a[N], b[N], c[N]; + +int foo (void) +{ + int i; + + for (i = 0; i < N; i++) +a[i] = int) b[i] * (int) c[i]) >> 14) + 1) >> 1; +} + +/* { dg-final { scan-assembler "pmulhrsw" } } */
Re: Patch to support extended characters in C/C++ identifiers
On Tue, Sep 10, 2019 at 11:47:22PM +, Joseph Myers wrote: > On Mon, 12 Aug 2019, Lewis Hyatt wrote: > > > Hello- > > > > The attached patch for libcpp adds support for extended characters (e.g. > > UTF-8) > > in identifiers. A preliminary version of the patch was posted on PR c/67224 > > as > > Comment 26 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224#c26) and > > discussed with Joseph Myers. Here is an updated patch incorporating all > > feedback received so far. I hope it is suitable now; please let me know if I > > can do anything else to make it ready for you to apply. I am happy to work > > on > > it further, whatever is needed. I can't easily test on anything other than > > x86_64-linux though. I did bootstrap all languages and run all tests on that > > platform, everything was good. > > > > The (relatively short) changes to libcpp are included inline here. I > > attached > > the test cases as a gzipped patch to avoid any problems with the encoding > > (the > > test cases contain some invalid UTF-8 and also other encodings such as > > latin-1 > > as part of the testing). > > > > Thanks for taking a look at it! > > Thanks, I think this is OK with a few updates to the documentation. Attached is a single patch relative to current trunk that incorporates all of your feedback. I gzipped it like last time just in case the invalid UTF-8 in the tests presents a problem. The code changes are the same as before other than comments. The documentation is now updated... there were a couple other places that also seemed reasonable to me to update, hope it sounds OK. I also created the PR about UCN conversion (PR 91755) and added a reference in the comments for those tests. Bootstrap was done on Linux x86-64, testing results: before patch: 36 XPASS 72 FAIL 1452 XFAIL 9624 UNSUPPORTED 359529 PASS after patch: 36 XPASS 72 FAIL 1452 XFAIL 9624 UNSUPPORTED 359633 PASS Thank you. -Lewis libcpp/ChangeLog 2019-09-12 Lewis Hyatt PR c/67224 * charset.c (_cpp_valid_utf8): New function to help lex UTF-8 tokens. * internal.h (_cpp_valid_utf8): Declare. * lex.c (forms_identifier_p): Use it to recognize UTF-8 identifiers. (_cpp_lex_direct): Handle UTF-8 in identifiers and CPP_OTHER tokens. Do all work in "default" case to avoid slowing down typical code paths. Also handle $ and UCN in the default case for consistency. gcc/Changelog 2019-09-12 Lewis Hyatt PR c/67224 * doc/cpp.texi: Document support for extended characters in identifiers. * doc/cppopts.texi: Likewise. gcc/testsuite/ChangeLog 2019-09-12 Lewis Hyatt PR c/67224 * c-c++-common/cpp/ucnid-2011-1-utf8.c: New test. * g++.dg/cpp/ucnid-1-utf8.C: New test. * g++.dg/cpp/ucnid-2-utf8.C: New test. * g++.dg/cpp/ucnid-3-utf8.C: New test. * g++.dg/cpp/ucnid-4-utf8.C: New test. * g++.dg/other/ucnid-1-utf8.C: New test. * gcc.dg/cpp/ucnid-1-utf8.c: New test. * gcc.dg/cpp/ucnid-10-utf8.c: New test. * gcc.dg/cpp/ucnid-11-utf8.c: New test. * gcc.dg/cpp/ucnid-12-utf8.c: New test. * gcc.dg/cpp/ucnid-13-utf8.c: New test. * gcc.dg/cpp/ucnid-14-utf8.c: New test. * gcc.dg/cpp/ucnid-15-utf8.c: New test. * gcc.dg/cpp/ucnid-2-utf8.c: New test. * gcc.dg/cpp/ucnid-3-utf8.c: New test. * gcc.dg/cpp/ucnid-4-utf8.c: New test. * gcc.dg/cpp/ucnid-6-utf8.c: New test. * gcc.dg/cpp/ucnid-7-utf8.c: New test. * gcc.dg/cpp/ucnid-9-utf8.c: New test. * gcc.dg/ucnid-1-utf8.c: New test. * gcc.dg/ucnid-10-utf8.c: New test. * gcc.dg/ucnid-11-utf8.c: New test. * gcc.dg/ucnid-12-utf8.c: New test. * gcc.dg/ucnid-13-utf8.c: New test. * gcc.dg/ucnid-14-utf8.c: New test. * gcc.dg/ucnid-15-utf8.c: New test. * gcc.dg/ucnid-16-utf8.c: New test. * gcc.dg/ucnid-2-utf8.c: New test. * gcc.dg/ucnid-3-utf8.c: New test. * gcc.dg/ucnid-4-utf8.c: New test. * gcc.dg/ucnid-5-utf8.c: New test. * gcc.dg/ucnid-6-utf8.c: New test. * gcc.dg/ucnid-7-utf8.c: New test. * gcc.dg/ucnid-8-utf8.c: New test. * gcc.dg/ucnid-9-utf8.c: New test. utf8-identifiers-v2.patch.gz Description: application/gunzip
Re: [00/32] Support multiple ABIs in the same translation unit
On Wednesday, September 11, 2019, Richard Sandiford < richard.sandif...@arm.com> wrote:. > > > Sorry for the long write-up. > > Richard > *thanks* for the long write-up! Ciao! Steven
Re: [PATCH] Fortran - character type names in errors and warning - for review'
On Mon, Sep 09, 2019 at 02:52:20PM +0100, Mark Eggleston wrote: > gcc/fortran > > Mark Eggleston > > * arith.c (gfc_arith_concat): Set length field in typespec. > * expr.c (gfc_get_character_expr): Set length field in typespec. > * gfortran.h: Add length field to gfc_typespec for use to allow the > length to available for character literals. > * interface.c (gfc_check_dummy_characteristics): Use gfc_dummy_typename > instead of gfc_typename when constructing error message to allow for > CHARACTER(*) and CHARACTER(*,4). > (compare_parameter): Use gfc_dummy_typename for formal argument when > constructing error message to allow for CHARACTER(*) and > CHARACTER(*,4). > * intrinsic.c (gfc_actual_arglist): Reword error message so that > CHARACTER(*) or CHARACTER(*,4) can be reported as the target type. Use > gfc_dummy_typename for the formal argument. > * misc.c (gfc_typename): Add new local variable length and initialise > with the value from the length field in typespec passed in. If there > is a character structure use the value from there for length. If the > kind is the default character kind construct the type name using length > otherwise use the length followed by kind separated by a comma. > (gfc_dummy_typename): New routine for use with formal arguments, if the > typespec does not have a character length structure then the length is > assumed and * is used for the length, if kind is not the default > character kind follow * with a comma and then the kind. > > gcc/testsuite > > Mark Eggleston > > * gfortran.dg/bad_operands.f90: New test. > * gfortran.dg/character mismatch.f90: New test. > * gfortran.dg/compare_interfaces.f90: New test. > * gfortran.dg/widechar_intrinsics_1.f90: Checked for specific character > type names instead of "Type of argument". > * gfortran.dg/widechar_intrinsics_2.f90: Checked for specific character > type names instead of "Type of argument". > * gfortran.dg/widechar_intrinsics_3.f90: Checked for specific character > type names instead of "Type of argument". > This looks OK to me. I don't know if anyone else will glance at the patch. You said that the patch has the feeling of a kludge. This is probably a by-product of bolting kind=4 support onto what gfortran inherited long ago. -- steve
Re: [PATCH v2 4/6] compiler-gcc.h: add asm_inline definition
On Sat, Sep 7, 2019 at 6:11 AM Segher Boessenkool wrote: > > On Fri, Sep 06, 2019 at 06:04:54PM -0700, Nick Desaulniers wrote: > > On Fri, Sep 6, 2019 at 5:14 PM Segher Boessenkool > > wrote: > > > On Fri, Sep 06, 2019 at 04:42:58PM -0700, Nick Desaulniers via > > > gcc-patches wrote: > > > > Just to prove my point about version checks being brittle, it looks > > > > like Rasmus' version check isn't even right. GCC supported `asm > > > > inline` back in the 8.3 release, not 9.1 as in this patch: > > > > > > Yes, I backported it so that it is available in 7.5, 8.3, and 9.1, so > > > that more users will have this available sooner. (7.5 has not been > > > released yet, but asm inline has been supported in GCC 7 since Jan 2 > > > this year). > > > > Ah, ok that makes sense. > > > > How would you even write a version check for that? > > I wouldn't. Please stop using that straw man. I'm not saying version > checks are good, or useful for most things. I am saying they are not. Then please help Rasmus with a suggestion on how best to detect and safely make use of the feature you implemented. As is, the patch in question is using version checks. > > Predefined compiler symbols to do version checking (of a feature) is > just a lesser instance of the same problem though. (And it causes its > own more or less obvious problems as well). > > > > > Or was it "broken" until 9.1? Lord knows, as `asm inline` wasn't in > > > > any release notes or bug reports I can find: > > > > > > https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01143.html > > > > > > It never was accepted, and I dropped the ball. > > > > Ah, ok, that's fine, so documentation was at least written. Tracking > > when and where patches land (or don't) is difficult when patch files > > are emailed around. I try to keep track of when and where our kernel > > patches land, but I frequently drop the ball there. > > I keep track of most things just fine... But the release notes are part > of our web content, which is in a separate CVS repository (still nicer > than SVN :-) ), and since I don't use it very often it falls outside of > all my normal procedures. > > > your preference). I'm already subscribed to more mailing lists than I > > have time to read. > > > > > But I'll try to remember, sure. > > > Not that I am involved in all such discussions myself, mind. > > > > But you _did_ implement `asm inline`. ;) > > That started as just > > + /* If this asm is asm inline, count anything as minimum size. */ > + if (gimple_asm_inline_p (as_a (stmt))) > + count = MIN (1, count); > > (in estimate_num_insns) but then things ballooned. Like such things do. So I'm not convinced this GNU C extension solves the problem it's described to be used for. I agree that current implementations in multiple compilers is imprecise, and leads to developer headaches. I think `asm inline` will help in cases where vanilla `asm` overestimates the size of inline assembly, but I also think it will be just as bad as vanilla `asm` in cases where the size is underestimated. I have seen instances where instruction selection fails to select the appropriate way to branch when inline asm size is misjudged, resulting in un-encodeable jumps (as in the branch target is too far to be encoded in the number of bits of a single jump/branch instruction). And the use of .pushsection/.popsection assembler directives and __attribute__((section())) attributes complicates the accounting further, as you can then place code from the inline assembly in a different section than the function itself (so that inline assembly doesn't affect the function's size, and the implications on inlining the function). That would cause vanilla `asm` to overestimate size. (I suspect variable length encoded instruction sets also suffer from misaccounting). Short of invoking the assembler itself, and then matching the byte size of generated code section that matches the function's section, can you accurately describe the size of inline assembly. .macro and .rept assembler directives really complicate estimates and can cause vanilla `asm` to underestimate size. I agree that current implementations in multiple compilers is imprecise, and leads to developer headaches. Rather than give developers the ability to choose between 2 different heuristics that are both imprecise, why not make the existing heuristic (ie. vanilla `asm`) more precise in its measure? -- Thanks, ~Nick Desaulniers
Re: [PATCH v2 4/6] compiler-gcc.h: add asm_inline definition
On 12/09/2019 23.54, Nick Desaulniers wrote: > On Sat, Sep 7, 2019 at 6:11 AM Segher Boessenkool > wrote: >> >> On Fri, Sep 06, 2019 at 06:04:54PM -0700, Nick Desaulniers wrote: >> >>> How would you even write a version check for that? >> >> I wouldn't. Please stop using that straw man. I'm not saying version >> checks are good, or useful for most things. I am saying they are not. > > Then please help Rasmus with a suggestion on how best to detect and > safely make use of the feature you implemented. As is, the patch in > question is using version checks. I was just about to send out an updated version. I'm just going to do the check in Kconfig - I didn't realize how easy it had become to do that kind of thing until Masahiro pointed me at his RFC patch from December. Rasmus
libgo patch committed: Update to 1.13
I've committed a patch to update libgo to the final Go 1.13 release. Bootstrapped and ran Go tests on x86_64-pc-linux-gnu. Ian patch.txt.bz2 Description: application/bzip
Re: [PATCH] DWARF array bounds missing from C++ array definitions
On Sep 12, 2019, Richard Biener wrote: > Your new predicate looks a bit excessive here given it builds the type > again? Err, there seems to be some misunderstanding here. The predicate avoids outputting a type for the definition if it's the same DIE that would go in the specification. Now, when it's a different DIE, sometimes it might still refer to the same type, as in array-2.C, but I think that's not just acceptable, it's desirable, for it reflects the different ways used to denote the same type. Before posting the patch, I added an inform() to the case in which completing_type_p would return true (bringing about a debug info change), and reviewed all of the messages in a bootstrap. The only ones that weren't just adding array element count were those that inspired array-2.C and array-3.C. > So I'm obviously fine with your patch and I guess if we independently > arrive at this solution that answers my question what "correct DWARF" > is by a majority decision ;) Good. Once we clear up the misunderstanding (I'm not sure whether you misunderstood the patch, or I misunderstood your response), I'll be glad to put it in. > So - maybe we can have the patch a bit cleaner by adding > a flag to add_type_attribute saying we only want it if it's > different from that already present (on the specification DIE)? That's exactly what I meant completing_type_p to check. Do you mean it should be stricter, do it more cheaply, or what? -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Be the change, be Free! FSF.org & FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
Re: C++ PATCH for c++/91678 - wrong error with decltype and location wrapper
Ping. On Thu, Sep 05, 2019 at 10:24:55PM -0400, Marek Polacek wrote: > Compiling this testcase results in a bogus "invalid cast" error; this occurs > since the introduction of location wrappers in finish_id_expression. > > Here we are parsing the decltype expression via cp_parser_decltype_expr which > can lead to calling various fold_* and c-family routines. They use > non_lvalue_loc, but that won't create a NON_LVALUE_EXPR wrapper around a > location > wrapper. > > So before the location wrappers addition cp_parser_decltype_expr would return > NON_LVALUE_EXPR . Now it returns VIEW_CONVERT_EXPR(c), but the > STRIP_ANY_LOCATION_WRAPPER immediately following it strips the location > wrapper, > and suddenly we don't know whether we have an lvalue anymore. And that's sad > because then decltype produces the wrong type, causing nonsense errors. > > Bootstrapped/regtested on x86_64-linux, ok for trunk and 9? > > 2019-09-05 Marek Polacek > > PR c++/91678 - wrong error with decltype and location wrapper. > * parser.c (cp_parser_decltype): Use auto_suppress_location_wrappers > sentinel. Don't strip location wrappers. > > * g++.dg/cpp0x/decltype73.C: New test. > > diff --git gcc/cp/parser.c gcc/cp/parser.c > index baa60b8834e..b3c7bff5988 100644 > --- gcc/cp/parser.c > +++ gcc/cp/parser.c > @@ -14729,8 +14729,13 @@ cp_parser_decltype (cp_parser *parser) >/* Do not warn about problems with the expression. */ >++c_inhibit_evaluation_warnings; > > + /* Don't create wrapper nodes within decltype. non_lvalue_loc won't > + create a NON_LVALUE_EXPR wrapper around a location wrapper, and a > + subsequent STRIP_ANY_LOCATION_WRAPPER would destroy the information > + about lvalueness of the expression. */ > + auto_suppress_location_wrappers sentinel; > + >expr = cp_parser_decltype_expr (parser, > id_expression_or_member_access_p); > - STRIP_ANY_LOCATION_WRAPPER (expr); > >/* Go back to evaluating expressions. */ >--cp_unevaluated_operand; > diff --git gcc/testsuite/g++.dg/cpp0x/decltype73.C > gcc/testsuite/g++.dg/cpp0x/decltype73.C > new file mode 100644 > index 000..cbe94a898e3 > --- /dev/null > +++ gcc/testsuite/g++.dg/cpp0x/decltype73.C > @@ -0,0 +1,4 @@ > +// PR c++/91678 - wrong error with decltype and location wrapper. > +// { dg-do compile { target c++11 } } > + > +float* test(float* c) { return (decltype(c + 0))(float*)c; }
Re: [Patch, fortran] PR91717 - ICE on concatenating deferred-length character and character literal
Thanks - committed as r275696. Paul On Thu, 12 Sep 2019 at 15:09, Steve Kargl wrote: > > On Thu, Sep 12, 2019 at 09:55:20AM +0100, Paul Richard Thomas wrote: > > > > OK to commit? > > > > Yes. > > -- > Steve -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein
Re: [PATCH] Fortran - character type names in errors and warning - for review
On Mon, Sep 9, 2019 at 4:52 PM Mark Eggleston wrote: > To work around these problems I added a new length field to gfc_typespec > to used to produce the name of a character type if the character length > structure is not present. > The addition of the length field is a bit of kludge any pointers > regarding a better solution will be appreciated. Thanks for the patch, I agree that we should print character type names better. However, I'm not really happy with this approach. Requiring us to keep track of the character length in two places sounds like a recipe for confusing bugs. I don't really have a good solution thought out for this, but I think this should be solved somehow before committing the patch. Secondly, character lengths can be longer than what fits into int. In gfortran.h you'll find typedef HOST_WIDE_INT gfc_charlen_t; and then you should use gfc_mpz_get_hwi() instead of mpz_get_si(). And for the *printf() format string you should use HOST_WIDE_INT_PRINT_DEC. Thanks, -- Janne Blomqvist