[PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts
Hi! Comparing to the previous versions of implementation mainly based on the existing IV cands but zeroing the related group/use cost, this new one is based on Richard and Segher's suggestion introducing one doloop dedicated IV cand. Some key points are listed below: 1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV cand. 2) Special name "doloop" assigned. 3) Doloop IV cand with form (niter+1, +, -1) 4) For doloop IV cand, no extra one cost like BIV, assign zero cost for step. 5) Support may_be_zero (regressed PR is in this case), the base of doloop IV can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv. 6) Add more expr support in force_expr_to_var_cost for reasonable cost calculation on the IV base with may_be_zero (like COND_EXPR). 7) Set zero cost when using doloop IV cand for doloop use. 8) Add three hooks (should we merge _generic and _address?). *) have_count_reg_decr_p, is to indicate the target has special hardware count register, we shouldn't consider the impact of doloop IV when calculating register pressures. *) doloop_cost_for_generic, is the extra cost when using doloop IV cand for generic type IV use. *) doloop_cost_for_address, is the extra cost when using doloop IV cand for address type IV use. Bootstrapped on powerpc64le-linux-gnu and regression testing passed excepting for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is tracked by PR89983. Any comments and suggestions are highly appreciated. Thanks! Kewen - gcc/ChangeLog 2019-08-14 Kewen Lin PR middle-end/80791 * config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macro. (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. * target.def (have_count_reg_decr_p): New hook. (doloop_cost_for_generic): Likewise. (doloop_cost_for_address): Likewise. * doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): Likewise. (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. * doc/tm.texi: Regenerate. * tree-ssa-loop-ivopts.c (comp_cost::operator+=): Consider infinite cost addend. (record_group): Init doloop_p. (add_candidate_1): Add optional argument doloop, change the handlings accordingly. (add_candidate): Likewise. (add_iv_candidate_for_biv): Update the call to add_candidate. (generic_predict_doloop_p): Update attribute. (force_expr_to_var_cost): Add costing for expressions COND_EXPR/LT_EXPR/ LE_EXPR/GT_EXPR/GE_EXPR/EQ_EXPR/NE_EXPR/UNORDERED_EXPR/ORDERED_EXPR/ UNLT_EXPR/UNLE_EXPR/UNGT_EXPR/UNGE_EXPR/UNEQ_EXPR/LTGT_EXPR/MAX_EXPR/ MIN_EXPR. (determine_group_iv_cost_generic): Update for doloop IV cand. (determine_group_iv_cost_address): Likewise. (determine_group_iv_cost_cond): Likewise. (determine_iv_cost): Likewise. (ivopts_estimate_reg_pressure): Likewise. (cand_value_at): Update argument niter type to struct tree_niter_desc*, consider doloop IV cand and may_be_zero. (may_eliminate_iv): Update the call to cand_value_at, consider doloop IV cand and may_be_zero. (add_iv_candidate_for_doloop): New function. (find_iv_candidates): Call function add_iv_candidate_for_doloop. (determine_set_costs): Update the call to ivopts_estimate_reg_pressure. (iv_ca_recount_cost): Likewise. (iv_ca_new): Init n_doloop_cands. (iv_ca_set_no_cp): Update n_doloop_cands. (iv_ca_set_cp): Likewise. (iv_ca_dump): Dump register cost. (find_doloop_use): Likewise. (tree_ssa_iv_optimize_loop): Call function generic_predict_doloop_p and find_doloop_use. gcc/testsuite/ChangeLog 2019-08-14 Kewen Lin PR middle-end/80791 * gcc.dg/tree-ssa/ivopts-3.c: Adjust for doloop change. * gcc.dg/tree-ssa/ivopts-lt.c: Likewise. * gcc.dg/tree-ssa/pr32044.c: Likewise. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 6667cd0..5eccbdc 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1912,6 +1912,16 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_PREDICT_DOLOOP_P #define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p +#undef TARGET_HAVE_COUNT_REG_DECR_P +#define TARGET_HAVE_COUNT_REG_DECR_P true + +/* 10 is infinite cost in IVOPTs. */ +#undef TARGET_DOLOOP_COST_FOR_GENERIC +#define TARGET_DOLOOP_COST_FOR_GENERIC 10 + +#undef TARGET_DOLOOP_COST_FOR_ADDRESS +#define TARGET_DOLOOP_COST_FOR_ADDRESS 10 + #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index c2aa4d0..9f3a08a 100644 --- a/g
RE: Add TIGERLAKE and COOPERLAKE to GCC
Resend this mail for GCC Patches rejected my message, thanks. -Original Message- Hi Uros and all: This patch is about to add TIGERLAKE and COOPERLAKE to GCC. TIGERLAKE is based on ICELAKE_CLIENT and plus new ISA MOVEDIRI/MOVDIR64B/AVX512VP2INTERSECT. COOPERLAKE is based on CASCADELAKE and plus new ISA AVX512BF16. Bootstrap is ok, and no regressions for i386/x86-64 testsuite. Changelog: gcc/ * common/config/i386/i386-common.c (processor_names): Add tigerlake and cooperlake. (processor_alias_table): Add tigerlake and cooperlake. * config.gcc: Add -march=tigerlake and cooperlake. * config/i386/driver-i386.c (host_detect_local_cpu): Detect tigerlake and cooperlake. * config/i386/i386-builtins.c (processor_model) : Add M_INTEL_COREI7_TIGERLAKE and M_INTEL_COREI7_COOPERLAKE. (arch_names_table): Add tigerlake and cooperlake. (get_builtin_code_for_version) : Handle PROCESSOR_TIGERLAKE and PROCESSOR_COOPERLAKE. * config/i386/i386-c.c (ix86_target_macros_internal): Handle tigerlake and cooperlake. (ix86_target_macros_internal): Handle OPTION_MASK_ISA_AVX512VP2INTERSECT. * config/i386/i386-options.c (m_TIGERLAKE) : Define. (m_COOPERLAKE) : Ditto. (m_CORE_AVX512): Ditto. (processor_cost_table): Add cascadelake. (ix86_target_string) : Handle -mavx512vp2intersect. (ix86_valid_target_attribute_inner_p) : Handle avx512vp2intersect. (ix86_option_override_internal): Hadle PTA_SHSTK, PTA_MOVDIRI, PTA_MOVDIR64B, PTA_AVX512VP2INTERSECT. * config/i386/i386.h (ix86_size_cost) : Define TARGET_TIGERLAKE and TARGET_COOPERLAKE. (processor_type) : Add PROCESSOR_TIGERLAKE and PROCESSOR_COOPERLAKE. (PTA_SHSTK) : Define. (PTA_MOVDIRI): Ditto. (PTA_MOVDIR64B): Ditto. (PTA_COOPERLAKE) : Ditto. (PTA_TIGERLAKE) : Ditto. (TARGET_AVX512VP2INTERSECT) : Ditto. (TARGET_AVX512VP2INTERSECT_P(x)) : Ditto. (processor_type) : Add PROCESSOR_TIGERLAKE and PROCESSOR_COOPERLAKE. * doc/extend.texi: Add tigerlake and cooperlake. gcc/testsuite/ * gcc.target/i386/funcspec-56.inc: Handle new march. * g++.target/i386/mv16.C: Handle new march libgcc/ * config/i386/cpuinfo.h: Add INTEL_COREI7_TIGERLAKE and INTEL_COREI7_COOPERLAKE.
Re: Canonicalization of compares performed as side-effect operations
[Sorry for the delay, I missed your question...] > Interesting. Does it work for the general case of a reverse subtract, > which I need to handle as wel? Not clear, Visium only uses it for SNE and combined NEG/SNE. -- Eric Botcazou
[committed][AArch64] Rework SVE PTEST patterns
This patch reworks the rtl representation of the SVE PTEST operation so that: - the governing predicate is always VNx16BI (and so all bits are defined) - it is still possible to pattern-match the governing predicate in the mode that it had previously - a new hint operand says whether the governing predicate is known to be all true for the element size of interest, rather than this being part of the unspec name. These changes make it easier to handle more flag-setting instructions as part of the ACLE work. See the comment in aarch64-sve.md for more details. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274414. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/aarch64-protos.h (aarch64_ptrue_all): Declare. * config/aarch64/aarch64.c (aarch64_ptrue_all): New function. * config/aarch64/aarch64.md (UNSPEC_PTEST_PTRUE): Delete. (UNSPEC_PTEST): New unspec. (SVE_MAYBE_NOT_PTRUE, SVE_KNOWN_PTRUE): New constants. * config/aarch64/iterators.md (data_bytes): New mode attribute. * config/aarch64/predicates.md (aarch64_sve_ptrue_flag): New predicate. * config/aarch64/aarch64-sve.md: Add a new section describing the handling of UNSPEC_PTEST. (pred_3): Rename to... (@aarch64_pred__z): ...this. (ptest_ptrue): Replace with... (aarch64_ptest): ...this new pattern. (cbranch4): Update after above changes. (*3_cc): Use UNSPEC_PTEST instead of UNSPEC_PTEST_PTRUE. (*cmp_cc): Likewise. (*cmp_ptest): Likewise. (*while_ult_cc): Likewise. Index: gcc/config/aarch64/aarch64-protos.h === --- gcc/config/aarch64/aarch64-protos.h 2019-08-13 22:33:36.213955216 +0100 +++ gcc/config/aarch64/aarch64-protos.h 2019-08-14 08:56:12.498608977 +0100 @@ -550,6 +550,7 @@ const char * aarch64_output_probe_stack_ const char * aarch64_output_probe_sve_stack_clash (rtx, rtx, rtx, rtx); void aarch64_err_no_fpadvsimd (machine_mode); void aarch64_expand_epilogue (bool); +rtx aarch64_ptrue_all (unsigned int); void aarch64_expand_mov_immediate (rtx, rtx); rtx aarch64_ptrue_reg (machine_mode); rtx aarch64_pfalse_reg (machine_mode); Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c2019-08-13 22:35:11.717252343 +0100 +++ gcc/config/aarch64/aarch64.c2019-08-14 08:56:12.502608946 +0100 @@ -2699,6 +2699,22 @@ aarch64_svpattern_for_vl (machine_mode p return AARCH64_NUM_SVPATTERNS; } +/* Return a VNx16BImode constant in which every sequence of ELT_SIZE + bits has the lowest bit set and the upper bits clear. This is the + VNx16BImode equivalent of a PTRUE for controlling elements of + ELT_SIZE bytes. However, because the constant is VNx16BImode, + all bits are significant, even the upper zeros. */ + +rtx +aarch64_ptrue_all (unsigned int elt_size) +{ + rtx_vector_builder builder (VNx16BImode, elt_size, 1); + builder.quick_push (const1_rtx); + for (unsigned int i = 1; i < elt_size; ++i) +builder.quick_push (const0_rtx); + return builder.build (); +} + /* Return an all-true predicate register of mode MODE. */ rtx Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md 2019-08-13 22:33:30.365998256 +0100 +++ gcc/config/aarch64/aarch64.md 2019-08-14 08:56:12.502608946 +0100 @@ -220,7 +220,7 @@ (define_c_enum "unspec" [ UNSPEC_LD1_GATHER UNSPEC_ST1_SCATTER UNSPEC_MERGE_PTRUE -UNSPEC_PTEST_PTRUE +UNSPEC_PTEST UNSPEC_UNPACKSHI UNSPEC_UNPACKUHI UNSPEC_UNPACKSLO @@ -259,6 +259,15 @@ (define_c_enum "unspecv" [ ] ) +;; These constants are used as a const_int in various SVE unspecs +;; to indicate whether the governing predicate is known to be a PTRUE. +(define_constants + [; Indicates that the predicate might not be a PTRUE. + (SVE_MAYBE_NOT_PTRUE 0) + + ; Indicates that the predicate is known to be a PTRUE. + (SVE_KNOWN_PTRUE 1)]) + ;; If further include files are added the defintion of MD_INCLUDES ;; must be updated. Index: gcc/config/aarch64/iterators.md === --- gcc/config/aarch64/iterators.md 2019-08-13 10:38:35.963894971 +0100 +++ gcc/config/aarch64/iterators.md 2019-08-14 08:56:12.502608946 +0100 @@ -1169,6 +1169,10 @@ (define_mode_attr FCMLA_maybe_lane [(V2S (V4HF "[%4]") (V8HF "[%4]") ]) +;; The number of bytes controlled by a predicate +(define_mode_attr data_bytes [(VNx16BI "1") (VNx8BI "2") + (VNx4BI "4") (VNx2BI "8")]) + ;; --- ;; Code Iterators ;;
[committed][AArch64] Canonicalise SVE predicate constants
This patch makes sure that we build all SVE predicate constants as VNx16BI before RA, to encourage similar constants to be reused between modes. This is also useful for the ACLE, where the single predicate type svbool_t is always a VNx16BI. Also, and again to encourage reuse, the patch makes us use a .B PTRUE for all ptrue-predicated operations, rather than (for example) using a .S PTRUE for 32-bit operations and a .D PTRUE for 64-bit operations. The only current case in which a .H, .S or .D operation needs to be predicated by a "strict" .H/.S/.D PTRUE is the PTEST in a conditional branch, which an earlier patch fixed to use an appropriate VNx16BI constant. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274415. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/aarch64.c (aarch64_target_reg): New function. (aarch64_emit_set_immediate): Likewise. (aarch64_ptrue_reg): Build a VNx16BI constant and then bitcast it. (aarch64_pfalse_reg): Likewise. (aarch64_convert_sve_data_to_pred): New function. (aarch64_sve_move_pred_via_while): Take an optional target register and the required register mode. (aarch64_expand_sve_const_pred_1): New function. (aarch64_expand_sve_const_pred): Likewise. (aarch64_expand_mov_immediate): Build an all-true predicate if the significant bits of the immediate are all true. Use aarch64_expand_sve_const_pred for all compile-time predicate constants. (aarch64_mov_operand_p): Force predicate constants to be VNx16BI before register allocation. * config/aarch64/aarch64-sve.md (*vec_duplicate_reg): Use a VNx16BI PTRUE when splitting the memory alternative. (vec_duplicate): Update accordingly. (*pred_cmp): Rename to... (@aarch64_pred_cmp): ...this. gcc/testsuite/ * gcc.target/aarch64/sve/spill_4.c: Expect all ptrues to be .Bs. * gcc.target/aarch64/sve/single_1.c: Likewise. * gcc.target/aarch64/sve/single_2.c: Likewise. * gcc.target/aarch64/sve/single_3.c: Likewise. * gcc.target/aarch64/sve/single_4.c: Likewise. Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c2019-08-14 08:58:06.353767448 +0100 +++ gcc/config/aarch64/aarch64.c2019-08-14 09:00:55.960509992 +0100 @@ -2546,6 +2546,36 @@ aarch64_zero_extend_const_eq (machine_mo } +/* Return TARGET if it is nonnull and a register of mode MODE. + Otherwise, return a fresh register of mode MODE if we can, + or TARGET reinterpreted as MODE if we can't. */ + +static rtx +aarch64_target_reg (rtx target, machine_mode mode) +{ + if (target && REG_P (target) && GET_MODE (target) == mode) +return target; + if (!can_create_pseudo_p ()) +{ + gcc_assert (target); + return gen_lowpart (mode, target); +} + return gen_reg_rtx (mode); +} + +/* Return a register that contains the constant in BUILDER, given that + the constant is a legitimate move operand. Use TARGET as the register + if it is nonnull and convenient. */ + +static rtx +aarch64_emit_set_immediate (rtx target, rtx_vector_builder &builder) +{ + rtx src = builder.build (); + target = aarch64_target_reg (target, GET_MODE (src)); + emit_insn (gen_rtx_SET (target, src)); + return target; +} + static rtx aarch64_force_temporary (machine_mode mode, rtx x, rtx value) { @@ -2721,7 +2751,8 @@ aarch64_ptrue_all (unsigned int elt_size aarch64_ptrue_reg (machine_mode mode) { gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL); - return force_reg (mode, CONSTM1_RTX (mode)); + rtx reg = force_reg (VNx16BImode, CONSTM1_RTX (VNx16BImode)); + return gen_lowpart (mode, reg); } /* Return an all-false predicate register of mode MODE. */ @@ -2730,7 +2761,26 @@ aarch64_ptrue_reg (machine_mode mode) aarch64_pfalse_reg (machine_mode mode) { gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL); - return force_reg (mode, CONST0_RTX (mode)); + rtx reg = force_reg (VNx16BImode, CONST0_RTX (VNx16BImode)); + return gen_lowpart (mode, reg); +} + +/* Use a comparison to convert integer vector SRC into MODE, which is + the corresponding SVE predicate mode. Use TARGET for the result + if it's nonnull and convenient. */ + +static rtx +aarch64_convert_sve_data_to_pred (rtx target, machine_mode mode, rtx src) +{ + machine_mode src_mode = GET_MODE (src); + insn_code icode = code_for_aarch64_pred_cmp (NE, src_mode); + expand_operand ops[4]; + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], CONSTM1_RTX (mode), mode); + create_input_operand (&ops[2], src, src_mode); + create_input_operand (&ops[3], CONST0_RTX (src_mode), src_mode); + expand_insn (icode, 4, ops); + return ops[0].value; } /* Return true if we can move VALUE into a register using a s
[committed][AArch64] Don't rely on REG_EQUAL notes to combine SVE BIC
This patch generalises the SVE BIC pattern so that it doesn't rely on REG_EQUAL notes. The danger with relying on the notes is that an optimisation could for example replace the original (not ...) note with an (unspec ... UNSPEC_MERGE_PTRUE) in which the predicate is a constant. That's a legitimate change and could even be useful in some situations. The patch also makes the operand order match the SVE operand order in both the vector and predicate BIC patterns, which makes things easier for the ACLE. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274416. Richard 2019-08-14 Richard Sandiford Kugan Vivekanandarajah gcc/ * config/aarch64/aarch64-sve.md (bic3): Rename to... (*bic3): ...this. Match the form that an SVE inverse actually has, rather than relying on REG_EQUAL notes. Make the insn operand order match the SVE operand order. (*3): Make the insn operand order match the SVE operand order. Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:03:20.515438326 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:05:41.902390293 +0100 @@ -1779,15 +1779,20 @@ (define_insn "3" ;; - BIC ;; - -;; REG_EQUAL notes on "not3" should ensure that we can generate -;; this pattern even though the NOT instruction itself is predicated. -(define_insn "bic3" +(define_insn_and_rewrite "*bic3" [(set (match_operand:SVE_I 0 "register_operand" "=w") (and:SVE_I - (not:SVE_I (match_operand:SVE_I 1 "register_operand" "w")) - (match_operand:SVE_I 2 "register_operand" "w")))] + (unspec:SVE_I + [(match_operand 3) +(not:SVE_I (match_operand:SVE_I 2 "register_operand" "w"))] + UNSPEC_MERGE_PTRUE) + (match_operand:SVE_I 1 "register_operand" "w")))] "TARGET_SVE" - "bic\t%0.d, %2.d, %1.d" + "bic\t%0.d, %1.d, %2.d" + "&& !CONSTANT_P (operands[3])" + { +operands[3] = CONSTM1_RTX (mode); + } ) ;; - @@ -2451,11 +2456,11 @@ (define_insn "*3" [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa") (and:PRED_ALL (NLOGICAL:PRED_ALL - (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa")) - (match_operand:PRED_ALL 3 "register_operand" "Upa")) + (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa")) + (match_operand:PRED_ALL 2 "register_operand" "Upa")) (match_operand:PRED_ALL 1 "register_operand" "Upa")))] "TARGET_SVE" - "\t%0.b, %1/z, %3.b, %2.b" + "\t%0.b, %1/z, %2.b, %3.b" ) ;; -
[committed][AArch64] Use unspecs for remaining SVE FP binary ops
Another patch in the series to make the SVE FP patterns use unspecs, so that they can accurately describe cases in which the predicate isn't a PTRUE. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274417. Richard 2019-08-14 Richard Sandiford Kugan Vivekanandarajah gcc/ * config/aarch64/aarch64-sve.md (add3, *add3) (sub3, *sub3, *fabd3, mul3, *mul3) (div3, *div3): Use SVE_COND_FP_* unspecs instead of rtx codes. (cond_, *cond__2, *cond__3) (*cond__any): Add the predicate to the SVE_COND_FP_* unspecs. Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:08:04.289334990 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:10:48.912115057 +0100 @@ -1963,7 +1963,8 @@ (define_expand "cond_" (unspec:SVE_F [(match_operand: 1 "register_operand") (unspec:SVE_F -[(match_operand:SVE_F 2 "register_operand") +[(match_dup 1) + (match_operand:SVE_F 2 "register_operand") (match_operand:SVE_F 3 "register_operand")] SVE_COND_FP_BINARY) (match_operand:SVE_F 4 "aarch64_simd_reg_or_zero")] @@ -1977,7 +1978,8 @@ (define_insn "*cond__2" (unspec:SVE_F [(match_operand: 1 "register_operand" "Upl, Upl") (unspec:SVE_F -[(match_operand:SVE_F 2 "register_operand" "0, w") +[(match_dup 1) + (match_operand:SVE_F 2 "register_operand" "0, w") (match_operand:SVE_F 3 "register_operand" "w, w")] SVE_COND_FP_BINARY) (match_dup 2)] @@ -1995,7 +1997,8 @@ (define_insn "*cond__3" (unspec:SVE_F [(match_operand: 1 "register_operand" "Upl, Upl") (unspec:SVE_F -[(match_operand:SVE_F 2 "register_operand" "w, w") +[(match_dup 1) + (match_operand:SVE_F 2 "register_operand" "w, w") (match_operand:SVE_F 3 "register_operand" "0, w")] SVE_COND_FP_BINARY) (match_dup 3)] @@ -2013,7 +2016,8 @@ (define_insn_and_rewrite "*cond_< (unspec:SVE_F [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl, Upl") (unspec:SVE_F -[(match_operand:SVE_F 2 "register_operand" "0, w, w, w, w") +[(match_dup 1) + (match_operand:SVE_F 2 "register_operand" "0, w, w, w, w") (match_operand:SVE_F 3 "register_operand" "w, 0, w, w, w")] SVE_COND_FP_BINARY) (match_operand:SVE_F 4 "aarch64_simd_reg_or_zero" "Dz, Dz, Dz, 0, w")] @@ -2051,10 +2055,9 @@ (define_expand "add3" [(set (match_operand:SVE_F 0 "register_operand") (unspec:SVE_F [(match_dup 3) - (plus:SVE_F -(match_operand:SVE_F 1 "register_operand") -(match_operand:SVE_F 2 "aarch64_sve_float_arith_with_sub_operand"))] - UNSPEC_MERGE_PTRUE))] + (match_operand:SVE_F 1 "register_operand") + (match_operand:SVE_F 2 "aarch64_sve_float_arith_with_sub_operand")] + UNSPEC_COND_FADD))] "TARGET_SVE" { operands[3] = aarch64_ptrue_reg (mode); @@ -2066,10 +2069,9 @@ (define_insn_and_split "*add3" [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w") (unspec:SVE_F [(match_operand: 1 "register_operand" "Upl, Upl, Upl") - (plus:SVE_F - (match_operand:SVE_F 2 "register_operand" "%0, 0, w") - (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" "vsA, vsN, w"))] - UNSPEC_MERGE_PTRUE))] + (match_operand:SVE_F 2 "register_operand" "%0, 0, w") + (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" "vsA, vsN, w")] + UNSPEC_COND_FADD))] "TARGET_SVE" "@ fadd\t%0., %1/m, %0., #%3 @@ -2098,10 +2100,9 @@ (define_expand "sub3" [(set (match_operand:SVE_F 0 "register_operand") (unspec:SVE_F [(match_dup 3) - (minus:SVE_F -(match_operand:SVE_F 1 "aarch64_sve_float_arith_operand") -(match_operand:SVE_F 2 "register_operand"))] - UNSPEC_MERGE_PTRUE))] + (match_operand:SVE_F 1 "aarch64_sve_float_arith_operand") + (match_operand:SVE_F 2 "register_operand")] + UNSPEC_COND_FSUB))] "TARGET_SVE" { operands[3] = aarch64_ptrue_reg (mode); @@ -2113,10 +2114,9 @@ (define_insn_and_split "*sub3" [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w, w") (unspec:SVE_F [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl") - (minus:SVE_F -(match_operand:SVE_F 2 "aarch64_sve_float_arith_operand" "0, 0, vsA, w") -(match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" "vsA, vsN, 0, w"))] - UNSPEC_MERGE_PTRUE))] + (match_operand:SVE_F 2 "a
[committed][AArch64] Add a "GP strictness" operand to SVE FP unspecs
This patch makes the SVE unary, binary and ternary FP unspecs take a new "GP strictness" operand that indicates whether the predicate has to be taken literally, or whether it is valid to make extra lanes active (up to and including using a PTRUE). This again is laying the groundwork for the ACLE patterns, in which the value can depend on the FP command-line flags. At the moment it's only needed for addition, subtraction and multiplication, which have unpredicated forms that can only be used when operating on all lanes is safe. But in future it might be useful for optimising predicate usage. The strict mode requires extra alternatives for addition, subtraction and multiplication, but I've left those for the main ACLE patch. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274418. Richard 2019-08-14 Richard Sandiford Kugan Vivekanandarajah gcc/ * config/aarch64/aarch64.md (SVE_RELAXED_GP, SVE_STRICT_GP): New constants. * config/aarch64/predicates.md (aarch64_sve_gp_strictness): New predicate. * config/aarch64/aarch64-protos.h (aarch64_sve_pred_dominates_p): Declare. * config/aarch64/aarch64.c (aarch64_sve_pred_dominates_p): New function. * config/aarch64/aarch64-sve.md: Add a block comment about the handling of predicated FP operations. (2, add3) (sub3, mul3, div3) (3) (3) (4): Add an SVE_RELAXED_GP operand. (cond_) (cond_): Add an SVE_STRICT_GP operand. (*2) (*cond__2) (*cond__3) (*cond__any) (*fabd3, *div3) (*3) (*4) (*cond__2) (*cond__4) (*cond__any): Match the strictness operands. Use aarch64_sve_pred_dominates_p to check whether the predicate on the conditional operation is suitable for merging. Split patterns into the canonical equal-predicate form. (*add3, *sub3, *mul3): Likewise. Restrict the unpredicated alternatives to SVE_RELAXED_GP. Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md 2019-08-14 08:58:06.357767418 +0100 +++ gcc/config/aarch64/aarch64.md 2019-08-14 09:13:55.210734712 +0100 @@ -268,6 +268,18 @@ (define_constants ; Indicates that the predicate is known to be a PTRUE. (SVE_KNOWN_PTRUE 1)]) +;; These constants are used as a const_int in predicated SVE FP arithmetic +;; to indicate whether the operation is allowed to make additional lanes +;; active without worrying about the effect on faulting behavior. +(define_constants + [; Indicates either that all lanes are active or that the instruction may + ; operate on inactive inputs even if doing so could induce a fault. + (SVE_RELAXED_GP 0) + + ; Indicates that some lanes might be inactive and that the instruction + ; must not operate on inactive inputs if doing so could induce a fault. + (SVE_STRICT_GP 1)]) + ;; If further include files are added the defintion of MD_INCLUDES ;; must be updated. Index: gcc/config/aarch64/predicates.md === --- gcc/config/aarch64/predicates.md2019-08-14 08:58:06.357767418 +0100 +++ gcc/config/aarch64/predicates.md2019-08-14 09:13:55.210734712 +0100 @@ -689,6 +689,11 @@ (define_predicate "aarch64_sve_ptrue_fla (ior (match_test "INTVAL (op) == SVE_MAYBE_NOT_PTRUE") (match_test "INTVAL (op) == SVE_KNOWN_PTRUE" +(define_predicate "aarch64_sve_gp_strictness" + (and (match_code "const_int") + (ior (match_test "INTVAL (op) == SVE_RELAXED_GP") + (match_test "INTVAL (op) == SVE_STRICT_GP" + (define_predicate "aarch64_gather_scale_operand_w" (and (match_code "const_int") (match_test "INTVAL (op) == 1 || INTVAL (op) == 4"))) Index: gcc/config/aarch64/aarch64-protos.h === --- gcc/config/aarch64/aarch64-protos.h 2019-08-14 08:58:06.349767478 +0100 +++ gcc/config/aarch64/aarch64-protos.h 2019-08-14 09:13:55.206734742 +0100 @@ -554,6 +554,7 @@ rtx aarch64_ptrue_all (unsigned int); void aarch64_expand_mov_immediate (rtx, rtx); rtx aarch64_ptrue_reg (machine_mode); rtx aarch64_pfalse_reg (machine_mode); +bool aarch64_sve_pred_dominates_p (rtx *, rtx); void aarch64_emit_sve_pred_move (rtx, rtx, rtx); void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode); bool aarch64_maybe_expand_sve_subreg_move (rtx, rtx); Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c2019-08-14 09:03:20.523438266 +0100 +++ gcc/config/aarch64/aarch64.c2019-08-14 09:13:55.210734712 +0100 @@ -2765,6 +2765,24 @@ aarch64_pfalse_reg (machine_mode mode) return gen_lowpart (mode, reg); } +/* Return true i
[committed][AArch64] Commonise some SVE FP patterns
This patch uses a single expander for generic FP binary optabs that map to predicated SVE instructions. This makes them consistent with the associated conditional optabs, which already work this way. The patch also generalises the division handling to be one example of a register-only predicated FP operation. The ACLE patches will add FMULX to the same category. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274419. Richard 2019-08-14 Richard Sandiford Kugan Vivekanandarajah gcc/ * config/aarch64/iterators.md (SVE_COND_FP_BINARY_REG): New int iterator. (sve_pred_fp_rhs1_operand, sve_pred_fp_rhs1_operand): New int attributes. * config/aarch64/aarch64-sve.md (add3, sub3) (mul3, div3) (3): Merge into... (3): ...this new expander. (*div3): Generalize to... (*3): ...this. Index: gcc/config/aarch64/iterators.md === --- gcc/config/aarch64/iterators.md 2019-08-14 08:58:06.357767418 +0100 +++ gcc/config/aarch64/iterators.md 2019-08-14 09:18:49.360558297 +0100 @@ -1646,6 +1646,8 @@ (define_int_iterator SVE_COND_FP_BINARY UNSPEC_COND_FMUL UNSPEC_COND_FSUB]) +(define_int_iterator SVE_COND_FP_BINARY_REG [UNSPEC_COND_FDIV]) + ;; Floating-point max/min operations that correspond to optabs, ;; as opposed to those that are internal to the port. (define_int_iterator SVE_COND_FP_MAXMIN_PUBLIC [UNSPEC_COND_FMAXNM @@ -2003,3 +2005,23 @@ (define_int_attr sve_fmad_op [(UNSPEC_CO (UNSPEC_COND_FMLS "fmsb") (UNSPEC_COND_FNMLA "fnmad") (UNSPEC_COND_FNMLS "fnmsb")]) + +;; The predicate to use for the first input operand in a floating-point +;; 3 pattern. +(define_int_attr sve_pred_fp_rhs1_operand + [(UNSPEC_COND_FADD "register_operand") + (UNSPEC_COND_FDIV "register_operand") + (UNSPEC_COND_FMAXNM "register_operand") + (UNSPEC_COND_FMINNM "register_operand") + (UNSPEC_COND_FMUL "register_operand") + (UNSPEC_COND_FSUB "aarch64_sve_float_arith_operand")]) + +;; The predicate to use for the second input operand in a floating-point +;; 3 pattern. +(define_int_attr sve_pred_fp_rhs2_operand + [(UNSPEC_COND_FADD "aarch64_sve_float_arith_with_sub_operand") + (UNSPEC_COND_FDIV "register_operand") + (UNSPEC_COND_FMAXNM "register_operand") + (UNSPEC_COND_FMINNM "register_operand") + (UNSPEC_COND_FMUL "aarch64_sve_float_mul_operand") + (UNSPEC_COND_FSUB "register_operand")]) Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:15:57.613827991 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:18:49.360558297 +0100 @@ -73,7 +73,6 @@ ;; [FP] Subtraction ;; [FP] Absolute difference ;; [FP] Multiplication -;; [FP] Division ;; [FP] Binary logical operations ;; [FP] Sign copying ;; [FP] Maximum and minimum @@ -2037,6 +2036,38 @@ (define_insn "*post_ra_ ;; - FSUBR ;; - +;; Unpredicated floating-point binary operations. +(define_expand "3" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_dup 3) + (const_int SVE_RELAXED_GP) + (match_operand:SVE_F 1 "") + (match_operand:SVE_F 2 "")] + SVE_COND_FP_BINARY))] + "TARGET_SVE" + { +operands[3] = aarch64_ptrue_reg (mode); + } +) + +;; Predicated floating-point binary operations that have no immediate forms. +(define_insn "*3" + [(set (match_operand:SVE_F 0 "register_operand" "=w, w, ?&w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl, Upl, Upl") + (match_operand:SI 4 "aarch64_sve_gp_strictness") + (match_operand:SVE_F 2 "register_operand" "0, w, w") + (match_operand:SVE_F 3 "register_operand" "w, 0, w")] + SVE_COND_FP_BINARY_REG))] + "TARGET_SVE" + "@ + \t%0., %1/m, %0., %3. + \t%0., %1/m, %0., %2. + movprfx\t%0, %2\;\t%0., %1/m, %0., %3." + [(set_attr "movprfx" "*,*,yes")] +) + ;; Predicated floating-point operations with merging. (define_expand "cond_" [(set (match_operand:SVE_F 0 "register_operand") @@ -2150,21 +2181,6 @@ (define_insn_and_rewrite "*cond_< ;; - FSUB ;; - -;; Unpredicated floating-point addition. -(define_expand "add3" - [(set (match_operand:SVE_F 0 "register_operand") - (unspec:SVE_F - [(match_dup 3) - (const_int SVE_RELAXED_GP) - (match_operand:SVE_F 1 "register_operand") - (match_operand:SVE_F 2 "aarch64_sve_float_arith_with_sub_operand")] - UNSPEC_COND_FADD))] - "TARGE
Re: [PATCH] Add missing popcount simplifications (PR90693)
On Tue, Aug 13, 2019 at 6:47 PM Andrew Pinski wrote: > > On Tue, Aug 13, 2019 at 8:50 AM Wilco Dijkstra wrote: > > > > Add simplifications for popcount (x) > 1 to (x & (x-1)) != 0 and > > popcount (x) == 1 into (x-1) > single-use cases and support an optional convert. A microbenchmark > > shows a speedup of 2-2.5x on both x64 and AArch64. > > > > Bootstrap OK, OK for commit? > > I think this should be in expand stage where there could be comparison > of the cost of the RTLs. I tend to agree here, if not then for the reason the "simplified" variants have more GIMPLE stmts which means they are not "simpler". In fact I'd argue for canonicalization we'd want to have the reverse "simplifications" on GIMPLE and expansion based on target cost. Richard. > The only reason why it is faster for AARCH64 is the requirement of > moving between the GPRs and the SIMD registers. > > Thanks, > Andrew Pinski > > > > > ChangeLog: > > 2019-08-13 Wilco Dijkstra > > > > gcc/ > > PR middle-end/90693 > > * match.pd: Add popcount simplifications. > > > > testsuite/ > > PR middle-end/90693 > > * gcc.dg/fold-popcount-5.c: Add new test. > > > > --- > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index > > 0317bc704f771f626ab72189b3a54de00087ad5a..bf4351a330f45f3a1424d9792cefc3da6267597d > > 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -5356,7 +5356,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > rep (eq eq ne ne) > > (simplify > >(cmp (popcount @0) integer_zerop) > > - (rep @0 { build_zero_cst (TREE_TYPE (@0)); } > > + (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))) > > + /* popcount(X) == 1 -> (X-1) > + (for cmp (eq ne) > > + rep (lt ge) > > +(simplify > > + (cmp (convert? (popcount:s @0)) integer_onep) > > + (with { > > + tree utype = unsigned_type_for (TREE_TYPE (@0)); > > + tree a0 = fold_convert (utype, @0); } > > + (rep (plus { a0; } { build_minus_one_cst (utype); }) > > +(bit_and (negate { a0; }) { a0; }) > > + /* popcount(X) > 1 -> (X & (X-1)) != 0. */ > > + (for cmp (gt le) > > + rep (ne eq) > > +(simplify > > + (cmp (convert? (popcount:s @0)) integer_onep) > > + (rep (bit_and (plus @0 { build_minus_one_cst (TREE_TYPE (@0)); }) @0) > > + { build_zero_cst (TREE_TYPE (@0)); } > > > > /* Simplify: > > > > diff --git a/gcc/testsuite/gcc.dg/fold-popcount-5.c > > b/gcc/testsuite/gcc.dg/fold-popcount-5.c > > new file mode 100644 > > index > > ..fcf3910587caacb8e39cf437dc3971df892f405a > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/fold-popcount-5.c > > @@ -0,0 +1,69 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > + > > +/* Test popcount (x) > 1 -> (x & (x-1)) != 0. */ > > + > > +int test_1 (long x) > > +{ > > + return __builtin_popcountl (x) >= 2; > > +} > > + > > +int test_2 (int x) > > +{ > > + return (unsigned) __builtin_popcount (x) <= 1u; > > +} > > + > > +int test_3 (unsigned x) > > +{ > > + return __builtin_popcount (x) > 1u; > > +} > > + > > +int test_4 (unsigned long x) > > +{ > > + return (unsigned char) __builtin_popcountl (x) > 1; > > +} > > + > > +int test_5 (unsigned long x) > > +{ > > + return (signed char) __builtin_popcountl (x) <= (signed char)1; > > +} > > + > > +int test_6 (unsigned long long x) > > +{ > > + return 2u <= __builtin_popcountll (x); > > +} > > + > > +/* Test popcount (x) == 1 -> (x-1) > + > > +int test_7 (unsigned long x) > > +{ > > + return __builtin_popcountl (x) != 1; > > +} > > + > > +int test_8 (long long x) > > +{ > > + return (unsigned) __builtin_popcountll (x) == 1u; > > +} > > + > > +int test_9 (int x) > > +{ > > + return (unsigned char) __builtin_popcount (x) != 1u; > > +} > > + > > +int test_10 (unsigned x) > > +{ > > + return (unsigned char) __builtin_popcount (x) == 1; > > +} > > + > > +int test_11 (long x) > > +{ > > + return (signed char) __builtin_popcountl (x) == 1; > > +} > > + > > +int test_12 (long x) > > +{ > > + return 1u == __builtin_popcountl (x); > > +} > > + > > +/* { dg-final { scan-tree-dump-times "popcount" 0 "optimized" } } */ > > +
[committed][AArch64] Add support for SVE HF vconds
We were missing vcond patterns that had HF comparisons and HI or HF data. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274420. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/iterators.md (SVE_HSD): New mode iterator. (V_FP_EQUIV, v_fp_equiv): Handle VNx8HI and VNx8HF. * config/aarch64/aarch64-sve.md (vcond): Use SVE_HSD instead of SVE_SD. gcc/testsuite/ * gcc.target/aarch64/sve/vcond_17.c: New test. * gcc.target/aarch64/sve/vcond_17_run.c: Likewise. Index: gcc/config/aarch64/iterators.md === --- gcc/config/aarch64/iterators.md 2019-08-14 09:20:57.547610677 +0100 +++ gcc/config/aarch64/iterators.md 2019-08-14 09:23:37.626427347 +0100 @@ -301,6 +301,9 @@ (define_mode_iterator SVE_HSDI [VNx16QI ;; All SVE floating-point vector modes that have 16-bit or 32-bit elements. (define_mode_iterator SVE_HSF [VNx8HF VNx4SF]) +;; All SVE vector modes that have 16-bit, 32-bit or 64-bit elements. +(define_mode_iterator SVE_HSD [VNx8HI VNx4SI VNx2DI VNx8HF VNx4SF VNx2DF]) + ;; All SVE vector modes that have 32-bit or 64-bit elements. (define_mode_iterator SVE_SD [VNx4SI VNx2DI VNx4SF VNx2DF]) @@ -928,9 +931,11 @@ (define_mode_attr v_int_equiv [(V8QI "v8 ]) ;; Floating-point equivalent of selected modes. -(define_mode_attr V_FP_EQUIV [(VNx4SI "VNx4SF") (VNx4SF "VNx4SF") +(define_mode_attr V_FP_EQUIV [(VNx8HI "VNx8HF") (VNx8HF "VNx8HF") + (VNx4SI "VNx4SF") (VNx4SF "VNx4SF") (VNx2DI "VNx2DF") (VNx2DF "VNx2DF")]) -(define_mode_attr v_fp_equiv [(VNx4SI "vnx4sf") (VNx4SF "vnx4sf") +(define_mode_attr v_fp_equiv [(VNx8HI "vnx8hf") (VNx8HF "vnx8hf") + (VNx4SI "vnx4sf") (VNx4SF "vnx4sf") (VNx2DI "vnx2df") (VNx2DF "vnx2df")]) ;; Mode for vector conditional operations where the comparison has Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:20:57.547610677 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:23:37.626427347 +0100 @@ -2884,13 +2884,13 @@ (define_expand "vcondu" - [(set (match_operand:SVE_SD 0 "register_operand") - (if_then_else:SVE_SD + [(set (match_operand:SVE_HSD 0 "register_operand") + (if_then_else:SVE_HSD (match_operator 3 "comparison_operator" [(match_operand: 4 "register_operand") (match_operand: 5 "aarch64_simd_reg_or_zero")]) - (match_operand:SVE_SD 1 "register_operand") - (match_operand:SVE_SD 2 "register_operand")))] + (match_operand:SVE_HSD 1 "register_operand") + (match_operand:SVE_HSD 2 "register_operand")))] "TARGET_SVE" { aarch64_expand_sve_vcond (mode, mode, operands); Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_17.c === --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_17.c 2019-08-14 09:23:37.626427347 +0100 @@ -0,0 +1,94 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define eq(A, B) ((A) == (B)) +#define ne(A, B) ((A) != (B)) +#define olt(A, B) ((A) < (B)) +#define ole(A, B) ((A) <= (B)) +#define oge(A, B) ((A) >= (B)) +#define ogt(A, B) ((A) > (B)) +#define ordered(A, B) (!__builtin_isunordered (A, B)) +#define unordered(A, B) (__builtin_isunordered (A, B)) +#define ueq(A, B) (!__builtin_islessgreater (A, B)) +#define ult(A, B) (__builtin_isless (A, B)) +#define ule(A, B) (__builtin_islessequal (A, B)) +#define uge(A, B) (__builtin_isgreaterequal (A, B)) +#define ugt(A, B) (__builtin_isgreater (A, B)) +#define nueq(A, B) (__builtin_islessgreater (A, B)) +#define nult(A, B) (!__builtin_isless (A, B)) +#define nule(A, B) (!__builtin_islessequal (A, B)) +#define nuge(A, B) (!__builtin_isgreaterequal (A, B)) +#define nugt(A, B) (!__builtin_isgreater (A, B)) + +#define DEF_LOOP(CMP, EXPECT_INVALID) \ + void __attribute__ ((noinline, noclone)) \ + test_##CMP##_var (__fp16 *restrict dest, __fp16 *restrict src, \ + __fp16 fallback, __fp16 *restrict a,\ + __fp16 *restrict b, int count) \ + {\ +for (int i = 0; i < count; ++i)\ + dest[i] = CMP (a[i], b[i]) ? src[i] : fallback; \ + }\ + \ + void __attribute__ ((noinline, noclone)) \ + test_##CMP##_zero (__fp16 *restrict dest, __fp16 *restrict src,
[committed][AArch64] Rework SVE FP comparisons
This patch rewrites the SVE FP comparisons so that they always use unspecs and so that they have an additional operand to indicate whether the predicate is known to be a PTRUE. It's part of a series that rewrites the SVE FP patterns so that they can cope with non-PTRUE predicates. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274421. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/iterators.md (UNSPEC_COND_FCMUO): New unspec. (cmp_op): Handle it. (SVE_COND_FP_CMP): Rename to... (SVE_COND_FP_CMP_I0): ...this. (SVE_FP_CMP): Remove. * config/aarch64/aarch64-sve.md (*fcm): Replace with... (*fcm): ...this new pattern, using unspecs to represent the comparison. (*fcmuo): Use UNSPEC_COND_FCMUO. (*fcm_and_combine, *fcmuo_and_combine): Update accordingly. * config/aarch64/aarch64.c (aarch64_emit_sve_ptrue_op): Delete. (aarch64_unspec_cond_code): Move after integer code. Handle UNORDERED. (aarch64_emit_sve_predicated_cond): Replace with... (aarch64_emit_sve_fp_cond): ...this new function. (aarch64_emit_sve_or_conds): Replace with... (aarch64_emit_sve_or_fp_conds): ...this new function. (aarch64_emit_sve_inverted_cond): Replace with... (aarch64_emit_sve_invert_fp_cond): ...this new function. (aarch64_expand_sve_vec_cmp_float): Update accordingly. Index: gcc/config/aarch64/iterators.md === --- gcc/config/aarch64/iterators.md 2019-08-14 09:25:49.689451157 +0100 +++ gcc/config/aarch64/iterators.md 2019-08-14 09:29:14.195939545 +0100 @@ -479,6 +479,7 @@ (define_c_enum "unspec" UNSPEC_COND_FCMLE ; Used in aarch64-sve.md. UNSPEC_COND_FCMLT ; Used in aarch64-sve.md. UNSPEC_COND_FCMNE ; Used in aarch64-sve.md. +UNSPEC_COND_FCMUO ; Used in aarch64-sve.md. UNSPEC_COND_FDIV ; Used in aarch64-sve.md. UNSPEC_COND_FMAXNM ; Used in aarch64-sve.md. UNSPEC_COND_FMINNM ; Used in aarch64-sve.md. @@ -1273,9 +1274,6 @@ (define_code_iterator SVE_UNPRED_FP_BINA ;; SVE integer comparisons. (define_code_iterator SVE_INT_CMP [lt le eq ne ge gt ltu leu geu gtu]) -;; SVE floating-point comparisons. -(define_code_iterator SVE_FP_CMP [lt le eq ne ge gt]) - ;; --- ;; Code Attributes ;; --- @@ -1663,12 +1661,13 @@ (define_int_iterator SVE_COND_FP_TERNARY UNSPEC_COND_FNMLA UNSPEC_COND_FNMLS]) -(define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_FCMEQ - UNSPEC_COND_FCMGE - UNSPEC_COND_FCMGT - UNSPEC_COND_FCMLE - UNSPEC_COND_FCMLT - UNSPEC_COND_FCMNE]) +;; SVE FP comparisons that accept #0.0. +(define_int_iterator SVE_COND_FP_CMP_I0 [UNSPEC_COND_FCMEQ +UNSPEC_COND_FCMGE +UNSPEC_COND_FCMGT +UNSPEC_COND_FCMLE +UNSPEC_COND_FCMLT +UNSPEC_COND_FCMNE]) (define_int_iterator FCADD [UNSPEC_FCADD90 UNSPEC_FCADD270]) @@ -1955,7 +1954,8 @@ (define_int_attr cmp_op [(UNSPEC_COND_FC (UNSPEC_COND_FCMGT "gt") (UNSPEC_COND_FCMLE "le") (UNSPEC_COND_FCMLT "lt") -(UNSPEC_COND_FCMNE "ne")]) +(UNSPEC_COND_FCMNE "ne") +(UNSPEC_COND_FCMUO "uo")]) (define_int_attr sve_int_op [(UNSPEC_ANDV "andv") (UNSPEC_IORV "orv") Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:25:49.685451187 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:29:14.191939575 +0100 @@ -3136,15 +3136,15 @@ (define_expand "vec_cmp" } ) -;; Floating-point comparisons predicated with a PTRUE. +;; Predicated floating-point comparisons. (define_insn "*fcm" [(set (match_operand: 0 "register_operand" "=Upa, Upa") (unspec: [(match_operand: 1 "register_operand" "Upl, Upl") - (SVE_FP_CMP: -(match_operand:SVE_F 2 "register_operand" "w, w") -(match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "Dz, w"))] - UNSPEC_MERGE_PTRUE))] + (match_operand:SI 4 "aarch64_sve_ptrue_flag") + (match_operand:SVE_F 2 "register_operand" "w, w") + (match_operand:SVE_F 3 "aarch64_simd_reg_or_zer
[committed][AArch64] Use unspecs for SVE conversions involving floats
This patch changes the SVE FP<->FP and FP<->INT patterns so that they use unspecs rather than rtx codes, continuing the series to make the patterns work with predicates that might not be all-true. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274423. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/aarch64.md (UNSPEC_FLOAT_CONVERT): Delete. * config/aarch64/iterators.md (UNSPEC_COND_FCVT, UNSPEC_COND_FCVTZS) (UNSPEC_COND_FCVTZU, UNSPEC_COND_SCVTF, UNSPEC_COND_UCVTF): New unspecs. (optab, su): Handle them. (SVE_COND_FCVTI, SVE_COND_ICVTF): New int iterators. * config/aarch64/aarch64-sve.md (2): Replace with... (2): ...this. (*v16hsf<:SVE_HSDImode>2): Replace with... (*v16hsf2): ...this. (*vnx4sf2): Replace with... (*vnx4sf2): ...this. (*vnx2df2): Replace with... (*vnx2df2): ...this. (vec_pack_fix_trunc_vnx2df): Use SVE_COND_FCVTI instead of FIXUORS. (2): Replace with... (2): ...this. (*vnx8hf2): Replace with... (*vnx8hf2): ...this. (*vnx4sf2): Replace with... (*vnx4sf2): ...this. (aarch64_sve_vnx2df2): Replace with... (aarch64_sve_vnx2df2): ...this. (vec_unpack_float__vnx4si): Pass a GP strictness operand to aarch64_sve_vnx2df2. (vec_pack_trunc_, *trunc2) (aarch64_sve_extend2): Use UNSPEC_COND_FCVT instead of UNSPEC_FLOAT_CONVERT. (vec_unpacks__): Pass a GP strictness operand to aarch64_sve_extend2. Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md 2019-08-14 09:15:57.617827961 +0100 +++ gcc/config/aarch64/aarch64.md 2019-08-14 09:32:41.846404780 +0100 @@ -226,7 +226,6 @@ (define_c_enum "unspec" [ UNSPEC_UNPACKSLO UNSPEC_UNPACKULO UNSPEC_PACK -UNSPEC_FLOAT_CONVERT UNSPEC_WHILE_LO UNSPEC_LDN UNSPEC_STN Index: gcc/config/aarch64/iterators.md === --- gcc/config/aarch64/iterators.md 2019-08-14 09:29:52.871653684 +0100 +++ gcc/config/aarch64/iterators.md 2019-08-14 09:32:41.846404780 +0100 @@ -480,6 +480,9 @@ (define_c_enum "unspec" UNSPEC_COND_FCMLT ; Used in aarch64-sve.md. UNSPEC_COND_FCMNE ; Used in aarch64-sve.md. UNSPEC_COND_FCMUO ; Used in aarch64-sve.md. +UNSPEC_COND_FCVT ; Used in aarch64-sve.md. +UNSPEC_COND_FCVTZS ; Used in aarch64-sve.md. +UNSPEC_COND_FCVTZU ; Used in aarch64-sve.md. UNSPEC_COND_FDIV ; Used in aarch64-sve.md. UNSPEC_COND_FMAXNM ; Used in aarch64-sve.md. UNSPEC_COND_FMINNM ; Used in aarch64-sve.md. @@ -498,6 +501,8 @@ (define_c_enum "unspec" UNSPEC_COND_FRINTZ ; Used in aarch64-sve.md. UNSPEC_COND_FSQRT ; Used in aarch64-sve.md. UNSPEC_COND_FSUB ; Used in aarch64-sve.md. +UNSPEC_COND_SCVTF ; Used in aarch64-sve.md. +UNSPEC_COND_UCVTF ; Used in aarch64-sve.md. UNSPEC_LASTB ; Used in aarch64-sve.md. UNSPEC_FCADD90 ; Used in aarch64-simd.md. UNSPEC_FCADD270; Used in aarch64-simd.md. @@ -1642,6 +1647,9 @@ (define_int_iterator SVE_COND_FP_UNARY [ UNSPEC_COND_FRINTZ UNSPEC_COND_FSQRT]) +(define_int_iterator SVE_COND_FCVTI [UNSPEC_COND_FCVTZS UNSPEC_COND_FCVTZU]) +(define_int_iterator SVE_COND_ICVTF [UNSPEC_COND_SCVTF UNSPEC_COND_UCVTF]) + (define_int_iterator SVE_COND_FP_BINARY [UNSPEC_COND_FADD UNSPEC_COND_FDIV UNSPEC_COND_FMAXNM @@ -1715,6 +1723,9 @@ (define_int_attr optab [(UNSPEC_ANDF "an (UNSPEC_FMINV "smin_nan") (UNSPEC_COND_FABS "abs") (UNSPEC_COND_FADD "add") + (UNSPEC_COND_FCVT "fcvt") + (UNSPEC_COND_FCVTZS "fix_trunc") + (UNSPEC_COND_FCVTZU "fixuns_trunc") (UNSPEC_COND_FDIV "div") (UNSPEC_COND_FMAXNM "smax") (UNSPEC_COND_FMINNM "smin") @@ -1732,7 +1743,9 @@ (define_int_attr optab [(UNSPEC_ANDF "an (UNSPEC_COND_FRINTX "rint") (UNSPEC_COND_FRINTZ "btrunc") (UNSPEC_COND_FSQRT "sqrt") - (UNSPEC_COND_FSUB "sub")]) + (UNSPEC_COND_FSUB "sub") + (UNSPEC_COND_SCVTF "float") + (UNSPEC_COND_UCVTF "floatuns")]) (define_int_attr maxmin_uns [(UNSPEC_UMAXV "umax") (UNSPEC_UMINV "umin") @@ -1773,7 +1786,11 @@ (define_int_attr su [(UNSPEC_UNPACKSHI " (UNSPEC_UNPACKSLO "s") (UNSPEC_UNPA
[committed][AArch64] Rearrange SVE conversion patterns
The SVE int<->float conversion patterns need to handle various combinations of modes, making sure that the predicate mode is based on the widest element size. We did this using separate patterns for conversions involving: - HF (converting to/from [HSD]I, predicated based on the int operand) - SF (converting to/from [SD]I, predicated based on the int operand) - DF (converting to/from [SD]I, predicated based on the float operand) This worked, and meant that there were no redundant patterns. However, the ACLE needs various new predicated patterns too, and having three versions of each one seemed excessive. This patch instead splits the patterns into two groups rather than three. For conversions to integers: - truncating (predicated based on the source type, DF->SI only) - non-truncating (predicated based on the destination type) For conversions from integers: - extending (predicated based on the destination type, SI->DF only) - non-extending (predicated based on the source type) This means that we still don't create pattern names for the invalid combinations DF<->HI and SF<->HI. The downside is that we need to use C conditions to exclude the SI<->DF case from the non-truncating/ non-extending patterns. We therefore have two pattern names for SI<->DF, but genconditions ensures that the invalid one always has the value CODE_FOR_nothing. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274424. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/iterators.md (VNx4SI_ONLY, VNx2DF_ONLY): New mode iterators. (SVE_BHSI, SVE_SDI): Tweak comment. (SVE_HSDI): Likewise. Fix definition. (SVE_SDF): New mode iterator. (elem_bits): New mode attribute. (SVE_COND_FCVT): New int iterator. * config/aarch64/aarch64-sve.md (*v16hsf2) (*vnx4sf2) (*vnx2df2): Merge into... (*aarch64_sve__nontrunc) (*aarch64_sve__trunc): ...these new patterns. (*vnx8hf2) (*vnx4sf2) (aarch64_sve_vnx2df2): Merge into... (*aarch64_sve__nonextend) (aarch64_sve__extend): ...these new patterns. (vec_unpack_float__vnx4si): Update accordingly. (*trunc2): Replace with... (*aarch64_sve__trunc): ...this new pattern. (aarch64_sve_extend2): Replace with... (aarch64_sve__nontrunc): ...this new pattern. (vec_unpacks__): Update accordingly. Index: gcc/config/aarch64/iterators.md === --- gcc/config/aarch64/iterators.md 2019-08-14 09:34:05.509786440 +0100 +++ gcc/config/aarch64/iterators.md 2019-08-14 09:38:47.027705882 +0100 @@ -278,6 +278,10 @@ (define_mode_iterator VMUL_CHANGE_NLANES (define_mode_iterator SVE_ALL [VNx16QI VNx8HI VNx4SI VNx2DI VNx8HF VNx4SF VNx2DF]) +;; Iterators for single modes, for "@" patterns. +(define_mode_iterator VNx4SI_ONLY [VNx4SI]) +(define_mode_iterator VNx2DF_ONLY [VNx2DF]) + ;; All SVE vector structure modes. (define_mode_iterator SVE_STRUCT [VNx32QI VNx16HI VNx8SI VNx4DI VNx16HF VNx8SF VNx4DF @@ -292,15 +296,21 @@ (define_mode_iterator SVE_BH [VNx16QI VN ;; All SVE vector modes that have 8-bit, 16-bit or 32-bit elements. (define_mode_iterator SVE_BHS [VNx16QI VNx8HI VNx4SI VNx8HF VNx4SF]) -;; All SVE integer vector modes that have 8-bit, 16-bit or 32-bit elements. +;; SVE integer vector modes that have 8-bit, 16-bit or 32-bit elements. (define_mode_iterator SVE_BHSI [VNx16QI VNx8HI VNx4SI]) -;; All SVE integer vector modes that have 16-bit, 32-bit or 64-bit elements. -(define_mode_iterator SVE_HSDI [VNx16QI VNx8HI VNx4SI]) +;; SVE integer vector modes that have 16-bit, 32-bit or 64-bit elements. +(define_mode_iterator SVE_HSDI [VNx8HI VNx4SI VNx2DI]) -;; All SVE floating-point vector modes that have 16-bit or 32-bit elements. +;; SVE floating-point vector modes that have 16-bit or 32-bit elements. (define_mode_iterator SVE_HSF [VNx8HF VNx4SF]) +;; SVE integer vector modes that have 32-bit or 64-bit elements. +(define_mode_iterator SVE_SDI [VNx4SI VNx2DI]) + +;; SVE floating-point vector modes that have 32-bit or 64-bit elements. +(define_mode_iterator SVE_SDF [VNx4SF VNx2DF]) + ;; All SVE vector modes that have 16-bit, 32-bit or 64-bit elements. (define_mode_iterator SVE_HSD [VNx8HI VNx4SI VNx2DI VNx8HF VNx4SF VNx2DF]) @@ -313,9 +323,6 @@ (define_mode_iterator SVE_S [VNx4SI VNx4 ;; All SVE vector modes that have 64-bit elements. (define_mode_iterator SVE_D [VNx2DI VNx2DF]) -;; All SVE integer vector modes that have 32-bit or 64-bit elements. -(define_mode_iterator SVE_SDI [VNx4SI VNx2DI]) - ;; All SVE integer vector modes. (define_mode_iterator SVE_I [VNx16QI VNx8HI VNx4SI VNx2DI]) @@ -629,6 +636,11 @@ (define_mode_attr sizen [(QI "8") (HI "1 (define_mode_attr sizem1 [(QI "#7")
Re: [PATCH] Automatics in equivalence statements
I now have commit access. gcc/fortran Jeff Law Mark Eggleston * gfortran.h: Add gfc_check_conflict declaration. * symbol.c (check_conflict): Rename cfg_check_conflict and remove static. * symbol.c (cfg_check_conflict): Remove automatic in equivalence conflict check. * symbol.c (save_symbol): Add check for in equivalence to stop the the save attribute being added. * trans-common.c (build_equiv_decl): Add is_auto parameter and add !is_auto to condition where TREE_STATIC (decl) is set. * trans-common.c (build_equiv_decl): Add local variable is_auto, set it true if an atomatic attribute is encountered in the variable list. Call build_equiv_decl with is_auto as an additional parameter. flag_dec_format_defaults is enabled. * trans-common.c (accumulate_equivalence_attributes) : New subroutine. * trans-common.c (find_equivalence) : New local variable dummy_symbol, accumulated equivalence attributes from each symbol then check for conflicts. gcc/testsuite Mark Eggleston * gfortran.dg/auto_in_equiv_1.f90: New test. * gfortran.dg/auto_in_equiv_2.f90: New test. * gfortran.dg/auto_in_equiv_3.f90: New test. OK to commit? How do I know that I have approval to commit? On 23/07/2019 03:50, Jeff Law wrote: On 7/22/19 8:36 PM, Steve Kargl wrote: On Mon, Jul 22, 2019 at 08:07:12PM -0600, Jeff Law wrote: On 7/22/19 7:38 PM, Steve Kargl wrote: Someone needs to get commit access. I've sent Mark the link for authenticated access. So is he clear to commit once that's set up? jeff Yes, IMHO. He's sent a number of quality patches, and from what I gathered you've worked with him so he has a good mentor. Perfect. THanks. Unfortunately, gfortran has too few contributors at the moment. Y'all aren't alone... Jeff -- https://www.codethink.co.uk/privacy.html >From 8487aa2c195261f62489f94c2e2d16d81f945362 Mon Sep 17 00:00:00 2001 From: Mark Eggleston Date: Tue, 11 Sep 2018 12:50:11 +0100 Subject: [PATCH 2/3] Allow automatics in equivalence If a variable with an automatic attribute appears in an equivalence statement the storage should be allocated on the stack. Note: most of this patch was provided by Jeff Law . --- gcc/fortran/gfortran.h| 1 + gcc/fortran/symbol.c | 102 +- gcc/fortran/trans-common.c| 73 -- gcc/testsuite/gfortran.dg/auto_in_equiv_1.f90 | 36 + gcc/testsuite/gfortran.dg/auto_in_equiv_2.f90 | 38 ++ gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 | 63 6 files changed, 257 insertions(+), 56 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/auto_in_equiv_1.f90 create mode 100644 gcc/testsuite/gfortran.dg/auto_in_equiv_2.f90 create mode 100644 gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 75e5b2f0644..49bcacc9a54 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -3007,6 +3007,7 @@ bool gfc_merge_new_implicit (gfc_typespec *); void gfc_set_implicit_none (bool, bool, locus *); void gfc_check_function_type (gfc_namespace *); bool gfc_is_intrinsic_typename (const char *); +bool gfc_check_conflict (symbol_attribute *, const char *, locus *); gfc_typespec *gfc_get_default_type (const char *, gfc_namespace *); bool gfc_set_default_type (gfc_symbol *, int, gfc_namespace *); diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c index 2b8f86e0881..cc5b5efa3a8 100644 --- a/gcc/fortran/symbol.c +++ b/gcc/fortran/symbol.c @@ -407,8 +407,8 @@ gfc_check_function_type (gfc_namespace *ns) goto conflict_std;\ } -static bool -check_conflict (symbol_attribute *attr, const char *name, locus *where) +bool +gfc_check_conflict (symbol_attribute *attr, const char *name, locus *where) { static const char *dummy = "DUMMY", *save = "SAVE", *pointer = "POINTER", *target = "TARGET", *external = "EXTERNAL", *intent = "INTENT", @@ -544,7 +544,6 @@ check_conflict (symbol_attribute *attr, const char *name, locus *where) conf (allocatable, elemental); conf (in_common, automatic); - conf (in_equivalence, automatic); conf (result, automatic); conf (use_assoc, automatic); conf (dummy, automatic); @@ -1004,7 +1003,7 @@ gfc_add_attribute (symbol_attribute *attr, locus *where) if (check_used (attr, NULL, where)) return false; - return check_conflict (attr, NULL, where); + return gfc_check_conflict (attr, NULL, where); } @@ -1030,7 +1029,7 @@ gfc_add_allocatable (symbol_attribute *attr, locus *where) } attr->allocatable = 1; - return check_conflict (attr, NULL, where); + return gfc_check_conflict (attr, NULL, where); } @@ -1045,7 +1044,7 @@ gfc_add_automatic (symbol_attribute *attr, const char *name, locus *where) return false; attr->automatic = 1; - retur
[committed][AArch64] Use "x" predication for SVE integer arithmetic patterns
The SVE patterns used an UNSPEC_MERGE_PTRUE unspec to attach a predicate to an otherwise unpredicated integer arithmetic operation. As its name suggests, this was designed to be a wrapper used for merging instructions in which the predicate is known to be a PTRUE. This unspec dates from the very early days of the port and nothing has ever taken advantage of the PTRUE guarantee for arithmetic (as opposed to comparisons). This patch replaces it with the less stringent guarantee that: (a) the values of inactive lanes don't matter and (b) it is valid to make extra lanes active if there's a specific benefit Doing this makes the patterns suitable for the ACLE _x functions, which have the above semantics. See the block comment in the patch for more details. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274425. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/aarch64.md (UNSPEC_PRED_X): New unspec. * config/aarch64/aarch64-sve.md: Add a section describing it. (@aarch64_pred_mov, @aarch64_pred_mov) (2, *2) (aarch64_abd_3, mul3, *mul3) (mul3_highpart, *mul3_highpart) (3, *3) (*bic3, v3, *v3) (3, *3, *madd) (*msub3, *aarch64_sve_rev64) (*aarch64_sve_rev32, *aarch64_sve_rev16vnx16qi): Use UNSPEC_PRED_X instead of UNSPEC_MERGE_PTRUE. * config/aarch64/aarch64-sve2.md (avg3_floor) (avg3_ceil, *h): Likewise. * config/aarch64/aarch64.c (aarch64_split_sve_subreg_move) (aarch64_evpc_rev_local): Update accordingly. Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md 2019-08-14 09:34:05.509786440 +0100 +++ gcc/config/aarch64/aarch64.md 2019-08-14 09:43:23.977659217 +0100 @@ -220,6 +220,7 @@ (define_c_enum "unspec" [ UNSPEC_LD1_GATHER UNSPEC_ST1_SCATTER UNSPEC_MERGE_PTRUE +UNSPEC_PRED_X UNSPEC_PTEST UNSPEC_UNPACKSHI UNSPEC_UNPACKUHI Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:39:44.323282457 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:43:23.973659247 +0100 @@ -24,6 +24,7 @@ ;; == General notes ;; Note on the handling of big-endian SVE ;; Description of UNSPEC_PTEST +;; Note on predicated integer arithemtic and UNSPEC_PRED_X ;; Note on predicated FP arithmetic patterns and GP "strictness" ;; ;; == Moves @@ -230,6 +231,63 @@ ;; - OP is the predicate we want to test, of the same mode as CAST_GP. ;; ;; - +;; Note on predicated integer arithemtic and UNSPEC_PRED_X +;; - +;; +;; Many SVE integer operations are predicated. We can generate them +;; from four sources: +;; +;; (1) Using normal unpredicated optabs. In this case we need to create +;; an all-true predicate register to act as the governing predicate +;; for the SVE instruction. There are no inactive lanes, and thus +;; the values of inactive lanes don't matter. +;; +;; (2) Using _x ACLE functions. In this case the function provides a +;; specific predicate and some lanes might be inactive. However, +;; as for (1), the values of the inactive lanes don't matter. +;; We can make extra lanes active without changing the behavior +;; (although for code-quality reasons we should avoid doing so +;; needlessly). +;; +;; (3) Using cond_* optabs that correspond to IFN_COND_* internal functions. +;; These optabs have a predicate operand that specifies which lanes are +;; active and another operand that provides the values of inactive lanes. +;; +;; (4) Using _m and _z ACLE functions. These functions map to the same +;; patterns as (3), with the _z functions setting inactive lanes to zero +;; and the _m functions setting the inactive lanes to one of the function +;; arguments. +;; +;; For (1) and (2) we need a way of attaching the predicate to a normal +;; unpredicated integer operation. We do this using: +;; +;; (unspec:M [pred (code:M (op0 op1 ...))] UNSPEC_PRED_X) +;; +;; where (code:M (op0 op1 ...)) is the normal integer operation and PRED +;; is a predicate of mode . PRED might or might not be a PTRUE; +;; it always is for (1), but might not be for (2). +;; +;; The unspec as a whole has the same value as (code:M ...) when PRED is +;; all-true. It is always semantically valid to replace PRED with a PTRUE, +;; but as noted above, we should only do so if there's a specific benefit. +;; +;; (The "_X" in the unspec is named after the ACLE functions in (2).) +;; +;; For (3) and (4) we can simply use the SVE port's normal representation +;; of a predicate-based select: +;; +;; (unspec:M [pred (code:M (
Re: [PATCH 2/3] Add simplify rules for wrapped binary operations.
On Tue, Aug 13, 2019 at 10:36 AM Robin Dapp wrote: > > We would like to simplify code like > (larger_type)(var + const1) + const2 > to > (larger_type)(var + combined_const1_const2) > when we know that no overflow happens. Trowing in my own comments... > --- > gcc/match.pd | 101 +++ > 1 file changed, 101 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 0317bc704f7..94400529ad8 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -2020,6 +2020,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (cst && !TREE_OVERFLOW (cst)) > (plus { cst; } @0 > > +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ > +#if GIMPLE > + (simplify > +(plus (convert (plus @0 INTEGER_CST@1)) INTEGER_CST@2) > + (if (INTEGRAL_TYPE_P (type) > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) > + /* We actually want to perform two simplifications here: > + (1) (T)(A + CST1) + CST2 --> (T)(A) + (T)(CST1) > + If for (A + CST1) we either do not care about overflow (e.g. > + for a signed inner type) or the overflow is ok for an unsigned > + inner type. > + (2) (T)(A) + (T)(CST1) + CST2 --> (T)(A) + (T)(CST1 + CST2) But the original is already (T)(A) + CST1-in-T + CST2-in-T and thus we can always do 2) by means of already existing patterns. So it's only 1) you need to implement?! And 1) is really (T)(A + CST) -> (T)A + CST-in-T, no? > + If (CST1 + CST2) does not overflow and we do care about overflow > + (for a signed outer type) or we do not care about overflow in an > + unsigned outer type. */ > + (with > + { > + tree inner_type = TREE_TYPE (@0); > + wide_int wmin0, wmax0; > + wide_int cst1 = wi::to_wide (@1); > + > +wi::overflow_type min_ovf = wi::OVF_OVERFLOW, > + max_ovf = wi::OVF_OVERFLOW; > + > +/* Get overflow behavior. */ > + bool ovf_undef_inner = TYPE_OVERFLOW_UNDEFINED (inner_type); > + bool ovf_undef_outer = TYPE_OVERFLOW_UNDEFINED (type); > + > +/* Get value range of A. */ > + enum value_range_kind vr0 = get_range_info (@0, &wmin0, &wmax0); > + > +/* If we have a proper range, determine min and max overflow > + of (A + CST1). > + ??? We might also want to handle anti ranges. */ > +if (vr0 == VR_RANGE) > + { > +wi::add (wmin0, cst1, TYPE_SIGN (inner_type), &min_ovf); > +wi::add (wmax0, cst1, TYPE_SIGN (inner_type), &max_ovf); > + } > + > +/* Inner overflow does not matter in this case. */ > +if (ovf_undef_inner) > + { > +min_ovf = wi::OVF_NONE; > +max_ovf = wi::OVF_NONE; > + } > + > +/* Extend CST from INNER_TYPE to TYPE. */ > +cst1 = cst1.from (cst1, TYPE_PRECISION (type), TYPE_SIGN > (inner_type)); > + > +/* Check for overflow of (TYPE)(CST1 + CST2). */ > +wi::overflow_type outer_ovf = wi::OVF_OVERFLOW; > +wide_int cst = wi::add (cst1, wi::to_wide (@2), TYPE_SIGN (type), > +&outer_ovf); > + > +/* We *do* care about an overflow here as we do not want to introduce > + new undefined behavior that was not there before. */ > +if (ovf_undef_outer && outer_ovf) > + { > +/* Set these here to prevent the final conversion below > + to take place instead of introducing a new guard variable. */ > +min_ovf = wi::OVF_OVERFLOW; > +max_ovf = wi::OVF_OVERFLOW; > + } > + } > + (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) > +(plus (convert @0) { wide_int_to_tree (type, cst); } > + ) > +#endif > + > +/* ((T)(A)) + CST -> (T)(A + CST) */ But then this is the reverse... (as Marc already noticed). So - what are you really after? (sorry if I don't remeber, testcase(s) are missing from this patch) To me it seems that 1) loses information if A + CST was done in a signed type and we know that overflow doesn't happen because of that. For the reverse transformation we don't. Btw, if you make A == A' + CST' then you get (T)A + CST -> (T)(A' + CST' + CST) which is again trivially handled so why do you need both transforms again? > +#if GIMPLE > + (simplify > + (plus (convert SSA_NAME@0) INTEGER_CST@1) > +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > + && INTEGRAL_TYPE_P (type) > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > + && int_fits_type_p (@1, TREE_TYPE (@0))) > + /* Perform binary operation inside the cast if the constant fits > +and (A + CST)'s range does not overflow. */ > + (with > + { > + wi::overflow_type min_ovf = wi::OVF_OVERFLOW, > + max_ovf = wi::OVF_OVERFLOW; > +tree inner_type = TREE_TYPE (@0); > +
Re: [PATCH 0/3] Libsanitizer: merge from trunk
On 8/13/19 5:02 PM, Jeff Law wrote: > On 8/13/19 7:07 AM, Martin Liska wrote: >> Hi. >> >> For this year, I decided to make a first merge now and the >> next (much smaller) at the end of October. >> >> The biggest change is rename of many files from .cc to .cpp. >> >> I bootstrapped the patch set on x86_64-linux-gnu and run >> asan/ubsan/tsan tests on x86_64, ppc64le (power8) and >> aarch64. >> >> Libasan SONAME has been already bumped compared to GCC 9. >> >> For other libraries, I don't see a reason for library bumping: >> >> $ abidiff /usr/lib64/libubsan.so.1.0.0 >> ./x86_64-pc-linux-gnu/libsanitizer/ubsan/.libs/libubsan.so.1.0.0 --stat >> Functions changes summary: 0 Removed, 0 Changed, 4 Added functions >> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable >> Function symbols changes summary: 3 Removed, 0 Added function symbols not >> referenced by debug info >> Variable symbols changes summary: 0 Removed, 0 Added variable symbol not >> referenced by debug info >> >> $ abidiff /usr/lib64/libtsan.so.0.0.0 >> ./x86_64-pc-linux-gnu/libsanitizer/tsan/.libs/libtsan.so.0.0.0 --stat >> Functions changes summary: 0 Removed, 0 Changed, 47 Added functions >> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable >> Function symbols changes summary: 1 Removed, 2 Added function symbols not >> referenced by debug info >> Variable symbols changes summary: 0 Removed, 0 Added variable symbol not >> referenced by debug info >> >> Ready to be installed? > ISTM that a sanitizer merge during stage1 should be able to move forward > without ACKs. Similarly for other runtimes where we pull from some > upstream master. Good then. I've just installed the patch and also the refresh of LOCAL_PATCHES. > > I'd be slightly concerned about the function removals, but I don't think > we've really tried to be ABI stable for the sanitizer runtimes. These are fine based on the function names. Martin > > jeff > >From 090353a2c70b2cf18add7520e34366e10b7f54f7 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Wed, 14 Aug 2019 10:48:38 +0200 Subject: [PATCH] Refresh LOCAL_PATCHES libsanitizer/ChangeLog: 2019-08-14 Martin Liska * LOCAL_PATCHES: Refresh based on what was committed. --- libsanitizer/LOCAL_PATCHES | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/libsanitizer/LOCAL_PATCHES b/libsanitizer/LOCAL_PATCHES index f653712fdda..121df67826b 100644 --- a/libsanitizer/LOCAL_PATCHES +++ b/libsanitizer/LOCAL_PATCHES @@ -1,6 +1 @@ -r258525 -r265667 -r265668 -r265669 -r265950 -r270208 +r274427 -- 2.22.0
[committed][AArch64] Rework SVE integer comparisons
The remaining uses of UNSPEC_MERGE_PTRUE were in integer comparison patterns. These aren't actually merging operations but zeroing ones, although there's no practical difference when the predicate is a PTRUE. All comparisons produced by expand are predicated on a PTRUE, although we try to pattern-match a compare-and-AND as a predicated comparison during combine. Like previous patches, this one rearranges things in a way that works better with the ACLE, where the initial predicate might or might not be a PTRUE. The new patterns use UNSPEC_PRED_Z to represent zeroing predication, with a aarch64_sve_ptrue_flag to record whether the predicate is all-true (as for UNSPEC_PTEST). See the block comment in the patch for more details. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274429. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/aarch64-protos.h (aarch64_sve_same_pred_for_ptest_p): Declare. * config/aarch64/aarch64.c (aarch64_sve_same_pred_for_ptest_p) (aarch64_sve_emit_int_cmp): New functions. (aarch64_convert_sve_data_to_pred): Use aarch64_sve_emit_int_cmp. (aarch64_sve_cmp_operand_p, aarch64_emit_sve_ptrue_op_cc): Delete. (aarch64_expand_sve_vec_cmp_int): Use aarch64_sve_emit_int_cmp. * config/aarch64/aarch64.md (UNSPEC_MERGE_PTRUE): Delete. (UNSPEC_PRED_Z): New unspec. (set_clobber_cc_nzc): Delete. * config/aarch64/aarch64-sve.md: Add a block comment about UNSPEC_PRED_Z. (*cmp): Rename to... (@aarch64_pred_cmp): ...this, replacing the old pattern with that name. Use UNSPEC_PRED_Z instead of UNSPEC_MERGE_PTRUE. (*cmp_cc): Use UNSPEC_PRED_Z instead of UNSPEC_MERGE_PTRUE. Use aarch64_sve_same_pred_for_ptest_p to check for compatible predicates. (*cmp_ptest): Likewise. (*cmp_and): Match a known-ptrue UNSPEC_PRED_Z instead of UNSPEC_MERGE_PTRUE. Split into the new form of predicated comparisons above. Index: gcc/config/aarch64/aarch64-protos.h === --- gcc/config/aarch64/aarch64-protos.h 2019-08-14 09:15:57.609828019 +0100 +++ gcc/config/aarch64/aarch64-protos.h 2019-08-14 09:47:31.355831192 +0100 @@ -555,6 +555,7 @@ void aarch64_expand_mov_immediate (rtx, rtx aarch64_ptrue_reg (machine_mode); rtx aarch64_pfalse_reg (machine_mode); bool aarch64_sve_pred_dominates_p (rtx *, rtx); +bool aarch64_sve_same_pred_for_ptest_p (rtx *, rtx *); void aarch64_emit_sve_pred_move (rtx, rtx, rtx); void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode); bool aarch64_maybe_expand_sve_subreg_move (rtx, rtx); Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c2019-08-14 09:45:45.464613673 +0100 +++ gcc/config/aarch64/aarch64.c2019-08-14 09:47:31.355831192 +0100 @@ -2783,6 +2783,48 @@ aarch64_sve_pred_dominates_p (rtx *pred1 || rtx_equal_p (pred1[0], pred2)); } +/* PRED1[0] is a PTEST predicate and PRED1[1] is an aarch64_sve_ptrue_flag + for it. PRED2[0] is the predicate for the instruction whose result + is tested by the PTEST and PRED2[1] is again an aarch64_sve_ptrue_flag + for it. Return true if we can prove that the two predicates are + equivalent for PTEST purposes; that is, if we can replace PRED2[0] + with PRED1[0] without changing behavior. */ + +bool +aarch64_sve_same_pred_for_ptest_p (rtx *pred1, rtx *pred2) +{ + machine_mode mode = GET_MODE (pred1[0]); + gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL + && mode == GET_MODE (pred2[0]) + && aarch64_sve_ptrue_flag (pred1[1], SImode) + && aarch64_sve_ptrue_flag (pred2[1], SImode)); + + bool ptrue1_p = (pred1[0] == CONSTM1_RTX (mode) + || INTVAL (pred1[1]) == SVE_KNOWN_PTRUE); + bool ptrue2_p = (pred2[0] == CONSTM1_RTX (mode) + || INTVAL (pred2[1]) == SVE_KNOWN_PTRUE); + return (ptrue1_p && ptrue2_p) || rtx_equal_p (pred1[0], pred2[0]); +} + +/* Emit a comparison CMP between OP0 and OP1, both of which have mode + DATA_MODE, and return the result in a predicate of mode PRED_MODE. + Use TARGET as the target register if nonnull and convenient. */ + +static rtx +aarch64_sve_emit_int_cmp (rtx target, machine_mode pred_mode, rtx_code cmp, + machine_mode data_mode, rtx op1, rtx op2) +{ + insn_code icode = code_for_aarch64_pred_cmp (cmp, data_mode); + expand_operand ops[5]; + create_output_operand (&ops[0], target, pred_mode); + create_input_operand (&ops[1], CONSTM1_RTX (pred_mode), pred_mode); + create_integer_operand (&ops[2], SVE_KNOWN_PTRUE); + create_input_operand (&ops[3], op1, data_mode); + create_input_operand (&ops[4], op2, data_mode); + expand_insn (icode, 5, ops); + return ops[0].value; +} + /* Use a comparison to
[committed][AArch64] Handle more SVE predicate constants
This patch handles more predicate constants by using TRN1, TRN2 and EOR. For now, only one operation is allowed before we fall back to loading from memory or doing an integer move and a compare. The EOR support includes the important special case of an inverted predicate. The real motivating case for this is the ACLE svdupq function, which allows a repeating 16-bit predicate to be built from individual scalar booleans. It's not easy to test properly before that support is merged. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274434. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/aarch64.c (aarch64_expand_sve_const_pred_eor) (aarch64_expand_sve_const_pred_trn): New functions. (aarch64_expand_sve_const_pred_1): Add a recurse_p parameter and use the above functions when the parameter is true. (aarch64_expand_sve_const_pred): Update call accordingly. * config/aarch64/aarch64-sve.md (*aarch64_sve_): Rename to... (@aarch64_sve_): ...this. gcc/testsuite/ * gcc.target/aarch64/sve/peel_ind_1.c: Look for an inverted .B VL1. * gcc.target/aarch64/sve/peel_ind_2.c: Likewise .S VL7. Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c2019-08-14 09:50:03.682705602 +0100 +++ gcc/config/aarch64/aarch64.c2019-08-14 09:52:02.893827778 +0100 @@ -3751,13 +3751,163 @@ aarch64_sve_move_pred_via_while (rtx tar return target; } +static rtx +aarch64_expand_sve_const_pred_1 (rtx, rtx_vector_builder &, bool); + +/* BUILDER is a constant predicate in which the index of every set bit + is a multiple of ELT_SIZE (which is <= 8). Try to load the constant + by inverting every element at a multiple of ELT_SIZE and EORing the + result with an ELT_SIZE PTRUE. + + Return a register that contains the constant on success, otherwise + return null. Use TARGET as the register if it is nonnull and + convenient. */ + +static rtx +aarch64_expand_sve_const_pred_eor (rtx target, rtx_vector_builder &builder, + unsigned int elt_size) +{ + /* Invert every element at a multiple of ELT_SIZE, keeping the + other bits zero. */ + rtx_vector_builder inv_builder (VNx16BImode, builder.npatterns (), + builder.nelts_per_pattern ()); + for (unsigned int i = 0; i < builder.encoded_nelts (); ++i) +if ((i & (elt_size - 1)) == 0 && INTVAL (builder.elt (i)) == 0) + inv_builder.quick_push (const1_rtx); +else + inv_builder.quick_push (const0_rtx); + inv_builder.finalize (); + + /* See if we can load the constant cheaply. */ + rtx inv = aarch64_expand_sve_const_pred_1 (NULL_RTX, inv_builder, false); + if (!inv) +return NULL_RTX; + + /* EOR the result with an ELT_SIZE PTRUE. */ + rtx mask = aarch64_ptrue_all (elt_size); + mask = force_reg (VNx16BImode, mask); + target = aarch64_target_reg (target, VNx16BImode); + emit_insn (gen_aarch64_pred_z (XOR, VNx16BImode, target, mask, inv, mask)); + return target; +} + +/* BUILDER is a constant predicate in which the index of every set bit + is a multiple of ELT_SIZE (which is <= 8). Try to load the constant + using a TRN1 of size PERMUTE_SIZE, which is >= ELT_SIZE. Return the + register on success, otherwise return null. Use TARGET as the register + if nonnull and convenient. */ + +static rtx +aarch64_expand_sve_const_pred_trn (rtx target, rtx_vector_builder &builder, + unsigned int elt_size, + unsigned int permute_size) +{ + /* We're going to split the constant into two new constants A and B, + with element I of BUILDER going into A if (I & PERMUTE_SIZE) == 0 + and into B otherwise. E.g. for PERMUTE_SIZE == 4 && ELT_SIZE == 1: + + A: { 0, 1, 2, 3, _, _, _, _, 8, 9, 10, 11, _, _, _, _ } + B: { 4, 5, 6, 7, _, _, _, _, 12, 13, 14, 15, _, _, _, _ } + + where _ indicates elements that will be discarded by the permute. + + First calculate the ELT_SIZEs for A and B. */ + unsigned int a_elt_size = GET_MODE_SIZE (DImode); + unsigned int b_elt_size = GET_MODE_SIZE (DImode); + for (unsigned int i = 0; i < builder.encoded_nelts (); i += elt_size) +if (INTVAL (builder.elt (i)) != 0) + { + if (i & permute_size) + b_elt_size |= i - permute_size; + else + a_elt_size |= i; + } + a_elt_size &= -a_elt_size; + b_elt_size &= -b_elt_size; + + /* Now construct the vectors themselves. */ + rtx_vector_builder a_builder (VNx16BImode, builder.npatterns (), + builder.nelts_per_pattern ()); + rtx_vector_builder b_builder (VNx16BImode, builder.npatterns (), + builder.nelts_per_pattern ()); + unsigned int nelts = builder.encoded_nelts (); + for (unsigned int i = 0; i < nelts; ++i) +
[committed][AArch64] Use SVE ADR to optimise shift-add sequences
This patch uses SVE ADR to optimise shift-and-add and uxtw-and-add sequences. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274436. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/predicates.md (const_1_to_3_operand): New predicate. * config/aarch64/aarch64-sve.md (*aarch64_adr_uxtw) (*aarch64_adr_shift, *aarch64_adr_shift_uxtw): New patterns. gcc/testsuite/ * gcc.target/aarch64/sve/adr_1.c: New test. * gcc.target/aarch64/sve/adr_1_run.c: Likewise. * gcc.target/aarch64/sve/adr_2.c: Likewise. * gcc.target/aarch64/sve/adr_2_run.c: Likewise. * gcc.target/aarch64/sve/adr_3.c: Likewise. * gcc.target/aarch64/sve/adr_3_run.c: Likewise. * gcc.target/aarch64/sve/adr_4.c: Likewise. * gcc.target/aarch64/sve/adr_4_run.c: Likewise. * gcc.target/aarch64/sve/adr_5.c: Likewise. * gcc.target/aarch64/sve/adr_5_run.c: Likewise. -- Index: gcc/config/aarch64/predicates.md === --- gcc/config/aarch64/predicates.md2019-08-14 09:15:57.617827961 +0100 +++ gcc/config/aarch64/predicates.md2019-08-14 09:56:55.323680943 +0100 @@ -39,6 +39,13 @@ (define_predicate "const0_operand" (and (match_code "const_int") (match_test "op == CONST0_RTX (mode)"))) +(define_predicate "const_1_to_3_operand" + (match_code "const_int,const_vector") +{ + op = unwrap_const_vec_duplicate (op); + return CONST_INT_P (op) && IN_RANGE (INTVAL (op), 1, 3); +}) + (define_special_predicate "subreg_lowpart_operator" (and (match_code "subreg") (match_test "subreg_lowpart_p (op)"))) @@ -595,6 +602,11 @@ (define_predicate "aarch64_sve_inc_dec_i (and (match_code "const,const_vector") (match_test "aarch64_sve_inc_dec_immediate_p (op)"))) +(define_predicate "aarch64_sve_uxtw_immediate" + (and (match_code "const_vector") + (match_test "GET_MODE_UNIT_BITSIZE (GET_MODE (op)) > 32") + (match_test "aarch64_const_vec_all_same_int_p (op, 0x)"))) + (define_predicate "aarch64_sve_logical_immediate" (and (match_code "const,const_vector") (match_test "aarch64_sve_bitmask_immediate_p (op)"))) Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:54:30.808741952 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:56:55.323680943 +0100 @@ -61,6 +61,7 @@ ;; [INT] General binary arithmetic corresponding to rtx codes ;; [INT] Addition ;; [INT] Subtraction +;; [INT] Take address ;; [INT] Absolute difference ;; [INT] Multiplication ;; [INT] Highpart multiplication @@ -1672,6 +1673,65 @@ (define_insn "sub3" ;; Merging forms are handled through SVE_INT_BINARY. ;; - +;; [INT] Take address +;; - +;; Includes: +;; - ADR +;; - + +;; Unshifted ADR, with the offset being zero-extended from the low 32 bits. +(define_insn "*aarch64_adr_uxtw" + [(set (match_operand:VNx2DI 0 "register_operand" "=w") + (plus:VNx2DI + (and:VNx2DI + (match_operand:VNx2DI 2 "register_operand" "w") + (match_operand:VNx2DI 3 "aarch64_sve_uxtw_immediate")) + (match_operand:VNx2DI 1 "register_operand" "w")))] + "TARGET_SVE" + "adr\t%0.d, [%1.d, %2.d, uxtw]" +) + +;; ADR with a nonzero shift. +(define_insn_and_rewrite "*aarch64_adr_shift" + [(set (match_operand:SVE_SDI 0 "register_operand" "=w") + (plus:SVE_SDI + (unspec:SVE_SDI + [(match_operand 4) +(ashift:SVE_SDI + (match_operand:SVE_SDI 2 "register_operand" "w") + (match_operand:SVE_SDI 3 "const_1_to_3_operand"))] + UNSPEC_PRED_X) + (match_operand:SVE_SDI 1 "register_operand" "w")))] + "TARGET_SVE" + "adr\t%0., [%1., %2., lsl %3]" + "&& !CONSTANT_P (operands[4])" + { +operands[4] = CONSTM1_RTX (mode); + } +) + +;; Same, but with the index being zero-extended from the low 32 bits. +(define_insn_and_rewrite "*aarch64_adr_shift_uxtw" + [(set (match_operand:VNx2DI 0 "register_operand" "=w") + (plus:VNx2DI + (unspec:VNx2DI + [(match_operand 5) +(ashift:VNx2DI + (and:VNx2DI +(match_operand:VNx2DI 2 "register_operand" "w") +(match_operand:VNx2DI 4 "aarch64_sve_uxtw_immediate")) + (match_operand:VNx2DI 3 "const_1_to_3_operand"))] + UNSPEC_PRED_X) + (match_operand:VNx2DI 1 "register_operand" "w")))] + "TARGET_SVE" + "adr\t%0.d, [%1.d, %2.d, uxtw %3]" + "&& !CONSTANT_P (operands[5])" +
[committed][AArch64] Add support for SVE CLS and CLZ
This patch adds support for unpredicated SVE CLS and CLZ. A later patch will add support for predicated unary integer arithmetic. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274437. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/iterators.md (SVE_INT_UNARY): Add clrsb and clz. (optab, sve_int_op): Handle them. * config/aarch64/aarch64-sve.md: Expand comment. gcc/testsuite/ * gcc.target/aarch64/vect-clz.c: Force SVE off. * gcc.target/aarch64/sve/clrsb_1.c: New test. * gcc.target/aarch64/sve/clrsb_1_run.c: Likewise. * gcc.target/aarch64/sve/clz_1.c: Likewise. * gcc.target/aarch64/sve/clz_1_run.c: Likewise. Index: gcc/config/aarch64/iterators.md === --- gcc/config/aarch64/iterators.md 2019-08-14 09:39:44.323282457 +0100 +++ gcc/config/aarch64/iterators.md 2019-08-14 10:00:45.485990851 +0100 @@ -1276,7 +1276,7 @@ (define_code_iterator UCOMPARISONS [ltu (define_code_iterator FAC_COMPARISONS [lt le ge gt]) ;; SVE integer unary operations. -(define_code_iterator SVE_INT_UNARY [abs neg not popcount]) +(define_code_iterator SVE_INT_UNARY [abs neg not clrsb clz popcount]) ;; SVE integer binary operations. (define_code_iterator SVE_INT_BINARY [plus minus mult smax umax smin umin @@ -1307,6 +1307,8 @@ (define_code_attr optab [(ashift "ashl") (unsigned_fix "fixuns") (float "float") (unsigned_float "floatuns") +(clrsb "clrsb") +(clz "clz") (popcount "popcount") (and "and") (ior "ior") @@ -1474,6 +1476,8 @@ (define_code_attr sve_int_op [(plus "add (ior "orr") (xor "eor") (not "not") + (clrsb "cls") + (clz "clz") (popcount "cnt")]) (define_code_attr sve_int_op_rev [(plus "add") Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 09:58:35.914942337 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 10:00:45.485990851 +0100 @@ -1422,6 +1422,8 @@ (define_expand "vec_extract" ;; - ;; Includes: ;; - ABS +;; - CLS (= clrsb) +;; - CLZ ;; - CNT (= popcount) ;; - NEG ;; - NOT Index: gcc/testsuite/gcc.target/aarch64/vect-clz.c === --- gcc/testsuite/gcc.target/aarch64/vect-clz.c 2019-03-08 18:14:30.068993639 + +++ gcc/testsuite/gcc.target/aarch64/vect-clz.c 2019-08-14 10:00:45.485990851 +0100 @@ -1,6 +1,8 @@ /* { dg-do run } */ /* { dg-options "-O3 -save-temps -fno-inline -fno-vect-cost-model" } */ +#pragma GCC target "+nosve" + extern void abort (); void Index: gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c === --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c 2019-08-14 10:00:45.485990851 +0100 @@ -0,0 +1,22 @@ +/* { dg-do assemble { target aarch64_asm_sve_ok } } */ +/* { dg-options "-O2 -ftree-vectorize --save-temps" } */ + +#include + +void __attribute__ ((noinline, noclone)) +clrsb_32 (unsigned int *restrict dst, uint32_t *restrict src, int size) +{ + for (int i = 0; i < size; ++i) +dst[i] = __builtin_clrsb (src[i]); +} + +void __attribute__ ((noinline, noclone)) +clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) +{ + for (int i = 0; i < size; ++i) +dst[i] = __builtin_clrsbll (src[i]); +} + +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/clrsb_1_run.c === --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/clrsb_1_run.c 2019-08-14 10:00:45.485990851 +0100 @@ -0,0 +1,50 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "clrsb_1.c" + +extern void abort (void) __attribute__ ((noreturn)); + +unsigned int data[] = { + 0xff80, 24, + 0x, 31, + 0x, 31, + 0x8000, 0, + 0x7fff, 0, + 0x03ff, 21, + 0x1fff, 2, + 0x, 15, + 0x, 15 +}; + +int __attribute__ ((optimize (1))) +main (void) +{ + unsigned int count = sizeof (data) / sizeof (data[0]) / 2; + + uint32_t i
[committed][AArch64] Add support for SVE CNOT
This patch adds support for predicated and unpredicated CNOT (logical NOT on integers). In RTL terms, this is a select between 1 and 0 in which the predicate is fed by a comparison with zero. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274438. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/predicates.md (aarch64_simd_imm_one): New predicate. * config/aarch64/aarch64-sve.md (*cnot): New pattern. (*cond_cnot_2, *cond_cnot_any): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/cnot_1.c: New test. * gcc.target/aarch64/sve/cond_cnot_1.c: Likewise. * gcc.target/aarch64/sve/cond_cnot_1_run.c: Likewise. * gcc.target/aarch64/sve/cond_cnot_2.c: Likewise. * gcc.target/aarch64/sve/cond_cnot_2_run.c: Likewise. * gcc.target/aarch64/sve/cond_cnot_3.c: Likewise. * gcc.target/aarch64/sve/cond_cnot_3_run.c: Likewise. Index: gcc/config/aarch64/predicates.md === --- gcc/config/aarch64/predicates.md2019-08-14 09:58:35.914942337 +0100 +++ gcc/config/aarch64/predicates.md2019-08-14 10:04:58.948129300 +0100 @@ -460,6 +460,10 @@ (define_predicate "aarch64_simd_imm_zero (and (match_code "const,const_vector") (match_test "op == CONST0_RTX (GET_MODE (op))"))) +(define_predicate "aarch64_simd_imm_one" + (and (match_code "const_vector") + (match_test "op == CONST1_RTX (GET_MODE (op))"))) + (define_predicate "aarch64_simd_or_scalar_imm_zero" (and (match_code "const_int,const_double,const,const_vector") (match_test "op == CONST0_RTX (GET_MODE (op))"))) Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 10:02:44.165119259 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 10:04:58.948129300 +0100 @@ -54,6 +54,7 @@ ;; ;; == Unary arithmetic ;; [INT] General unary arithmetic corresponding to rtx codes +;; [INT] Logical inverse ;; [FP] General unary arithmetic corresponding to unspecs ;; [PRED] Inverse @@ -1455,6 +1456,95 @@ (define_insn "*2" ) ;; - +;; [INT] Logical inverse +;; - + +;; Predicated logical inverse. +(define_insn "*cnot" + [(set (match_operand:SVE_I 0 "register_operand" "=w") + (unspec:SVE_I + [(unspec: +[(match_operand: 1 "register_operand" "Upl") + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (eq: + (match_operand:SVE_I 2 "register_operand" "w") + (match_operand:SVE_I 3 "aarch64_simd_imm_zero"))] +UNSPEC_PRED_Z) + (match_operand:SVE_I 4 "aarch64_simd_imm_one") + (match_dup 3)] + UNSPEC_SEL))] + "TARGET_SVE" + "cnot\t%0., %1/m, %2." +) + +;; Predicated logical inverse, merging with the first input. +(define_insn_and_rewrite "*cond_cnot_2" + [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w") + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl, Upl") + ;; Logical inverse of operand 2 (as above). + (unspec:SVE_I +[(unspec: + [(match_operand 5) +(const_int SVE_KNOWN_PTRUE) +(eq: + (match_operand:SVE_I 2 "register_operand" "0, w") + (match_operand:SVE_I 3 "aarch64_simd_imm_zero"))] + UNSPEC_PRED_Z) + (match_operand:SVE_I 4 "aarch64_simd_imm_one") + (match_dup 3)] +UNSPEC_SEL) + (match_dup 2)] + UNSPEC_SEL))] + "TARGET_SVE" + "@ + cnot\t%0., %1/m, %0. + movprfx\t%0, %2\;cnot\t%0., %1/m, %2." + "&& !CONSTANT_P (operands[5])" + { +operands[5] = CONSTM1_RTX (mode); + } + [(set_attr "movprfx" "*,yes")] +) + +;; Predicated logical inverse, merging with an independent value. +;; +;; The earlyclobber isn't needed for the first alternative, but omitting +;; it would only help the case in which operands 2 and 6 are the same, +;; which is handled above rather than here. Marking all the alternatives +;; as earlyclobber helps to make the instruction more regular to the +;; register allocator. +(define_insn_and_rewrite "*cond_cnot_any" + [(set (match_operand:SVE_I 0 "register_operand" "=&w, ?&w, ?&w") + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl, Upl, Upl") + ;; Logical inverse of operand 2 (as above). + (unspec:SVE_I +[(unspec: + [(match_operand 5) +(const_int SVE_KNOWN_PTRUE) +(eq: + (match_operand:SVE_I 2 "register_operand" "w, w, w") + (match_operand:SVE_I 3 "aarch64_simd_imm_zero"))] + UNSPEC_PRED_Z) + (m
Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs
On Tue, 13 Aug 2019, Jeff Law wrote: > On 8/9/19 7:00 AM, Richard Biener wrote: > > > > It fixes the slowdown observed in 416.gamess and 464.h264ref. > > > > Bootstrapped on x86_64-unknown-linux-gnu, testing still in progress. > > > > CCing Jeff who "knows RTL". > What specifically do you want me to look at? I'm not really familiar > with the STV stuff, but can certainly take a peek. Below is the updated patch with the already approved and committed parts taken out. It is not mostly mechanical apart from the make_vector_copies and convert_reg changes which move existing "patterns" under appropriate conditionals and adds handling of the case where the scalar mode fits in a single GPR (previously it was -m32 DImode only, now it handles -m32/-m64 SImode and DImode). I'm redoing bootstrap / regtest on x86_64-unknown-linux-gnu now just to be safe. OK? I do expect we need to work on the compile-time issue I placed ??? comments on and more generally try to avoid using DF so much. Thanks, Richard. 2019-08-13 Richard Biener PR target/91154 * config/i386/i386-features.h (scalar_chain::scalar_chain): Add mode arguments. (scalar_chain::smode): New member. (scalar_chain::vmode): Likewise. (dimode_scalar_chain): Rename to... (general_scalar_chain): ... this. (general_scalar_chain::general_scalar_chain): Take mode arguments. (timode_scalar_chain::timode_scalar_chain): Initialize scalar_chain base with TImode and V1TImode. * config/i386/i386-features.c (scalar_chain::scalar_chain): Adjust. (general_scalar_chain::vector_const_cost): Adjust for SImode chains. (general_scalar_chain::compute_convert_gain): Likewise. Add {S,U}{MIN,MAX} support. (general_scalar_chain::replace_with_subreg): Use vmode/smode. (general_scalar_chain::make_vector_copies): Likewise. Handle non-DImode chains appropriately. (general_scalar_chain::convert_reg): Likewise. (general_scalar_chain::convert_op): Likewise. (general_scalar_chain::convert_insn): Likewise. Add fatal_insn_not_found if the result is not recognized. (convertible_comparison_p): Pass in the scalar mode and use that. (general_scalar_to_vector_candidate_p): Likewise. Rename from dimode_scalar_to_vector_candidate_p. Add {S,U}{MIN,MAX} support. (scalar_to_vector_candidate_p): Remove by inlining into single caller. (general_remove_non_convertible_regs): Rename from dimode_remove_non_convertible_regs. (remove_non_convertible_regs): Remove by inlining into single caller. (convert_scalars_to_vector): Handle SImode and DImode chains in addition to TImode chains. * config/i386/i386.md (3): New expander. (*3_1): New insn-and-split. (*di3_doubleword): Likewise. * gcc.target/i386/pr91154.c: New testcase. * gcc.target/i386/minmax-3.c: Likewise. * gcc.target/i386/minmax-4.c: Likewise. * gcc.target/i386/minmax-5.c: Likewise. * gcc.target/i386/minmax-6.c: Likewise. * gcc.target/i386/minmax-1.c: Add -mno-stv. * gcc.target/i386/minmax-2.c: Likewise. Index: gcc/config/i386/i386-features.c === --- gcc/config/i386/i386-features.c (revision 274422) +++ gcc/config/i386/i386-features.c (working copy) @@ -276,8 +276,11 @@ unsigned scalar_chain::max_id = 0; /* Initialize new chain. */ -scalar_chain::scalar_chain () +scalar_chain::scalar_chain (enum machine_mode smode_, enum machine_mode vmode_) { + smode = smode_; + vmode = vmode_; + chain_id = ++max_id; if (dump_file) @@ -319,7 +322,7 @@ scalar_chain::add_to_queue (unsigned ins conversion. */ void -dimode_scalar_chain::mark_dual_mode_def (df_ref def) +general_scalar_chain::mark_dual_mode_def (df_ref def) { gcc_assert (DF_REF_REG_DEF_P (def)); @@ -409,6 +412,9 @@ scalar_chain::add_insn (bitmap candidate && !HARD_REGISTER_P (SET_DEST (def_set))) bitmap_set_bit (defs, REGNO (SET_DEST (def_set))); + /* ??? The following is quadratic since analyze_register_chain + iterates over all refs to look for dual-mode regs. Instead this + should be done separately for all regs mentioned in the chain once. */ df_ref ref; df_ref def; for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) @@ -469,19 +475,21 @@ scalar_chain::build (bitmap candidates, instead of using a scalar one. */ int -dimode_scalar_chain::vector_const_cost (rtx exp) +general_scalar_chain::vector_const_cost (rtx exp) { gcc_assert (CONST_INT_P (exp)); - if (standard_sse_constant_p (exp, V2DImode)) -return COSTS_N_INSNS (1); - return ix86_cost->sse_load[1]; + if (standard_sse_constant_p (exp, vmode)) +return ix86_cost->sse_op; + /* We have separate costs for SImode and DImode, us
[committed][AArch64] Add support for SVE [SU]{MAX,MIN} immediate
This patch adds support for the immediate forms of SVE SMAX, SMIN, UMAX and UMIN. SMAX and SMIN take the same range as MUL, so the patch basically just moves and generalises the existing MUL patterns. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274439. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/constraints.md (vsb): New constraint. (vsm): Generalize description. * config/aarch64/iterators.md (SVE_INT_BINARY_IMM): New code iterator. (sve_imm_con): Handle smax, smin, umax and umin. (sve_imm_prefix): New code attribute. * config/aarch64/predicates.md (aarch64_sve_vsb_immediate) (aarch64_sve_vsb_operand): New predicates. (aarch64_sve_mul_immediate): Rename to... (aarch64_sve_vsm_immediate): ...this. (aarch64_sve_mul_operand): Rename to... (aarch64_sve_vsm_operand): ...this. * config/aarch64/aarch64-sve.md (mul3): Generalize to... (3): ...this. (*mul3, *post_ra_mul3): Generalize to... (*3) (*post_ra_3): ...these and add movprfx support for the immediate alternatives. (3, *3): Delete in favor of the above. (*3): Fix incorrect predicate for operand 3. gcc/testsuite/ * gcc.target/aarch64/sve/smax_1.c: New test. * gcc.target/aarch64/sve/smin_1.c: Likewise. * gcc.target/aarch64/sve/umax_1.c: Likewise. * gcc.target/aarch64/sve/umin_1.c: Likewise. Index: gcc/config/aarch64/constraints.md === --- gcc/config/aarch64/constraints.md 2019-08-13 11:39:54.753376024 +0100 +++ gcc/config/aarch64/constraints.md 2019-08-14 10:08:03.446774020 +0100 @@ -388,6 +388,12 @@ (define_constraint "vsa" arithmetic instructions." (match_operand 0 "aarch64_sve_arith_immediate")) +(define_constraint "vsb" + "@internal + A constraint that matches an immediate operand valid for SVE UMAX + and UMIN operations." + (match_operand 0 "aarch64_sve_vsb_immediate")) + (define_constraint "vsc" "@internal A constraint that matches a signed immediate operand valid for SVE @@ -420,9 +426,9 @@ (define_constraint "vsl" (define_constraint "vsm" "@internal - A constraint that matches an immediate operand valid for SVE MUL - operations." - (match_operand 0 "aarch64_sve_mul_immediate")) + A constraint that matches an immediate operand valid for SVE MUL, + SMAX and SMIN operations." + (match_operand 0 "aarch64_sve_vsm_immediate")) (define_constraint "vsA" "@internal Index: gcc/config/aarch64/iterators.md === --- gcc/config/aarch64/iterators.md 2019-08-14 10:02:44.165119259 +0100 +++ gcc/config/aarch64/iterators.md 2019-08-14 10:08:03.446774020 +0100 @@ -1285,6 +1285,9 @@ (define_code_iterator SVE_INT_BINARY [pl ;; SVE integer binary division operations. (define_code_iterator SVE_INT_BINARY_SD [div udiv]) +;; SVE integer binary operations that have an immediate form. +(define_code_iterator SVE_INT_BINARY_IMM [mult smax smin umax umin]) + ;; SVE floating-point operations with an unpredicated all-register form. (define_code_iterator SVE_UNPRED_FP_BINARY [plus minus mult]) @@ -1499,7 +1502,12 @@ (define_code_attr sve_fp_op [(plus "fadd (mult "fmul")]) ;; The SVE immediate constraint to use for an rtl code. -(define_code_attr sve_imm_con [(eq "vsc") +(define_code_attr sve_imm_con [(mult "vsm") + (smax "vsm") + (smin "vsm") + (umax "vsb") + (umin "vsb") + (eq "vsc") (ne "vsc") (lt "vsc") (ge "vsc") @@ -1510,6 +1518,13 @@ (define_code_attr sve_imm_con [(eq "vsc" (geu "vsd") (gtu "vsd")]) +;; The prefix letter to use when printing an immediate operand. +(define_code_attr sve_imm_prefix [(mult "") + (smax "") + (smin "") + (umax "D") + (umin "D")]) + ;; --- ;; Int Iterators. ;; --- Index: gcc/config/aarch64/predicates.md === --- gcc/config/aarch64/predicates.md2019-08-14 10:06:06.331634340 +0100 +++ gcc/config/aarch64/predicates.md2019-08-14 10:08:03.446774020 +0100 @@ -615,7 +615,15 @@ (define_predicate "aarch64_sve_logical_i (and (match_code "const,const_vector") (match_test "aarch64_sve_bitmask_immediate_p (op)"))) -(define_predicate "aarch64_sve_mul
[committed][AArch64] Add support for SVE F{MAX,MIN}NM immediate
This patch uses the immediate forms of FMAXNM and FMINNM for unconditional arithmetic. The same rules apply to FMAX and FMIN, but we only generate those via the ACLE. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274440. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/predicates.md (aarch64_sve_float_maxmin_immediate) (aarch64_sve_float_maxmin_operand): New predicates. * config/aarch64/constraints.md (vsB): New constraint. (vsM): Fix typo. * config/aarch64/iterators.md (sve_pred_fp_rhs2_operand): Use aarch64_sve_float_maxmin_operand for UNSPEC_COND_FMAXNM and UNSPEC_COND_FMINNM. * config/aarch64/aarch64-sve.md (3): Use aarch64_sve_float_maxmin_operand for operand 2. (*3): Likewise. Add alternatives for the constant forms. gcc/testsuite/ * gcc.target/aarch64/sve/fmaxnm_1.c: New test. * gcc.target/aarch64/sve/fminnm_1.c: Likewise. Index: gcc/config/aarch64/predicates.md === --- gcc/config/aarch64/predicates.md2019-08-14 10:10:02.497900721 +0100 +++ gcc/config/aarch64/predicates.md2019-08-14 10:12:12.864944397 +0100 @@ -655,6 +655,11 @@ (define_predicate "aarch64_sve_float_mul (and (match_code "const,const_vector") (match_test "aarch64_sve_float_mul_immediate_p (op)"))) +(define_predicate "aarch64_sve_float_maxmin_immediate" + (and (match_code "const_vector") + (ior (match_test "op == CONST0_RTX (GET_MODE (op))") + (match_test "op == CONST1_RTX (GET_MODE (op))" + (define_predicate "aarch64_sve_arith_operand" (ior (match_operand 0 "register_operand") (match_operand 0 "aarch64_sve_arith_immediate"))) @@ -708,6 +713,10 @@ (define_predicate "aarch64_sve_float_mul (ior (match_operand 0 "register_operand") (match_operand 0 "aarch64_sve_float_mul_immediate"))) +(define_predicate "aarch64_sve_float_maxmin_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "aarch64_sve_float_maxmin_immediate"))) + (define_predicate "aarch64_sve_vec_perm_operand" (ior (match_operand 0 "register_operand") (match_operand 0 "aarch64_constant_vector_operand"))) Index: gcc/config/aarch64/constraints.md === --- gcc/config/aarch64/constraints.md 2019-08-14 10:10:02.497900721 +0100 +++ gcc/config/aarch64/constraints.md 2019-08-14 10:12:12.864944397 +0100 @@ -436,9 +436,16 @@ (define_constraint "vsA" and FSUB operations." (match_operand 0 "aarch64_sve_float_arith_immediate")) +;; "B" for "bound". +(define_constraint "vsB" + "@internal + A constraint that matches an immediate operand valid for SVE FMAX + and FMIN operations." + (match_operand 0 "aarch64_sve_float_maxmin_immediate")) + (define_constraint "vsM" "@internal - A constraint that matches an imediate operand valid for SVE FMUL + A constraint that matches an immediate operand valid for SVE FMUL operations." (match_operand 0 "aarch64_sve_float_mul_immediate")) Index: gcc/config/aarch64/iterators.md === --- gcc/config/aarch64/iterators.md 2019-08-14 10:10:02.497900721 +0100 +++ gcc/config/aarch64/iterators.md 2019-08-14 10:12:12.864944397 +0100 @@ -2075,7 +2075,7 @@ (define_int_attr sve_pred_fp_rhs1_operan (define_int_attr sve_pred_fp_rhs2_operand [(UNSPEC_COND_FADD "aarch64_sve_float_arith_with_sub_operand") (UNSPEC_COND_FDIV "register_operand") - (UNSPEC_COND_FMAXNM "register_operand") - (UNSPEC_COND_FMINNM "register_operand") + (UNSPEC_COND_FMAXNM "aarch64_sve_float_maxmin_operand") + (UNSPEC_COND_FMINNM "aarch64_sve_float_maxmin_operand") (UNSPEC_COND_FMUL "aarch64_sve_float_mul_operand") (UNSPEC_COND_FSUB "register_operand")]) Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 10:10:02.497900721 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 10:12:12.864944397 +0100 @@ -2604,7 +2604,7 @@ (define_expand "3" [(match_dup 3) (const_int SVE_RELAXED_GP) (match_operand:SVE_F 1 "register_operand") - (match_operand:SVE_F 2 "register_operand")] + (match_operand:SVE_F 2 "aarch64_sve_float_maxmin_operand")] SVE_COND_FP_MAXMIN_PUBLIC))] "TARGET_SVE" { @@ -2614,18 +2614,20 @@ (define_expand "3" ;; Predicated floating-point maximum/minimum. (define_insn "*3" - [(set (match_operand:SVE_F 0 "register_operand" "=w, ?&w") + [(set (match_operand:SVE_F 0 "register_operand" "=w, w, ?&w, ?&w") (unspec:SVE_F - [(match_operand: 1 "register_operand" "Upl, Upl") + [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl") (match_operand:SI 4 "aarch64_sve_gp_strictness") -
[committed][AArch64] Make more use of SVE conditional constant moves
This patch extends the SVE UNSPEC_SEL patterns so that they can use: (1) MOV /M of a duplicated integer constant (2) MOV /M of a duplicated floating-point constant bitcast to an integer, accepting the same constants as (1) (3) FMOV /M of a duplicated floating-point constant (4) MOV /Z of a duplicated integer constant (5) MOV /Z of a duplicated floating-point constant bitcast to an integer, accepting the same constants as (4) (6) MOVPRFXed FMOV /M of a duplicated floating-point constant We already handled (4) with a special pattern; the rest are new. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274441. Richard 2019-08-14 Richard Sandiford Kugan Vivekanandarajah gcc/ * config/aarch64/aarch64.c (aarch64_bit_representation): New function. (aarch64_print_vector_float_operand): Also handle 8-bit floats. (aarch64_print_operand): Add support for %I. (aarch64_sve_dup_immediate_p): Handle scalars as well as vectors. Bitcast floating-point constants to the corresponding integer constant. (aarch64_float_const_representable_p): Handle vectors as well as scalars. (aarch64_expand_sve_vcond): Make sure that the operands are valid for the new vcond_mask_ expander. * config/aarch64/predicates.md (aarch64_sve_dup_immediate): Also test aarch64_float_const_representable_p. (aarch64_sve_reg_or_dup_imm): New predicate. * config/aarch64/aarch64-sve.md (vec_extract): Use gen_vcond_mask_ instead of gen_aarch64_sve_dup_const. (vcond_mask_): Turn into a define_expand that accepts aarch64_sve_reg_or_dup_imm and aarch64_simd_reg_or_zero for operands 1 and 2 respectively. Force operand 2 into a register if operand 1 is a register. Fold old define_insn... (aarch64_sve_dup_const): ...and this define_insn... (*vcond_mask_): ...into this new pattern. Handle floating-point constants that can be moved as integers. Add alternatives for MOV /M and FMOV /M. (vcond, vcondu) (vcond): Accept nonmemory_operand for operands 1 and 2 respectively. * config/aarch64/constraints.md (Ufc): Handle vectors as well as scalars. (vss): New constraint. gcc/testsuite/ * gcc.target/aarch64/sve/vcond_18.c: New test. * gcc.target/aarch64/sve/vcond_18_run.c: Likewise. * gcc.target/aarch64/sve/vcond_19.c: Likewise. * gcc.target/aarch64/sve/vcond_19_run.c: Likewise. * gcc.target/aarch64/sve/vcond_20.c: Likewise. * gcc.target/aarch64/sve/vcond_20_run.c: Likewise. Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c2019-08-14 09:54:30.816741891 +0100 +++ gcc/config/aarch64/aarch64.c2019-08-14 10:16:30.671052843 +0100 @@ -1482,6 +1482,16 @@ aarch64_dbx_register_number (unsigned re return DWARF_FRAME_REGISTERS; } +/* If X is a CONST_DOUBLE, return its bit representation as a constant + integer, otherwise return X unmodified. */ +static rtx +aarch64_bit_representation (rtx x) +{ + if (CONST_DOUBLE_P (x)) +x = gen_lowpart (int_mode_for_mode (GET_MODE (x)).require (), x); + return x; +} + /* Return true if MODE is any of the Advanced SIMD structure modes. */ static bool aarch64_advsimd_struct_mode_p (machine_mode mode) @@ -8275,7 +8285,8 @@ aarch64_print_vector_float_operand (FILE if (negate) r = real_value_negate (&r); - /* We only handle the SVE single-bit immediates here. */ + /* Handle the SVE single-bit immediates specially, since they have a + fixed form in the assembly syntax. */ if (real_equal (&r, &dconst0)) asm_fprintf (f, "0.0"); else if (real_equal (&r, &dconst1)) @@ -8283,7 +8294,13 @@ aarch64_print_vector_float_operand (FILE else if (real_equal (&r, &dconsthalf)) asm_fprintf (f, "0.5"); else -return false; +{ + const int buf_size = 20; + char float_buf[buf_size] = {'\0'}; + real_to_decimal_for_mode (float_buf, &r, buf_size, buf_size, + 1, GET_MODE (elt)); + asm_fprintf (f, "%s", float_buf); +} return true; } @@ -8312,6 +8329,11 @@ sizetochar (int size) and print it as an unsigned integer, in decimal. 'e': Print the sign/zero-extend size as a character 8->b, 16->h, 32->w. + 'I': If the operand is a duplicated vector constant, + replace it with the duplicated scalar. If the + operand is then a floating-point constant, replace + it with the integer bit representation. Print the + transformed constant as a signed decimal number. 'p': Prints N such that 2^N == X (X must be power of 2 and
[committed][AArch64] Use SVE MOV /M of scalars
This patch uses MOV /M to optimise selects between a duplicated scalar variable and a vector. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274442. Richard 2019-08-14 Richard Sandiford Kugan Vivekanandarajah gcc/ * config/aarch64/aarch64-sve.md (*aarch64_sel_dup): New pattern. gcc/testsuite/ * g++.target/aarch64/sve/dup_sel_1.C: New test. * g++.target/aarch64/sve/dup_sel_2.C: Likewise. * g++.target/aarch64/sve/dup_sel_3.C: Likewise. * g++.target/aarch64/sve/dup_sel_4.C: Likewise. * g++.target/aarch64/sve/dup_sel_5.C: Likewise. * g++.target/aarch64/sve/dup_sel_6.C: Likewise. Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 10:18:10.634319267 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 10:20:21.241360707 +0100 @@ -3070,6 +3070,29 @@ (define_insn "*vcond_mask_" [(set_attr "movprfx" "*,*,*,*,yes,yes,yes")] ) +;; Optimize selects between a duplicated scalar variable and another vector, +;; the latter of which can be a zero constant or a variable. Treat duplicates +;; of GPRs as being more expensive than duplicates of FPRs, since they +;; involve a cross-file move. +(define_insn "*aarch64_sel_dup" + [(set (match_operand:SVE_ALL 0 "register_operand" "=?w, w, ??w, ?&w, ??&w, ?&w") + (unspec:SVE_ALL + [(match_operand: 3 "register_operand" "Upa, Upa, Upl, Upl, Upl, Upl") + (vec_duplicate:SVE_ALL +(match_operand: 1 "register_operand" "r, w, r, w, r, w")) + (match_operand:SVE_ALL 2 "aarch64_simd_reg_or_zero" "0, 0, Dz, Dz, w, w")] + UNSPEC_SEL))] + "TARGET_SVE" + "@ + mov\t%0., %3/m, %1 + mov\t%0., %3/m, %1 + movprfx\t%0., %3/z, %0.\;mov\t%0., %3/m, %1 + movprfx\t%0., %3/z, %0.\;mov\t%0., %3/m, %1 + movprfx\t%0, %2\;mov\t%0., %3/m, %1 + movprfx\t%0, %2\;mov\t%0., %3/m, %1" + [(set_attr "movprfx" "*,*,yes,yes,yes,yes")] +) + ;; - ;; [INT,FP] Compare and select ;; - Index: gcc/testsuite/g++.target/aarch64/sve/dup_sel_1.C === --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/g++.target/aarch64/sve/dup_sel_1.C2019-08-14 10:20:21.245360681 +0100 @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msve-vector-bits=256" } */ + +#include + +typedef int32_t vnx4si __attribute__((vector_size(32))); + +void +foo (int32_t val) +{ + register vnx4si x asm ("z0"); + register vnx4si y asm ("z0"); + asm volatile ("" : "=w" (y)); + val += 1; + vnx4si z = { val, val, val, val, val, val, val, val }; + x = (vnx4si) { -1, 0, 0, -1, 0, -1, 0, -1 } ? z : y; + asm volatile ("" :: "w" (x)); +} + +/* { dg-final { scan-assembler {\tmov\tz0\.s, p[0-7]/m, w[0-9]+\n} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ Index: gcc/testsuite/g++.target/aarch64/sve/dup_sel_2.C === --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/g++.target/aarch64/sve/dup_sel_2.C2019-08-14 10:20:21.245360681 +0100 @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msve-vector-bits=256" } */ + +#include + +typedef int32_t vnx4si __attribute__((vector_size(32))); + +void +foo (int32_t val) +{ + register vnx4si x asm ("z0"); + register vnx4si y asm ("z1"); + asm volatile ("" : "=w" (y)); + val += 1; + vnx4si z = { val, val, val, val, val, val, val, val }; + x = (vnx4si) { -1, 0, 0, -1, 0, -1, 0, -1 } ? z : y; + asm volatile ("" :: "w" (x)); +} + +/* { dg-final { scan-assembler {\tmovprfx\tz0, z1\n\tmov\tz0\.s, p[0-7]/m, w[0-9]+\n} } } */ Index: gcc/testsuite/g++.target/aarch64/sve/dup_sel_3.C === --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/g++.target/aarch64/sve/dup_sel_3.C2019-08-14 10:20:21.245360681 +0100 @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msve-vector-bits=256" } */ + +#include + +typedef int32_t vnx4si __attribute__((vector_size(32))); +typedef float vnx4sf __attribute__((vector_size(32))); + +void +foo (float val) +{ + register vnx4sf x asm ("z0"); + register vnx4sf y asm ("z0"); + asm volatile ("" : "=w" (y)); + vnx4sf z = { val, val, val, val, val, val, val, val }; + x = (vnx4si) { -1, 0, 0, -1, 0, -1, 0, -1 } ? z : y; + asm volatile ("" :: "w" (x)); +} + +/* { dg-final { scan-assembler {\tmov\tz0\.s, p[0-7]/m, s[0-9]+\n} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ Index: gcc/testsuite/g++.target/aarch64/sve/dup_sel_4.C === --- /dev/null 2019-07-30 08:53:31.3
Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs
On Wed, Aug 14, 2019 at 11:08 AM Richard Biener wrote: > > On Tue, 13 Aug 2019, Jeff Law wrote: > > > On 8/9/19 7:00 AM, Richard Biener wrote: > > > > > > It fixes the slowdown observed in 416.gamess and 464.h264ref. > > > > > > Bootstrapped on x86_64-unknown-linux-gnu, testing still in progress. > > > > > > CCing Jeff who "knows RTL". > > What specifically do you want me to look at? I'm not really familiar > > with the STV stuff, but can certainly take a peek. > > Below is the updated patch with the already approved and committed > parts taken out. It is not mostly mechanical apart from the > make_vector_copies and convert_reg changes which move existing > "patterns" under appropriate conditionals and adds handling of the > case where the scalar mode fits in a single GPR (previously it > was -m32 DImode only, now it handles -m32/-m64 SImode and DImode). > > I'm redoing bootstrap / regtest on x86_64-unknown-linux-gnu now just > to be safe. > > OK? > > I do expect we need to work on the compile-time issue I placed ??? > comments on and more generally try to avoid using DF so much. > > Thanks, > Richard. > > 2019-08-13 Richard Biener > > PR target/91154 > * config/i386/i386-features.h (scalar_chain::scalar_chain): Add > mode arguments. > (scalar_chain::smode): New member. > (scalar_chain::vmode): Likewise. > (dimode_scalar_chain): Rename to... > (general_scalar_chain): ... this. > (general_scalar_chain::general_scalar_chain): Take mode arguments. > (timode_scalar_chain::timode_scalar_chain): Initialize scalar_chain > base with TImode and V1TImode. > * config/i386/i386-features.c (scalar_chain::scalar_chain): Adjust. > (general_scalar_chain::vector_const_cost): Adjust for SImode > chains. > (general_scalar_chain::compute_convert_gain): Likewise. Add > {S,U}{MIN,MAX} support. > (general_scalar_chain::replace_with_subreg): Use vmode/smode. > (general_scalar_chain::make_vector_copies): Likewise. Handle > non-DImode chains appropriately. > (general_scalar_chain::convert_reg): Likewise. > (general_scalar_chain::convert_op): Likewise. > (general_scalar_chain::convert_insn): Likewise. Add > fatal_insn_not_found if the result is not recognized. > (convertible_comparison_p): Pass in the scalar mode and use that. > (general_scalar_to_vector_candidate_p): Likewise. Rename from > dimode_scalar_to_vector_candidate_p. Add {S,U}{MIN,MAX} support. > (scalar_to_vector_candidate_p): Remove by inlining into single > caller. > (general_remove_non_convertible_regs): Rename from > dimode_remove_non_convertible_regs. > (remove_non_convertible_regs): Remove by inlining into single caller. > (convert_scalars_to_vector): Handle SImode and DImode chains > in addition to TImode chains. > * config/i386/i386.md (3): New expander. > (*3_1): New insn-and-split. > (*di3_doubleword): Likewise. > > * gcc.target/i386/pr91154.c: New testcase. > * gcc.target/i386/minmax-3.c: Likewise. > * gcc.target/i386/minmax-4.c: Likewise. > * gcc.target/i386/minmax-5.c: Likewise. > * gcc.target/i386/minmax-6.c: Likewise. > * gcc.target/i386/minmax-1.c: Add -mno-stv. > * gcc.target/i386/minmax-2.c: Likewise. OK. Thanks, Uros. > Index: gcc/config/i386/i386-features.c > === > --- gcc/config/i386/i386-features.c (revision 274422) > +++ gcc/config/i386/i386-features.c (working copy) > @@ -276,8 +276,11 @@ unsigned scalar_chain::max_id = 0; > > /* Initialize new chain. */ > > -scalar_chain::scalar_chain () > +scalar_chain::scalar_chain (enum machine_mode smode_, enum machine_mode > vmode_) > { > + smode = smode_; > + vmode = vmode_; > + >chain_id = ++max_id; > > if (dump_file) > @@ -319,7 +322,7 @@ scalar_chain::add_to_queue (unsigned ins > conversion. */ > > void > -dimode_scalar_chain::mark_dual_mode_def (df_ref def) > +general_scalar_chain::mark_dual_mode_def (df_ref def) > { >gcc_assert (DF_REF_REG_DEF_P (def)); > > @@ -409,6 +412,9 @@ scalar_chain::add_insn (bitmap candidate >&& !HARD_REGISTER_P (SET_DEST (def_set))) > bitmap_set_bit (defs, REGNO (SET_DEST (def_set))); > > + /* ??? The following is quadratic since analyze_register_chain > + iterates over all refs to look for dual-mode regs. Instead this > + should be done separately for all regs mentioned in the chain once. */ >df_ref ref; >df_ref def; >for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) > @@ -469,19 +475,21 @@ scalar_chain::build (bitmap candidates, > instead of using a scalar one. */ > > int > -dimode_scalar_chain::vector_const_cost (rtx exp) > +general_scalar_chain::vector_const_cost (rtx
[committed][AArch64] Add support for SVE absolute comparisons
This patch adds support for floating-point absolute comparisons FACLT and FACLE (aliased as FACGT and FACGE with swapped operands). Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274443. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/iterators.md (SVE_COND_FP_ABS_CMP): New iterator. * config/aarch64/aarch64-sve.md (*aarch64_pred_fac): New pattern. gcc/testsuite/ * gcc.target/aarch64/sve/vcond_21.c: New test. * gcc.target/aarch64/sve/vcond_21_run.c: Likewise. Index: gcc/config/aarch64/iterators.md === --- gcc/config/aarch64/iterators.md 2019-08-14 10:14:27.899953691 +0100 +++ gcc/config/aarch64/iterators.md 2019-08-14 10:24:53.211364279 +0100 @@ -1709,6 +1709,11 @@ (define_int_iterator SVE_COND_FP_CMP_I0 UNSPEC_COND_FCMLT UNSPEC_COND_FCMNE]) +(define_int_iterator SVE_COND_FP_ABS_CMP [UNSPEC_COND_FCMGE + UNSPEC_COND_FCMGT + UNSPEC_COND_FCMLE + UNSPEC_COND_FCMLT]) + (define_int_iterator FCADD [UNSPEC_FCADD90 UNSPEC_FCADD270]) Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 10:22:19.524492496 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 10:24:53.211364279 +0100 @@ -94,7 +94,8 @@ ;; [INT,FP] Compare and select ;; [INT] Comparisons ;; [INT] While tests -;; [FP] Comparisons +;; [FP] Direct comparisons +;; [FP] Absolute comparisons ;; [PRED] Test bits ;; ;; == Reductions @@ -3364,7 +3365,7 @@ (define_insn_and_rewrite "*while_ult_and ) ;; - +;; [FP] Absolute comparisons +;; - +;; Includes: +;; - FACGE +;; - FACGT +;; - FACLE +;; - FACLT +;; - + +;; Predicated floating-point absolute comparisons. +(define_insn_and_rewrite "*aarch64_pred_fac" + [(set (match_operand: 0 "register_operand" "=Upa") + (unspec: + [(match_operand: 1 "register_operand" "Upl") + (match_operand:SI 4 "aarch64_sve_ptrue_flag") + (unspec:SVE_F +[(match_operand 5) + (match_operand:SI 6 "aarch64_sve_gp_strictness") + (match_operand:SVE_F 2 "register_operand" "w")] +UNSPEC_COND_FABS) + (unspec:SVE_F +[(match_operand 7) + (match_operand:SI 8 "aarch64_sve_gp_strictness") + (match_operand:SVE_F 3 "register_operand" "w")] +UNSPEC_COND_FABS)] + SVE_COND_FP_ABS_CMP))] + "TARGET_SVE + && aarch64_sve_pred_dominates_p (&operands[5], operands[1]) + && aarch64_sve_pred_dominates_p (&operands[7], operands[1])" + "fac\t%0., %1/z, %2., %3." + "&& (!rtx_equal_p (operands[1], operands[5]) + || !rtx_equal_p (operands[1], operands[7]))" + { +operands[5] = copy_rtx (operands[1]); +operands[7] = copy_rtx (operands[1]); + } +) + +;; - ;; [PRED] Test bits ;; - ;; Includes: Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_21.c === --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_21.c 2019-08-14 10:24:53.211364279 +0100 @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define DEF_LOOP(TYPE, ABS, NAME, OP) \ + void \ + test_##TYPE##_##NAME (TYPE *restrict r, \ + TYPE *restrict a, \ + TYPE *restrict b, int n)\ + {\ +for (int i = 0; i < n; ++i)\ + r[i] = ABS (a[i]) OP ABS (b[i]) ? 1.0 : 0.0; \ + } + +#define TEST_TYPE(T, TYPE, ABS)\ + T (TYPE, ABS, lt, <) \ + T (TYPE, ABS, le, <=)\ + T (TYPE, ABS, ge, >=)\ + T (TYPE, ABS, gt, >) + +#define TEST_ALL(T)\ + TEST_TYPE (T, _Float16, __builtin_fabsf16) \ + TEST_TYPE (T, float, __builtin_fabsf)\ + TEST_TYPE (T, double, __builtin_fabs) + +TEST_ALL (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tfac[lg]t\tp[0-9]+\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */ +/* { dg-fin
Re: [PATCH 1/3] Perform fold when propagating.
On Tue, Aug 13, 2019 at 1:24 PM Robin Dapp wrote: > > > May I suggest to add a parameter to the substitute-and-fold engine > > so we can do the folding on all stmts only when enabled and enable > > it just for VRP? That also avoids the testsuite noise. > > Would something along these lines do? > > diff --git a/gcc/tree-ssa-propagate.c b/gcc/tree-ssa-propagate.c > index 7a8f1e037b0..6c0d743b823 100644 > --- a/gcc/tree-ssa-propagate.c > +++ b/gcc/tree-ssa-propagate.c > @@ -814,7 +814,6 @@ ssa_propagation_engine::ssa_propagate (void) >ssa_prop_fini (); > } > > - > /* Return true if STMT is of the form 'mem_ref = RHS', where 'mem_ref' > is a non-volatile pointer dereference, a structure reference or a > reference to a single _DECL. Ignore volatile memory references > @@ -1064,11 +1063,10 @@ > substitute_and_fold_dom_walker::before_dom_children (basic_block bb) >/* Replace real uses in the statement. */ >did_replace |= substitute_and_fold_engine->replace_uses_in (stmt); > > - if (did_replace) > - gimple_set_modified (stmt, true); > - > - if (fold_stmt (&i, follow_single_use_edges)) > + /* If we made a replacement, fold the statement. */ > + if (did_replace || > substitute_and_fold_engine->should_fold_all_stmts ()) > { > + fold_stmt (&i, follow_single_use_edges); > did_replace = true; > stmt = gsi_stmt (i); > gimple_set_modified (stmt, true); > diff --git a/gcc/tree-ssa-propagate.h b/gcc/tree-ssa-propagate.h > index 81b635e0787..939680f487c 100644 > --- a/gcc/tree-ssa-propagate.h > +++ b/gcc/tree-ssa-propagate.h > @@ -107,6 +107,13 @@ class substitute_and_fold_engine >bool substitute_and_fold (basic_block = NULL); >bool replace_uses_in (gimple *); >bool replace_phi_args_in (gphi *); > + > + /* Users like VRP can overwrite this when they want to perform > + folding for every propagation. */ > + virtual bool should_fold_all_stmts (void) > +{ > + return false; > +} Since this is constant for a single invocation I'd either add a flag param to substitute_and_fold or a bool class member initialized at construction time. Also do if ((did_replace || fold_all_stmts) && fold_stmt (...)) { } to avoid extra work when folding does nothing. Otherwise yes, this woudl work. > }; > > #endif /* _TREE_SSA_PROPAGATE_H */ > diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c > index e2850682da2..8c8fa6f2bec 100644 > --- a/gcc/tree-vrp.c > +++ b/gcc/tree-vrp.c > @@ -6271,6 +6271,9 @@ class vrp_folder : public substitute_and_fold_engine > { return vr_values->simplify_stmt_using_ranges (gsi); } > tree op_with_constant_singleton_value_range (tree op) > { return vr_values->op_with_constant_singleton_value_range (op); } > + > + /* Enable aggressive folding in every propagation. */ > + bool should_fold_all_stmts (void) { return true; } > }; > > /* If the statement pointed by SI has a predicate whose value can be > > > > I think it's also only necessary to fold a stmt when a (indirect) use > > after substitution has either been folded or has (new) SSA name > > info (range/known-bits) set? > > Where would this need to be changed? It was just a random thought, doing this would need to keep track of "changed" SSA names (in a bitmap?) and before folding checking all uses on the stmt if they are "changed". The overhead for this may be higher than the folding savings we get. The "changed" would also need to tickle down some distance so patterns with deeper nested subexpressions would be tried. Richard. > > Regards > Robin >
Re: [PATCH][testsuite] Fix PR91419
On Tue, 13 Aug 2019, Hans-Peter Nilsson wrote: > > From: Richard Biener > > Date: Tue, 13 Aug 2019 09:50:34 +0200 > > > 2019-08-13 Richard Biener > > > > PR testsuite/91419 > > * lib/target-supports.exp (natural_alignment_32): Amend target > > list based on BIGGEST_ALIGNMENT. > > (natural_alignment_64): Targets not natural_alignment_32 cannot > > be natural_alignment_64. > > * gcc.dg/tree-ssa/pr91091-2.c: XFAIL for !natural_alignment_32. > > * gcc.dg/tree-ssa/ssa-fre-77.c: Likewise. > > * gcc.dg/tree-ssa/ssa-fre-61.c: Require natural_alignment_32. > > LGTM, thanks. (Not tested myself but my cris-elf autotester will pick it up.) Committed. Thanks, Richard.
[Ada] Illegal selection of first object in a task type's body not detected
The compiler was improperly allowing selection of an object declared within a task body when the prefix was of the task type, specifically in the case where the object was the very first declared in the body (selections of later body declarations were being flagged). The flag Is_Private_Op was only set at the point of the first "private" declaration of the type in cases where the first declaration's name didn't match the selector. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Gary Dismukes gcc/ada/ * sem_ch4.adb (Analyze_Selected_Component): In the case where the prefix is of a concurrent type, and the selected entity matching the selector is the first private declaration of the type (such as the first local variable in a task's body), set Is_Private_Op. gcc/testsuite/ * gnat.dg/task5.adb: New testcase.--- gcc/ada/sem_ch4.adb +++ gcc/ada/sem_ch4.adb @@ -4994,7 +4994,15 @@ package body Sem_Ch4 is if Comp = First_Private_Entity (Type_To_Use) then if Etype (Sel) /= Any_Type then - -- We have a candiate + -- If the first private entity's name matches, then treat + -- it as a private op: needed for the error check for + -- illegal selection of private entities further below. + + if Chars (Comp) = Chars (Sel) then +Is_Private_Op := True; + end if; + + -- We have a candidate, so exit the loop exit; --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/task5.adb @@ -0,0 +1,26 @@ +procedure Task5 is + + task type T is + entry E (V1, V2 : Integer); + end T; + + T_Obj : T; + + task body T is + V1 : Integer; + V2 : Integer; + V3 : Integer; + begin + accept E (V1, V2 : Integer) do + T.V1 := V1; + T.V2 := V2; + + T_Obj.V1 := V1; -- { dg-error "invalid reference to private operation of some object of type \"T\"" } + T_Obj.V2 := V2; -- { dg-error "invalid reference to private operation of some object of type \"T\"" } + T_Obj.V3 := V3; -- { dg-error "invalid reference to private operation of some object of type \"T\"" } + end E; + end T; + +begin + null; +end Task5;
[Ada] Fix failing assertions on SPARK elaboration
Checking of SPARK elaboration rules may lead to assertion failures on a compiler built with assertions. Now fixed. There is no impact on compilation. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Yannick Moy gcc/ada/ * sem_disp.adb (Check_Dispatching_Operation): Update assertion for the separate declarations created in GNATprove mode. * sem_disp.ads (Is_Overriding_Subprogram): Update comment. * sem_elab.adb (SPARK_Processor): Fix test for checking of overriding primitives.--- gcc/ada/sem_disp.adb +++ gcc/ada/sem_disp.adb @@ -1149,6 +1149,10 @@ package body Sem_Disp is -- overridden primitives. The wrappers include checks on these -- modified conditions. (AI12-113). + -- 5. Declarations built for subprograms without separate spec which + -- are eligible for inlining in GNATprove (inside + -- Sem_Ch6.Analyze_Subprogram_Body_Helper). + if Present (Old_Subp) and then Present (Overridden_Operation (Subp)) and then Is_Dispatching_Operation (Old_Subp) @@ -1168,7 +1172,9 @@ package body Sem_Disp is or else Get_TSS_Name (Subp) = TSS_Stream_Read or else Get_TSS_Name (Subp) = TSS_Stream_Write - or else Present (Contract (Overridden_Operation (Subp; + or else Present (Contract (Overridden_Operation (Subp))) + + or else GNATprove_Mode); Check_Controlling_Formals (Tagged_Type, Subp); Override_Dispatching_Operation (Tagged_Type, Old_Subp, Subp); --- gcc/ada/sem_disp.ads +++ gcc/ada/sem_disp.ads @@ -151,7 +151,8 @@ package Sem_Disp is -- Returns True if E is a null procedure that is an interface primitive function Is_Overriding_Subprogram (E : Entity_Id) return Boolean; - -- Returns True if E is an overriding subprogram + -- Returns True if E is an overriding subprogram and False otherwise, in + -- particular for an inherited subprogram. function Is_Tag_Indeterminate (N : Node_Id) return Boolean; -- Returns true if the expression N is tag-indeterminate. An expression --- gcc/ada/sem_elab.adb +++ gcc/ada/sem_elab.adb @@ -49,6 +49,7 @@ with Sem_Aux; use Sem_Aux; with Sem_Cat; use Sem_Cat; with Sem_Ch7; use Sem_Ch7; with Sem_Ch8; use Sem_Ch8; +with Sem_Disp; use Sem_Disp; with Sem_Prag; use Sem_Prag; with Sem_Util; use Sem_Util; with Sinfo;use Sinfo; @@ -15233,9 +15234,12 @@ package body Sem_Elab is begin -- Nothing to do for predefined primitives because they are -- artifacts of tagged type expansion and cannot override source --- primitives. +-- primitives. Nothing to do as well for inherited primitives as +-- the check concerns overridding ones. -if Is_Predefined_Dispatching_Operation (Prim) then +if Is_Predefined_Dispatching_Operation (Prim) + or else not Is_Overriding_Subprogram (Prim) +then return; end if;
[Ada] Crash on precondition involving quantified expression
This patch fixes a compiler abort on a precondition whose condition includes a quantified expression. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Ed Schonberg gcc/ada/ * sem_util.adb (New_Copy_Tree, Visit_Entity): A quantified expression includes the implicit declaration of the loop parameter. When a quantified expression is copied during expansion, for example when building the precondition code from the generated pragma, a new loop parameter must be created for the new tree, to prevent duplicate declarations for the same symbol. gcc/testsuite/ * gnat.dg/predicate12.adb, gnat.dg/predicate12.ads: New testcase.--- gcc/ada/sem_util.adb +++ gcc/ada/sem_util.adb @@ -20799,16 +20799,27 @@ package body Sem_Util is -- this restriction leads to a performance penalty. -- ??? this list is flaky, and may hide dormant bugs + -- Should functions be included??? + + -- Loop parameters appear within quantified expressions and contain + -- an entity declaration that must be replaced when the expander is + -- active if the expression has been preanalyzed or analyzed. elsif not Ekind_In (Id, E_Block, E_Constant, E_Label, + E_Loop_Parameter, E_Procedure, E_Variable) and then not Is_Type (Id) then return; + elsif Ekind (Id) = E_Loop_Parameter + and then No (Etype (Condition (Parent (Parent (Id) + then +return; + -- Nothing to do when the entity was already visited elsif NCT_Tables_In_Use @@ -21081,7 +21092,14 @@ package body Sem_Util is begin pragma Assert (Nkind (N) not in N_Entity); - if Nkind (N) = N_Expression_With_Actions then + -- If the node is a quantified expression and expander is active, + -- it contains an implicit declaration that may require a new entity + -- when the condition has already been (pre)analyzed. + + if Nkind (N) = N_Expression_With_Actions + or else + (Nkind (N) = N_Quantified_Expression and then Expander_Active) + then EWA_Level := EWA_Level + 1; elsif EWA_Level > 0 @@ -21225,6 +21243,12 @@ package body Sem_Util is --* Semantic fields of nodes such as First_Real_Statement must be -- updated to reference the proper replicated nodes. + -- Finally, quantified expressions contain an implicit delaration for + -- the bound variable. Given that quantified expressions appearing + -- in contracts are copied to create pragmas and eventually checking + -- procedures, a new bound variable must be created for each copy, to + -- prevent multiple declarations of the same symbol. + -- To meet all these demands, routine New_Copy_Tree is split into two -- phases. --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/predicate12.adb @@ -0,0 +1,6 @@ +-- { dg-do compile } +-- { dg-options "-gnata" } + +package body Predicate12 is + procedure Dummy is null; +end Predicate12; --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/predicate12.ads @@ -0,0 +1,42 @@ +package Predicate12 is + + subtype Index_Type is Positive range 1 .. 100; + type Array_Type is array(Index_Type) of Integer; + + type Search_Engine is interface; + + procedure Search + (S : in Search_Engine; + Search_Item : in Integer; + Items : in Array_Type; + Found : out Boolean; + Result : out Index_Type) is abstract + with + Pre'Class => + (for all J in Items'Range => + (for all K in J + 1 .. Items'Last => Items(J) <= Items(K))), + Post'Class => + (if Found then Search_Item = Items(Result) + else (for all J in Items'Range => Items(J) /= Search_Item)); + + type Binary_Search_Engine is new Search_Engine with null record; + + procedure Search + (S : in Binary_Search_Engine; + Search_Item : in Integer; + Items : in Array_Type; + Found : out Boolean; + Result : out Index_Type) is null; + + type Forward_Search_Engine is new Search_Engine with null record; + + procedure Search + (S : in Forward_Search_Engine; + Search_Item : in Integer; + Items : in Array_Type; + Found : out Boolean; + Result : out Index_Type) is null; + + procedure Dummy; + +end Predicate12;
[Ada] Fix discrepancy in mechanism tracking private and full views
This fixes a discrepancy in the mechanism tracking the private and full views of entities when entering and leaving scopes. This mechanism records private entities that are dependent on other private entities, so that the exchange done on entering and leaving scopes can be propagated. The propagation is done recursively on entering child units, but it was not done recursively on leaving them, which would leave the dependency chains in a uncertain state in this case. That's mostly visible when inlining across units is enabled for code involving a lot of generic units. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Eric Botcazou gcc/ada/ * sem_ch7.adb (Install_Private_Declarations) : Do not rely solely on the Is_Child_Unit flag on the unit to recurse. (Uninstall_Declarations) : New function. Use it to recurse on the private dependent entities for child units. gcc/testsuite/ * gnat.dg/inline18.adb, gnat.dg/inline18.ads, gnat.dg/inline18_gen1-inner_g.ads, gnat.dg/inline18_gen1.adb, gnat.dg/inline18_gen1.ads, gnat.dg/inline18_gen2.adb, gnat.dg/inline18_gen2.ads, gnat.dg/inline18_gen3.adb, gnat.dg/inline18_gen3.ads, gnat.dg/inline18_pkg1.adb, gnat.dg/inline18_pkg1.ads, gnat.dg/inline18_pkg2-child.ads, gnat.dg/inline18_pkg2.ads: New testcase.--- gcc/ada/sem_ch7.adb +++ gcc/ada/sem_ch7.adb @@ -2261,13 +2261,14 @@ package body Sem_Ch7 is procedure Swap_Private_Dependents (Priv_Deps : Elist_Id); -- When the full view of a private type is made available, we do the -- same for its private dependents under proper visibility conditions. - -- When compiling a grandchild unit this needs to be done recursively. + -- When compiling a child unit this needs to be done recursively. - -- Swap_Private_Dependents -- - procedure Swap_Private_Dependents (Priv_Deps : Elist_Id) is + Cunit : Entity_Id; Deps : Elist_Id; Priv : Entity_Id; Priv_Elmt : Elmt_Id; @@ -2285,6 +2286,7 @@ package body Sem_Ch7 is if Present (Full_View (Priv)) and then Is_Visible_Dependent (Priv) then if Is_Private_Type (Priv) then + Cunit := Cunit_Entity (Current_Sem_Unit); Deps := Private_Dependents (Priv); Is_Priv := True; else @@ -2312,11 +2314,14 @@ package body Sem_Ch7 is Set_Is_Potentially_Use_Visible (Priv, Is_Potentially_Use_Visible (Node (Priv_Elmt))); - -- Within a child unit, recurse, except in generic child unit, - -- which (unfortunately) handle private_dependents separately. + -- Recurse for child units, except in generic child units, + -- which unfortunately handle private_dependents separately. + -- Note that the current unit may not have been analyzed, + -- for example a package body, so we cannot rely solely on + -- the Is_Child_Unit flag, but that's only an optimization. if Is_Priv - and then Is_Child_Unit (Cunit_Entity (Current_Sem_Unit)) + and then (No (Etype (Cunit)) or else Is_Child_Unit (Cunit)) and then not Is_Empty_Elmt_List (Deps) and then not Inside_A_Generic then @@ -2701,13 +2706,16 @@ package body Sem_Ch7 is Decl : constant Node_Id := Unit_Declaration_Node (P); Id: Entity_Id; Full : Entity_Id; - Priv_Elmt : Elmt_Id; - Priv_Sub : Entity_Id; procedure Preserve_Full_Attributes (Priv : Entity_Id; Full : Entity_Id); -- Copy to the private declaration the attributes of the full view that -- need to be available for the partial view also. + procedure Swap_Private_Dependents (Priv_Deps : Elist_Id); + -- When the full view of a private type is made unavailable, we do the + -- same for its private dependents under proper visibility conditions. + -- When compiling a child unit this needs to be done recursively. + function Type_In_Use (T : Entity_Id) return Boolean; -- Check whether type or base type appear in an active use_type clause @@ -2826,6 +2834,66 @@ package body Sem_Ch7 is end if; end Preserve_Full_Attributes; + - + -- Swap_Private_Dependents -- + - + + procedure Swap_Private_Dependents (Priv_Deps : Elist_Id) is + Cunit : Entity_Id; + Deps : Elist_Id; + Priv : Entity_Id; + Priv_Elmt : Elmt_Id; + Is_Priv : Boolean; + + begin + Priv_Elmt := First_Elmt (Priv_Deps); + while Present (Priv_Elmt) loop +
[Ada] Spurious error in discriminated aggregate
This patch fixes a bug in which a spurious error is given on an aggregate of a type derived from a subtype with a constrained discriminant. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Bob Duff gcc/ada/ * exp_aggr.adb (Init_Hidden_Discriminants): Avoid processing the wrong discriminant, which could be of the wrong type. gcc/testsuite/ * gnat.dg/discr57.adb: New testcase.--- gcc/ada/exp_aggr.adb +++ gcc/ada/exp_aggr.adb @@ -2689,8 +2689,10 @@ package body Exp_Aggr is Discr_Constr := First_Elmt (Stored_Constraint (Full_View (Base_Typ))); +-- Otherwise, no discriminant to process + else - Discr_Constr := First_Elmt (Stored_Constraint (Typ)); + Discr_Constr := No_Elmt; end if; while Present (Discr) and then Present (Discr_Constr) loop --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/discr57.adb @@ -0,0 +1,17 @@ +-- { dg-do compile } + +procedure Discr57 is + + type T1(Scalar : Boolean) is abstract tagged null record; + + subtype S1 is T1 (Scalar => False); + + type T2(Lower_Bound : Natural) is new + S1 with null record; + + Obj : constant T2 := + (Lower_Bound => 123); + +begin + null; +end Discr57;
[Ada] Expose part of ownership checking for use in GNATprove
GNATprove needs to be able to call a subset of the ownership legality rules from marking. This is provided by a new function Sem_SPARK.Is_Legal. There is no impact on compilation. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Yannick Moy gcc/ada/ * sem_spark.adb, sem_spark.ads (Is_Legal): New function exposed for use in GNATprove, to test legality rules not related to permissions. (Check_Declaration_Legality): Extract the part of Check_Declaration that checks rules not related to permissions. (Check_Declaration): Call the new Check_Declaration_Legality. (Check_Type_Legality): Rename of Check_Type. Introduce parameters to force or not checking, and update a flag detecting illegalities. (Check_Node): Ignore attribute references in statement position.--- gcc/ada/sem_spark.adb +++ gcc/ada/sem_spark.adb @@ -637,6 +637,14 @@ package body Sem_SPARK is procedure Check_Declaration (Decl : Node_Id); + procedure Check_Declaration_Legality + (Decl : Node_Id; + Force : Boolean; + Legal : in out Boolean); + -- Check the legality of declaration Decl regarding rules not related to + -- permissions. Update Legal to False if a rule is violated. Issue an + -- error message if Force is True and Emit_Messages returns True. + procedure Check_Expression (Expr : Node_Id; Mode : Extended_Checking_Mode); pragma Precondition (Nkind_In (Expr, N_Index_Or_Discriminant_Constraint, N_Range_Constraint, @@ -686,7 +694,10 @@ package body Sem_SPARK is procedure Check_Statement (Stmt : Node_Id); - procedure Check_Type (Typ : Entity_Id); + procedure Check_Type_Legality + (Typ : Entity_Id; + Force : Boolean; + Legal : in out Boolean); -- Check that type Typ is either not deep, or that it is an observing or -- owning type according to SPARK RM 3.10 @@ -1138,11 +1149,12 @@ package body Sem_SPARK is Expr_Root : Entity_Id; Perm: Perm_Kind; Status : Error_Status; + Dummy : Boolean := True; -- Start of processing for Check_Assignment begin - Check_Type (Target_Typ); + Check_Type_Legality (Target_Typ, Force => True, Legal => Dummy); if Is_Anonymous_Access_Type (Target_Typ) then Check_Source_Of_Borrow_Or_Observe (Expr, Status); @@ -1410,11 +1422,18 @@ package body Sem_SPARK is Target : constant Entity_Id := Defining_Identifier (Decl); Target_Typ : constant Node_Id := Etype (Target); Expr : Node_Id; + Dummy : Boolean := True; begin + -- Start with legality rules not related to permissions + + Check_Declaration_Legality (Decl, Force => True, Legal => Dummy); + + -- Now check permission-related legality rules + case N_Declaration'(Nkind (Decl)) is when N_Full_Type_Declaration => -Check_Type (Target); +null; -- ??? What about component declarations with defaults. @@ -1424,7 +1443,105 @@ package body Sem_SPARK is when N_Object_Declaration => Expr := Expression (Decl); -Check_Type (Target_Typ); +if Present (Expr) then + Check_Assignment (Target => Target, + Expr => Expr); +end if; + +if Is_Deep (Target_Typ) then + declare + Tree : constant Perm_Tree_Access := +new Perm_Tree_Wrapper' + (Tree => + (Kind=> Entire_Object, + Is_Node_Deep=> True, + Explanation => Decl, + Permission => Read_Write, + Children_Permission => Read_Write)); + begin + Set (Current_Perm_Env, Target, Tree); + end; +end if; + + when N_Iterator_Specification => +null; + + when N_Loop_Parameter_Specification => +null; + + -- Checking should not be called directly on these nodes + + when N_Function_Specification +| N_Entry_Declaration +| N_Procedure_Specification +| N_Component_Declaration + => +raise Program_Error; + + -- Ignored constructs for pointer checking + + when N_Formal_Object_Declaration +| N_Formal_Type_Declaration +| N_Incomplete_Type_Declaration +| N_Private_Extension_Declaration +| N_Private_Type_Declaration +| N_Protected_Type_Declaration + => +null; + + -- The following nodes are rewritten by semantic analysis + + when N_Expression_Function => +raise Program_Error; + end case; + end Check_Declaratio
[Ada] Equality for nonabstract type derived from interface treated as abstract
The compiler was creating an abstract function for the equality operation of a (nonlimited) interface type, and that could result in errors on generic instantiations that are passed nonabstract types derived from the interface type along with the derived type's inherited equality operation (complaining about an abstract subprogram being passed to a nonabstract formal). The "=" operation of an interface is supposed to be nonabstract (a direct consequence of the rule in RM 4.5.2(6-7)), so we now create an expression function rather than an abstract function. The function returns False, but the result is unimportant since a function of an abstract type can never actually be invoked (its arguments must generally be class-wide, since there can be no objects of the type, and calling it will dispatch). Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Gary Dismukes gcc/ada/ * exp_ch3.adb (Predef_Spec_Or_Body): For an equality operation of an interface type, create an expression function (that returns False) rather than declaring an abstract function. * freeze.adb (Check_Inherited_Conditions): Set Needs_Wrapper to False unconditionally at the start of the loop creating wrappers for inherited operations. gcc/testsuite/ * gnat.dg/equal11.adb, gnat.dg/equal11_interface.ads, gnat.dg/equal11_record.adb, gnat.dg/equal11_record.ads: New testcase.--- gcc/ada/exp_ch3.adb +++ gcc/ada/exp_ch3.adb @@ -10313,8 +10313,24 @@ package body Exp_Ch3 is Result_Definition=> New_Occurrence_Of (Ret_Type, Loc)); end if; + -- Declare an abstract subprogram for primitive subprograms of an + -- interface type (except for "="). + if Is_Interface (Tag_Typ) then - return Make_Abstract_Subprogram_Declaration (Loc, Spec); + if Name /= Name_Op_Eq then +return Make_Abstract_Subprogram_Declaration (Loc, Spec); + + -- The equality function (if any) for an interface type is defined + -- to be nonabstract, so we create an expression function for it that + -- always returns False. Note that the function can never actually be + -- invoked because interface types are abstract, so there aren't any + -- objects of such types (and their equality operation will always + -- dispatch). + + else +return Make_Expression_Function + (Loc, Spec, New_Occurrence_Of (Standard_False, Loc)); + end if; -- If body case, return empty subprogram body. Note that this is ill- -- formed, because there is not even a null statement, and certainly not --- gcc/ada/freeze.adb +++ gcc/ada/freeze.adb @@ -1526,11 +1526,11 @@ package body Freeze is -- so that LSP can be verified/enforced. Op_Node := First_Elmt (Prim_Ops); - Needs_Wrapper := False; while Present (Op_Node) loop - Decls := Empty_List; - Prim := Node (Op_Node); + Decls := Empty_List; + Prim := Node (Op_Node); + Needs_Wrapper := False; if not Comes_From_Source (Prim) and then Present (Alias (Prim)) then Par_Prim := Alias (Prim); @@ -1601,8 +1601,6 @@ package body Freeze is (Par_R, New_List (New_Decl, New_Body)); end if; end; - -Needs_Wrapper := False; end if; Next_Elmt (Op_Node); --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/equal11.adb @@ -0,0 +1,37 @@ +-- { dg-do run } + +with Equal11_Record; + +procedure Equal11 is + + use Equal11_Record; + + R : My_Record_Type; + L : My_Record_Type_List_Pck.List; +begin + -- Single record + R.F := 42; + R.Put; + if Put_Result /= 42 then +raise Program_Error; + end if; + + -- List of records + L.Append ((F => 3)); + L.Append ((F => 2)); + L.Append ((F => 1)); + + declare +Expected : constant array (Positive range <>) of Integer := + (3, 2, 1); +I : Positive := 1; + begin +for LR of L loop + LR.Put; + if Put_Result /= Expected (I) then +raise Program_Error; + end if; + I := I + 1; +end loop; + end; +end Equal11; --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/equal11_interface.ads @@ -0,0 +1,7 @@ +package Equal11_Interface is + + type My_Interface_Type is interface; + + procedure Put (R : in My_Interface_Type) is abstract; + +end Equal11_Interface; --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/equal11_record.adb @@ -0,0 +1,10 @@ +with Ada.Text_IO; + +package body Equal11_Record is + + procedure Put (R : in My_Record_Type) is + begin +Put_Result := R.F; + end Put; + +end Equal11_Record; --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/equal11_record.ads @@ -0,0 +1,21 @@ +with Ada.Containers.Doubly_Linked_Lists; +with Equal11_Interface; + +package Equal11_Record is + + use Eq
[Ada] Strengthen Locked flag
This patch strengthens the Locked flag, by Asserting that it is False on operations that might cause reallocation. No change in behavior (except in the presence of compiler bugs), so no test. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Bob Duff gcc/ada/ * table.adb: Assert that the table is not locked when increasing Last, even if it doesn't cause reallocation. In other words, assert that on operations that MIGHT cause reallocation. * table.ads: Fix comment accordingly.--- gcc/ada/table.adb +++ gcc/ada/table.adb @@ -80,6 +80,7 @@ package body Table is procedure Append (New_Val : Table_Component_Type) is begin + pragma Assert (not Locked); Set_Item (Table_Index_Type (Last_Val + 1), New_Val); end Append; @@ -120,6 +121,7 @@ package body Table is procedure Increment_Last is begin + pragma Assert (not Locked); Last_Val := Last_Val + 1; if Last_Val > Max then @@ -384,6 +386,8 @@ package body Table is procedure Set_Last (New_Val : Table_Index_Type) is begin + pragma Assert (Int (New_Val) <= Last_Val or else not Locked); + if Int (New_Val) < Last_Val then Last_Val := Int (New_Val); --- gcc/ada/table.ads +++ gcc/ada/table.ads @@ -130,14 +130,15 @@ package Table is -- First .. Last. Locked : Boolean := False; - -- Table expansion is permitted only if this switch is set to False. A - -- client may set Locked to True, in which case any attempt to expand - -- the table will cause an assertion failure. Note that while a table - -- is locked, its address in memory remains fixed and unchanging. This - -- feature is used to control table expansion during Gigi processing. - -- Gigi assumes that tables other than the Uint and Ureal tables do - -- not move during processing, which means that they cannot be expanded. - -- The Locked flag is used to enforce this restriction. + -- Increasing the value of Last is permitted only if this switch is set + -- to False. A client may set Locked to True, in which case any attempt + -- to increase the value of Last (which might expand the table) will + -- cause an assertion failure. Note that while a table is locked, its + -- address in memory remains fixed and unchanging. This feature is used + -- to control table expansion during Gigi processing. Gigi assumes that + -- tables other than the Uint and Ureal tables do not move during + -- processing, which means that they cannot be expanded. The Locked + -- flag is used to enforce this restriction. procedure Init; -- This procedure allocates a new table of size Initial (freeing any
[Ada] Crash on quantified expression in disabled assertion
The defining identifier of a quantified expression may be the freeze point of its type. If the quantified expression appears in an assertion that is disavbled, the freeze node for that type may appear in a tree that will be discarded when the enclosing pragma is elaborated. To ensure that the freeze node is reachable for subsquent uses we must generate its freeze node explicitly when the quantified expression is analyzed. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Ed Schonberg gcc/ada/ * exp_ch4.adb (Expand_N_Quantified_Expression): Freeze explicitly the type of the loop parameter. gcc/testsuite/ * gnat.dg/assert2.adb, gnat.dg/assert2.ads: New testcase.--- gcc/ada/exp_ch4.adb +++ gcc/ada/exp_ch4.adb @@ -10337,8 +10337,30 @@ package body Exp_Ch4 is Flag : Entity_Id; Scheme: Node_Id; Stmts : List_Id; + Var : Entity_Id; begin + -- Ensure that the bound variable is properly frozen. We must do + -- this before expansion because the expression is about to be + -- converted into a loop, and resulting freeze nodes may end up + -- in the wrong place in the tree. + + if Present (Iter_Spec) then + Var := Defining_Identifier (Iter_Spec); + else + Var := Defining_Identifier (Loop_Spec); + end if; + + declare + P : Node_Id := Parent (N); + begin + while Nkind (P) in N_Subexpr loop +P := Parent (P); + end loop; + + Freeze_Before (P, Etype (Var)); + end; + -- Create the declaration of the flag which tracks the status of the -- quantified expression. Generate: --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/assert2.adb @@ -0,0 +1,5 @@ +-- { dg-do compile } + +package body Assert2 is + procedure Dummy is null; +end Assert2; --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/assert2.ads @@ -0,0 +1,15 @@ +package Assert2 +with SPARK_Mode +is + type Living is new Integer; + function Is_Martian (Unused : Living) return Boolean is (False); + + function Is_Green (Unused : Living) return Boolean is (True); + + pragma Assert + (for all M in Living => (if Is_Martian (M) then Is_Green (M))); + pragma Assert + (for all M in Living => (if Is_Martian (M) then not Is_Green (M))); + + procedure Dummy; +end Assert2;
[Ada] Warn about unknown condition in Compile_Time_Warning
The compiler now warns if the condition in a pragma Compile_Time_Warning or Compile_Time_Error does not have a compile-time-known value. The warning is not given for pragmas in a generic template, but is given for pragmas in an instance. The -gnatw_c and -gnatw_C switches turn the warning on and off. The default is on. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Bob Duff gcc/ada/ * sem_prag.ads, sem_prag.adb (Process_Compile_Time_Warning_Or_Error): In parameterless version, improve detection of whether we are in a generic unit to cover the case of an instance within a generic unit. (Process_Compile_Time_Warning_Or_Error): Rename the two-parameter version to be Validate_Compile_Time_Warning_Or_Error, and do not export it. Issue a warning if the condition is not known at compile time. The key point is that the warning must be given only for pragmas deferred to the back end, because the back end discovers additional values that are known at compile time. Previous changes in this ticket have enabled this by deferring to the back end without checking for special cases such as 'Size. (Validate_Compile_Time_Warning_Or_Error): Rename to be Defer_Compile_Time_Warning_Error_To_BE. * warnsw.ads, warnsw.adb (Warn_On_Unknown_Compile_Time_Warning): Add new switches -gnatw_c and -gnatw_C to control the above warning. * doc/gnat_ugn/building_executable_programs_with_gnat.rst: Document new switches. * gnat_ugn.texi: Regenerate. gcc/testsuite/ * gnat.dg/warn27.adb: New testcase. patch.diff.gz Description: application/gzip
[Ada] Fix spurious ownership error in GNATprove
Like Is_Path_Expression, function Is_Subpath_Expression should consider the possibility that the subpath is a type conversion or type qualification over the actual subpath node. This avoids spurious ownership errors in GNATprove. There is no impact on compilation. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Yannick Moy gcc/ada/ * sem_spark.adb (Is_Subpath_Expression): Take into account conversion and qualification.--- gcc/ada/sem_spark.adb +++ gcc/ada/sem_spark.adb @@ -4266,6 +4266,12 @@ package body Sem_SPARK is is begin return Is_Path_Expression (Expr, Is_Traversal) + +or else (Nkind_In (Expr, N_Qualified_Expression, + N_Type_Conversion, + N_Unchecked_Type_Conversion) + and then Is_Subpath_Expression (Expression (Expr))) + or else (Nkind (Expr) = N_Attribute_Reference and then (Get_Attribute_Id (Attribute_Name (Expr)) = @@ -4276,7 +4282,8 @@ package body Sem_SPARK is or else Get_Attribute_Id (Attribute_Name (Expr)) = Attribute_Image)) - or else Nkind (Expr) = N_Op_Concat; + +or else Nkind (Expr) = N_Op_Concat; end Is_Subpath_Expression; ---
[Ada] Alignment may be specified as zero
An Alignment clause or an aspect_specification for Alignment may be specified as 0, which is treated the same as 1. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Bob Duff gcc/ada/ * sem_ch13.adb (Get_Alignment_Value): Return 1 for Alignment 0, and do not give an error. * doc/gnat_rm/representation_clauses_and_pragmas.rst: Update the corresponding documentation. * gnat_rm.texi: Regenerate. gcc/testsuite/ * gnat.dg/alignment15.adb: New testcase.--- gcc/ada/doc/gnat_rm/representation_clauses_and_pragmas.rst +++ gcc/ada/doc/gnat_rm/representation_clauses_and_pragmas.rst @@ -30,9 +30,11 @@ Alignment Clauses .. index:: Alignment Clause -GNAT requires that all alignment clauses specify a power of 2, and all -default alignments are always a power of 2. The default alignment -values are as follows: +GNAT requires that all alignment clauses specify 0 or a power of 2, and +all default alignments are always a power of 2. Specifying 0 is the +same as specifying 1. + +The default alignment values are as follows: * *Elementary Types*. @@ -610,23 +612,23 @@ alignment of the type (this is true for all types). In some cases the end record; -On a typical 32-bit architecture, the X component will occupy four bytes -and the Y component will occupy one byte, for a total of 5 bytes. As a -result ``R'Value_Size`` will be 40 (bits) since this is the minimum size -required to store a value of this type. For example, it is permissible -to have a component of type R in an array whose component size is -specified to be 40 bits. - -However, ``R'Object_Size`` will be 64 (bits). The difference is due to -the alignment requirement for objects of the record type. The X -component will require four-byte alignment because that is what type -Integer requires, whereas the Y component, a Character, will only -require 1-byte alignment. Since the alignment required for X is the -greatest of all the components' alignments, that is the alignment -required for the enclosing record type, i.e., 4 bytes or 32 bits. As -indicated above, the actual object size must be rounded up so that it is -a multiple of the alignment value. Therefore, 40 bits rounded up to the -next multiple of 32 yields 64 bits. +On a typical 32-bit architecture, the X component will occupy four bytes +and the Y component will occupy one byte, for a total of 5 bytes. As a +result ``R'Value_Size`` will be 40 (bits) since this is the minimum size +required to store a value of this type. For example, it is permissible +to have a component of type R in an array whose component size is +specified to be 40 bits. + +However, ``R'Object_Size`` will be 64 (bits). The difference is due to +the alignment requirement for objects of the record type. The X +component will require four-byte alignment because that is what type +Integer requires, whereas the Y component, a Character, will only +require 1-byte alignment. Since the alignment required for X is the +greatest of all the components' alignments, that is the alignment +required for the enclosing record type, i.e., 4 bytes or 32 bits. As +indicated above, the actual object size must be rounded up so that it is +a multiple of the alignment value. Therefore, 40 bits rounded up to the +next multiple of 32 yields 64 bits. For all other types, the ``Object_Size`` and ``Value_Size`` are the same (and equivalent to the RM attribute ``Size``). --- gcc/ada/gnat_rm.texi +++ gcc/ada/gnat_rm.texi @@ -21,7 +21,7 @@ @copying @quotation -GNAT Reference Manual , Jul 31, 2019 +GNAT Reference Manual , Aug 01, 2019 AdaCore @@ -18369,9 +18369,11 @@ and this section describes the additional capabilities provided. @geindex Alignment Clause -GNAT requires that all alignment clauses specify a power of 2, and all -default alignments are always a power of 2. The default alignment -values are as follows: +GNAT requires that all alignment clauses specify 0 or a power of 2, and +all default alignments are always a power of 2. Specifying 0 is the +same as specifying 1. + +The default alignment values are as follows: @itemize * --- gcc/ada/sem_ch13.adb +++ gcc/ada/sem_ch13.adb @@ -11509,7 +11509,7 @@ package body Sem_Ch13 is if Align = No_Uint then return No_Uint; - elsif Align <= 0 then + elsif Align < 0 then -- This error is suppressed in ASIS mode to allow for different ASIS -- back ends or ASIS-based tools to query the illegal clause. @@ -11520,6 +11520,11 @@ package body Sem_Ch13 is return No_Uint; + -- If Alignment is specified to be 0, we treat it the same as 1 + + elsif Align = 0 then + return Uint_1; + else for J in Int range 0 .. 64 loop declare --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/alignment15.adb @@ -0,0 +1,17 @@ +-- { dg-compile } + +procedure Alignment15 is + type T0 is record + X
[Ada] Incorrect error on inline protected function
This patch fixes a bug where if a protected function has a pragma Inline, and has no local variables, and the body consists of a single extended_return_statement, and the result type is an indefinite composite subtype, and inlining is enabled, the compiler gives an error, even though the program is legal. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Bob Duff gcc/ada/ * inline.adb (Check_And_Split_Unconstrained_Function): Ignore protected functions to get rid of spurious error. The transformation done by this procedure triggers legality errors in the generated code in this case. gcc/testsuite/ * gnat.dg/inline19.adb, gnat.dg/inline19.ads: New testcase.--- gcc/ada/inline.adb +++ gcc/ada/inline.adb @@ -2041,6 +2041,8 @@ package body Inline is Original_Body : Node_Id; Body_To_Analyze : Node_Id; + -- Start of processing for Build_Body_To_Inline + begin pragma Assert (Current_Scope = Spec_Id); @@ -2448,6 +2450,18 @@ package body Inline is elsif Present (Body_To_Inline (Decl)) then return; + -- Do not generate a body to inline for protected functions, because the + -- transformation generates a call to a protected procedure, causing + -- spurious errors. We don't inline protected operations anyway, so + -- this is no loss. We might as well ignore intrinsics and foreign + -- conventions as well -- just allow Ada conventions. + + elsif not (Convention (Spec_Id) = Convention_Ada +or else Convention (Spec_Id) = Convention_Ada_Pass_By_Copy +or else Convention (Spec_Id) = Convention_Ada_Pass_By_Reference) + then + return; + -- Check excluded declarations elsif Present (Declarations (N)) --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/inline19.adb @@ -0,0 +1,17 @@ +-- { dg-do compile } +-- { dg-options "-O2" } + +package body Inline19 is + + S : String := "Hello"; + + protected body P is + function F return String is + begin + return Result : constant String := S do +null; + end return; + end F; + end P; + +end Inline19; --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/inline19.ads @@ -0,0 +1,8 @@ +package Inline19 is + + protected P is + function F return String; + pragma Inline (F); + end P; + +end Inline19;
[Ada] Check SPARK restriction on Old/Loop_Entry with pointers
SPARK RM rule 3.10(14) restricts the use of Old and Loop_Entry attributes on prefixes of an owning or observing type (i.e. a type with access inside). There is no impact on compilation. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Yannick Moy gcc/ada/ * sem_spark.adb (Check_Old_Loop_Entry): New procedure to check correct use of Old and Loop_Entry. (Check_Node): Check subprogram contracts. (Check_Pragma): Check Loop_Variant. (Check_Safe_Pointers): Apply checking to library-level subprogram declarations as well, in order to check their contract. --- gcc/ada/sem_spark.adb +++ gcc/ada/sem_spark.adb @@ -663,6 +663,9 @@ package body Sem_SPARK is procedure Check_Node (N : Node_Id); -- Main traversal procedure to check safe pointer usage + procedure Check_Old_Loop_Entry (N : Node_Id); + -- Check SPARK RM 3.10(14) regarding 'Old and 'Loop_Entry + procedure Check_Package_Body (Pack : Node_Id); procedure Check_Package_Spec (Pack : Node_Id); @@ -2583,6 +2586,43 @@ package body Sem_SPARK is procedure Check_Node (N : Node_Id) is + + procedure Check_Subprogram_Contract (N : Node_Id); + -- Check the postcondition-like contracts for use of 'Old + + --- + -- Check_Subprogram_Contract -- + --- + + procedure Check_Subprogram_Contract (N : Node_Id) is + begin + if Nkind (N) = N_Subprogram_Declaration + or else Acts_As_Spec (N) + then +declare + E: constant Entity_Id := Unique_Defining_Entity (N); + Post : constant Node_Id := + Get_Pragma (E, Pragma_Postcondition); + Cases: constant Node_Id := + Get_Pragma (E, Pragma_Contract_Cases); +begin + Check_Old_Loop_Entry (Post); + Check_Old_Loop_Entry (Cases); +end; + + elsif Nkind (N) = N_Subprogram_Body then +declare + E: constant Entity_Id := Defining_Entity (N); + Ref_Post : constant Node_Id := + Get_Pragma (E, Pragma_Refined_Post); +begin + Check_Old_Loop_Entry (Ref_Post); +end; + end if; + end Check_Subprogram_Contract; + + -- Start of processing for Check_Node + begin case Nkind (N) is when N_Declaration => @@ -2602,14 +2642,17 @@ package body Sem_SPARK is Check_Package_Body (N); end if; - when N_Subprogram_Body -| N_Entry_Body -| N_Task_Body - => + when N_Subprogram_Body => if not Is_Generic_Unit (Unique_Defining_Entity (N)) then + Check_Subprogram_Contract (N); Check_Callable_Body (N); end if; + when N_Entry_Body +| N_Task_Body + => +Check_Callable_Body (N); + when N_Protected_Body => Check_List (Declarations (N)); @@ -2622,6 +2665,9 @@ package body Sem_SPARK is when N_Pragma => Check_Pragma (N); + when N_Subprogram_Declaration => +Check_Subprogram_Contract (N); + -- Ignored constructs for pointer checking when N_Abstract_Subprogram_Declaration @@ -2655,7 +2701,6 @@ package body Sem_SPARK is | N_Procedure_Instantiation | N_Raise_xxx_Error | N_Record_Representation_Clause -| N_Subprogram_Declaration | N_Subprogram_Renaming_Declaration | N_Task_Type_Declaration | N_Use_Package_Clause @@ -2677,6 +2722,65 @@ package body Sem_SPARK is end case; end Check_Node; + -- + -- Check_Old_Loop_Entry -- + -- + + procedure Check_Old_Loop_Entry (N : Node_Id) is + + function Check_Attribute (N : Node_Id) return Traverse_Result; + + - + -- Check_Attribute -- + - + + function Check_Attribute (N : Node_Id) return Traverse_Result is + Attr_Id : Attribute_Id; + Aname : Name_Id; + Pref: Node_Id; + + begin + if Nkind (N) = N_Attribute_Reference then +Attr_Id := Get_Attribute_Id (Attribute_Name (N)); +Aname := Attribute_Name (N); + +if Attr_Id = Attribute_Old + or else Attr_Id = Attribute_Loop_Entry +then + Pref := Prefix (N); + + if Is_Deep (Etype (Pref)) then + if Nkind (Pref) /= N_Function_Call then + if Emit_Messages then +Error_Msg_Name_1 := Aname; +Error_Msg_N + ("prefix of % attribute must be a function call " +
[Ada] Improve performance of Containers.Functional_Base
This patch modifies the implementation of Functional_Base to damp the cost of its subprograms at runtime in specific cases. Instead of copying the entire underlying array to create a new container, containers can share the same Array_Base attribute. Performance on common use cases of formal and functional containers is improved with this patch. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Joffrey Huguet gcc/ada/ * libgnat/a-cofuba.ads: Add a Length attribute to type Container. Add a type Array_Base which replaces the previous Elements attribute of Container. (Content_Init): New subprogram. It is used to initialize the Base attribute of Container. * libgnat/a-cofuba.adb (Resize): New subprogram. It is used to resize the underlying array of a container if necessary. (=, <=, Find, Get, Intersection, Length, Num_Overlaps, Set, Union): Update to match changes in type declarations. (Add): Modify body to damp the time and space cost in a specific case. (Content_Init): New subprogram. It is used to initialize the Base attribute of Container. (Remove): Modify body to damp the time and space cost in a specific case.--- gcc/ada/libgnat/a-cofuba.adb +++ gcc/ada/libgnat/a-cofuba.adb @@ -30,6 +30,7 @@ -- pragma Ada_2012; +with Ada.Unchecked_Deallocation; package body Ada.Containers.Functional_Base with SPARK_Mode => Off is @@ -47,18 +48,22 @@ package body Ada.Containers.Functional_Base with SPARK_Mode => Off is -- Search a container C for an element equal to E.all, returning the -- position in the underlying array. + procedure Resize (Base : Array_Base_Access); + -- Resize the underlying array if needed so that it can contain one more + -- element. + - -- "=" -- - function "=" (C1 : Container; C2 : Container) return Boolean is begin - if C1.Elements'Length /= C2.Elements'Length then + if C1.Length /= C2.Length then return False; end if; - for I in C1.Elements'Range loop - if C1.Elements (I).all /= C2.Elements (I).all then + for I in 1 .. C1.Length loop + if C1.Base.Elements (I).all /= C2.Base.Elements (I).all then return False; end if; end loop; @@ -72,8 +77,8 @@ package body Ada.Containers.Functional_Base with SPARK_Mode => Off is function "<=" (C1 : Container; C2 : Container) return Boolean is begin - for I in C1.Elements'Range loop - if Find (C2, C1.Elements (I)) = 0 then + for I in 1 .. C1.Length loop + if Find (C2, C1.Base.Elements (I)) = 0 then return False; end if; end loop; @@ -90,31 +95,58 @@ package body Ada.Containers.Functional_Base with SPARK_Mode => Off is I : Index_Type; E : Element_Type) return Container is - A : constant Element_Array_Access := -new Element_Array'(1 .. C.Elements'Last + 1 => <>); - P : Count_Type := 0; - begin - for J in 1 .. C.Elements'Last + 1 loop - if J /= To_Count (I) then -P := P + 1; -A (J) := C.Elements (P); - else -A (J) := new Element_Type'(E); - end if; - end loop; - - return Container'(Elements => A); + if To_Count (I) = C.Length + 1 and then C.Length = C.Base.Max_Length then + Resize (C.Base); + C.Base.Max_Length := C.Base.Max_Length + 1; + C.Base.Elements (C.Base.Max_Length) := new Element_Type'(E); + + return Container'(Length => C.Base.Max_Length, Base => C.Base); + else + declare +A : constant Array_Base_Access := Content_Init (C.Length); +P : Count_Type := 0; + begin +A.Max_Length := C.Length + 1; +for J in 1 .. C.Length + 1 loop + if J /= To_Count (I) then + P := P + 1; + A.Elements (J) := C.Base.Elements (P); + else + A.Elements (J) := new Element_Type'(E); + end if; +end loop; + +return Container'(Length => A.Max_Length, + Base => A); + end; + end if; end Add; + -- + -- Content_Init -- + -- + + function Content_Init (L : Count_Type := 0) return Array_Base_Access + is + Max_Init : constant Count_Type := 100; + Size : constant Count_Type := +(if L < Count_Type'Last - Max_Init then L + Max_Init + else Count_Type'Last); + Elements : constant Element_Array_Access := +new Element_Array'(1 .. Size => <>); + begin + return new Array_Base'(Max_Length => 0, Elements => Elements); + end Content_Init; + -- -- Find -- -- function Find (C :
[Ada] Fix internal error on inlined subprogram instance
This fixes a long-standing oddity in the procedure analyzing the instantiation of a generic subprogram, which would set the Is_Generic_Instance flag on the enclosing package generated for the instantiation but only to reset it a few lines below. Now this flag is relied upon by the machinery which computes the set of public entities to be exposed by a package. Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Eric Botcazou gcc/ada/ * sem_ch12.adb (Analyze_Instance_And_Renamings): Do not reset the Is_Generic_Instance flag previously set on the package generated for the instantiation of a generic subprogram. gcc/testsuite/ * gnat.dg/generic_inst11.adb, gnat.dg/generic_inst11_pkg.adb, gnat.dg/generic_inst11_pkg.ads: New testcase.--- gcc/ada/sem_ch12.adb +++ gcc/ada/sem_ch12.adb @@ -5264,10 +5264,6 @@ package body Sem_Ch12 is Analyze (Pack_Decl); Check_Formal_Packages (Pack_Id); - Set_Is_Generic_Instance (Pack_Id, False); - - -- Why do we clear Is_Generic_Instance??? We set it 20 lines - -- above??? -- Body of the enclosing package is supplied when instantiating the -- subprogram body, after semantic analysis is completed. --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/generic_inst11.adb @@ -0,0 +1,9 @@ +-- { dg-do compile } +-- { dg-options "-O -gnatn" } + +with Generic_Inst11_Pkg; + +procedure Generic_Inst11 is +begin + Generic_Inst11_Pkg.Proc; +end; --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/generic_inst11_pkg.adb @@ -0,0 +1,21 @@ +with System; + +package body Generic_Inst11_Pkg is + + Data : Integer; + + generic + Reg_Address : System.Address; + procedure Inner_G with Inline; + + procedure Inner_G is + Reg : Integer with Address => Reg_Address; + begin + null; + end; + + procedure My_Inner_G is new Inner_G (Data'Address); + + procedure Proc renames My_Inner_G; + +end Generic_Inst11_Pkg; --- /dev/null new file mode 100644 +++ gcc/testsuite/gnat.dg/generic_inst11_pkg.ads @@ -0,0 +1,5 @@ +package Generic_Inst11_Pkg is + + procedure Proc with Inline; + +end Generic_Inst11_Pkg;
[Ada] Compiler speedup with inlining across units
This change is aimed at speeding up the inlining across units done by the Ada compiler when -gnatn is specified and in the presence of units instantiating a lot of generic packages. The current implementation is as follows: when a generic package is being instantiated, the compiler scans its spec for the presence of subprograms with an aspect/pragma Inline and, upon finding one, schedules the instantiation of its body. That's not very efficient because the compiler doesn't know yet if one of those inlined subprograms will eventually be called from the main unit. The new implementation arranges for the compiler to instantiate the body on demand, i.e. when it encounters a call to one of the inlined subprograms. That's still not optimal because, at this point, the compiler has not yet computed whether the call itself is reachable from the main unit (it will do this computation at the very end of the processing, just before sending the inlined units to the code generator) but that's nevertheless a net progress. The patch also enhances the -gnatd.j option to make it output the list of instances "inlined" this way. The following package is a simple example: with Q; procedure P is begin Q.Proc; end; package Q is procedure Proc; pragma Inline (Proc); end Q; with G; package body Q is package My_G is new G (1); procedure Proc is Val : constant Integer := My_G.Func; begin if Val /= 1 then raise Program_Error; end if; end; end Q; generic Value : Integer; package G is function Func return Integer; pragma Inline (Func); end G; package body G is function Func return Integer is begin return Value; end; end G; Tested on x86_64-pc-linux-gnu, committed on trunk 2019-08-14 Eric Botcazou gcc/ada/ * einfo.ads (Is_Called): Document new usage on E_Package entities. * einfo.adb (Is_Called): Accept E_Package entities. (Set_Is_Called): Likewise. * exp_ch6.adb (Expand_Call_Helper): Move code dealing with instances for back-end inlining to Add_Inlined_Body. * inline.ads: Remove with clauses for Alloc and Table. (Pending_Instantiations): Move to... * inline.adb: Add with clauses for Alloc, Uintp, Table and GNAT.HTable. (Backend_Instances): New variable. (Pending_Instantiations): ...here. (Called_Pending_Instantiations): New table. (Node_Table_Size): New constant. (Node_Header_Num): New subtype. (Node_Hash): New function. (To_Pending_Instantiations): New hash table. (Add_Inlined_Body): Bail out early for subprograms in the main unit or subunit. Likewise if the Is_Called flag is set. If the subprogram is an instance, invoke Add_Inlined_Instance. Call Set_Is_Called earlier. If the subrogram is within an instance, invoke Add_Inlined_Instance. Also deal with the case where the call itself is within an instance. (Add_Inlined_Instance): New procedure. (Add_Inlined_Subprogram): Remove conditions always fulfilled. (Add_Pending_Instantiation): Move the defence against ludicruous number of instantiations to here. When back-end inlining is enabled, associate an instantiation with its index in table and mark a few selected kinds of instantiations as always needed. (Initialize): Set Backend_Instances to No_Elist. (Instantiate_Body): New procedure doing the work extracted from... (Instantiate_Bodies): ...here. When back-end inlining is enabled, loop over Called_Pending_Instantiations instead of Pending_Instantiations. (Is_Nested): Minor tweak. (List_Inlining_Info): Also list the contents of Backend_Instances. * sem_ch12.adb (Might_Inline_Subp): Return early if Is_Inlined is set and otherwise set it before returning true. (Analyze_Package_Instantiation): Remove the defence against ludicruous number of instantiations. Invoke Remove_Dead_Instance instead of doing the removal manually if there is a guaranteed ABE. patch.diff.gz Description: application/gzip
Re: [PATCH 2/2] Add more entries to the C++ get_std_name_hint array
On 13/08/19 16:08 -0400, Jason Merrill wrote: On 8/13/19 9:36 AM, Jonathan Wakely wrote: This adds some commonly-used C++11/14 names, and some new C++17/20 names. The latter aren't available when using the -std=gnu++14 default, so the fix-it suggesting to use a newer dialect is helpful. * name-lookup.c (get_std_name_hint): Add more entries. Tested x86_64-linux. OK for trunk? OK. I realised as I was about to commit it that cxx17 is the wrong dialect for remove_cvref and remove_cvref_t, so I corrected them to cxx2a before committing it. (I've tried to use remove_cvref_t in C++17 a few times, so this diagnostic should help me!)
[PING] [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
Hi! I'd like to ping for this patch: https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00546.html Thanks Bernd.
[committed][AArch64] Add SVE conditional integer unary patterns
This patch adds patterns to match conditional unary operations on integers. At the moment we rely on combine to merge separate arithmetic and vcond_mask operations, and since the latter doesn't accept zero operands, we miss out on the opportunity to use the movprfx /z alternative. (This alternative is tested by the ACLE patches though.) Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274476. Richard 2019-08-14 Richard Sandiford Kugan Vivekanandarajah gcc/ * config/aarch64/aarch64-sve.md (*cond__2): New pattern. (*cond__any): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/cond_unary_1.c: New test. * gcc.target/aarch64/sve/cond_unary_1_run.c: Likewise. * gcc.target/aarch64/sve/cond_unary_2.c: Likewise. * gcc.target/aarch64/sve/cond_unary_2_run.c: Likewise. * gcc.target/aarch64/sve/cond_unary_3.c: Likewise. * gcc.target/aarch64/sve/cond_unary_3_run.c: Likewise. * gcc.target/aarch64/sve/cond_unary_4.c: Likewise. * gcc.target/aarch64/sve/cond_unary_4_run.c: Likewise. Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 10:28:46.145666799 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 11:47:53.151171700 +0100 @@ -1454,6 +1454,45 @@ (define_insn "*2" "\t%0., %1/m, %2." ) +;; Predicated integer unary arithmetic, merging with the first input. +(define_insn "*cond__2" + [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w") + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl, Upl") + (SVE_INT_UNARY:SVE_I +(match_operand:SVE_I 2 "register_operand" "0, w")) + (match_dup 2)] + UNSPEC_SEL))] + "TARGET_SVE" + "@ + \t%0., %1/m, %0. + movprfx\t%0, %2\;\t%0., %1/m, %2." + [(set_attr "movprfx" "*,yes")] +) + +;; Predicated integer unary arithmetic, merging with an independent value. +;; +;; The earlyclobber isn't needed for the first alternative, but omitting +;; it would only help the case in which operands 2 and 3 are the same, +;; which is handled above rather than here. Marking all the alternatives +;; as earlyclobber helps to make the instruction more regular to the +;; register allocator. +(define_insn "*cond__any" + [(set (match_operand:SVE_I 0 "register_operand" "=&w, ?&w, ?&w") + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl, Upl, Upl") + (SVE_INT_UNARY:SVE_I +(match_operand:SVE_I 2 "register_operand" "w, w, w")) + (match_operand:SVE_I 3 "aarch64_simd_reg_or_zero" "0, Dz, w")] + UNSPEC_SEL))] + "TARGET_SVE && !rtx_equal_p (operands[2], operands[3])" + "@ + \t%0., %1/m, %2. + movprfx\t%0., %1/z, %2.\;\t%0., %1/m, %2. + movprfx\t%0, %3\;\t%0., %1/m, %2." + [(set_attr "movprfx" "*,yes,yes")] +) + ;; - ;; [INT] Logical inverse ;; - Index: gcc/testsuite/gcc.target/aarch64/sve/cond_unary_1.c === --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/cond_unary_1.c 2019-08-14 11:47:53.151171700 +0100 @@ -0,0 +1,44 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define abs(A) ((A) < 0 ? -(A) : (A)) +#define neg(A) (-(A)) + +#define DEF_LOOP(TYPE, OP) \ + void __attribute__ ((noipa)) \ + test_##TYPE##_##OP (TYPE *__restrict r, TYPE *__restrict a, \ + TYPE *__restrict pred, int n) \ + {\ +for (int i = 0; i < n; ++i)\ + r[i] = pred[i] ? OP (a[i]) : a[i]; \ + } + +#define TEST_TYPE(T, TYPE) \ + T (TYPE, abs) \ + T (TYPE, neg) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int8_t) \ + TEST_TYPE (T, int16_t) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, int64_t) + +TEST_ALL (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tabs\tz[0-9]+\.b, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tabs\tz[0-9]+\.h, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tabs\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tabs\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-times {\tneg\tz[0-9]+\.b, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tneg\tz[0-9]+\.h, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tneg\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tneg\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-not {\tmov\tz} } } */ +/* { dg-final { scan-assembler-not
[committed][AArch64] Add SVE conditional floating-point unary patterns
This patch adds patterns to match conditional unary operations on floating-point modes. At the moment we rely on combine to merge separate arithmetic and vcond_mask operations, and since the latter doesn't accept zero operands, we miss out on the opportunity to use the movprfx /z alternative. (This alternative is tested by the ACLE patches though.) Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274477. Richard 2019-08-14 Richard Sandiford Kugan Vivekanandarajah gcc/ * config/aarch64/aarch64-sve.md (*cond__2): New pattern. (*cond__any): Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/cond_unary_1.c: Add tests for floating-point types. * gcc.target/aarch64/sve/cond_unary_2.c: Likewise. * gcc.target/aarch64/sve/cond_unary_3.c: Likewise. * gcc.target/aarch64/sve/cond_unary_4.c: Likewise. Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 11:48:45.114792555 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 11:51:07.537753363 +0100 @@ -1624,6 +1624,62 @@ (define_insn "*2" "\t%0., %1/m, %2." ) +;; Predicated floating-point unary arithmetic, merging with the first input. +(define_insn_and_rewrite "*cond__2" + [(set (match_operand:SVE_F 0 "register_operand" "=w, ?&w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl, Upl") + (unspec:SVE_F +[(match_operand 3) + (match_operand:SI 4 "aarch64_sve_gp_strictness") + (match_operand:SVE_F 2 "register_operand" "0, w")] +SVE_COND_FP_UNARY) + (match_dup 2)] + UNSPEC_SEL))] + "TARGET_SVE && aarch64_sve_pred_dominates_p (&operands[3], operands[1])" + "@ + \t%0., %1/m, %0. + movprfx\t%0, %2\;\t%0., %1/m, %2." + "&& !rtx_equal_p (operands[1], operands[3])" + { +operands[3] = copy_rtx (operands[1]); + } + [(set_attr "movprfx" "*,yes")] +) + +;; Predicated floating-point unary arithmetic, merging with an independent +;; value. +;; +;; The earlyclobber isn't needed for the first alternative, but omitting +;; it would only help the case in which operands 2 and 3 are the same, +;; which is handled above rather than here. Marking all the alternatives +;; as earlyclobber helps to make the instruction more regular to the +;; register allocator. +(define_insn_and_rewrite "*cond__any" + [(set (match_operand:SVE_F 0 "register_operand" "=&w, ?&w, ?&w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl, Upl, Upl") + (unspec:SVE_F +[(match_operand 4) + (match_operand:SI 5 "aarch64_sve_gp_strictness") + (match_operand:SVE_F 2 "register_operand" "w, w, w")] +SVE_COND_FP_UNARY) + (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "0, Dz, w")] + UNSPEC_SEL))] + "TARGET_SVE + && !rtx_equal_p (operands[2], operands[3]) + && aarch64_sve_pred_dominates_p (&operands[4], operands[1])" + "@ + \t%0., %1/m, %2. + movprfx\t%0., %1/z, %2.\;\t%0., %1/m, %2. + movprfx\t%0, %3\;\t%0., %1/m, %2." + "&& !rtx_equal_p (operands[1], operands[4])" + { +operands[4] = copy_rtx (operands[1]); + } + [(set_attr "movprfx" "*,yes,yes")] +) + ;; - ;; [PRED] Inverse ;; - Index: gcc/testsuite/gcc.target/aarch64/sve/cond_unary_1.c === --- gcc/testsuite/gcc.target/aarch64/sve/cond_unary_1.c 2019-08-14 11:48:45.114792555 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/cond_unary_1.c 2019-08-14 11:51:07.537753363 +0100 @@ -15,15 +15,22 @@ #define DEF_LOOP(TYPE, OP) \ r[i] = pred[i] ? OP (a[i]) : a[i]; \ } -#define TEST_TYPE(T, TYPE) \ +#define TEST_INT_TYPE(T, TYPE) \ T (TYPE, abs) \ T (TYPE, neg) +#define TEST_FLOAT_TYPE(T, TYPE, SUFFIX) \ + T (TYPE, __builtin_fabs##SUFFIX) \ + T (TYPE, neg) + #define TEST_ALL(T) \ - TEST_TYPE (T, int8_t) \ - TEST_TYPE (T, int16_t) \ - TEST_TYPE (T, int32_t) \ - TEST_TYPE (T, int64_t) + TEST_INT_TYPE (T, int8_t) \ + TEST_INT_TYPE (T, int16_t) \ + TEST_INT_TYPE (T, int32_t) \ + TEST_INT_TYPE (T, int64_t) \ + TEST_FLOAT_TYPE (T, _Float16, f16) \ + TEST_FLOAT_TYPE (T, float, f) \ + TEST_FLOAT_TYPE (T, double, ) TEST_ALL (DEF_LOOP) @@ -37,6 +44,14 @@ TEST_ALL (DEF_LOOP) /* { dg-final { scan-assembler-times {\tneg\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ /* { dg-final { scan-assembler-times {\tneg\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tz[0-9]+\.h, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tfabs\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times
[committed][AArch64] Add SVE conditional conversion patterns
This patch adds patterns to match conditional conversions between integers and like-sized floats. The patterns are actually more general than that, but the other combinations can only be tested via the ACLE. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274478. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/aarch64-sve.md (*cond__nontrunc) (*cond__nonextend): New patterns. gcc/testsuite/ * gcc.target/aarch64/sve/cond_convert_1.c: New test. * gcc.target/aarch64/sve/cond_convert_1_run.c: Likewise. * gcc.target/aarch64/sve/cond_convert_2.c: Likewise. * gcc.target/aarch64/sve/cond_convert_2_run.c: Likewise. * gcc.target/aarch64/sve/cond_convert_3.c: Likewise. * gcc.target/aarch64/sve/cond_convert_3_run.c: Likewise. * gcc.target/aarch64/sve/cond_convert_4.c: Likewise. * gcc.target/aarch64/sve/cond_convert_4_run.c: Likewise. * gcc.target/aarch64/sve/cond_convert_5.c: Likewise. * gcc.target/aarch64/sve/cond_convert_5_run.c: Likewise. * gcc.target/aarch64/sve/cond_convert_6.c: Likewise. * gcc.target/aarch64/sve/cond_convert_6_run.c: Likewise. Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 11:53:04.636898923 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 11:55:33.251813494 +0100 @@ -4071,6 +4071,39 @@ (define_insn "*aarch64_sve__trunc "fcvtz\t%0., %1/m, %2." ) +;; Predicated float-to-integer conversion with merging, either to the same +;; width or wider. +;; +;; The first alternative doesn't need the earlyclobber, but the only case +;; it would help is the uninteresting one in which operands 2 and 3 are +;; the same register (despite having different modes). Making all the +;; alternatives earlyclobber makes things more consistent for the +;; register allocator. +(define_insn_and_rewrite "*cond__nontrunc" + [(set (match_operand:SVE_HSDI 0 "register_operand" "=&w, &w, ?&w") + (unspec:SVE_HSDI + [(match_operand: 1 "register_operand" "Upl, Upl, Upl") + (unspec:SVE_HSDI +[(match_operand 4) + (match_operand:SI 5 "aarch64_sve_gp_strictness") + (match_operand:SVE_F 2 "register_operand" "w, w, w")] +SVE_COND_FCVTI) + (match_operand:SVE_HSDI 3 "aarch64_simd_reg_or_zero" "0, Dz, w")] + UNSPEC_SEL))] + "TARGET_SVE + && >= + && aarch64_sve_pred_dominates_p (&operands[4], operands[1])" + "@ + fcvtz\t%0., %1/m, %2. + movprfx\t%0., %1/z, %2.\;fcvtz\t%0., %1/m, %2. + movprfx\t%0, %3\;fcvtz\t%0., %1/m, %2." + "&& !rtx_equal_p (operands[1], operands[4])" + { +operands[4] = copy_rtx (operands[1]); + } + [(set_attr "movprfx" "*,yes,yes")] +) + ;; - ;; [INT<-FP] Packs ;; - @@ -4155,6 +4188,39 @@ (define_insn "aarch64_sve__extend "cvtf\t%0., %1/m, %2." ) +;; Predicated integer-to-float conversion with merging, either to the same +;; width or narrower. +;; +;; The first alternative doesn't need the earlyclobber, but the only case +;; it would help is the uninteresting one in which operands 2 and 3 are +;; the same register (despite having different modes). Making all the +;; alternatives earlyclobber makes things more consistent for the +;; register allocator. +(define_insn_and_rewrite "*cond__nonextend" + [(set (match_operand:SVE_F 0 "register_operand" "=&w, &w, ?&w") + (unspec:SVE_F + [(match_operand: 1 "register_operand" "Upl, Upl, Upl") + (unspec:SVE_F +[(match_operand 4) + (match_operand:SI 5 "aarch64_sve_gp_strictness") + (match_operand:SVE_HSDI 2 "register_operand" "w, w, w")] +SVE_COND_ICVTF) + (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "0, Dz, w")] + UNSPEC_SEL))] + "TARGET_SVE + && >= + && aarch64_sve_pred_dominates_p (&operands[4], operands[1])" + "@ + cvtf\t%0., %1/m, %2. + movprfx\t%0., %1/z, %2.\;cvtf\t%0., %1/m, %2. + movprfx\t%0, %3\;cvtf\t%0., %1/m, %2." + "&& !rtx_equal_p (operands[1], operands[4])" + { +operands[4] = copy_rtx (operands[1]); + } + [(set_attr "movprfx" "*,yes,yes")] +) + ;; - ;; [FP<-INT] Packs ;; - Index: gcc/testsuite/gcc.target/aarch64/sve/cond_convert_1.c === --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/cond_convert_1.c 2019-08-14 11:55:33.251813494 +0100 @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-
[committed][AArch64] Use SVE UXT[BHW] as a form of predicated AND
UXTB, UXTH and UXTW are equivalent to predicated ANDs with the constants 0xff, 0x and 0x respectively. This patch uses them in the patterns for IFN_COND_AND. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274479. Richard 2019-08-14 Richard Sandiford gcc/ * config/aarch64/aarch64.c (aarch64_print_operand): Allow %e to take the equivalent mask, as well as a bit count. * config/aarch64/predicates.md (aarch64_sve_uxtb_immediate) (aarch64_sve_uxth_immediate, aarch64_sve_uxt_immediate) (aarch64_sve_pred_and_operand): New predicates. * config/aarch64/iterators.md (sve_pred_int_rhs2_operand): New code attribute. * config/aarch64/aarch64-sve.md (cond_): Use it. (*cond_uxt_2, *cond_uxt_any): New patterns. gcc/testsuite/ * gcc.target/aarch64/sve/cond_uxt_1.c: New test. * gcc.target/aarch64/sve/cond_uxt_1_run.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_2.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_2_run.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_3.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_3_run.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_4.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_4_run.c: Likewise. Index: gcc/config/aarch64/aarch64.c === --- gcc/config/aarch64/aarch64.c2019-08-14 10:18:10.642319210 +0100 +++ gcc/config/aarch64/aarch64.c2019-08-14 12:00:03.209840337 +0100 @@ -8328,7 +8328,8 @@ sizetochar (int size) 'D': Take the duplicated element in a vector constant and print it as an unsigned integer, in decimal. 'e': Print the sign/zero-extend size as a character 8->b, - 16->h, 32->w. + 16->h, 32->w. Can also be used for masks: + 0xff->b, 0x->h, 0x->w. 'I': If the operand is a duplicated vector constant, replace it with the duplicated scalar. If the operand is then a floating-point constant, replace @@ -8399,27 +8400,22 @@ aarch64_print_operand (FILE *f, rtx x, i case 'e': { - int n; - - if (!CONST_INT_P (x) - || (n = exact_log2 (INTVAL (x) & ~7)) <= 0) + x = unwrap_const_vec_duplicate (x); + if (!CONST_INT_P (x)) { output_operand_lossage ("invalid operand for '%%%c'", code); return; } - switch (n) + HOST_WIDE_INT val = INTVAL (x); + if ((val & ~7) == 8 || val == 0xff) + fputc ('b', f); + else if ((val & ~7) == 16 || val == 0x) + fputc ('h', f); + else if ((val & ~7) == 32 || val == 0x) + fputc ('w', f); + else { - case 3: - fputc ('b', f); - break; - case 4: - fputc ('h', f); - break; - case 5: - fputc ('w', f); - break; - default: output_operand_lossage ("invalid operand for '%%%c'", code); return; } Index: gcc/config/aarch64/predicates.md === --- gcc/config/aarch64/predicates.md2019-08-14 10:18:10.642319210 +0100 +++ gcc/config/aarch64/predicates.md2019-08-14 12:00:03.209840337 +0100 @@ -606,11 +606,26 @@ (define_predicate "aarch64_sve_inc_dec_i (and (match_code "const,const_vector") (match_test "aarch64_sve_inc_dec_immediate_p (op)"))) +(define_predicate "aarch64_sve_uxtb_immediate" + (and (match_code "const_vector") + (match_test "GET_MODE_UNIT_BITSIZE (GET_MODE (op)) > 8") + (match_test "aarch64_const_vec_all_same_int_p (op, 0xff)"))) + +(define_predicate "aarch64_sve_uxth_immediate" + (and (match_code "const_vector") + (match_test "GET_MODE_UNIT_BITSIZE (GET_MODE (op)) > 16") + (match_test "aarch64_const_vec_all_same_int_p (op, 0x)"))) + (define_predicate "aarch64_sve_uxtw_immediate" (and (match_code "const_vector") (match_test "GET_MODE_UNIT_BITSIZE (GET_MODE (op)) > 32") (match_test "aarch64_const_vec_all_same_int_p (op, 0x)"))) +(define_predicate "aarch64_sve_uxt_immediate" + (ior (match_operand 0 "aarch64_sve_uxtb_immediate") + (match_operand 0 "aarch64_sve_uxth_immediate") + (match_operand 0 "aarch64_sve_uxtw_immediate"))) + (define_predicate "aarch64_sve_logical_immediate" (and (match_code "const,const_vector") (match_test "aarch64_sve_bitmask_immediate_p (op)"))) @@ -670,6 +685,10 @@ (define_predicate "aarch64_sve_add_opera (match_operand 0 "aarch64_sve_sub_arith_immediate") (match_operand 0 "aarch64_sve_inc_dec_immediate"))) +(define_predicate "aarch64_sve_pred_and_operand" + (ior (match_operand 0 "register_opera
Re: [PATCH 1/2] PR c++/91436 fix C++ dialect for std::make_unique fix-it hint
On 13/08/19 16:07 -0400, Jason Merrill wrote: On 8/13/19 9:32 AM, Jonathan Wakely wrote: * g++.dg/lookup/missing-std-include-6.C: Don't check make_unique in test that runs for C++11. I'm not comfortable removing this test coverage entirely. Doesn't it give a useful diagnostic in C++11 mode as well? It does: mu.cc:3:15: error: 'make_unique' is not a member of 'std' 3 | auto p = std::make_unique(); | ^~~ mu.cc:3:15: note: 'std::make_unique' is only available from C++14 onwards mu.cc:3:27: error: expected primary-expression before 'int' 3 | auto p = std::make_unique(); | ^~~ So we can add it to g++.dg/lookup/missing-std-include-8.C instead, which runs for c++98_only and checks for the "is only available for" cases. Here's a patch doing that. Tested x86_64-linux. OK for trunk? OK for gcc-9-branch and gcc-8-branch too, since PR c++/91436 affects those branches? commit 5ad7b3202e4818f2d6d84e22e7e489b39a65c851 Author: Jonathan Wakely Date: Tue Aug 13 13:25:39 2019 +0100 PR c++/91436 fix C++ dialect for std::make_unique fix-it hint The std::make_unique function wasn't added until C++14, and neither was the std::complex_literals namespace. gcc/cp: PR c++/91436 * name-lookup.c (get_std_name_hint): Fix min_dialect field for complex_literals and make_unique entries. gcc/testsuite: PR c++/91436 * g++.dg/lookup/missing-std-include-5.C: Limit test to C++14 and up. * g++.dg/lookup/missing-std-include-6.C: Don't check make_unique in test that runs for C++11. * g++.dg/lookup/missing-std-include-8.C: Check make_unique here. diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c index d5e491e9072..16c74287bb1 100644 --- a/gcc/cp/name-lookup.c +++ b/gcc/cp/name-lookup.c @@ -5559,7 +5559,7 @@ get_std_name_hint (const char *name) {"bitset", "", cxx11}, /* . */ {"complex", "", cxx98}, -{"complex_literals", "", cxx98}, +{"complex_literals", "", cxx14}, /* . */ {"condition_variable", "", cxx11}, {"condition_variable_any", "", cxx11}, @@ -5632,7 +5632,7 @@ get_std_name_hint (const char *name) {"allocator", "", cxx98}, {"allocator_traits", "", cxx11}, {"make_shared", "", cxx11}, -{"make_unique", "", cxx11}, +{"make_unique", "", cxx14}, {"shared_ptr", "", cxx11}, {"unique_ptr", "", cxx11}, {"weak_ptr", "", cxx11}, diff --git a/gcc/testsuite/g++.dg/lookup/missing-std-include-5.C b/gcc/testsuite/g++.dg/lookup/missing-std-include-5.C index fe880a6263b..3ec9abd9316 100644 --- a/gcc/testsuite/g++.dg/lookup/missing-std-include-5.C +++ b/gcc/testsuite/g++.dg/lookup/missing-std-include-5.C @@ -1,2 +1,3 @@ +// { dg-do compile { target c++14 } } using namespace std::complex_literals; // { dg-error "" } // { dg-message "#include " "" { target *-*-* } .-1 } diff --git a/gcc/testsuite/g++.dg/lookup/missing-std-include-6.C b/gcc/testsuite/g++.dg/lookup/missing-std-include-6.C index d9eeb4284e8..a8f27473e6d 100644 --- a/gcc/testsuite/g++.dg/lookup/missing-std-include-6.C +++ b/gcc/testsuite/g++.dg/lookup/missing-std-include-6.C @@ -11,15 +11,6 @@ void test_make_shared () // { dg-error "expected primary-expression before '\\)' token" "" { target *-*-* } .-3 } } -template -void test_make_unique () -{ - auto p = std::make_unique(); // { dg-error "'make_unique' is not a member of 'std'" } - // { dg-message "'#include '" "" { target *-*-* } .-1 } - // { dg-error "expected primary-expression before '>' token" "" { target *-*-* } .-2 } - // { dg-error "expected primary-expression before '\\)' token" "" { target *-*-* } .-3 } -} - std::shared_ptr test_shared_ptr; // { dg-error "'shared_ptr' in namespace 'std' does not name a template type" } // { dg-message "'#include '" "" { target *-*-* } .-1 } diff --git a/gcc/testsuite/g++.dg/lookup/missing-std-include-8.C b/gcc/testsuite/g++.dg/lookup/missing-std-include-8.C index 68b208299f2..73532c82968 100644 --- a/gcc/testsuite/g++.dg/lookup/missing-std-include-8.C +++ b/gcc/testsuite/g++.dg/lookup/missing-std-include-8.C @@ -13,6 +13,15 @@ void test_make_shared () // { dg-error "expected primary-expression before '\\)' token" "" { target *-*-* } .-3 } } +template +void test_make_unique () +{ + std::make_unique(); // { dg-error "'make_unique' is not a member of 'std'" } + // { dg-message "'std::make_unique' is only available from C\\+\\+14 onwards" "" { target *-*-* } .-1 } + // { dg-error "expected primary-expression before '>' token" "" { target *-*-* } .-2 } + // { dg-error "expected primary-expression before '\\)' token" "" { target *-*-* } .-3 } +} + void test_array () { std::array a; // { dg-error "'array' is not a member of 'std'" }
[committed][AArch64] Use SVE BIC for conditional arithmetic
This patch uses BIC to pattern-match conditional AND with an inverted third input. It also adds extra tests for AND, ORR and EOR. Tested on aarch64-linux-gnu (with and without SVE) and aarch64_be-elf. Applied as r274480. Richard 2019-08-14 Richard Sandiford Kugan Vivekanandarajah gcc/ * config/aarch64/aarch64-sve.md (*cond_bic_2) (*cond_bic_any): New patterns. gcc/testsuite/ * gcc.target/aarch64/sve/cond_logical_1.c: New test. * gcc.target/aarch64/sve/cond_logical_1_run.c: Likewise. * gcc.target/aarch64/sve/cond_logical_2.c: Likewise. * gcc.target/aarch64/sve/cond_logical_2_run.c: Likewise. * gcc.target/aarch64/sve/cond_logical_3.c: Likewise. * gcc.target/aarch64/sve/cond_logical_3_run.c: Likewise. * gcc.target/aarch64/sve/cond_logical_4.c: Likewise. * gcc.target/aarch64/sve/cond_logical_4_run.c: Likewise. * gcc.target/aarch64/sve/cond_logical_5.c: Likewise. * gcc.target/aarch64/sve/cond_logical_5_run.c: Likewise. Index: gcc/config/aarch64/aarch64-sve.md === --- gcc/config/aarch64/aarch64-sve.md 2019-08-14 12:00:23.761690128 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2019-08-14 12:02:29.540770835 +0100 @@ -2274,6 +2274,50 @@ (define_insn_and_rewrite "*bic3" } ) +;; Predicated integer BIC, merging with the first input. +(define_insn "*cond_bic_2" + [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w") + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl, Upl") + (and:SVE_I +(not:SVE_I (match_operand:SVE_I 3 "register_operand" "w, w")) +(match_operand:SVE_I 2 "register_operand" "0, w")) + (match_dup 2)] + UNSPEC_SEL))] + "TARGET_SVE" + "@ + bic\t%0., %1/m, %0., %3. + movprfx\t%0, %2\;bic\t%0., %1/m, %0., %3." + [(set_attr "movprfx" "*,yes")] +) + +;; Predicated integer BIC, merging with an independent value. +(define_insn_and_rewrite "*cond_bic_any" + [(set (match_operand:SVE_I 0 "register_operand" "=&w, &w, &w, ?&w") + (unspec:SVE_I + [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl") + (and:SVE_I +(not:SVE_I (match_operand:SVE_I 3 "register_operand" "w, w, w, w")) +(match_operand:SVE_I 2 "register_operand" "0, w, w, w")) + (match_operand:SVE_I 4 "aarch64_simd_reg_or_zero" "Dz, Dz, 0, w")] + UNSPEC_SEL))] + "TARGET_SVE && !rtx_equal_p (operands[2], operands[4])" + "@ + movprfx\t%0., %1/z, %0.\;bic\t%0., %1/m, %0., %3. + movprfx\t%0., %1/z, %2.\;bic\t%0., %1/m, %0., %3. + movprfx\t%0., %1/m, %2.\;bic\t%0., %1/m, %0., %3. + #" + "&& reload_completed + && register_operand (operands[4], mode) + && !rtx_equal_p (operands[0], operands[4])" + { +emit_insn (gen_vcond_mask_ (operands[0], operands[2], +operands[4], operands[1])); +operands[4] = operands[2] = operands[0]; + } + [(set_attr "movprfx" "yes")] +) + ;; - ;; [INT] Shifts ;; - Index: gcc/testsuite/gcc.target/aarch64/sve/cond_logical_1.c === --- /dev/null 2019-07-30 08:53:31.317691683 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/cond_logical_1.c 2019-08-14 12:02:29.540770835 +0100 @@ -0,0 +1,62 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define bit_and(A, B) ((A) & (B)) +#define bit_or(A, B) ((A) | (B)) +#define bit_xor(A, B) ((A) ^ (B)) +#define bit_bic(A, B) ((A) & ~(B)) + +#define DEF_LOOP(TYPE, OP) \ + void __attribute__ ((noinline, noclone)) \ + test_##TYPE##_##OP (TYPE *__restrict r, \ + TYPE *__restrict a, \ + TYPE *__restrict b, \ + TYPE *__restrict c, int n)\ + {\ +for (int i = 0; i < n; ++i)\ + r[i] = a[i] < 20 ? OP (b[i], c[i]) : b[i]; \ + } + +#define TEST_TYPE(T, TYPE) \ + T (TYPE, bit_and) \ + T (TYPE, bit_or) \ + T (TYPE, bit_xor) \ + T (TYPE, bit_bic) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int8_t) \ + TEST_TYPE (T, uint8_t) \ + TEST_TYPE (T, int16_t) \ + TEST_TYPE (T, uint16_t) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, uint32_t) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) + +TEST_ALL (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.b, p[0-7]/m,} 2 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, p[0-7]/m,} 2 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, p[0-7]/m,} 2 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d
Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
On Fri, 2 Aug 2019, Bernd Edlinger wrote: > On 8/2/19 3:11 PM, Richard Biener wrote: > > On Tue, 30 Jul 2019, Bernd Edlinger wrote: > > > >> > >> I have no test coverage for the movmisalign optab though, so I > >> rely on your code review for that part. > > > > It looks OK. I tried to make it trigger on the following on > > i?86 with -msse2: > > > > typedef int v4si __attribute__((vector_size (16))); > > > > struct S { v4si v; } __attribute__((packed)); > > > > v4si foo (struct S s) > > { > > return s.v; > > } > > > > Hmm, the entry_parm need to be a MEM_P and an unaligned one. > So the test case could be made to trigger it this way: > > typedef int v4si __attribute__((vector_size (16))); > > struct S { v4si v; } __attribute__((packed)); > > int t; > v4si foo (struct S a, struct S b, struct S c, struct S d, > struct S e, struct S f, struct S g, struct S h, > int i, int j, int k, int l, int m, int n, > int o, struct S s) > { > t = o; > return s.v; > } > > However the code path is still not reached, since targetm.slow_ualigned_access > is always FALSE, which is probably a flaw in my patch. > > So I think, > > + else if (MEM_P (data->entry_parm) > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > + > MEM_ALIGN (data->entry_parm) > + && targetm.slow_unaligned_access (promoted_nominal_mode, > +MEM_ALIGN (data->entry_parm))) > > should probably better be > > + else if (MEM_P (data->entry_parm) > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > + > MEM_ALIGN (data->entry_parm) > +&& (((icode = optab_handler (movmisalign_optab, > promoted_nominal_mode)) > + != CODE_FOR_nothing) > +|| targetm.slow_unaligned_access (promoted_nominal_mode, > + MEM_ALIGN (data->entry_parm > > Right? Ah, yes. So it's really the presence of a movmisalign optab makes it a must for unaligned moves and if it is not present then targetm.slow_unaligned_access tells whether we need to use the bitfield extraction/insertion code. > Then the modified test case would use the movmisalign optab. > However nothing changes in the end, since the i386 back-end is used to work > around the middle end not using movmisalign optab when it should do so. Yeah, in the past it would have failed though. I wonder if movmisalign is still needed for x86... > I wonder if I should try to add a gcc_checking_assert to the mov expand > patterns that the memory is properly aligned ? I suppose gen* could add asserts that there is no movmisalign_optab that would match when expanding a mov. Eventually it's enough to guard the mov_optab use in emit_move_insn_1 that way? Or even try movmisalign there... > > > but nowadays x86 seems to be happy with regular moves operating on > > unaligned memory, using unaligned moves where necessary. > > > > (insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ]) > > (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 1229 > > {movv4si_internal} > > (nil)) > > > > and with GCC 4.8 we ended up with the following expansion which is > > also correct. > > > > (insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0) > > (unspec:V16QI [ > > (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32]) > > ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu} > > (nil)) > > > > So it seems it has been too long and I don't remember what is > > special with arm that it doesn't work... it possibly simply > > trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which > > I think is OK-ish? > > > > Yes, that is what Richard said as well. > > > Similarly the very same issue should exist on x86_64 which is > > !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate > > alignment on the caller side. So the STRICT_ALIGNMENT check is > > a wrong one. > > > > I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets > just use MEM_ALIGN to select the right instructions. MEM_ALIGN > is always 32-bit align on the DImode memory. The x86_64 vector > instructions > would look at MEM_ALIGN and do the right thing, yes? > >>> > >>> No, they need to use the movmisalign optab and end up with UNSPECs > >>> for example. > >> Ah, thanks, now I see. > >> > It seems to be the definition of STRICT_ALIGNMENT targets that all RTL > instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target > does not even have to look at MEM_ALIGN except in the mov_misalign_optab, > right? > >>> > >>> Yes, I think we never losened that. Note that RTL expansion has to > >>> fix this up for them. Note that strictly speaking SLOW_UNALIGNED_ACCESS > >>> specifies that x86 is strict-align wrt vector modes. > >>> > >> > >> Yes I agree, the code would be incorrect for x86 as well when the > >> movmisalign_op
Re: [PATCH] PR libstdc++/90361 add missing macro definition
On 12/08/19 17:41 +0100, Jonathan Wakely wrote: The src/c++17/string-inst.cc file needs to override the default string ABI so that it still contains the expected symbols even when the library is configured with --with-default-libstdcxx-abi=gcc4-compatible. PR libstdc++/90361 * src/c++17/string-inst.cc: Use _GLIBCXX_USE_CXX11_ABI=1 by default. Tested x86_64-linux, committed to trunk. This documents the bug in the gcc-9 release notes. Committed to CVS. Index: htdocs/gcc-9/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v retrieving revision 1.74 diff -u -r1.74 changes.html --- htdocs/gcc-9/changes.html 12 Aug 2019 07:31:04 - 1.74 +++ htdocs/gcc-9/changes.html 14 Aug 2019 11:17:34 - @@ -70,8 +70,18 @@ definition of std::rotate is not used. - The automatic template instantiation at link time (https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/C_002b_002b-Dialect-Options.html#index-frepo";>-frepo) has been deprecated and -will be removed in a future release. +The automatic template instantiation at link time +(https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/C_002b_002b-Dialect-Options.html#index-frepo";>-frepo) +has been deprecated and will be removed in a future release. + + +The --with-default-libstdcxx-abi=gcc4-compatible configure +option is broken in the 9.1 and 9.2 releases, producing a shared library +with missing symbols +(see https://gcc.gnu.org/PR90361";>Bug 90361). +As a workaround, configure without that option and build GCC as normal, +then edit the installedheaders +to define the _GLIBCXX_USE_CXX11_ABI macro to 0.
[PATCH 0/2] Fix dangling pointer in next_nested.
Hi. First patch is about addition of a nested/origin/next_nested verification. The verification can find the issue in Ada run-time library on x86_64 without bootstrap. The second patch is fix where we need to clean up the field. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin Martin Liska (2): Add ::verify for cgraph_node::origin/nested/next_nested. Clean next_nested properly. gcc/cgraph.c | 35 +++ 1 file changed, 31 insertions(+), 4 deletions(-) -- 2.22.0
[PATCH 2/2] Clean next_nested properly.
gcc/ChangeLog: 2019-08-14 Martin Liska PR ipa/91438 * cgraph.c (cgraph_node::remove): When setting n->origin = NULL for all nested functions, reset also next_nested. --- gcc/cgraph.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/gcc/cgraph.c b/gcc/cgraph.c index eb38b905879..ea8ab38d806 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1767,8 +1767,6 @@ cgraph_node::release_body (bool keep_arguments) void cgraph_node::remove (void) { - cgraph_node *n; - if (symtab->ipa_clones_dump_file && symtab->cloned_nodes.contains (this)) fprintf (symtab->ipa_clones_dump_file, "Callgraph removal;%s;%d;%s;%d;%d\n", asm_name (), order, @@ -1785,8 +1783,13 @@ cgraph_node::remove (void) */ force_output = false; forced_by_abi = false; - for (n = nested; n; n = n->next_nested) + cgraph_node *next = nested; + for (cgraph_node *n = nested; n; n = next) + { +next = n->next_nested; n->origin = NULL; +n->next_nested = NULL; + } nested = NULL; if (origin) { @@ -1840,7 +1843,7 @@ cgraph_node::remove (void) */ if (symtab->state != LTO_STREAMING) { - n = cgraph_node::get (decl); + cgraph_node *n = cgraph_node::get (decl); if (!n || (!n->clones && !n->clone_of && !n->global.inlined_to && ((symtab->global_info_ready || in_lto_p)
[PATCH 1/2] Add ::verify for cgraph_node::origin/nested/next_nested.
gcc/ChangeLog: 2019-08-14 Martin Liska * cgraph.c (cgraph_node::verify_node): Verify origin, nested and next_nested. --- gcc/cgraph.c | 24 1 file changed, 24 insertions(+) diff --git a/gcc/cgraph.c b/gcc/cgraph.c index ed46d81a513..eb38b905879 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -3464,6 +3464,30 @@ cgraph_node::verify_node (void) e->aux = 0; } } + + if (nested != NULL) +{ + for (cgraph_node *n = nested; n != NULL; n = n->next_nested) + { + if (n->origin == NULL) + { + error ("missing origin for a node in a nested list"); + error_found = true; + } + else if (n->origin != this) + { + error ("origin points to a different parent"); + error_found = true; + break; + } + } +} + if (next_nested != NULL && origin == NULL) +{ + error ("missing origin for a node in a nested list"); + error_found = true; +} + if (error_found) { dump (stderr);
Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
On Thu, 8 Aug 2019, Bernd Edlinger wrote: > On 8/2/19 9:01 PM, Bernd Edlinger wrote: > > On 8/2/19 3:11 PM, Richard Biener wrote: > >> On Tue, 30 Jul 2019, Bernd Edlinger wrote: > >> > >>> > >>> I have no test coverage for the movmisalign optab though, so I > >>> rely on your code review for that part. > >> > >> It looks OK. I tried to make it trigger on the following on > >> i?86 with -msse2: > >> > >> typedef int v4si __attribute__((vector_size (16))); > >> > >> struct S { v4si v; } __attribute__((packed)); > >> > >> v4si foo (struct S s) > >> { > >> return s.v; > >> } > >> > > > > Hmm, the entry_parm need to be a MEM_P and an unaligned one. > > So the test case could be made to trigger it this way: > > > > typedef int v4si __attribute__((vector_size (16))); > > > > struct S { v4si v; } __attribute__((packed)); > > > > int t; > > v4si foo (struct S a, struct S b, struct S c, struct S d, > > struct S e, struct S f, struct S g, struct S h, > > int i, int j, int k, int l, int m, int n, > > int o, struct S s) > > { > > t = o; > > return s.v; > > } > > > > Ah, I realized that there are already a couple of very similar > test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c, > gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c, > which also manage to execute the movmisalign code with the latest patch > version. So I thought that it is not necessary to add another one. > > > However the code path is still not reached, since > > targetm.slow_ualigned_access > > is always FALSE, which is probably a flaw in my patch. > > > > So I think, > > > > + else if (MEM_P (data->entry_parm) > > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > > + > MEM_ALIGN (data->entry_parm) > > + && targetm.slow_unaligned_access (promoted_nominal_mode, > > +MEM_ALIGN (data->entry_parm))) > > > > should probably better be > > > > + else if (MEM_P (data->entry_parm) > > + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > > + > MEM_ALIGN (data->entry_parm) > > +&& (((icode = optab_handler (movmisalign_optab, > > promoted_nominal_mode)) > > + != CODE_FOR_nothing) > > +|| targetm.slow_unaligned_access (promoted_nominal_mode, > > + MEM_ALIGN > > (data->entry_parm > > > > Right? > > > > Then the modified test case would use the movmisalign optab. > > However nothing changes in the end, since the i386 back-end is used to work > > around the middle end not using movmisalign optab when it should do so. > > > > I prefer the second form of the check, as it offers more test coverage, > and is probably more correct than the former. > > Note there are more variations of this misalign check in expr.c, > some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR: > > && mode != BLKmode > && align < GET_MODE_ALIGNMENT (mode)) > { > if ((icode = optab_handler (movmisalign_optab, mode)) > != CODE_FOR_nothing) > [...] > else if (targetm.slow_unaligned_access (mode, align)) > temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode), > 0, TYPE_UNSIGNED (TREE_TYPE (exp)), > (modifier == EXPAND_STACK_PARM > ? NULL_RTX : target), > mode, mode, false, alt_rtl); > > I wonder if they are correct this way, why shouldn't we use the movmisalign > optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ? Doesn't the code do exactly this? Prefer movmisalign over extrct_bit_field? > > > I wonder if I should try to add a gcc_checking_assert to the mov > > expand > > patterns that the memory is properly aligned ? > > > > Wow, that was a really exciting bug-hunt with those assertions around... :) > >> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all > >> > >>did_conversion = true; > >> } > >> + else if (MEM_P (data->entry_parm) > >> + && GET_MODE_ALIGNMENT (promoted_nominal_mode) > >> + > MEM_ALIGN (data->entry_parm) > >> > >> we arrive here by-passing > >> > >> else if (need_conversion) > >> { > >> /* We did not have an insn to convert directly, or the sequence > >> generated appeared unsafe. We must first copy the parm to a > >> pseudo reg, and save the conversion until after all > >> parameters have been moved. */ > >> > >> int save_tree_used; > >> rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm)); > >> > >> emit_move_insn (tempreg, validated_mem); > >> > >> but this move instruction is invalid in the same way as the case > >> you fix, no? So wouldn't it be better to do > >> > > > > We could do that, but I s
Re: [PATCH] Add generic support for "noinit" attribute
Sorry for the slow response, I'd missed that there was an updated patch... Christophe Lyon writes: > 2019-07-04 Christophe Lyon > > * lib/target-supports.exp (check_effective_target_noinit): New > proc. > * gcc.c-torture/execute/noinit-attribute.c: New test. Second line should be indented by tabs rather than spaces. > @@ -2224,6 +2234,54 @@ handle_weak_attribute (tree *node, tree name, >return NULL_TREE; > } > > +/* Handle a "noinit" attribute; arguments as in struct > + attribute_spec.handler. Check whether the attribute is allowed > + here and add the attribute to the variable decl tree or otherwise > + issue a diagnostic. This function checks NODE is of the expected > + type and issues diagnostics otherwise using NAME. If it is not of > + the expected type *NO_ADD_ATTRS will be set to true. */ > + > +static tree > +handle_noinit_attribute (tree * node, > + tree name, > + tree args, > + intflags ATTRIBUTE_UNUSED, > + bool *no_add_attrs) > +{ > + const char *message = NULL; > + > + gcc_assert (DECL_P (*node)); > + gcc_assert (args == NULL); > + > + if (TREE_CODE (*node) != VAR_DECL) > +message = G_("%qE attribute only applies to variables"); > + > + /* Check that it's possible for the variable to have a section. */ > + else if ((TREE_STATIC (*node) || DECL_EXTERNAL (*node) || in_lto_p) > +&& DECL_SECTION_NAME (*node)) > +message = G_("%qE attribute cannot be applied to variables " > + "with specific sections"); > + > + if (!targetm.have_switchable_bss_sections) > +message = G_("%qE attribute is specific to ELF targets"); Maybe make this an else if too? Or make the VAR_DECL an else if if you think the ELF one should win. Either way, it seems odd to have the mixture between else if and not. > + if (message) > +{ > + warning (OPT_Wattributes, message, name); > + *no_add_attrs = true; > +} > + else > + /* If this var is thought to be common, then change this. Common > + variables are assigned to sections before the backend has a > + chance to process them. Do this only if the attribute is > + valid. */ Comment should be indented two spaces more. > +if (DECL_COMMON (*node)) > + DECL_COMMON (*node) = 0; > + > + return NULL_TREE; > +} > + > + > /* Handle a "noplt" attribute; arguments as in > struct attribute_spec.handler. */ > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index f2619e1..f1af1dc 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -7129,6 +7129,14 @@ The @code{visibility} attribute is described in > The @code{weak} attribute is described in > @ref{Common Function Attributes}. > > +@item noinit > +@cindex @code{noinit} variable attribute > +Any data with the @code{noinit} attribute will not be initialized by > +the C runtime startup code, or the program loader. Not initializing > +data in this way can reduce program startup times. Specific to ELF > +targets, this attribute relies on the linker to place such data in the > +right location. Maybe: This attribute is specific to ELF targets and relies on the linker to place such data in the right location. > diff --git a/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > b/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > new file mode 100644 > index 000..ffcf8c6 > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > @@ -0,0 +1,59 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target noinit */ > +/* { dg-options "-O2" } */ > + > +/* This test checks that noinit data is handled correctly. */ > + > +extern void _start (void) __attribute__ ((noreturn)); > +extern void abort (void) __attribute__ ((noreturn)); > +extern void exit (int) __attribute__ ((noreturn)); > + > +int var_common; > +int var_zero = 0; > +int var_one = 1; > +int __attribute__((noinit)) var_noinit; > +int var_init = 2; > + > +int __attribute__((noinit)) func(); /* { dg-warning "attribute only applies > to variables" } */ > +int __attribute__((section ("mysection"), noinit)) var_section1; /* { > dg-warning "because it conflicts with attribute" } */ > +int __attribute__((noinit, section ("mysection"))) var_section2; /* { > dg-warning "because it conflicts with attribute" } */ > + > + > +int > +main (void) > +{ > + /* Make sure that the C startup code has correctly initialized the > ordinary variables. */ > + if (var_common != 0) > +abort (); > + > + /* Initialized variables are not re-initialized during startup, so > + check their original values only during the first run of this > + test. */ > + if (var_init == 2) > +if (var_zero != 0 || var_one != 1) > + abort (); > + > + switch (var_init) > +{ > +case 2: > + /* First time through - change all the values. */ > + var_common = var_zero = var_one = var_noinit = var_init = 3
Add IFN_COND functions for shifting
This patch adds support for IFN_COND shifts left and shifts right. This is mostly mechanical, but since we try to handle conditional operations in the same way as unconditional operations in match.pd, we need to support IFN_COND shifts by scalars as well as vectors. E.g.: IFN_COND_SHL (cond, a, { 1, 1, ... }, fallback) and: IFN_COND_SHL (cond, a, 1, fallback) are the same operation, with: (for shiftrotate (lrotate rrotate lshift rshift) ... /* Prefer vector1 << scalar to vector1 << vector2 if vector2 is uniform. */ (for vec (VECTOR_CST CONSTRUCTOR) (simplify (shiftrotate @0 vec@1) (with { tree tem = uniform_vector_p (@1); } (if (tem) (shiftrotate @0 { tem; })) preferring the latter. The patch copes with this by extending create_convert_operand_from to handle scalar-to-vector conversions. Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf and x86_64-linux-gnu. OK for the generic bits? Richard 2019-08-14 Richard Sandiford Prathamesh Kulkarni gcc/ * internal-fn.def (IFN_COND_SHL, IFN_COND_SHR): New internal functions. * internal-fn.c (FOR_EACH_CODE_MAPPING): Handle shifts. * match.pd (UNCOND_BINARY, COND_BINARY): Likewise. * optabs.def (cond_ashl_optab, cond_ashr_optab, cond_lshr_optab): New optabs. * optabs.h (create_convert_operand_from): Expand comment. * optabs.c (maybe_legitimize_operand): Allow implicit broadcasts when mapping scalar rtxes to vector operands. * config/aarch64/iterators.md (SVE_INT_BINARY): Add ashift, ashiftrt and lshiftrt. (sve_int_op, sve_int_op_rev, sve_pred_int_rhs2_operand): Handle them. * config/aarch64/aarch64-sve.md (*cond__2_const) (*cond__any_const): New patterns. gcc/testsuite/ * gcc.target/aarch64/sve/cond_shift_1.c: New test. * gcc.target/aarch64/sve/cond_shift_1_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_2.c: Likewise. * gcc.target/aarch64/sve/cond_shift_2_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_3.c: Likewise. * gcc.target/aarch64/sve/cond_shift_3_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_4.c: Likewise. * gcc.target/aarch64/sve/cond_shift_4_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_5.c: Likewise. * gcc.target/aarch64/sve/cond_shift_5_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_6.c: Likewise. * gcc.target/aarch64/sve/cond_shift_6_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_7.c: Likewise. * gcc.target/aarch64/sve/cond_shift_7_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_8.c: Likewise. * gcc.target/aarch64/sve/cond_shift_8_run.c: Likewise. * gcc.target/aarch64/sve/cond_shift_9.c: Likewise. * gcc.target/aarch64/sve/cond_shift_9_run.c: Likewise. Index: gcc/internal-fn.def === --- gcc/internal-fn.def 2019-06-18 09:35:54.921869466 +0100 +++ gcc/internal-fn.def 2019-08-14 13:22:08.625843346 +0100 @@ -167,6 +167,10 @@ DEF_INTERNAL_OPTAB_FN (COND_IOR, ECF_CON cond_ior, cond_binary) DEF_INTERNAL_OPTAB_FN (COND_XOR, ECF_CONST | ECF_NOTHROW, cond_xor, cond_binary) +DEF_INTERNAL_OPTAB_FN (COND_SHL, ECF_CONST | ECF_NOTHROW, + cond_ashl, cond_binary) +DEF_INTERNAL_SIGNED_OPTAB_FN (COND_SHR, ECF_CONST | ECF_NOTHROW, first, + cond_ashr, cond_lshr, cond_binary) DEF_INTERNAL_OPTAB_FN (COND_FMA, ECF_CONST, cond_fma, cond_ternary) DEF_INTERNAL_OPTAB_FN (COND_FMS, ECF_CONST, cond_fms, cond_ternary) Index: gcc/internal-fn.c === --- gcc/internal-fn.c 2019-07-10 19:41:21.623936245 +0100 +++ gcc/internal-fn.c 2019-08-14 13:22:08.625843346 +0100 @@ -3286,7 +3286,9 @@ #define FOR_EACH_CODE_MAPPING(T) \ T (MAX_EXPR, IFN_COND_MAX) \ T (BIT_AND_EXPR, IFN_COND_AND) \ T (BIT_IOR_EXPR, IFN_COND_IOR) \ - T (BIT_XOR_EXPR, IFN_COND_XOR) + T (BIT_XOR_EXPR, IFN_COND_XOR) \ + T (LSHIFT_EXPR, IFN_COND_SHL) \ + T (RSHIFT_EXPR, IFN_COND_SHR) /* Return a function that only performs CODE when a certain condition is met and that uses a given fallback value otherwise. For example, if CODE is Index: gcc/match.pd === --- gcc/match.pd2019-07-29 09:39:48.690173827 +0100 +++ gcc/match.pd2019-08-14 13:22:08.625843346 +0100 @@ -83,12 +83,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) plus minus mult trunc_div trunc_mod rdiv min max - bit_and bit_ior bit_xor) + bit_and bit_ior bit_xor + lshift rshift) (define_operator_list COND_BINARY IFN_COND_ADD IFN_COND_SUB IFN_COND_MUL IFN_COND_DIV IFN_COND_MOD IFN_COND_RDIV IFN_COND_MIN IFN_COND_MAX - IFN_COND_AND IFN_CO
Re: [PATCH 5/9] Come up with an abstraction.
On Mon, Aug 12, 2019 at 3:56 PM Martin Liška wrote: > > On 8/12/19 2:43 PM, Richard Biener wrote: > > On Mon, Aug 12, 2019 at 1:49 PM Martin Liška wrote: > >> > >> On 8/12/19 1:40 PM, Richard Biener wrote: > >>> On Mon, Aug 12, 2019 at 1:19 PM Martin Liška wrote: > > On 8/8/19 5:55 PM, Michael Matz wrote: > > Hi, > > > > On Mon, 10 Jun 2019, Martin Liska wrote: > > > >> 2019-07-24 Martin Liska > >> > >> * fold-const.c (operand_equal_p): Rename to ... > >> (operand_compare::operand_equal_p): ... this. > >> (add_expr): Rename to ... > >> (operand_compare::hash_operand): ... this. > >> (operand_compare::operand_equal_valueize): Likewise. > >> (operand_compare::hash_operand_valueize): Likewise. > >> * fold-const.h (operand_equal_p): Set default > >> value for last argument. > >> (class operand_compare): New. > > > > Hmpf. A class without any data? That doesn't sound like a good design. > > Yes, the base class (current operand_equal_p) does not have a data. > But the ICF derive class has a data and e.g. > func_checker::operand_equal_valueize > will use m_label_bb_map.get (t1). Which are member data of class > func_checker. > > > You seem to need it only to have the possibility of virtual functions, > > i.e. fancy callbacks. AFAICS you only have one derived class, i.e. a > > simple distinction of two cases. What do you think about encoding the > > additional new (ICF) case in the (existing) 'flags' argument to > > operand_equal_p (and in case the ICF flag is set simply call the > > "callback" directly)? > > That's possible. I can add two more callbacks to the operand_equal_p > function > (hash_operand_valueize and operand_equal_valueize). > > Is Richi also supporting this approach? > >>> > >>> I still see no value in the abstraction since you invoke none of the > >>> (virtual) methods from the base class operand_equal_p. > >> > >> I call operand_equal_valueize (and hash_operand) from operand_equal_p. > >> These are then used in IPA ICF (patch 6/9). > > > > Ugh. I see you call that after > > > > if (TREE_CODE (arg0) != TREE_CODE (arg1)) > > { > > ... > > } > > else > > return false; > > } > > > > and also after > > > > /* Check equality of integer constants before bailing out due to > > precision differences. */ > > if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST) > > > > which means for arg0 == SSA_NAME and arg1 == INTEGER_CST you return false > > instead of valueizing arg0 to the possibly same or same "lose" value > > and returning true. > > Yes. ICF does not allow to have anything where TREE_CODEs do not match. > > > > > Also > > > > + int val = operand_equal_valueize (arg0, arg1, flags); > > + if (val == 1) > > +return 1; > > + if (val == 0) > > +return 0; > > > > suggests that you pass in arbirtrary trees for "valueization" but it > > isn't actually > > valueization that is performed but instead it should do an alternate > > comparison > > of arg0 and arg1 with valueization. Why's this done this way instead of > > sth like > > > > if (TREE_CODE (arg0) == SSA_NAME) > >arg0 = operand_equal_valueize (arg0, flags); > > if (TREE_CODE (arg1) == SSA_NAME) > >arg1 = operand_equal_valueize (arg1, flags); > > Because I want to be given a pair of trees about which the function > operand_equal_valueize returns match/no-match/dunno. > > > > > and why's this done with virtual functions rather than a callback that we > > can > > cheaply check for NULLness in the default implementation? > > I can transform it into a hook. But as mentioned I'll need two hooks. > > > > > So - what does ICF want to make "equal" that isn't equal normally and how's > > that "valueization"? > > E.g. for a FUNCTION_DECL, ICF always return true because it can only calls > the operand_equal_p after callgraph is compared. Similarly for LABEL_DECLs, > we have a map (m_label_bb_map). Please take a look at patch 6/9 in this > series. Hmm, ok, so you basically replace recursive calls to operand_equal_p with operand_equal_valueize (t1, t2, 0) || operand_equal_p (t1, t2, 0) no? But the same could be achieved by actually making t1 and t2 equal according to operand_equal_p rules via the valueization hook? So replace FUNCTION_DECLs with their prevailing ones, LABEL_DECLs with theirs, etc. As given your abstraction is quite awkward to use, say, from value-numbering which knows how to "valueize" a single tree but doesn't compare things. To make it work for your case you'd valueize not only SSA names but also all DECL_P I guess. After all your operand_equal_valueize only does something for "leafs" but is called for all intermediate expressions as well. Richard. > Thanks, > Martin > > > > > Thanks, > > Richard. > > > >> Martin > >> > >>>
Re: C++ PATCH for c++/91391 - bogus -Wcomma-subscript warning
Ping. On Wed, Aug 07, 2019 at 04:05:53PM -0400, Marek Polacek wrote: > When implementing -Wcomma-subscript I failed to realize that a comma in > a template-argument-list shouldn't be warned about. > > But we can't simply ignore any commas inside < ... > because the following > needs to be caught: > > a[b < c, b > c]; > > This patch from Jakub fixes it by moving the warning to cp_parser_expression > where we can better detect top-level commas (and avoid saving tokens). > > I've extended the patch to revert the cp_parser_skip_to_closing_square_bracket > changes I made in r274121 -- they are no longer needed. > > Apologies for the thinko. > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > 2019-08-07 Jakub Jelinek > Marek Polacek > > PR c++/91391 - bogus -Wcomma-subscript warning. > * parser.c (cp_parser_postfix_open_square_expression): Don't warn about > a deprecated comma here. Pass warn_comma_subscript down to > cp_parser_expression. > (cp_parser_expression): New bool parameter. Warn about uses of a comma > operator within a subscripting expression. > (cp_parser_skip_to_closing_square_bracket): Revert to pre-r274121 state. > (cp_parser_skip_to_closing_square_bracket_1): Remove. > > * g++.dg/cpp2a/comma5.C: New test. > > diff --git gcc/cp/parser.c gcc/cp/parser.c > index 14b724095c4..eccc3749fd0 100644 > --- gcc/cp/parser.c > +++ gcc/cp/parser.c > @@ -2102,7 +2102,7 @@ static cp_expr cp_parser_assignment_expression > static enum tree_code cp_parser_assignment_operator_opt >(cp_parser *); > static cp_expr cp_parser_expression > - (cp_parser *, cp_id_kind * = NULL, bool = false, bool = false); > + (cp_parser *, cp_id_kind * = NULL, bool = false, bool = false, bool = > false); > static cp_expr cp_parser_constant_expression >(cp_parser *, bool = false, bool * = NULL, bool = false); > static cp_expr cp_parser_builtin_offsetof > @@ -2669,8 +2669,6 @@ static bool cp_parser_init_statement_p >(cp_parser *); > static bool cp_parser_skip_to_closing_square_bracket >(cp_parser *); > -static int cp_parser_skip_to_closing_square_bracket_1 > - (cp_parser *, enum cpp_ttype); > > /* Concept-related syntactic transformations */ > > @@ -7524,33 +7522,9 @@ cp_parser_postfix_open_square_expression (cp_parser > *parser, > index = cp_parser_braced_list (parser, &expr_nonconst_p); > } >else > - { > - /* [depr.comma.subscript]: A comma expression appearing as > - the expr-or-braced-init-list of a subscripting expression > - is deprecated. A parenthesized comma expression is not > - deprecated. */ > - if (warn_comma_subscript) > - { > - /* Save tokens so that we can put them back. */ > - cp_lexer_save_tokens (parser->lexer); > - > - /* Look for ',' that is not nested in () or {}. */ > - if (cp_parser_skip_to_closing_square_bracket_1 (parser, > - CPP_COMMA) == -1) > - { > - auto_diagnostic_group d; > - warning_at (cp_lexer_peek_token (parser->lexer)->location, > - OPT_Wcomma_subscript, > - "top-level comma expression in array subscript " > - "is deprecated"); > - } > - > - /* Roll back the tokens we skipped. */ > - cp_lexer_rollback_tokens (parser->lexer); > - } > - > - index = cp_parser_expression (parser); > - } > + index = cp_parser_expression (parser, NULL, /*cast_p=*/false, > + /*decltype_p=*/false, > + /*warn_comma_p=*/warn_comma_subscript); > } > >parser->greater_than_is_operator_p = saved_greater_than_is_operator_p; > @@ -9932,12 +9906,13 @@ cp_parser_assignment_operator_opt (cp_parser* parser) > CAST_P is true if this expression is the target of a cast. > DECLTYPE_P is true if this expression is the immediate operand of > decltype, > except possibly parenthesized or on the RHS of a comma (N3276). > + WARN_COMMA_P is true if a comma should be diagnosed. > > Returns a representation of the expression. */ > > static cp_expr > cp_parser_expression (cp_parser* parser, cp_id_kind * pidk, > - bool cast_p, bool decltype_p) > + bool cast_p, bool decltype_p, bool warn_comma_p) > { >cp_expr expression = NULL_TREE; >location_t loc = UNKNOWN_LOCATION; > @@ -9984,6 +9959,17 @@ cp_parser_expression (cp_parser* parser, cp_id_kind * > pidk, > break; >/* Consume the `,'. */ >loc = cp_lexer_peek_token (parser->lexer)->location; > + if (warn_comma_p) > + { > + /* [depr.comma.subscript]: A comma expression appearing as > + the expr-or-braced-init-list of a subscripting expression > +
Re: [PATCH] Add generic support for "noinit" attribute
On Wed, 14 Aug 2019 at 14:14, Richard Sandiford wrote: > > Sorry for the slow response, I'd missed that there was an updated patch... > > Christophe Lyon writes: > > 2019-07-04 Christophe Lyon > > > > * lib/target-supports.exp (check_effective_target_noinit): New > > proc. > > * gcc.c-torture/execute/noinit-attribute.c: New test. > > Second line should be indented by tabs rather than spaces. > > > @@ -2224,6 +2234,54 @@ handle_weak_attribute (tree *node, tree name, > >return NULL_TREE; > > } > > > > +/* Handle a "noinit" attribute; arguments as in struct > > + attribute_spec.handler. Check whether the attribute is allowed > > + here and add the attribute to the variable decl tree or otherwise > > + issue a diagnostic. This function checks NODE is of the expected > > + type and issues diagnostics otherwise using NAME. If it is not of > > + the expected type *NO_ADD_ATTRS will be set to true. */ > > + > > +static tree > > +handle_noinit_attribute (tree * node, > > + tree name, > > + tree args, > > + intflags ATTRIBUTE_UNUSED, > > + bool *no_add_attrs) > > +{ > > + const char *message = NULL; > > + > > + gcc_assert (DECL_P (*node)); > > + gcc_assert (args == NULL); > > + > > + if (TREE_CODE (*node) != VAR_DECL) > > +message = G_("%qE attribute only applies to variables"); > > + > > + /* Check that it's possible for the variable to have a section. */ > > + else if ((TREE_STATIC (*node) || DECL_EXTERNAL (*node) || in_lto_p) > > +&& DECL_SECTION_NAME (*node)) > > +message = G_("%qE attribute cannot be applied to variables " > > + "with specific sections"); > > + > > + if (!targetm.have_switchable_bss_sections) > > +message = G_("%qE attribute is specific to ELF targets"); > > Maybe make this an else if too? Or make the VAR_DECL an else if > if you think the ELF one should win. Either way, it seems odd to > have the mixture between else if and not. > Right, I changed this into an else if. > > + if (message) > > +{ > > + warning (OPT_Wattributes, message, name); > > + *no_add_attrs = true; > > +} > > + else > > + /* If this var is thought to be common, then change this. Common > > + variables are assigned to sections before the backend has a > > + chance to process them. Do this only if the attribute is > > + valid. */ > > Comment should be indented two spaces more. > > > +if (DECL_COMMON (*node)) > > + DECL_COMMON (*node) = 0; > > + > > + return NULL_TREE; > > +} > > + > > + > > /* Handle a "noplt" attribute; arguments as in > > struct attribute_spec.handler. */ > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > > index f2619e1..f1af1dc 100644 > > --- a/gcc/doc/extend.texi > > +++ b/gcc/doc/extend.texi > > @@ -7129,6 +7129,14 @@ The @code{visibility} attribute is described in > > The @code{weak} attribute is described in > > @ref{Common Function Attributes}. > > > > +@item noinit > > +@cindex @code{noinit} variable attribute > > +Any data with the @code{noinit} attribute will not be initialized by > > +the C runtime startup code, or the program loader. Not initializing > > +data in this way can reduce program startup times. Specific to ELF > > +targets, this attribute relies on the linker to place such data in the > > +right location. > > Maybe: > >This attribute is specific to ELF targets and relies on the linker to >place such data in the right location. > Thanks, I thought I had chosen a nice turn of phrase :-) > > diff --git a/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > > b/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > > new file mode 100644 > > index 000..ffcf8c6 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > > @@ -0,0 +1,59 @@ > > +/* { dg-do run } */ > > +/* { dg-require-effective-target noinit */ > > +/* { dg-options "-O2" } */ > > + > > +/* This test checks that noinit data is handled correctly. */ > > + > > +extern void _start (void) __attribute__ ((noreturn)); > > +extern void abort (void) __attribute__ ((noreturn)); > > +extern void exit (int) __attribute__ ((noreturn)); > > + > > +int var_common; > > +int var_zero = 0; > > +int var_one = 1; > > +int __attribute__((noinit)) var_noinit; > > +int var_init = 2; > > + > > +int __attribute__((noinit)) func(); /* { dg-warning "attribute only > > applies to variables" } */ > > +int __attribute__((section ("mysection"), noinit)) var_section1; /* { > > dg-warning "because it conflicts with attribute" } */ > > +int __attribute__((noinit, section ("mysection"))) var_section2; /* { > > dg-warning "because it conflicts with attribute" } */ > > + > > + > > +int > > +main (void) > > +{ > > + /* Make sure that the C startup code has correctly initialized the > > ordinary variables. */ > > + if (var_common != 0) > > +abort (); > > + > > + /* Initialized
Re: [PATCH 5/9] Come up with an abstraction.
On 8/14/19 3:04 PM, Richard Biener wrote: > On Mon, Aug 12, 2019 at 3:56 PM Martin Liška wrote: >> >> On 8/12/19 2:43 PM, Richard Biener wrote: >>> On Mon, Aug 12, 2019 at 1:49 PM Martin Liška wrote: On 8/12/19 1:40 PM, Richard Biener wrote: > On Mon, Aug 12, 2019 at 1:19 PM Martin Liška wrote: >> >> On 8/8/19 5:55 PM, Michael Matz wrote: >>> Hi, >>> >>> On Mon, 10 Jun 2019, Martin Liska wrote: >>> 2019-07-24 Martin Liska * fold-const.c (operand_equal_p): Rename to ... (operand_compare::operand_equal_p): ... this. (add_expr): Rename to ... (operand_compare::hash_operand): ... this. (operand_compare::operand_equal_valueize): Likewise. (operand_compare::hash_operand_valueize): Likewise. * fold-const.h (operand_equal_p): Set default value for last argument. (class operand_compare): New. >>> >>> Hmpf. A class without any data? That doesn't sound like a good design. >> >> Yes, the base class (current operand_equal_p) does not have a data. >> But the ICF derive class has a data and e.g. >> func_checker::operand_equal_valueize >> will use m_label_bb_map.get (t1). Which are member data of class >> func_checker. >> >>> You seem to need it only to have the possibility of virtual functions, >>> i.e. fancy callbacks. AFAICS you only have one derived class, i.e. a >>> simple distinction of two cases. What do you think about encoding the >>> additional new (ICF) case in the (existing) 'flags' argument to >>> operand_equal_p (and in case the ICF flag is set simply call the >>> "callback" directly)? >> >> That's possible. I can add two more callbacks to the operand_equal_p >> function >> (hash_operand_valueize and operand_equal_valueize). >> >> Is Richi also supporting this approach? > > I still see no value in the abstraction since you invoke none of the > (virtual) methods from the base class operand_equal_p. I call operand_equal_valueize (and hash_operand) from operand_equal_p. These are then used in IPA ICF (patch 6/9). >>> >>> Ugh. I see you call that after >>> >>> if (TREE_CODE (arg0) != TREE_CODE (arg1)) >>> { >>> ... >>> } >>> else >>> return false; >>> } >>> >>> and also after >>> >>> /* Check equality of integer constants before bailing out due to >>> precision differences. */ >>> if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST) >>> >>> which means for arg0 == SSA_NAME and arg1 == INTEGER_CST you return false >>> instead of valueizing arg0 to the possibly same or same "lose" value >>> and returning true. >> >> Yes. ICF does not allow to have anything where TREE_CODEs do not match. >> >>> >>> Also >>> >>> + int val = operand_equal_valueize (arg0, arg1, flags); >>> + if (val == 1) >>> +return 1; >>> + if (val == 0) >>> +return 0; >>> >>> suggests that you pass in arbirtrary trees for "valueization" but it >>> isn't actually >>> valueization that is performed but instead it should do an alternate >>> comparison >>> of arg0 and arg1 with valueization. Why's this done this way instead of >>> sth like >>> >>> if (TREE_CODE (arg0) == SSA_NAME) >>>arg0 = operand_equal_valueize (arg0, flags); >>> if (TREE_CODE (arg1) == SSA_NAME) >>>arg1 = operand_equal_valueize (arg1, flags); >> >> Because I want to be given a pair of trees about which the function >> operand_equal_valueize returns match/no-match/dunno. >> >>> >>> and why's this done with virtual functions rather than a callback that we >>> can >>> cheaply check for NULLness in the default implementation? >> >> I can transform it into a hook. But as mentioned I'll need two hooks. >> >>> >>> So - what does ICF want to make "equal" that isn't equal normally and how's >>> that "valueization"? >> >> E.g. for a FUNCTION_DECL, ICF always return true because it can only calls >> the operand_equal_p after callgraph is compared. Similarly for LABEL_DECLs, >> we have a map (m_label_bb_map). Please take a look at patch 6/9 in this >> series. > > Hmm, ok, so you basically replace recursive calls to operand_equal_p with > > operand_equal_valueize (t1, t2, 0) > || operand_equal_p (t1, t2, 0) > > no? This is not going to work .. > But the same could be achieved by actually making t1 and t2 equal > according to operand_equal_p rules via the valueization hook? So replace > FUNCTION_DECLs with their prevailing ones, LABEL_DECLs with theirs, etc. > > As given your abstraction is quite awkward to use, say, from value-numbering > which knows how to "valueize" a single tree but doesn't compare things. > > To make it work for your case you'd valueize not only SSA names but also > all DECL_P I guess. After all your operand_equal_valueize only does > something for "leafs" but is called for
[PATCH] Make GIMPLE forwprop DCE dead stmts
The following patch makes forwprop DCE the stmts that become dead because of propagation of copies and constants. For this to work we actually have to do that reliably rather than relying on fold_stmt doing this for us. This hits fortran/trans-intrinsic.c in a way that we do "interesting" jump threading exposing a bogus uninit warning. I'll open a PR for this with an (unreduced) testcase after committing. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. I've done this when seeing the number of copyprop passes we have and knowing the expense of the SSA propagation machinery so eventually forwprop (in a cheaper mode, not folding all stmts) could replace copyprop. Richard. 2019-08-14 Richard Biener * tree-ssa-forwprop.c (pass_forwprop::execute): Fully propagate lattice, DCE stmts that became dead because of that. fortran/ * trans-intrinsic.c (gfc_conv_intrinsic_findloc): Initialize forward_branch to avoid bogus uninitialized warning. * gcc.dg/tree-ssa/forwprop-31.c: Adjust. Index: gcc/tree-ssa-forwprop.c === --- gcc/tree-ssa-forwprop.c (revision 274422) +++ gcc/tree-ssa-forwprop.c (working copy) @@ -2299,13 +2299,14 @@ pass_forwprop::execute (function *fun) int postorder_num = pre_and_rev_post_order_compute_fn (cfun, NULL, postorder, false); auto_vec to_fixup; + auto_vec to_remove; to_purge = BITMAP_ALLOC (NULL); for (int i = 0; i < postorder_num; ++i) { gimple_stmt_iterator gsi; basic_block bb = BASIC_BLOCK_FOR_FN (fun, postorder[i]); - /* Propagate into PHIs and record degenerate ones in the lattice. */ + /* Record degenerate PHIs in the lattice. */ for (gphi_iterator si = gsi_start_phis (bb); !gsi_end_p (si); gsi_next (&si)) { @@ -2321,17 +2322,20 @@ pass_forwprop::execute (function *fun) FOR_EACH_PHI_ARG (use_p, phi, it, SSA_OP_USE) { tree use = USE_FROM_PTR (use_p); - tree tem = fwprop_ssa_val (use); if (! first) - first = tem; - else if (! operand_equal_p (first, tem, 0)) - all_same = false; - if (tem != use - && may_propagate_copy (use, tem)) - propagate_value (use_p, tem); + first = use; + else if (! operand_equal_p (first, use, 0)) + { + all_same = false; + break; + } } if (all_same) - fwprop_set_lattice_val (res, first); + { + if (may_propagate_copy (res, first)) + to_remove.safe_push (phi); + fwprop_set_lattice_val (res, first); + } } /* Apply forward propagation to all stmts in the basic-block. @@ -2648,148 +2652,227 @@ pass_forwprop::execute (function *fun) /* Combine stmts with the stmts defining their operands. Note we update GSI within the loop as necessary. */ - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);) + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { gimple *stmt = gsi_stmt (gsi); - gimple *orig_stmt = stmt; - bool changed = false; - bool was_noreturn = (is_gimple_call (stmt) - && gimple_call_noreturn_p (stmt)); /* Mark stmt as potentially needing revisiting. */ gimple_set_plf (stmt, GF_PLF_1, false); - if (fold_stmt (&gsi, fwprop_ssa_val)) - { - changed = true; - stmt = gsi_stmt (gsi); - if (maybe_clean_or_replace_eh_stmt (orig_stmt, stmt)) - bitmap_set_bit (to_purge, bb->index); - if (!was_noreturn - && is_gimple_call (stmt) && gimple_call_noreturn_p (stmt)) - to_fixup.safe_push (stmt); - /* Cleanup the CFG if we simplified a condition to -true or false. */ - if (gcond *cond = dyn_cast (stmt)) - if (gimple_cond_true_p (cond) - || gimple_cond_false_p (cond)) - cfg_changed = true; - update_stmt (stmt); - } - - switch (gimple_code (stmt)) - { - case GIMPLE_ASSIGN: - { - tree rhs1 = gimple_assign_rhs1 (stmt); - enum tree_code code = gimple_assign_rhs_code (stmt); + /* Substitute from our lattice. We need to do so only once. */ + bool substituted_p = false; + use_operand_p usep; + ssa_op_iter iter; + FOR_EACH_SSA_USE_OPERAND (usep, stmt, iter, SSA_OP_USE) + { + tree use = USE_FROM_PTR (usep); + tree val = fwprop_ssa_val (use); + if (val && val != use && may_propagate_cop
Re: types for VR_VARYING
On 8/13/19 8:39 PM, Aldy Hernandez wrote: Yes, it was 2X. I noticed that Richi made some changes to the lattice handling for VARYING while the discussion was on-going. I missed these, and had failed to adapt the patch for it. I would appreciate a final review of the attached patch, especially the vr-values.c changes, which I have modified to play nice with current trunk. I also noticed that Andrew's patch was setting num_vr_values to num_ssa_names + num_ssa_names / 10. I think he meant num_vr_values + num_vr_values / 10. Please verify the current incantation makes sense. no, I meant num_ssa_names. We are resizing the vector because num_vr_values is out of date (and smaller than num_ssa_names is now), so we need to resize the vector to be at least the number of ssa-names... and I added 10% just in case we arent done adding new ones. if num_vr_values is 100, and we've added 200 ssa-names, num_ssa_names would now be 300. if you resize based on num_vr_values, you could still go off the end of the vector. Andrew
Re: [PATCH 5/9] Come up with an abstraction.
On Wed, Aug 14, 2019 at 3:19 PM Martin Liška wrote: > > On 8/14/19 3:04 PM, Richard Biener wrote: > > On Mon, Aug 12, 2019 at 3:56 PM Martin Liška wrote: > >> > >> On 8/12/19 2:43 PM, Richard Biener wrote: > >>> On Mon, Aug 12, 2019 at 1:49 PM Martin Liška wrote: > > On 8/12/19 1:40 PM, Richard Biener wrote: > > On Mon, Aug 12, 2019 at 1:19 PM Martin Liška wrote: > >> > >> On 8/8/19 5:55 PM, Michael Matz wrote: > >>> Hi, > >>> > >>> On Mon, 10 Jun 2019, Martin Liska wrote: > >>> > 2019-07-24 Martin Liska > > * fold-const.c (operand_equal_p): Rename to ... > (operand_compare::operand_equal_p): ... this. > (add_expr): Rename to ... > (operand_compare::hash_operand): ... this. > (operand_compare::operand_equal_valueize): Likewise. > (operand_compare::hash_operand_valueize): Likewise. > * fold-const.h (operand_equal_p): Set default > value for last argument. > (class operand_compare): New. > >>> > >>> Hmpf. A class without any data? That doesn't sound like a good > >>> design. > >> > >> Yes, the base class (current operand_equal_p) does not have a data. > >> But the ICF derive class has a data and e.g. > >> func_checker::operand_equal_valueize > >> will use m_label_bb_map.get (t1). Which are member data of class > >> func_checker. > >> > >>> You seem to need it only to have the possibility of virtual functions, > >>> i.e. fancy callbacks. AFAICS you only have one derived class, i.e. a > >>> simple distinction of two cases. What do you think about encoding the > >>> additional new (ICF) case in the (existing) 'flags' argument to > >>> operand_equal_p (and in case the ICF flag is set simply call the > >>> "callback" directly)? > >> > >> That's possible. I can add two more callbacks to the operand_equal_p > >> function > >> (hash_operand_valueize and operand_equal_valueize). > >> > >> Is Richi also supporting this approach? > > > > I still see no value in the abstraction since you invoke none of the > > (virtual) methods from the base class operand_equal_p. > > I call operand_equal_valueize (and hash_operand) from operand_equal_p. > These are then used in IPA ICF (patch 6/9). > >>> > >>> Ugh. I see you call that after > >>> > >>> if (TREE_CODE (arg0) != TREE_CODE (arg1)) > >>> { > >>> ... > >>> } > >>> else > >>> return false; > >>> } > >>> > >>> and also after > >>> > >>> /* Check equality of integer constants before bailing out due to > >>> precision differences. */ > >>> if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST) > >>> > >>> which means for arg0 == SSA_NAME and arg1 == INTEGER_CST you return false > >>> instead of valueizing arg0 to the possibly same or same "lose" value > >>> and returning true. > >> > >> Yes. ICF does not allow to have anything where TREE_CODEs do not match. > >> > >>> > >>> Also > >>> > >>> + int val = operand_equal_valueize (arg0, arg1, flags); > >>> + if (val == 1) > >>> +return 1; > >>> + if (val == 0) > >>> +return 0; > >>> > >>> suggests that you pass in arbirtrary trees for "valueization" but it > >>> isn't actually > >>> valueization that is performed but instead it should do an alternate > >>> comparison > >>> of arg0 and arg1 with valueization. Why's this done this way instead of > >>> sth like > >>> > >>> if (TREE_CODE (arg0) == SSA_NAME) > >>>arg0 = operand_equal_valueize (arg0, flags); > >>> if (TREE_CODE (arg1) == SSA_NAME) > >>>arg1 = operand_equal_valueize (arg1, flags); > >> > >> Because I want to be given a pair of trees about which the function > >> operand_equal_valueize returns match/no-match/dunno. > >> > >>> > >>> and why's this done with virtual functions rather than a callback that we > >>> can > >>> cheaply check for NULLness in the default implementation? > >> > >> I can transform it into a hook. But as mentioned I'll need two hooks. > >> > >>> > >>> So - what does ICF want to make "equal" that isn't equal normally and > >>> how's > >>> that "valueization"? > >> > >> E.g. for a FUNCTION_DECL, ICF always return true because it can only calls > >> the operand_equal_p after callgraph is compared. Similarly for LABEL_DECLs, > >> we have a map (m_label_bb_map). Please take a look at patch 6/9 in this > >> series. > > > > Hmm, ok, so you basically replace recursive calls to operand_equal_p with _recursive calls_ > > > > operand_equal_valueize (t1, t2, 0) > > || operand_equal_p (t1, t2, 0) > > > > no? > > This is not going to work .. I wonder if class base { virtual operand_equal_p (tree a, tree b, int f); }; base::operand_equal_p (tree a, tree b, int f) { as-is now, recursing to virtual operand_equal_p } class deriv : public base { vritual operand_equa
Re: types for VR_VARYING
On 8/14/19 9:50 AM, Andrew MacLeod wrote: On 8/13/19 8:39 PM, Aldy Hernandez wrote: Yes, it was 2X. I noticed that Richi made some changes to the lattice handling for VARYING while the discussion was on-going. I missed these, and had failed to adapt the patch for it. I would appreciate a final review of the attached patch, especially the vr-values.c changes, which I have modified to play nice with current trunk. I also noticed that Andrew's patch was setting num_vr_values to num_ssa_names + num_ssa_names / 10. I think he meant num_vr_values + num_vr_values / 10. Please verify the current incantation makes sense. no, I meant num_ssa_names. We are resizing the vector because num_vr_values is out of date (and smaller than num_ssa_names is now), so we need to resize the vector to be at least the number of ssa-names... and I added 10% just in case we arent done adding new ones. if num_vr_values is 100, and we've added 200 ssa-names, num_ssa_names would now be 300. if you resize based on num_vr_values, you could still go off the end of the vector. OK, I've changed the resize to allocate 2X as well. So now we'll have: + unsigned int old_sz = num_vr_values; + num_vr_values = num_ssa_names * 2; + vr_value = XRESIZEVEC (value_range *, vr_value, num_vr_values); etc And the original allocation will also be 2X. Aldy
Re: [PATCH 1/2] PR c++/91436 fix C++ dialect for std::make_unique fix-it hint
On Wed, Aug 14, 2019 at 7:02 AM Jonathan Wakely wrote: > > On 13/08/19 16:07 -0400, Jason Merrill wrote: > >On 8/13/19 9:32 AM, Jonathan Wakely wrote: > >> * g++.dg/lookup/missing-std-include-6.C: Don't check make_unique in > >> test that runs for C++11. > > > >I'm not comfortable removing this test coverage entirely. Doesn't it > >give a useful diagnostic in C++11 mode as well? > > It does: > > mu.cc:3:15: error: 'make_unique' is not a member of 'std' > 3 | auto p = std::make_unique(); > | ^~~ > mu.cc:3:15: note: 'std::make_unique' is only available from C++14 onwards > mu.cc:3:27: error: expected primary-expression before 'int' > 3 | auto p = std::make_unique(); > | ^~~ > > So we can add it to g++.dg/lookup/missing-std-include-8.C instead, > which runs for c++98_only and checks for the "is only available for" > cases. Here's a patch doing that. > > Tested x86_64-linux. > > OK for trunk? > > OK for gcc-9-branch and gcc-8-branch too, since PR c++/91436 affects > those branches? OK. Jason
Re: [PATCH 1/2] PR c++/91436 fix C++ dialect for std::make_unique fix-it hint
On Wed, 2019-08-14 at 12:02 +0100, Jonathan Wakely wrote: > On 13/08/19 16:07 -0400, Jason Merrill wrote: > > On 8/13/19 9:32 AM, Jonathan Wakely wrote: > > > * g++.dg/lookup/missing-std-include-6.C: Don't check > > > make_unique in > > > test that runs for C++11. > > > > I'm not comfortable removing this test coverage entirely. Doesn't > > it > > give a useful diagnostic in C++11 mode as well? > > It does: > > mu.cc:3:15: error: 'make_unique' is not a member of 'std' > 3 | auto p = std::make_unique(); > | ^~~ > mu.cc:3:15: note: 'std::make_unique' is only available from C++14 > onwards > mu.cc:3:27: error: expected primary-expression before 'int' > 3 | auto p = std::make_unique(); > | ^~~ > > So we can add it to g++.dg/lookup/missing-std-include-8.C instead, > which runs for c++98_only and checks for the "is only available for" > cases. Here's a patch doing that. FWIW this eliminates the testing that when we do have C++14 onwards, that including is suggested. Maybe we need a C++14-onwards missing-std-include-* test, and to move the existing test there? (and to add the new test for before-C++-14) > Tested x86_64-linux. > > OK for trunk? > > OK for gcc-9-branch and gcc-8-branch too, since PR c++/91436 affects > those branches? >
Re: C++ PATCH for c++/91391 - bogus -Wcomma-subscript warning
On 8/14/19 9:15 AM, Marek Polacek wrote: Ping. On Wed, Aug 07, 2019 at 04:05:53PM -0400, Marek Polacek wrote: When implementing -Wcomma-subscript I failed to realize that a comma in a template-argument-list shouldn't be warned about. But we can't simply ignore any commas inside < ... > because the following needs to be caught: a[b < c, b > c]; This patch from Jakub fixes it by moving the warning to cp_parser_expression where we can better detect top-level commas (and avoid saving tokens). I've extended the patch to revert the cp_parser_skip_to_closing_square_bracket changes I made in r274121 -- they are no longer needed. Apologies for the thinko. Bootstrapped/regtested on x86_64-linux, ok for trunk? OK. Jason
Re: [PATCH 1/2] PR c++/91436 fix C++ dialect for std::make_unique fix-it hint
On 8/14/19 10:39 AM, David Malcolm wrote: On Wed, 2019-08-14 at 12:02 +0100, Jonathan Wakely wrote: On 13/08/19 16:07 -0400, Jason Merrill wrote: On 8/13/19 9:32 AM, Jonathan Wakely wrote: * g++.dg/lookup/missing-std-include-6.C: Don't check make_unique in test that runs for C++11. I'm not comfortable removing this test coverage entirely. Doesn't it give a useful diagnostic in C++11 mode as well? It does: mu.cc:3:15: error: 'make_unique' is not a member of 'std' 3 | auto p = std::make_unique(); | ^~~ mu.cc:3:15: note: 'std::make_unique' is only available from C++14 onwards mu.cc:3:27: error: expected primary-expression before 'int' 3 | auto p = std::make_unique(); | ^~~ So we can add it to g++.dg/lookup/missing-std-include-8.C instead, which runs for c++98_only and checks for the "is only available for" cases. Here's a patch doing that. FWIW this eliminates the testing that when we do have C++14 onwards, that including is suggested. Maybe we need a C++14-onwards missing-std-include-* test, and to move the existing test there? (and to add the new test for before-C++-14) We can also check for different messages in different std modes, i.e. { dg-message "one" "" { target c++11_down } .-1 } { dg-message "two" "" { target c++14 } .-2 } Jason
[SVE] PR86753
Hi, The attached patch tries to fix PR86753. For following test: void f1 (int *restrict x, int *restrict y, int *restrict z) { for (int i = 0; i < 100; ++i) x[i] = y[i] ? z[i] : 10; } vect dump shows: vect_cst__42 = { 0, ... }; vect_cst__48 = { 0, ... }; vect__4.7_41 = .MASK_LOAD (vectp_y.5_38, 4B, loop_mask_40); _4 = *_3; _5 = z_12(D) + _2; mask__35.8_43 = vect__4.7_41 != vect_cst__42; _35 = _4 != 0; vec_mask_and_46 = mask__35.8_43 & loop_mask_40; vect_iftmp.11_47 = .MASK_LOAD (vectp_z.9_44, 4B, vec_mask_and_46); iftmp.0_13 = 0; vect_iftmp.12_50 = VEC_COND_EXPR ; and following code-gen: L2: ld1wz0.s, p2/z, [x1, x3, lsl 2] cmpne p1.s, p3/z, z0.s, #0 cmpne p0.s, p2/z, z0.s, #0 ld1wz0.s, p0/z, [x2, x3, lsl 2] sel z0.s, p1, z0.s, z1.s We could reuse vec_mask_and_46 in vec_cond_expr since the conditions vect__4.7_41 != vect_cst__48 and vect__4.7_41 != vect_cst__42 are equivalent, and vect_iftmp.11_47 depends on vect__4.7_41 != vect_cst__48. I suppose in general for vec_cond_expr if T comes from masked load, which is conditional on C, then we could reuse the mask used in load, in vec_cond_expr ? The patch maintains a hash_map cond_to_vec_mask from vec_mask (with loop predicate applied). In prepare_load_store_mask, we record -> vec_mask & loop_mask, and in vectorizable_condition, we check if exists in cond_to_vec_mask and if found, the corresponding vec_mask is used as 1st operand of vec_cond_expr. is represented with cond_vmask_key, and the patch adds tree_cond_ops to represent condition operator and operands coming either from cond_expr or a gimple comparison stmt. If the stmt is not comparison, it returns and inserts that into cond_to_vec_mask. With patch, the redundant p1 is eliminated and sel uses p0 for above test. For following test: void f2 (int *restrict x, int *restrict y, int *restrict z, int fallback) { for (int i = 0; i < 100; ++i) x[i] = y[i] ? z[i] : fallback; } input to vectorizer has operands swapped in cond_expr: _36 = _4 != 0; iftmp.0_14 = .MASK_LOAD (_5, 32B, _36); iftmp.0_8 = _4 == 0 ? fallback_12(D) : iftmp.0_14; So we need to check for inverted condition in cond_to_vec_mask, and swap the operands. Does the patch look OK so far ? One major issue remaining with the patch is value numbering. Currently, it does value numbering for entire function using sccvn during start of vect pass, which is too expensive since we only need block based VN. I am looking into that. Thanks, Prathamesh diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index b0cbbac0cb5..bf54f80dd8b 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -8608,6 +8608,7 @@ vect_transform_loop (loop_vec_info loop_vinfo) { basic_block bb = bbs[i]; stmt_vec_info stmt_info; + loop_vinfo->cond_to_vec_mask = new cond_vmask_map_type (8); for (gphi_iterator si = gsi_start_phis (bb); !gsi_end_p (si); gsi_next (&si)) @@ -8717,6 +8718,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) } } } + + delete loop_vinfo->cond_to_vec_mask; + loop_vinfo->cond_to_vec_mask = 0; }/* BBs in loop */ /* The vectorization factor is always > 1, so if we use an IV increment of 1. diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 1e2dfe5d22d..862206b3256 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info loop_vinfo, tree vectype, static tree prepare_load_store_mask (tree mask_type, tree loop_mask, tree vec_mask, - gimple_stmt_iterator *gsi) + gimple_stmt_iterator *gsi, tree mask, + cond_vmask_map_type *cond_to_vec_mask) { gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE (vec_mask))); if (!loop_mask) return vec_mask; gcc_assert (TREE_TYPE (loop_mask) == mask_type); + + tree *slot = 0; + if (cond_to_vec_mask) +{ + cond_vmask_key cond (mask, loop_mask); + slot = &cond_to_vec_mask->get_or_insert (cond); + if (*slot) + return *slot; +} + tree and_res = make_temp_ssa_name (mask_type, NULL, "vec_mask_and"); gimple *and_stmt = gimple_build_assign (and_res, BIT_AND_EXPR, vec_mask, loop_mask); gsi_insert_before (gsi, and_stmt, GSI_SAME_STMT); + + if (slot) +*slot = and_res; return and_res; } @@ -3514,8 +3528,10 @@ vectorizable_call (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, gcc_assert (ncopies == 1); tree mask = vect_get_loop_mask (gsi, masks, vec_num, vectype_out, i); + tree scalar_mask = gimple_call_arg (gsi_stmt (*gsi), mask_opno); vargs[mask_opno] = prepare_load_store_mask - (TREE_TYPE (mask), mask, vargs[mask_opno], gsi); + (TREE_TYPE (mask), mask, vargs[mask_opno], gsi, + scalar_mask, vinfo->cond_to_vec_mask); } gcall *call; @@ -3564,9 +3580,11 @@ vectorizable_call (stmt_vec_info stmt_info, gimple_stmt_ite
Re: [PATCH 2/3] C++20 constexpr lib part 2/3 - swappish functions.
On 8/13/19 7:14 AM, Jonathan Wakely wrote: On 01/08/19 13:16 -0400, Ed Smith-Rowland via libstdc++ wrote: Greetings, Here is a patch for C++20 p0879 - Constexpr for swap and swap related functions. This essentially constexprifies the rest of . Built and tested with C++20 (and pre-c++20) on x86_64-linux. Ok? Regards, Ed Smith-Rowland 2019-08-01?? Edward Smith-Rowland <3dw...@verizon.net> Implement C++20 p0879 - Constexpr for swap and swap related functions. * include/bits/algorithmfwd.h (__cpp_lib_constexpr_swap_algorithms): New macro. (iter_swap, make_heap, next_permutation, partial_sort_copy, There should be a newline after "New macro." and before the next parenthesized list of identifiers. The parenthesized lists should not span multiple lines, so close and reopen the parens, i.e. Implement C++20 p0879 - Constexpr for swap and swap related functions. * include/bits/algorithmfwd.h (__cpp_lib_constexpr_swap_algorithms): New macro. (iter_swap, make_heap, next_permutation, partial_sort_copy, pop_heap) (prev_permutation, push_heap, reverse, rotate, sort_heap, swap) (swap_ranges, nth_element, partial_sort, sort): Add constexpr. @@ -193,6 +193,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #if __cplusplus > 201703L #?? define __cpp_lib_constexpr_algorithms 201711L +#?? define __cpp_lib_constexpr_swap_algorithms 201712L Should this value be 201806L? Indeed. The new macro also needs to be added to . Done. I this OK after it passes testing? Ed 2019-08-14 Edward Smith-Rowland <3dw...@verizon.net> Implement C++20 p0879 - Constexpr for swap and swap related functions. * include/bits/algorithmfwd.h (__cpp_lib_constexpr_swap_algorithms): New macro. * include/std/version: Ditto. (iter_swap, make_heap, next_permutation, partial_sort_copy, pop_heap) (prev_permutation, push_heap, reverse, rotate, sort_heap, swap) (swap_ranges, nth_element, partial_sort, sort): Add constexpr. * include/bits/move.h (swap): Add constexpr. * include/bits/stl_algo.h (__move_median_to_first, __reverse, reverse) (__gcd, __rotate, rotate, __partition, __heap_select) (__partial_sort_copy, partial_sort_copy, __unguarded_partition) (__unguarded_partition_pivot, __partial_sort, __introsort_loop, __sort) (__introselect, __chunk_insertion_sort, next_permutation) (prev_permutation, partition, partial_sort, nth_element, sort) (__iter_swap::iter_swap, iter_swap, swap_ranges): Add constexpr. * include/bits/stl_algobase.h (__iter_swap::iter_swap, iter_swap) (swap_ranges): Add constexpr. * include/bits/stl_heap.h (__push_heap, push_heap, __adjust_heap, __pop_heap, pop_heap, __make_heap, make_heap, __sort_heap, sort_heap): Add constexpr. * include/std/type_traits (swap): Add constexpr. * testsuite/25_algorithms/headers/algorithm/synopsis.cc: Add constexpr. * testsuite/25_algorithms/iter_swap/constexpr.cc: New test. * testsuite/25_algorithms/make_heap/constexpr.cc: New test. * testsuite/25_algorithms/next_permutation/constexpr.cc: New test. * testsuite/25_algorithms/nth_element/constexpr.cc: New test. * testsuite/25_algorithms/partial_sort/constexpr.cc: New test. * testsuite/25_algorithms/partial_sort_copy/constexpr.cc: New test. * testsuite/25_algorithms/partition/constexpr.cc: New test. * testsuite/25_algorithms/pop_heap/constexpr.cc: New test. * testsuite/25_algorithms/prev_permutation/constexpr.cc: New test. * testsuite/25_algorithms/push_heap/constexpr.cc: New test. * testsuite/25_algorithms/reverse/constexpr.cc: New test. * testsuite/25_algorithms/rotate/constexpr.cc: New test. * testsuite/25_algorithms/sort/constexpr.cc: New test. * testsuite/25_algorithms/sort_heap/constexpr.cc: New test. * testsuite/25_algorithms/swap/constexpr.cc: New test. * testsuite/25_algorithms/swap_ranges/constexpr.cc: New test. Index: include/bits/algorithmfwd.h === --- include/bits/algorithmfwd.h (revision 274411) +++ include/bits/algorithmfwd.h (working copy) @@ -193,6 +193,7 @@ #if __cplusplus > 201703L # define __cpp_lib_constexpr_algorithms 201711L +# define __cpp_lib_constexpr_swap_algorithms 201806L #endif #if __cplusplus >= 201103L @@ -377,6 +378,7 @@ #endif template +_GLIBCXX20_CONSTEXPR void iter_swap(_FIter1, _FIter2); @@ -391,10 +393,12 @@ lower_bound(_FIter, _FIter, const _Tp&, _Compare); template +_GLIBCXX20_CONSTEXPR void make_heap(_RAIter, _RAIter); template +_GLIBCXX20_CONSTEXPR void make_heap(_RAIter, _RAIter, _Compare); @@ -478,10 +482,12 @@ // mismatch template +_GLIBCXX2
Re: [PATCH 1/2] PR c++/91436 fix C++ dialect for std::make_unique fix-it hint
On 14/08/19 10:39 -0400, David Malcolm wrote: On Wed, 2019-08-14 at 12:02 +0100, Jonathan Wakely wrote: On 13/08/19 16:07 -0400, Jason Merrill wrote: > On 8/13/19 9:32 AM, Jonathan Wakely wrote: > > * g++.dg/lookup/missing-std-include-6.C: Don't check > > make_unique in > > test that runs for C++11. > > I'm not comfortable removing this test coverage entirely. Doesn't > it > give a useful diagnostic in C++11 mode as well? It does: mu.cc:3:15: error: 'make_unique' is not a member of 'std' 3 | auto p = std::make_unique(); | ^~~ mu.cc:3:15: note: 'std::make_unique' is only available from C++14 onwards mu.cc:3:27: error: expected primary-expression before 'int' 3 | auto p = std::make_unique(); | ^~~ So we can add it to g++.dg/lookup/missing-std-include-8.C instead, which runs for c++98_only and checks for the "is only available for" cases. Here's a patch doing that. FWIW this eliminates the testing that when we do have C++14 onwards, that including is suggested. Do we really care? Are we testing that *every* entry in the array gives the right answer for both missing-header and bad-std-option, or are we just testing a subset of them to be sure the logic works as expected? Because if we're testing every entry then: 1) we're missing LOTS of tests, and 2) we're just as likely to test the wrong thing and not actually catch bugs (as was already happening for both make_unique and complex_literals). Maybe we need a C++14-onwards missing-std-include-* test, and to move the existing test there? (and to add the new test for before-C++-14) We could, but is it worth it?
RE: [PATCH] Add generic support for "noinit" attribute
Hi Christoph, The noinit testcase is currently failing on x86_64. Is the test supposed to be running there? Thanks, Tamar -Original Message- From: gcc-patches-ow...@gcc.gnu.org On Behalf Of Christophe Lyon Sent: Wednesday, August 14, 2019 2:18 PM To: Christophe Lyon ; Martin Sebor ; gcc Patches ; Richard Earnshaw ; ni...@redhat.com; Jozef Lawrynowicz ; Richard Sandiford Subject: Re: [PATCH] Add generic support for "noinit" attribute On Wed, 14 Aug 2019 at 14:14, Richard Sandiford wrote: > > Sorry for the slow response, I'd missed that there was an updated patch... > > Christophe Lyon writes: > > 2019-07-04 Christophe Lyon > > > > * lib/target-supports.exp (check_effective_target_noinit): New > > proc. > > * gcc.c-torture/execute/noinit-attribute.c: New test. > > Second line should be indented by tabs rather than spaces. > > > @@ -2224,6 +2234,54 @@ handle_weak_attribute (tree *node, tree name, > >return NULL_TREE; > > } > > > > +/* Handle a "noinit" attribute; arguments as in struct > > + attribute_spec.handler. Check whether the attribute is allowed > > + here and add the attribute to the variable decl tree or otherwise > > + issue a diagnostic. This function checks NODE is of the expected > > + type and issues diagnostics otherwise using NAME. If it is not of > > + the expected type *NO_ADD_ATTRS will be set to true. */ > > + > > +static tree > > +handle_noinit_attribute (tree * node, > > + tree name, > > + tree args, > > + intflags ATTRIBUTE_UNUSED, > > + bool *no_add_attrs) > > +{ > > + const char *message = NULL; > > + > > + gcc_assert (DECL_P (*node)); > > + gcc_assert (args == NULL); > > + > > + if (TREE_CODE (*node) != VAR_DECL) > > +message = G_("%qE attribute only applies to variables"); > > + > > + /* Check that it's possible for the variable to have a section. > > + */ else if ((TREE_STATIC (*node) || DECL_EXTERNAL (*node) || in_lto_p) > > +&& DECL_SECTION_NAME (*node)) > > +message = G_("%qE attribute cannot be applied to variables " > > + "with specific sections"); > > + > > + if (!targetm.have_switchable_bss_sections) > > +message = G_("%qE attribute is specific to ELF targets"); > > Maybe make this an else if too? Or make the VAR_DECL an else if if > you think the ELF one should win. Either way, it seems odd to have > the mixture between else if and not. > Right, I changed this into an else if. > > + if (message) > > +{ > > + warning (OPT_Wattributes, message, name); > > + *no_add_attrs = true; > > +} > > + else > > + /* If this var is thought to be common, then change this. Common > > + variables are assigned to sections before the backend has a > > + chance to process them. Do this only if the attribute is > > + valid. */ > > Comment should be indented two spaces more. > > > +if (DECL_COMMON (*node)) > > + DECL_COMMON (*node) = 0; > > + > > + return NULL_TREE; > > +} > > + > > + > > /* Handle a "noplt" attribute; arguments as in > > struct attribute_spec.handler. */ > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index > > f2619e1..f1af1dc 100644 > > --- a/gcc/doc/extend.texi > > +++ b/gcc/doc/extend.texi > > @@ -7129,6 +7129,14 @@ The @code{visibility} attribute is described > > in The @code{weak} attribute is described in @ref{Common Function > > Attributes}. > > > > +@item noinit > > +@cindex @code{noinit} variable attribute Any data with the > > +@code{noinit} attribute will not be initialized by the C runtime > > +startup code, or the program loader. Not initializing data in this > > +way can reduce program startup times. Specific to ELF targets, > > +this attribute relies on the linker to place such data in the right > > +location. > > Maybe: > >This attribute is specific to ELF targets and relies on the linker to >place such data in the right location. > Thanks, I thought I had chosen a nice turn of phrase :-) > > diff --git a/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > > b/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > > new file mode 100644 > > index 000..ffcf8c6 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > > @@ -0,0 +1,59 @@ > > +/* { dg-do run } */ > > +/* { dg-require-effective-target noinit */ > > +/* { dg-options "-O2" } */ > > + > > +/* This test checks that noinit data is handled correctly. */ > > + > > +extern void _start (void) __attribute__ ((noreturn)); extern void > > +abort (void) __attribute__ ((noreturn)); extern void exit (int) > > +__attribute__ ((noreturn)); > > + > > +int var_common; > > +int var_zero = 0; > > +int var_one = 1; > > +int __attribute__((noinit)) var_noinit; int var_init = 2; > > + > > +int __attribute__((noinit)) func(); /* { dg-warning "attribute only > > +applies to variables" } */ int __attribute__((sectio
Re: [PATCH 2/3] C++20 constexpr lib part 2/3 - swappish functions.
On 14/08/19 11:06 -0400, Ed Smith-Rowland wrote: I this OK after it passes testing? Ed 2019-08-14 Edward Smith-Rowland <3dw...@verizon.net> Implement C++20 p0879 - Constexpr for swap and swap related functions. * include/bits/algorithmfwd.h (__cpp_lib_constexpr_swap_algorithms): New macro. * include/std/version: Ditto. It looks like this line was inserted in the wrong place, as the lines that follow it are not part of . The entry for include/std/version should be after include/std/type_traits. (iter_swap, make_heap, next_permutation, partial_sort_copy, pop_heap) (prev_permutation, push_heap, reverse, rotate, sort_heap, swap) (swap_ranges, nth_element, partial_sort, sort): Add constexpr. * include/bits/move.h (swap): Add constexpr. * include/bits/stl_algo.h (__move_median_to_first, __reverse, reverse) (__gcd, __rotate, rotate, __partition, __heap_select) (__partial_sort_copy, partial_sort_copy, __unguarded_partition) (__unguarded_partition_pivot, __partial_sort, __introsort_loop, __sort) (__introselect, __chunk_insertion_sort, next_permutation) (prev_permutation, partition, partial_sort, nth_element, sort) (__iter_swap::iter_swap, iter_swap, swap_ranges): Add constexpr. * include/bits/stl_algobase.h (__iter_swap::iter_swap, iter_swap) (swap_ranges): Add constexpr. * include/bits/stl_heap.h (__push_heap, push_heap, __adjust_heap, __pop_heap, pop_heap, __make_heap, make_heap, __sort_heap, sort_heap): Add constexpr. * include/std/type_traits (swap): Add constexpr. i.e. here. OK for trunk with that change if testing passes. Thanks!
Re: [PATCH] fold more string comparison with known result (PR 90879)
On 8/12/19 7:40 AM, Michael Matz wrote: Hi, On Fri, 9 Aug 2019, Martin Sebor wrote: The solution introduced in C99 is a flexible array. C++ compilers usually support it as well. Those that don't are likely to support the zero-length array (even Visual C++ does). If there's a chance that some don't support either do you really think it's safe to assume they will do something sane with the [1] hack? As the [1] "hack" is the traditional pre-C99 (and C++) idiom to implement flexible trailing char arrays, yes, I do expect all existing (and not any more existing) compilers to do the obvious and sane thing with it. IOW: it's more portable in practice than our documented zero-length extension. And that's what matters for the things compiled by the host compiler. Without requiring C99 (which would be a different discussion) and a non-existing C++ standard we can't write this code (in this form) in a standard conforming way, no matter what we wish for. Hence it seems prudent to use the most portable variant of all the non-standard ways, the trailing [1] array. There are a few reasons why these legacy C idioms should be replaced with better/newer/safer alternatives. First, with two C revisions since C99 and with support for superior alternatives widely available, pre-C99 idioms have less and less relevance. Second, since most of GCC requires a C++98 compiler to compile, ancient C code needs to adjust to the more strict C++ requirements. As C++ evolves, dependencies on legacy extensions like this one make it increasingly difficult to upgrade to newer revisions of the standard. C++ 11 already requires compilers to reject undefined behavior in constexpr contexts, including accesses to arrays outside of their bounds. Once GCC adopts C++ 11 it won't be able to make use of constexpr with code that relies on the hack. Third, the safest and most secure approach to dealing with past- the-end accesses is to diagnose and prevent them. Accommodating code that disregards the array bounds compromises this goal. This is evident from the gaps in _FORTIFY_SOURCE and -Wstringop-overflow that other compilers like Clang and ICC don't suffer from(*). It's in everyone's best interest to proactively drive them to extinction and replace them by safer alternatives that let compilers distinguish the intentional accesses from accidental ones. It not only makes it easier to find bugs but also emit more efficient object code. Martin PS Unlike GCC, both Clang and ICC diagnose past-the-end accesses to trailing arrays with more than one element. They do recognize the struct hack even in C++ and, outside constexpr contexts, avoid diagnosing past-the-end accesses to trailing one-element arrays. This isn't so much an issue today because neither allows statically initializing struct objects with such arrays to more elements than the bound specifies. But it will likely change when the C++ proposal for constexpr functions to use new expressions is adopted (P0784R1).
Re: [PATCH 0/3] Libsanitizer: merge from trunk
On 8/14/19 2:50 AM, Martin Liška wrote: > On 8/13/19 5:02 PM, Jeff Law wrote: >> On 8/13/19 7:07 AM, Martin Liska wrote: >>> Hi. >>> >>> For this year, I decided to make a first merge now and the >>> next (much smaller) at the end of October. >>> >>> The biggest change is rename of many files from .cc to .cpp. >>> >>> I bootstrapped the patch set on x86_64-linux-gnu and run >>> asan/ubsan/tsan tests on x86_64, ppc64le (power8) and >>> aarch64. >>> >>> Libasan SONAME has been already bumped compared to GCC 9. >>> >>> For other libraries, I don't see a reason for library bumping: >>> >>> $ abidiff /usr/lib64/libubsan.so.1.0.0 >>> ./x86_64-pc-linux-gnu/libsanitizer/ubsan/.libs/libubsan.so.1.0.0 --stat >>> Functions changes summary: 0 Removed, 0 Changed, 4 Added functions >>> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable >>> Function symbols changes summary: 3 Removed, 0 Added function symbols not >>> referenced by debug info >>> Variable symbols changes summary: 0 Removed, 0 Added variable symbol not >>> referenced by debug info >>> >>> $ abidiff /usr/lib64/libtsan.so.0.0.0 >>> ./x86_64-pc-linux-gnu/libsanitizer/tsan/.libs/libtsan.so.0.0.0 --stat >>> Functions changes summary: 0 Removed, 0 Changed, 47 Added functions >>> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable >>> Function symbols changes summary: 1 Removed, 2 Added function symbols not >>> referenced by debug info >>> Variable symbols changes summary: 0 Removed, 0 Added variable symbol not >>> referenced by debug info >>> >>> Ready to be installed? >> ISTM that a sanitizer merge during stage1 should be able to move forward >> without ACKs. Similarly for other runtimes where we pull from some >> upstream master. > > Good then. I've just installed the patch and also the refresh of > LOCAL_PATCHES. Sounds good. My tester will spin them on a variety of platforms over the next couple days. I won't be at all surprised if the MIPS bits are still flakey. Jeff
Re: [PATCH] fix and improve strlen conditional handling of merged stores (PR 91183, 91294, 91315)
On 8/12/19 1:57 PM, Jeff Law wrote: On 8/9/19 5:42 PM, Martin Sebor wrote: @@ -3408,7 +3457,13 @@ static bool } gimple *stmt = SSA_NAME_DEF_STMT (exp); - if (gimple_code (stmt) != GIMPLE_PHI) + if (gimple_assign_single_p (stmt)) + { + tree rhs = gimple_assign_rhs1 (stmt); + return count_nonzero_bytes (rhs, offset, nbytes, lenrange, nulterm, + allnul, allnonnul, snlim); + } + else if (gimple_code (stmt) != GIMPLE_PHI) return false; What cases are you handling here? Are there any cases where a single operand expression on the RHS affects the result. For example, if we've got a NOP_EXPR which zero extends RHS? Does that change the nonzero bytes in a way that is significant? I'm not opposed to handling single operand objects here, I'm just concerned that we're being very lenient in just stripping away the operator and looking at the underlying object. I remember adding the code because of a test failure but not the specifics anymore. No tests fail with it removed so it may not be needed. As you know, I've been juggling a few enhancements in this area and copying code between them as I need it so it's possible that I copied too much, or that some other change has obviated it, or also that the test failed somewhere else and I forgot to copy the test along with the code I'll remove it until it's needed. Let's pull it for now. If we come across the need again, we can obviously revisit with a testcase. @@ -3795,7 +3824,14 @@ handle_store (gimple_stmt_iterator *gsi) } else si->nonzero_chars = build_int_cst (size_type_node, offset); - si->full_string_p = full_string_p; + + /* Set FULL_STRING_P only if the length of the strings being + written is the same, and clear it if the strings have + different lengths. In the latter case the length stored + in si->NONZERO_CHARS becomes the lower bound. + FIXME: Handle the upper bound of the length if possible. */ + si->full_string_p = full_string_p && lenrange[0] == lenrange[1]; So there seems to be a disconnect between the comment and the code. The comment indicates you care about the lengths of the two strings being the same. But what you're really comparing when the lenrange[0] == lenrange[1] test is that the min and max of RHS are the same. The comment tries to make clear that all the arrays on the RHS of the assignment must have the same length in order to set FULL_STRING_P. Like here where LENRANGE = { 4, 4, 4 }: void f (char *s) { if (__builtin_strlen (s) != 2) return; *(int*)a = i ? 0x : 0x; } but not here where LENRANGE = { 1, 4, 4 }: *(int*)a = i < 0 ? 0x : i ? 0x0022 : 0x3300; If the bounds of the range of lengths of all the strings on the RHS are the same they're all the same length. I'm open to phrasing it better. Oh, I think I see what I was missing. In the case where RHS is a conditional (or perhaps a SSA_NAME which was set from a PHI) LENRANGE will have the min/max/# bytes for the RHS was a whole, not just a single component of the RHS. It generally looks reasonable, so I think we just need to reach a conclusion on the gimple_assign_single_p cases we're trying to handle and the possible mismatch between the comment and the code. Do you want me to post another revision with the gimple_assign_single_p test removed? I think remove that hunk, bootstrap, test, commit and post for archival purposes. I do not think another round of review is necessary. Done in r274486 (also attached). Martin Index: gcc/tree-ssa-strlen.c === --- gcc/tree-ssa-strlen.c (revision 274485) +++ gcc/tree-ssa-strlen.c (revision 274486) @@ -1195,14 +1195,13 @@ adjust_last_stmt (strinfo *si, gimple *stmt, bool to constants. */ tree -set_strlen_range (tree lhs, wide_int max, tree bound /* = NULL_TREE */) +set_strlen_range (tree lhs, wide_int min, wide_int max, + tree bound /* = NULL_TREE */) { if (TREE_CODE (lhs) != SSA_NAME || !INTEGRAL_TYPE_P (TREE_TYPE (lhs))) return NULL_TREE; - wide_int min = wi::zero (max.get_precision ()); - if (bound) { /* For strnlen, adjust MIN and MAX as necessary. If the bound @@ -1312,7 +1311,8 @@ maybe_set_strlen_range (tree lhs, tree src, tree b } } - return set_strlen_range (lhs, max, bound); + wide_int min = wi::zero (max.get_precision ()); + return set_strlen_range (lhs, min, max, bound); } /* Handle a strlen call. If strlen of the argument is known, replace @@ -1434,6 +1434,12 @@ handle_builtin_strlen (gimple_stmt_iterator *gsi) tree adj = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (lhs), lhs, old); adjust_related_strinfos (loc, si, adj); + /* Use the constant minimim length as the lower bound + of the non-constant length. */ + wide_int min =
Re: [PATCH] fix and improve strlen conditional handling of merged stores (PR 91183, 91294, 91315)
Do you want me to post another revision with the gimple_assign_single_p test removed? I think remove that hunk, bootstrap, test, commit and post for archival purposes. I do not think another round of review is necessary. Done in r274486 (also attached). I should add that the early store merging on loosely aligned targets makes the strlen tests prone to failures on strictly aligned targets (or even ILP32 targets). The handle_store function can now deal with all sorts of MEM_REF assignments that result from the store merging, but because the handle_builtin_memcpy function lacks the same support, the tests that expect the assignments to be folded fail when they are not merged. For example, the strlen call below is folded on i386: const char a4[32] = "0123"; const char b4[32] = "3210"; void f (int i) { char a[32]; memcpy (a, i ? a4 + 1 : b4, 8); // copy just 8 bytes if (strlen (a) < 3) abort (); } but the equivalent call below is not: void g (int i) { char a[32]; memcpy (a, i ? a4 + 1 : b4, 16); // copy 16 bytes if (strlen (a) < 3) abort (); } This pattern may not be very common in the wild but having the pass behave consistently without these target dependencies would be helpful in avoiding these test failures. I will try to remember to extend the same enhancement as in handle_store to handle_builtin_memcpy (ideally by factoring the code out of handle_store into a helper and letting both functions call it to do the folding). Martin
Re: [SVE] PR86753
On Wed, Aug 14, 2019 at 5:06 PM Prathamesh Kulkarni wrote: > > Hi, > The attached patch tries to fix PR86753. > > For following test: > void > f1 (int *restrict x, int *restrict y, int *restrict z) > { > for (int i = 0; i < 100; ++i) > x[i] = y[i] ? z[i] : 10; > } > > vect dump shows: > vect_cst__42 = { 0, ... }; > vect_cst__48 = { 0, ... }; > > vect__4.7_41 = .MASK_LOAD (vectp_y.5_38, 4B, loop_mask_40); > _4 = *_3; > _5 = z_12(D) + _2; > mask__35.8_43 = vect__4.7_41 != vect_cst__42; > _35 = _4 != 0; > vec_mask_and_46 = mask__35.8_43 & loop_mask_40; > vect_iftmp.11_47 = .MASK_LOAD (vectp_z.9_44, 4B, vec_mask_and_46); > iftmp.0_13 = 0; > vect_iftmp.12_50 = VEC_COND_EXPR vect_iftmp.11_47, vect_cst__49>; > > and following code-gen: > L2: > ld1wz0.s, p2/z, [x1, x3, lsl 2] > cmpne p1.s, p3/z, z0.s, #0 > cmpne p0.s, p2/z, z0.s, #0 > ld1wz0.s, p0/z, [x2, x3, lsl 2] > sel z0.s, p1, z0.s, z1.s > > We could reuse vec_mask_and_46 in vec_cond_expr since the conditions > vect__4.7_41 != vect_cst__48 and vect__4.7_41 != vect_cst__42 > are equivalent, and vect_iftmp.11_47 depends on vect__4.7_41 != vect_cst__48. > > I suppose in general for vec_cond_expr if T comes from masked load, > which is conditional on C, then we could reuse the mask used in load, > in vec_cond_expr ? > > The patch maintains a hash_map cond_to_vec_mask > from vec_mask (with loop predicate applied). > In prepare_load_store_mask, we record -> vec_mask & > loop_mask, > and in vectorizable_condition, we check if exists in > cond_to_vec_mask > and if found, the corresponding vec_mask is used as 1st operand of > vec_cond_expr. > > is represented with cond_vmask_key, and the patch > adds tree_cond_ops to represent condition operator and operands coming > either from cond_expr > or a gimple comparison stmt. If the stmt is not comparison, it returns > and inserts that into cond_to_vec_mask. > > With patch, the redundant p1 is eliminated and sel uses p0 for above test. > > For following test: > void > f2 (int *restrict x, int *restrict y, int *restrict z, int fallback) > { > for (int i = 0; i < 100; ++i) > x[i] = y[i] ? z[i] : fallback; > } > > input to vectorizer has operands swapped in cond_expr: > _36 = _4 != 0; > iftmp.0_14 = .MASK_LOAD (_5, 32B, _36); > iftmp.0_8 = _4 == 0 ? fallback_12(D) : iftmp.0_14; > > So we need to check for inverted condition in cond_to_vec_mask, > and swap the operands. > Does the patch look OK so far ? > > One major issue remaining with the patch is value numbering. > Currently, it does value numbering for entire function using sccvn > during start of vect pass, which is too expensive since we only need > block based VN. I am looking into that. Why do you need it at all? We run VN on the if-converted loop bodies btw. Richard. > > Thanks, > Prathamesh
Re: [SVE] PR86753
On Wed, Aug 14, 2019 at 6:49 PM Richard Biener wrote: > > On Wed, Aug 14, 2019 at 5:06 PM Prathamesh Kulkarni > wrote: > > > > Hi, > > The attached patch tries to fix PR86753. > > > > For following test: > > void > > f1 (int *restrict x, int *restrict y, int *restrict z) > > { > > for (int i = 0; i < 100; ++i) > > x[i] = y[i] ? z[i] : 10; > > } > > > > vect dump shows: > > vect_cst__42 = { 0, ... }; > > vect_cst__48 = { 0, ... }; > > > > vect__4.7_41 = .MASK_LOAD (vectp_y.5_38, 4B, loop_mask_40); > > _4 = *_3; > > _5 = z_12(D) + _2; > > mask__35.8_43 = vect__4.7_41 != vect_cst__42; > > _35 = _4 != 0; > > vec_mask_and_46 = mask__35.8_43 & loop_mask_40; > > vect_iftmp.11_47 = .MASK_LOAD (vectp_z.9_44, 4B, vec_mask_and_46); > > iftmp.0_13 = 0; > > vect_iftmp.12_50 = VEC_COND_EXPR > vect_iftmp.11_47, vect_cst__49>; > > > > and following code-gen: > > L2: > > ld1wz0.s, p2/z, [x1, x3, lsl 2] > > cmpne p1.s, p3/z, z0.s, #0 > > cmpne p0.s, p2/z, z0.s, #0 > > ld1wz0.s, p0/z, [x2, x3, lsl 2] > > sel z0.s, p1, z0.s, z1.s > > > > We could reuse vec_mask_and_46 in vec_cond_expr since the conditions > > vect__4.7_41 != vect_cst__48 and vect__4.7_41 != vect_cst__42 > > are equivalent, and vect_iftmp.11_47 depends on vect__4.7_41 != > > vect_cst__48. > > > > I suppose in general for vec_cond_expr if T comes from masked > > load, > > which is conditional on C, then we could reuse the mask used in load, > > in vec_cond_expr ? > > > > The patch maintains a hash_map cond_to_vec_mask > > from vec_mask (with loop predicate applied). > > In prepare_load_store_mask, we record -> vec_mask & > > loop_mask, > > and in vectorizable_condition, we check if exists in > > cond_to_vec_mask > > and if found, the corresponding vec_mask is used as 1st operand of > > vec_cond_expr. > > > > is represented with cond_vmask_key, and the patch > > adds tree_cond_ops to represent condition operator and operands coming > > either from cond_expr > > or a gimple comparison stmt. If the stmt is not comparison, it returns > > and inserts that into cond_to_vec_mask. > > > > With patch, the redundant p1 is eliminated and sel uses p0 for above test. > > > > For following test: > > void > > f2 (int *restrict x, int *restrict y, int *restrict z, int fallback) > > { > > for (int i = 0; i < 100; ++i) > > x[i] = y[i] ? z[i] : fallback; > > } > > > > input to vectorizer has operands swapped in cond_expr: > > _36 = _4 != 0; > > iftmp.0_14 = .MASK_LOAD (_5, 32B, _36); > > iftmp.0_8 = _4 == 0 ? fallback_12(D) : iftmp.0_14; > > > > So we need to check for inverted condition in cond_to_vec_mask, > > and swap the operands. > > Does the patch look OK so far ? > > > > One major issue remaining with the patch is value numbering. > > Currently, it does value numbering for entire function using sccvn > > during start of vect pass, which is too expensive since we only need > > block based VN. I am looking into that. > > Why do you need it at all? We run VN on the if-converted loop bodies btw. Also I can't trivially see the equality of the masks and probably so can't VN. Is it that we just don't bother to apply loop_mask to VEC_COND but there's no harm if we do? Richard. > Richard. > > > > > Thanks, > > Prathamesh
Re: types for VR_VARYING
On 8/14/19 8:15 AM, Aldy Hernandez wrote: > > > On 8/14/19 9:50 AM, Andrew MacLeod wrote: >> On 8/13/19 8:39 PM, Aldy Hernandez wrote: >>> >>> >>> Yes, it was 2X. >>> >>> I noticed that Richi made some changes to the lattice handling for >>> VARYING while the discussion was on-going. I missed these, and had >>> failed to adapt the patch for it. I would appreciate a final review >>> of the attached patch, especially the vr-values.c changes, which I >>> have modified to play nice with current trunk. >>> >>> I also noticed that Andrew's patch was setting num_vr_values to >>> num_ssa_names + num_ssa_names / 10. I think he meant num_vr_values + >>> num_vr_values / 10. Please verify the current incantation makes sense. >>> >> no, I meant num_ssa_names. We are resizing the vector because >> num_vr_values is out of date (and smaller than num_ssa_names is now), >> so we need to resize the vector to be at least the number of >> ssa-names... and I added 10% just in case we arent done adding new ones. >> >> >> if num_vr_values is 100, and we've added 200 ssa-names, num_ssa_names >> would now be 300. if you resize based on num_vr_values, you could >> still go off the end of the vector. > > OK, I've changed the resize to allocate 2X as well. So now we'll have: > > + unsigned int old_sz = num_vr_values; > + num_vr_values = num_ssa_names * 2; > + vr_value = XRESIZEVEC (value_range *, vr_value, num_vr_values); > etc > > And the original allocation will also be 2X. I don't think we want the resize to be 2X, we've tried to get away from those kinds of growth patterns. The 10% from Andrew's patch seems like a better choice for the resize. jeff
Re: [PATCH] Make GIMPLE forwprop DCE dead stmts
On 8/14/19 7:36 AM, Richard Biener wrote: > > The following patch makes forwprop DCE the stmts that become dead > because of propagation of copies and constants. For this to work > we actually have to do that reliably rather than relying on > fold_stmt doing this for us. > > This hits fortran/trans-intrinsic.c in a way that we do "interesting" > jump threading exposing a bogus uninit warning. I'll open a PR > for this with an (unreduced) testcase after committing. Feel free to mark it as a regression, if for no other reason than that guarantees that I look at it during stage3/stage4. I can adjust the marker at that time based on what I find. jeff
Re: [PATCH 1/2] Add ::verify for cgraph_node::origin/nested/next_nested.
On 8/14/19 5:15 AM, Martin Liska wrote: > > gcc/ChangeLog: > > 2019-08-14 Martin Liska > > * cgraph.c (cgraph_node::verify_node): Verify origin, nested > and next_nested. > --- > gcc/cgraph.c | 24 > 1 file changed, 24 insertions(+) > OK. Jeff
Re: [PATCH 2/2] Clean next_nested properly.
On 8/14/19 5:17 AM, Martin Liska wrote: > > gcc/ChangeLog: > > 2019-08-14 Martin Liska > > PR ipa/91438 > * cgraph.c (cgraph_node::remove): When setting > n->origin = NULL for all nested functions, reset > also next_nested. > --- > gcc/cgraph.c | 11 +++ > 1 file changed, 7 insertions(+), 4 deletions(-) > OK jeff
Re: [PATCH] Add generic support for "noinit" attribute
On Wed, 14 Aug 2019 at 17:59, Tamar Christina wrote: > > Hi Christoph, > > The noinit testcase is currently failing on x86_64. > > Is the test supposed to be running there? > No, there's an effective-target to skip it. But I notice a typo: +/* { dg-require-effective-target noinit */ (missing closing brace) Could it explain why it's failing on x86_64 ? > Thanks, > Tamar > > -Original Message- > From: gcc-patches-ow...@gcc.gnu.org On Behalf > Of Christophe Lyon > Sent: Wednesday, August 14, 2019 2:18 PM > To: Christophe Lyon ; Martin Sebor > ; gcc Patches ; Richard Earnshaw > ; ni...@redhat.com; Jozef Lawrynowicz > ; Richard Sandiford > Subject: Re: [PATCH] Add generic support for "noinit" attribute > > On Wed, 14 Aug 2019 at 14:14, Richard Sandiford > wrote: > > > > Sorry for the slow response, I'd missed that there was an updated patch... > > > > Christophe Lyon writes: > > > 2019-07-04 Christophe Lyon > > > > > > * lib/target-supports.exp (check_effective_target_noinit): New > > > proc. > > > * gcc.c-torture/execute/noinit-attribute.c: New test. > > > > Second line should be indented by tabs rather than spaces. > > > > > @@ -2224,6 +2234,54 @@ handle_weak_attribute (tree *node, tree name, > > >return NULL_TREE; > > > } > > > > > > +/* Handle a "noinit" attribute; arguments as in struct > > > + attribute_spec.handler. Check whether the attribute is allowed > > > + here and add the attribute to the variable decl tree or otherwise > > > + issue a diagnostic. This function checks NODE is of the expected > > > + type and issues diagnostics otherwise using NAME. If it is not of > > > + the expected type *NO_ADD_ATTRS will be set to true. */ > > > + > > > +static tree > > > +handle_noinit_attribute (tree * node, > > > + tree name, > > > + tree args, > > > + intflags ATTRIBUTE_UNUSED, > > > + bool *no_add_attrs) > > > +{ > > > + const char *message = NULL; > > > + > > > + gcc_assert (DECL_P (*node)); > > > + gcc_assert (args == NULL); > > > + > > > + if (TREE_CODE (*node) != VAR_DECL) > > > +message = G_("%qE attribute only applies to variables"); > > > + > > > + /* Check that it's possible for the variable to have a section. > > > + */ else if ((TREE_STATIC (*node) || DECL_EXTERNAL (*node) || in_lto_p) > > > +&& DECL_SECTION_NAME (*node)) > > > +message = G_("%qE attribute cannot be applied to variables " > > > + "with specific sections"); > > > + > > > + if (!targetm.have_switchable_bss_sections) > > > +message = G_("%qE attribute is specific to ELF targets"); > > > > Maybe make this an else if too? Or make the VAR_DECL an else if if > > you think the ELF one should win. Either way, it seems odd to have > > the mixture between else if and not. > > > Right, I changed this into an else if. > > > > + if (message) > > > +{ > > > + warning (OPT_Wattributes, message, name); > > > + *no_add_attrs = true; > > > +} > > > + else > > > + /* If this var is thought to be common, then change this. Common > > > + variables are assigned to sections before the backend has a > > > + chance to process them. Do this only if the attribute is > > > + valid. */ > > > > Comment should be indented two spaces more. > > > > > +if (DECL_COMMON (*node)) > > > + DECL_COMMON (*node) = 0; > > > + > > > + return NULL_TREE; > > > +} > > > + > > > + > > > /* Handle a "noplt" attribute; arguments as in > > > struct attribute_spec.handler. */ > > > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index > > > f2619e1..f1af1dc 100644 > > > --- a/gcc/doc/extend.texi > > > +++ b/gcc/doc/extend.texi > > > @@ -7129,6 +7129,14 @@ The @code{visibility} attribute is described > > > in The @code{weak} attribute is described in @ref{Common Function > > > Attributes}. > > > > > > +@item noinit > > > +@cindex @code{noinit} variable attribute Any data with the > > > +@code{noinit} attribute will not be initialized by the C runtime > > > +startup code, or the program loader. Not initializing data in this > > > +way can reduce program startup times. Specific to ELF targets, > > > +this attribute relies on the linker to place such data in the right > > > +location. > > > > Maybe: > > > >This attribute is specific to ELF targets and relies on the linker to > >place such data in the right location. > > > Thanks, I thought I had chosen a nice turn of phrase :-) > > > > > diff --git a/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > > > b/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > > > new file mode 100644 > > > index 000..ffcf8c6 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.c-torture/execute/noinit-attribute.c > > > @@ -0,0 +1,59 @@ > > > +/* { dg-do run } */ > > > +/* { dg-require-effective-target noinit */ > > > +/* { dg-options "-O2" } */ > > > + > > > +/* This test checks that noinit dat
Re: [PATCH] Automatics in equivalence statements
On 8/14/19 2:45 AM, Mark Eggleston wrote: > I now have commit access. > > gcc/fortran > > Jeff Law > Mark Eggleston > > * gfortran.h: Add gfc_check_conflict declaration. > * symbol.c (check_conflict): Rename cfg_check_conflict and remove > static. > * symbol.c (cfg_check_conflict): Remove automatic in equivalence > conflict check. > * symbol.c (save_symbol): Add check for in equivalence to stop the > the save attribute being added. > * trans-common.c (build_equiv_decl): Add is_auto parameter and > add !is_auto to condition where TREE_STATIC (decl) is set. > * trans-common.c (build_equiv_decl): Add local variable is_auto, > set it true if an atomatic attribute is encountered in the variable > list. Call build_equiv_decl with is_auto as an additional parameter. > flag_dec_format_defaults is enabled. > * trans-common.c (accumulate_equivalence_attributes) : New subroutine. > * trans-common.c (find_equivalence) : New local variable dummy_symbol, > accumulated equivalence attributes from each symbol then check for > conflicts. > > gcc/testsuite > > Mark Eggleston > > * gfortran.dg/auto_in_equiv_1.f90: New test. > * gfortran.dg/auto_in_equiv_2.f90: New test. > * gfortran.dg/auto_in_equiv_3.f90: New test. > > OK to commit? > > How do I know that I have approval to commit? Yes, this is OK to commit. Steve acked it in a private message to me. Normally you'll get an ACK/OK on the public list. But private ACKs or ACKs on IRC also count as approval :-) jeff
Re: Rewrite some jump.c routines to use flags
On Fri, 12 Jul 2019, Richard Sandiford wrote: > At least AIUI, __builtin_isunordered etc. don't raise an exception even > for signalling NaNs. __builtin_isunordered should raise "invalid" for signaling NaNs. (isunordered is the IEEE 754 operation compareQuietUnordered, and IEEE 754 specifies for comparisons that "Invalid operation is the only exception that a comparison predicate can signal. All predicates signal the invalid operation exception on signaling NaN operands. The predicates named Quiet shall not signal any exception, unless an operand is a signaling NaN. The predicates named Signaling shall signal the invalid operation exception on quiet NaN operands.".) Note that __builtin_isunordered (x, x) is thus not the same as __builtin_isnan (x), because isnan binds to isNaN and isNaN is a non-computational operation for which IEEE 754 specifies "Implementations shall provide the following non-computational operations for all supported arithmetic formats and should provide them for all supported interchange formats. They are never exceptional, even for signaling NaNs.". -- Joseph S. Myers jos...@codesourcery.com
[PATCH] Simplify and generalize rust-demangle's unescaping logic.
Previously, rust-demangle.c was special-casing a fixed number of '$uXY$' escapes, but 'XY' can technically be any hex value, representing some Unicode codepoint. This patch adds more general support for '$u...$' escapes, similar to https://github.com/alexcrichton/rustc-demangle/pull/29, but only for the the ASCII subset. More complete Unicode support may come at a later time, but right now I want to keep it simple. Escapes that decode to ASCII control codes are considered invalid, as the Rust compiler should never emit them, and to avoid any undesirable effects from accidentally outputting a control code. Additionally, the switch statements, which had one case for each alphanumeric character, were replaced with if-else chains. Bootstrapped and tested on x86_64-unknown-linux-gnu. 2019-08-14 Eduard-Mihai Burtescu libiberty/ChangeLog: * rust-demangle.c (unescape): Remove. (parse_lower_hex_nibble): New function. (parse_legacy_escape): New function. (is_prefixed_hash): Use parse_lower_hex_nibble. (looks_like_rust): Use parse_legacy_escape. (rust_demangle_sym): Use parse_legacy_escape. * testsuite/rust-demangle-expected: Add 'llv$u6d$' test. diff --git a/libiberty/rust-demangle.c b/libiberty/rust-demangle.c index 2302db45b6f..da591902db1 100644 --- a/libiberty/rust-demangle.c +++ b/libiberty/rust-demangle.c @@ -50,7 +50,7 @@ extern void *memset(void *s, int c, size_t n); #include "rust-demangle.h" -/* Mangled Rust symbols look like this: +/* Mangled (legacy) Rust symbols look like this: _$LT$std..sys..fd..FileDesc$u20$as$u20$core..ops..Drop$GT$::drop::hc68340e1baa4987a The original symbol is: @@ -74,16 +74,7 @@ extern void *memset(void *s, int c, size_t n); ">" => $GT$ "(" => $LP$ ")" => $RP$ - " " => $u20$ - "\"" => $u22$ - "'" => $u27$ - "+" => $u2b$ - ";" => $u3b$ - "[" => $u5b$ - "]" => $u5d$ - "{" => $u7b$ - "}" => $u7d$ - "~" => $u7e$ + "\u{XY}" => $uXY$ A double ".." means "::" and a single "." means "-". @@ -95,7 +86,8 @@ static const size_t hash_len = 16; static int is_prefixed_hash (const char *start); static int looks_like_rust (const char *sym, size_t len); -static int unescape (const char **in, char **out, const char *seq, char value); +static int parse_lower_hex_nibble (char nibble); +static char parse_legacy_escape (const char **in); /* INPUT: sym: symbol that has been through C++ (gnu v3) demangling @@ -149,7 +141,7 @@ is_prefixed_hash (const char *str) const char *end; char seen[16]; size_t i; - int count; + int count, nibble; if (strncmp (str, hash_prefix, hash_prefix_len)) return 0; @@ -157,12 +149,12 @@ is_prefixed_hash (const char *str) memset (seen, 0, sizeof(seen)); for (end = str + hash_len; str < end; str++) -if (*str >= '0' && *str <= '9') - seen[*str - '0'] = 1; -else if (*str >= 'a' && *str <= 'f') - seen[*str - 'a' + 10] = 1; -else - return 0; +{ + nibble = parse_lower_hex_nibble (*str); + if (nibble < 0) +return 0; + seen[nibble] = 1; +} /* Count how many distinct digits seen */ count = 0; @@ -179,57 +171,17 @@ looks_like_rust (const char *str, size_t len) const char *end = str + len; while (str < end) -switch (*str) - { - case '$': - if (!strncmp (str, "$C$", 3)) - str += 3; - else if (!strncmp (str, "$SP$", 4) -|| !strncmp (str, "$BP$", 4) -|| !strncmp (str, "$RF$", 4) -|| !strncmp (str, "$LT$", 4) -|| !strncmp (str, "$GT$", 4) -|| !strncmp (str, "$LP$", 4) -|| !strncmp (str, "$RP$", 4)) - str += 4; - else if (!strncmp (str, "$u20$", 5) -|| !strncmp (str, "$u22$", 5) -|| !strncmp (str, "$u27$", 5) -|| !strncmp (str, "$u2b$", 5) -|| !strncmp (str, "$u3b$", 5) -|| !strncmp (str, "$u5b$", 5) -|| !strncmp (str, "$u5d$", 5) -|| !strncmp (str, "$u7b$", 5) -|| !strncmp (str, "$u7d$", 5) -|| !strncmp (str, "$u7e$", 5)) - str += 5; - else - return 0; - break; - case '.': - /* Do not allow three or more consecutive dots */ - if (!strncmp (str, "...", 3)) - return 0; - /* Fall through */ - case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': - case 'g': case 'h': case 'i': case 'j': case 'k': case 'l': - case 'm': case 'n': case 'o': case 'p': case 'q': case 'r': - case 's': case 't': case 'u': case 'v': case 'w': case 'x': - case 'y': case 'z': - case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': - case 'G': case 'H': case 'I': case 'J': case 'K': case 'L': - case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': - case 'S': c
Re: types for VR_VARYING
On 8/13/19 6:39 PM, Aldy Hernandez wrote: > > > On 8/12/19 7:46 PM, Jeff Law wrote: >> On 8/12/19 12:43 PM, Aldy Hernandez wrote: >>> This is a fresh re-post of: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2019-07/msg6.html >>> >>> Andrew gave me some feedback a week ago, and I obviously don't remember >>> what it was because I was about to leave on PTO. However, I do remember >>> I addressed his concerns before getting drunk on rum in tropical islands. >>> >> FWIW found a great coffee infused rum while in Kauai last week. I'm not >> a coffee fan, but it was wonderful. The one bottle we brought back >> isn't going to last until Cauldron and I don't think I can get a special >> order filled before I leave :( > > You must bring some to Cauldron before we believe you. :) That's the problem. The nearest place I can get it is in Vegas and there's no distributor in Montreal. I can special order it in our state run stores, but it won't be here in time. Of course, I don't mind if you don't believe me. More for me in that case... >> Is the supports_type_p stuff there to placate the calls from ipa-cp? I >> can live with it in the short term, but it really feels like there >> should be something in the ipa-cp client that avoids this silliness. > > I am not happy with this either, but there are various places where > statements that are !stmt_interesting_for_vrp() are still setting a > range of VARYING, which is then being ignored at a later time. > > For example, vrp_initialize: > > if (!stmt_interesting_for_vrp (phi)) > { > tree lhs = PHI_RESULT (phi); > set_def_to_varying (lhs); > prop_set_simulate_again (phi, false); > } > > Also in evrp_range_analyzer::record_ranges_from_stmt(), where we if the > statement is interesting for VRP but extract_range_from_stmt() does not > produce a useful range, we also set a varying for a range we will never > use. Similarly for a statement that is not interesting in this hunk. Ugh. One could perhaps argue that setting any kind of range in these circumstances is silly. But I suspect it's necessary due to the optimistic handling of VR_UNDEFINED in value_range_base::union_helper. It's all coming back to me now... > > Then there is vrp_prop::visit_stmt() where we also set VARYING for types > that VRP will never handle: > > case IFN_ADD_OVERFLOW: > case IFN_SUB_OVERFLOW: > case IFN_MUL_OVERFLOW: > case IFN_ATOMIC_COMPARE_EXCHANGE: > /* These internal calls return _Complex integer type, > which VRP does not track, but the immediate uses > thereof might be interesting. */ > if (lhs && TREE_CODE (lhs) == SSA_NAME) > { > imm_use_iterator iter; > use_operand_p use_p; > enum ssa_prop_result res = SSA_PROP_VARYING; > > set_def_to_varying (lhs); > > I've adjusted the patch so that set_def_to_varying will set the range to > VR_UNDEFINED if !supports_type_p. This is a fail safe, as we can't > really do anything with a nonsensical range. I just don't want to leave > the range in an indeterminate state. > I think VR_UNDEFINED is unsafe due to value_range_base::union_helper. And that's a more general than this patch. VR_UNDEFINED is _not_ a safe range to set something to if we can't handle it. We have to use VR_VARYING. Why? See the beginning of value_range_base::union_helper: /* VR0 has the resulting range if VR1 is undefined or VR0 is varying. */ if (vr1->undefined_p () || vr0->varying_p ()) return *vr0; /* VR1 has the resulting range if VR0 is undefined or VR1 is varying. */ if (vr0->undefined_p () || vr1->varying_p ()) return *vr1; This can get called for something like a = ? name1 : name2; If name1 was set to VR_UNDEFINED thinking that VR_UNDEFINED was a safe value for something we can't handle, then we'll incorrectly return the range for name2. VR_UNDEFINED can only be used for the ranges of objects we haven't processed. If we can't produce a range for an object because the statement is something we don't handle or just doesn't produce anythign useful, then the right result is VR_VARYING. This may be worth commenting at the definition site for VR_*. > > I also noticed that Andrew's patch was setting num_vr_values to > num_ssa_names + num_ssa_names / 10. I think he meant num_vr_values + > num_vr_values / 10. Please verify the current incantation makes sense. Going to assume this will be adjusted per the other messages in this thread. > diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c > index 39ea22f0554..663dd6e2398 100644 > --- a/gcc/tree-ssa-threadedge.c > +++ b/gcc/tree-ssa-threadedge.c > @@ -182,8 +182,10 @@ record_temporary_equivalences_from_phis (edge e, > new_vr->deep_copy (vr_values->get_value_range (src)); > else if (TREE_CODE (src) == INTEGER_CST) > new_vr->set (src); > + else if (value_range_base::supports_
[COMMITTED] Set memory alignment in expand_builtin_init_descriptor
Committed as r274487 with approval in https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00974.html Index: gcc/builtins.c === --- gcc/builtins.c (revision 274486) +++ gcc/builtins.c (revision 274487) @@ -5756,6 +5756,7 @@ expand_builtin_init_descriptor (tree exp) r_descr = expand_normal (t_descr); m_descr = gen_rtx_MEM (BLKmode, r_descr); MEM_NOTRAP_P (m_descr) = 1; + set_mem_align (m_descr, GET_MODE_ALIGNMENT (ptr_mode)); r_func = expand_normal (t_func); r_chain = expand_normal (t_chain); Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 274486) +++ gcc/ChangeLog (revision 274487) @@ -1,3 +1,7 @@ +2019-08-14 Bernd Edlinger + + * builtins.c (expand_builtin_init_descriptor): Set memory alignment. + 2019-08-14 Martin Sebor PR tree-optimization/91294 Thanks Bernd.
Re: [PATCH 1/2] PR c++/91436 fix C++ dialect for std::make_unique fix-it hint
On Wed, 2019-08-14 at 16:53 +0100, Jonathan Wakely wrote: > On 14/08/19 10:39 -0400, David Malcolm wrote: > > On Wed, 2019-08-14 at 12:02 +0100, Jonathan Wakely wrote: > > > On 13/08/19 16:07 -0400, Jason Merrill wrote: > > > > On 8/13/19 9:32 AM, Jonathan Wakely wrote: > > > > > * g++.dg/lookup/missing-std-include-6.C: Don't check > > > > > make_unique in > > > > > test that runs for C++11. > > > > > > > > I'm not comfortable removing this test coverage > > > > entirely. Doesn't > > > > it > > > > give a useful diagnostic in C++11 mode as well? > > > > > > It does: > > > > > > mu.cc:3:15: error: 'make_unique' is not a member of 'std' > > > 3 | auto p = std::make_unique(); > > > | ^~~ > > > mu.cc:3:15: note: 'std::make_unique' is only available from C++14 > > > onwards > > > mu.cc:3:27: error: expected primary-expression before 'int' > > > 3 | auto p = std::make_unique(); > > > | ^~~ > > > > > > So we can add it to g++.dg/lookup/missing-std-include-8.C > > > instead, > > > which runs for c++98_only and checks for the "is only available > > > for" > > > cases. Here's a patch doing that. > > > > FWIW this eliminates the testing that when we do have C++14 > > onwards, > > that including is suggested. > > Do we really care? > > Are we testing that *every* entry in the array gives the right answer > for both missing-header and bad-std-option, or are we just testing a > subset of them to be sure the logic works as expected? > > Because if we're testing every entry then: > > 1) we're missing LOTS of tests, and > > 2) we're just as likely to test the wrong thing and not actually > catch >bugs (as was already happening for both make_unique and >complex_literals). > > > Maybe we need a C++14-onwards missing-std-include-* test, and to > > move > > the existing test there? (and to add the new test for before-C++- > > 14) > > We could, but is it worth it? Fair enough. Dave
Re: enforce canonicalization of value_range's
On 8/13/19 6:51 PM, Aldy Hernandez wrote: >> Presumably this was better than moving the implementation earlier. > > Actually, it was for ease of review. I made some changes to the > function, and I didn't want the reviewer to miss them because I had > moved the function wholesale. I can move the function earlier, after we > agree on the changes (see below). Either works for me. I think there was an informal effort to avoid these kinds of forward decls eons ago because our inliner sucked, but in the IPA world order in the source file really shouldn't matter. > >> >> If we weren't on a path to kill VRP I'd probably suggest a distinct >> effort to constify this code. Some of the changes were a bit confusing >> when it looked like we'd dropped a call to set the range of an object. >> But those were just local copies, so setting the type/min/max directly >> was actually fine. constification would make this a bit clearer. But >> again, I don't think it's worth the effort given the long term >> trajectory for tree-vrp.c. > > I shouldn't be introducing any new confusion. Did I add any new methods > that should've been const that aren't? I can't see any??. I'm happy to > fix anything I introduced. IIRC we had an incoming range object passed by value, which we locally modified and called the setter. I spotted the dropped call to the setter and was going to call it out as possibly broken. But in investigating further I realized the object was passed by value, so dropping the setter wasn't really a problem. THe funny thing was we were doing this on source operands rather than the destination operand. Arguably the ranges for the source operands should be constant which would have flagged that code as fishy from its inception and I'm sure the code would have been restructured appropriately and would have avoided the confusion. So in summary, you didn't break anything. It was a safe change you made, but it wasn't immediately obvious it was safe. If we had a constified codebase the intent of the code would have been more obvious. > >> >> >> So where does the handle_pointers stuff matter? I'm a bit surprised we >> have to do anything special for them. > > I've learned to touch as little of VRP as is necessary, as changing > anything to be more consistent breaks things in unexpected ways ;-). > > In this particular case, TYPE_MIN_VALUE and TYPE_MAX_VALUE are not > defined for pointers, and I didn't want to change the meaning of > vrp_val_{min,max} throughout. I was trying to minimize the changes to > existing behavior. If it bothers you too much, we could remove it as a > follow up when we are sure there are no expected side-effects from the > rest of the patch. ?? I don't mind exploring this as a follow-up. I guess that a min/max doesn't really have significant meaning for pointers. I think rather than digging too deep into this, let's table it for now. I think the time to revisit will be as we work through removal of tree-vrp at some point in the future. > >> >> >> OK. I don't expect the answers to the minor questions above will >> ultimately change anything. > > I could appreciate a final nod before I commit. And even then, I will > wait until the other patch is approved and commit them simultaneously. > They are a bit intertwined. I'm nodding :-) jeff