Re: [PATCH] Add missing avx512fintrin.h intrinsics (PR target/89602)
On Thu, Mar 7, 2019 at 4:09 PM Jakub Jelinek wrote: > > On Thu, Mar 07, 2019 at 08:11:53AM +0100, Uros Bizjak wrote: > > > +(define_insn "*avx512f_load_mask" > > > + [(set (match_operand: 0 "register_operand" "=v") > > > + (vec_merge: > > > + (vec_merge: > > > + (vec_duplicate: > > > + (match_operand:MODEF 1 "memory_operand" "m")) > > > + (match_operand: 2 "nonimm_or_0_operand" "0C") > > > + (match_operand:QI 3 "nonmemory_operand" "Yk")) > > > > Is there a reason to have nonmemory_operand predicate here instead of > > register_operand? > > Thanks for catching that up, that was from my earlier attempt to have > Yk,n constraints and deal with that during output. For store it was > possible, for others there were some cases it couldn't handle but further > testing revealed that the combiner already handles most of the constant > mask cases right. > > Here is updated patch, I've changed this in two spots. It even improves the > constant 1 case (the only one that is still not optimized as much as it > should): > f4: > - movzbl .LC0(%rip), %eax > + movl$1, %eax > kmovw %eax, %k1 > vmovsd (%rsi), %xmm0{%k1}{z} > ret > Tested so far with make check-gcc RUNTESTFLAGS=i386.exp=avx512f-vmovs*.c > and compiling/eyeballing differences on the short testcase I've posted > in the description with also the u, -> 1, and u, -> 0, changes, appart > from the above f4 no differences. > > Ok for trunk if it passes another full bootstrap/regtest? > > 2019-03-07 Jakub Jelinek > > PR target/89602 > * config/i386/sse.md (avx512f_mov_mask, > *avx512f_load_mask, avx512f_store_mask): New define_insns. > (avx512f_load_mask): New define_expand. > * config/i386/i386-builtin.def (__builtin_ia32_loadsd_mask, > __builtin_ia32_loadss_mask, __builtin_ia32_storesd_mask, > __builtin_ia32_storess_mask, __builtin_ia32_movesd_mask, > __builtin_ia32_movess_mask): New builtins. > * config/i386/avx512fintrin.h (_mm_mask_load_ss, _mm_maskz_load_ss, > _mm_mask_load_sd, _mm_maskz_load_sd, _mm_mask_move_ss, > _mm_maskz_move_ss, _mm_mask_move_sd, _mm_maskz_move_sd, > _mm_mask_store_ss, _mm_mask_store_sd): New intrinsics. > This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89630 This looks very strange since this patch only touched backend. -- H.J.
V5 [PATCH] Optimize vector constructor
On Thu, Mar 7, 2019 at 9:51 AM H.J. Lu wrote: > > On Wed, Mar 6, 2019 at 8:33 PM Richard Biener > wrote: > > > > On Wed, Mar 6, 2019 at 8:46 AM H.J. Lu wrote: > > > > > > On Tue, Mar 5, 2019 at 1:46 AM H.J. Lu wrote: > > > > > > > > On Mon, Mar 04, 2019 at 12:55:04PM +0100, Richard Biener wrote: > > > > > On Sun, Mar 3, 2019 at 10:13 PM H.J. Lu wrote: > > > > > > > > > > > > On Sun, Mar 03, 2019 at 06:40:09AM -0800, Andrew Pinski wrote: > > > > > > > ) > > > > > > > ,On Sun, Mar 3, 2019 at 6:32 AM H.J. Lu > > > > > > > wrote: > > > > > > > > > > > > > > > > For vector init constructor: > > > > > > > > > > > > > > > > --- > > > > > > > > typedef float __v4sf __attribute__ ((__vector_size__ (16))); > > > > > > > > > > > > > > > > __v4sf > > > > > > > > foo (__v4sf x, float f) > > > > > > > > { > > > > > > > > __v4sf y = { f, x[1], x[2], x[3] }; > > > > > > > > return y; > > > > > > > > } > > > > > > > > --- > > > > > > > > > > > > > > > > we can optimize vector init constructor with vector copy or > > > > > > > > permute > > > > > > > > followed by a single scalar insert: > > > > > > > > > and you want to advance to the _1 = BIT_INSERT_EXPR here. The > > > > > easiest way > > > > > is to emit a new stmt for _2 = copy ...; and do the set_rhs with the > > > > > BIT_INSERT_EXPR. > > > > > > > > Thanks for BIT_INSERT_EXPR suggestion. I am testing this patch. > > > > > > > > > > > > H.J. > > > > --- > > > > We can optimize vector constructor with vector copy or permute followed > > > > by a single scalar insert: > > > > > > > > __v4sf y; > > > > __v4sf D.1930; > > > > float _1; > > > > float _2; > > > > float _3; > > > > > > > >: > > > > _1 = BIT_FIELD_REF ; > > > > _2 = BIT_FIELD_REF ; > > > > _3 = BIT_FIELD_REF ; > > > > y_6 = {f_5(D), _3, _2, _1}; > > > > return y_6; > > > > > > > > with > > > > > > > > __v4sf y; > > > > __v4sf D.1930; > > > > float _1; > > > > float _2; > > > > float _3; > > > > vector(4) float _8; > > > > > > > >: > > > > _1 = BIT_FIELD_REF ; > > > > _2 = BIT_FIELD_REF ; > > > > _3 = BIT_FIELD_REF ; > > > > _8 = x_9(D); > > > > y_6 = BIT_INSERT_EXPR ; > > > > return y_6; > > > > > > > > gcc/ > > > > > > > > PR tree-optimization/88828 > > > > * tree-ssa-forwprop.c (simplify_vector_constructor): Optimize > > > > vector init constructor with vector copy or permute followed > > > > by a single scalar insert. > > > > > > > > gcc/testsuite/ > > > > > > > > PR tree-optimization/88828 > > > > * gcc.target/i386/pr88828-1a.c: New test. > > > > * gcc.target/i386/pr88828-2b.c: Likewise. > > > > * gcc.target/i386/pr88828-2.c: Likewise. > > > > * gcc.target/i386/pr88828-3a.c: Likewise. > > > > * gcc.target/i386/pr88828-3b.c: Likewise. > > > > * gcc.target/i386/pr88828-3c.c: Likewise. > > > > * gcc.target/i386/pr88828-3d.c: Likewise. > > > > * gcc.target/i386/pr88828-4a.c: Likewise. > > > > * gcc.target/i386/pr88828-4b.c: Likewise. > > > > * gcc.target/i386/pr88828-5a.c: Likewise. > > > > * gcc.target/i386/pr88828-5b.c: Likewise. > > > > * gcc.target/i386/pr88828-6a.c: Likewise. > > > > * gcc.target/i386/pr88828-6b.c: Likewise. > > > > > > Here is the updated patch with run-time tests. > > > > - if (TREE_CODE (elt->value) != SSA_NAME) > > + if (TREE_CODE (ce->value) != SSA_NAME) > > return false; > > > > hmm, so it doesn't allow { 0, v[1], v[2], v[3] }? I think the single > > scalar value can be a constant as well. > > Fixed. > > >if (!def_stmt) > > - return false; > > + { > > + if (gimple_nop_p (SSA_NAME_DEF_STMT (ce->value))) > > > > if (SSA_NAME_IS_DEFAULT_DEF (ce->value)) > > > > + { > > > > also you seem to disallow > > > > { i + 1, v[1], v[2], v[3] } > > Fixed by > > if (code != BIT_FIELD_REF) > { > /* Only allow one scalar insert. */ > if (nscalars != 0) > return false; > > nscalars = 1; > insert = true; > scalar_idx = i; > sel.quick_push (i); > scalar_element = ce->value; > continue; > } > > > because get_prop_source_stmt will return the definition computing > > i + 1 in this case and your code will be skipped? > > > > I think you can simplify the code by treating scalar_element != NULL > > as nscalars == 1 and eliding nscalars. > > It works only if > > TYPE_VECTOR_SUBPARTS (type).to_constant () == (nscalars + nvectors) > > We need to check both nscalars and nvectors. Elide nscalar > check doesn't help much here. > > > - if (conv_code == ERROR_MARK) > > + if (conv_code == ERROR_MARK && !insert) > > gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig[0], > > orig[1], op2); > >else > > @@ -2148,10 +2198,25 @@ simplify_vector_constructor (gimple_stmt_ite
Re: [PATCH] bring netbsd/arm support up to speed. eabi, etc.
Ping. Link for possible convenience :-) https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01899.html
[PATCH, i386]: Add _mm_loadu_si64 and _mm_storeu_si64 intrinsics.
Attached patch adds missing _mm_loadu_si64 and _mm_storeu_si64 intrinsics. 2019-03-08 Uroš Bizjak PR target/68924 PR target/78782 PR target/87558 * config/i386/emmintrin.h (_mm_loadu_si64): New intrinsic. (_mm_storeu_si64): Ditto. testsuite/ChangeLog: 2019-03-08 Uroš Bizjak PR target/68924 PR target/78782 PR target/87558 * gcc.target/i386/pr78782.c: New test. * gcc.target/i386/pr87558.c: Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. HJ, are these changes OK? Uros. From 3441130f8c6115cada8fe0e235deb2860d4f0088 Mon Sep 17 00:00:00 2001 From: Uros Bizjak Date: Fri, 8 Mar 2019 10:55:36 +0100 Subject: [PATCH] x86: Add _mm_loadu_si64 and _mm_storeu_si64 intrinsics. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 2019-03-08 Uroš Bizjak PR target/68924 PR target/78782 PR target/87558 * config/i386/emmintrin.h (_mm_loadu_si64): New intrinsic. (_mm_storeu_si64): Ditto. testsuite/ChangeLog: 2019-03-08 Uroš Bizjak PR target/68924 PR target/78782 PR target/87558 * gcc.target/i386/pr78782.c: New test. * gcc.target/i386/pr87558.c: Ditto. --- gcc/config/i386/emmintrin.h | 12 gcc/testsuite/gcc.target/i386/pr78782.c | 9 + gcc/testsuite/gcc.target/i386/pr87558.c | 9 + 3 files changed, 30 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr78782.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87558.c diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h index d9bc3f7f28f6..f9e7b33b0ddf 100644 --- a/gcc/config/i386/emmintrin.h +++ b/gcc/config/i386/emmintrin.h @@ -709,6 +709,12 @@ _mm_loadl_epi64 (__m128i_u const *__P) return _mm_set_epi64 ((__m64)0LL, *(__m64_u *)__P); } +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadu_si64 (void const *__P) +{ + return _mm_loadl_epi64 ((__m128i_u *)__P); +} + extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_store_si128 (__m128i *__P, __m128i __B) { @@ -727,6 +733,12 @@ _mm_storel_epi64 (__m128i_u *__P, __m128i __B) *(__m64_u *)__P = (__m64) ((__v2di)__B)[0]; } +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storeu_si64 (void *__P, __m128i __B) +{ + _mm_storel_epi64 ((__m128i_u *)__P, __B); +} + extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_movepi64_pi64 (__m128i __B) { diff --git a/gcc/testsuite/gcc.target/i386/pr78782.c b/gcc/testsuite/gcc.target/i386/pr78782.c new file mode 100644 index ..e91d953dba92 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr78782.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2" } */ + +#include + +__m128i foo (unsigned char *p) +{ + return _mm_loadu_si64 ((void *)p); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87558.c b/gcc/testsuite/gcc.target/i386/pr87558.c new file mode 100644 index ..c210507d04ff --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87558.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2" } */ + +#include + +void foo (unsigned char *p, __m128i x) +{ + _mm_storeu_si64 ((void *)p, x); +} -- 2.20.1
Re: [PATCH] Fix up diagnostics in gimple-ssa-warn-alloca.c
On Thu, 7 Mar 2019, Jakub Jelinek wrote: > Hi! > > When looking at the diagnostics PRs, I've noticed that several diagnostic > calls > in gimple-ssa-warn-alloca.c use G_(...) uselessly, it is only needed if the > argument is not a string literal. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, > verified the messages are unmodified in gcc.pot, ok for trunk? OK. Richard. > 2019-03-07 Jakub Jelinek > > * gimple-ssa-warn-alloca.c (pass_walloca::execute): Don't wrap > warning_at or inform messages in G_() if there is no ?:. > > --- gcc/gimple-ssa-warn-alloca.c.jj 2019-01-01 12:37:18.193957952 +0100 > +++ gcc/gimple-ssa-warn-alloca.c 2019-03-07 16:43:30.308166042 +0100 > @@ -528,7 +528,7 @@ pass_walloca::execute (function *fun) > } > else if (warn_alloca) > { > - warning_at (loc, OPT_Walloca, G_("use of %")); > + warning_at (loc, OPT_Walloca, "use of %"); > continue; > } > else if (warn_alloca_limit < 0) > @@ -571,8 +571,8 @@ pass_walloca::execute (function *fun) > && t.limit != 0) > { > print_decu (t.limit, buff); > - inform (loc, G_("limit is %wu bytes, but argument " > - "may be as large as %s"), > + inform (loc, "limit is %wu bytes, but argument " > + "may be as large as %s", > is_vla ? warn_vla_limit : adjusted_alloca_limit, > buff); > } > @@ -588,7 +588,7 @@ pass_walloca::execute (function *fun) > && t.limit != 0) > { > print_decu (t.limit, buff); > - inform (loc, G_("limit is %wu bytes, but argument is %s"), > + inform (loc, "limit is %wu bytes, but argument is %s", > is_vla ? warn_vla_limit : adjusted_alloca_limit, > buff); > } > @@ -606,7 +606,7 @@ pass_walloca::execute (function *fun) > break; > case ALLOCA_IN_LOOP: > gcc_assert (!is_vla); > - warning_at (loc, wcode, G_("use of % within a loop")); > + warning_at (loc, wcode, "use of % within a loop"); > break; > case ALLOCA_CAST_FROM_SIGNED: > gcc_assert (invalid_casted_type != NULL_TREE); > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH] Only set TREE_NO_WARNING after warning if warning returned true (PR tree-optimization/89550)
On Thu, 7 Mar 2019, Jakub Jelinek wrote: > Hi! > > While looking at why we didn't warn before Richi's commit, I've discovered > is that the change is that in a different pass we used to call c_strlen too, > but at that point with a location from system headers, so no warning has > been printed, but TREE_NO_WARNING has been set anyway and later on when > c_strlen was called again with a location not from system header, we > wouldn't warn because TREE_NO_WARNING is set. > > Apparently there are quite a few spots that have similar bug (and quite a > few spots that do it right). > > The following patch fixes what I found. Bootstrapped/regtested on > x86_64-linux and i686-linux, ok for trunk? OK. Thanks, Richard. > 2019-03-07 Jakub Jelinek > > PR tree-optimization/89550 > * builtins.c (c_strlen): Only set TREE_NO_WARNING if warning_at > returned true. Formatting fixes. > (expand_builtin_strnlen): Formatting fixes. > * tree-vrp.c (vrp_prop::check_mem_ref): Only set TREE_NO_WARNING > if warning_at returned true. > * tree-cfg.c (pass_warn_function_return::execute): Likewise. > c-family/ > * c-common.c (c_common_truthvalue_conversion): Only set > TREE_NO_WARNING if warning_at returned true. > * c-warn.c (overflow_warning, warn_logical_operator): Likewise. > c/ > * c-decl.c (finish_function): Only set TREE_NO_WARNING if warning_at > returned true. > (c_write_global_declarations_1): Only set TREE_NO_WARNING if pedwarn > or warning returned true. > cp/ > * semantics.c (maybe_convert_cond): Only set TREE_NO_WARNING if > warning_at returned true. > * decl2.c (c_parse_final_cleanups): Likewise. > * typeck.c (convert_for_assignment): Likewise. > * decl.c (finish_function): Likewise. > > --- gcc/builtins.c.jj 2019-03-05 17:21:42.536993265 +0100 > +++ gcc/builtins.c2019-03-07 12:01:23.114479128 +0100 > @@ -760,15 +760,13 @@ c_strlen (tree src, int only_value, c_st > runtime. */ >if (eltoff < 0 || eltoff >= maxelts) > { > - /* Suppress multiple warnings for propagated constant strings. */ > + /* Suppress multiple warnings for propagated constant strings. */ >if (only_value != 2 > - && !TREE_NO_WARNING (src)) > -{ > - warning_at (loc, OPT_Warray_bounds, > - "offset %qwi outside bounds of constant string", > - eltoff); > - TREE_NO_WARNING (src) = 1; > -} > + && !TREE_NO_WARNING (src) > + && warning_at (loc, OPT_Warray_bounds, > + "offset %qwi outside bounds of constant string", > + eltoff)) > + TREE_NO_WARNING (src) = 1; >return NULL_TREE; > } > > @@ -3099,7 +3097,7 @@ expand_builtin_strnlen (tree exp, rtx ta >"%K%qD specified bound %E " >"exceeds maximum object size %E", >exp, func, bound, maxobjsize)) > - TREE_NO_WARNING (exp) = true; > + TREE_NO_WARNING (exp) = true; > >bool exact = true; >if (!len || TREE_CODE (len) != INTEGER_CST) > @@ -3158,7 +3156,7 @@ expand_builtin_strnlen (tree exp, rtx ta >"%K%qD specified bound [%wu, %wu] " >"exceeds maximum object size %E", >exp, func, min.to_uhwi (), max.to_uhwi (), maxobjsize)) > - TREE_NO_WARNING (exp) = true; > +TREE_NO_WARNING (exp) = true; > >bool exact = true; >if (!len || TREE_CODE (len) != INTEGER_CST) > --- gcc/tree-vrp.c.jj 2019-02-01 09:43:40.264757408 +0100 > +++ gcc/tree-vrp.c2019-03-07 12:05:05.959845856 +0100 > @@ -4749,7 +4749,8 @@ vrp_prop::check_mem_ref (location_t loca >if (warned && DECL_P (arg)) > inform (DECL_SOURCE_LOCATION (arg), "while referencing %qD", arg); > > - TREE_NO_WARNING (ref) = 1; > + if (warned) > + TREE_NO_WARNING (ref) = 1; >return; > } > > @@ -4762,11 +4763,10 @@ vrp_prop::check_mem_ref (location_t loca > { >HOST_WIDE_INT tmpidx = extrema[i].to_shwi () / eltsize.to_shwi (); > > - warning_at (location, OPT_Warray_bounds, > - "intermediate array offset %wi is outside array bounds " > - "of %qT", > - tmpidx, reftype); > - TREE_NO_WARNING (ref) = 1; > + if (warning_at (location, OPT_Warray_bounds, > + "intermediate array offset %wi is outside array bounds " > + "of %qT", tmpidx, reftype)) > + TREE_NO_WARNING (ref) = 1; > } > } > > --- gcc/tree-cfg.c.jj 2019-02-22 23:02:47.768118220 +0100 > +++ gcc/tree-cfg.c2019-03-07 12:02:52.662019138 +0100 > @@ -9328,9 +9328,9 @@ pass_warn_function_return::execute (func > location = gimple_location (last); > if (LOCATION_LOCUS (location) == UNKNOWN_LOCATION) > location = fun->function_end_locus; > - warning_
Re: [PATCH] Fix some cases of "whatever " on one line and " something" on the next one in diagnostics (PR other/80058)
On Thu, 7 Mar 2019, Jakub Jelinek wrote: > Hi! > > The following patch fixes a couple of cases where two or more spaces > are introduced in a middle of diagnostic message because we've split source > line and left a space both at the end of one line and at the start of next > one. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? OK. Richard. > 2019-03-07 Jakub Jelinek > > PR other/80058 > * lra-constraints.c (process_alt_operands): Avoid one space before > " at the end of line and another after " on another line in a string > literal. > * attribs.c (handle_dll_attribute): Likewise. > * config/avr/avr-devices.c (avr_texinfo): Likewise. > cp/ > * parser.c (cp_parser_template_declaration_after_parameters): Avoid > one space before " at the end of line and another after " on another > line in a string literal. > fortran/ > * arith.c (gfc_complex2complex): Avoid two spaces in the middle of > diagnostics. > * resolve.c (resolve_allocate_expr): Likewise. > > --- gcc/lra-constraints.c.jj 2019-02-20 22:14:42.288643681 +0100 > +++ gcc/lra-constraints.c 2019-03-07 16:58:59.121039574 +0100 > @@ -2681,7 +2681,7 @@ process_alt_operands (int only_alternati > if (lra_dump_file != NULL) > fprintf (lra_dump_file, >"alt=%d: reload pseudo for op %d " > - " cannot hold the mode value -- refuse\n", > + "cannot hold the mode value -- refuse\n", >nalt, nop); > goto fail; > } > --- gcc/attribs.c.jj 2019-03-05 14:38:14.447414660 +0100 > +++ gcc/attribs.c 2019-03-07 16:57:36.600383417 +0100 > @@ -1664,7 +1664,7 @@ handle_dll_attribute (tree * pnode, tree > && DECL_DECLARED_INLINE_P (node)) > { > warning (OPT_Wattributes, "inline function %q+D declared as " > - " dllimport: attribute ignored", node); > + "dllimport: attribute ignored", node); > *no_add_attrs = true; > } >/* Like MS, treat definition of dllimported variables and > --- gcc/config/avr/avr-devices.c.jj 2019-01-01 12:37:28.987780853 +0100 > +++ gcc/config/avr/avr-devices.c 2019-03-07 17:07:31.992688170 +0100 > @@ -76,7 +76,7 @@ avr_texinfo[] = > "the @code{MOVW} instruction." }, >{ ARCH_AVR3, > "``Classic'' devices with 16@tie{}KiB up to 64@tie{}KiB of " > -" program memory." }, > +"program memory." }, >{ ARCH_AVR31, > "``Classic'' devices with 128@tie{}KiB of program memory." }, >{ ARCH_AVR35, > --- gcc/cp/parser.c.jj2019-03-07 10:06:49.0 +0100 > +++ gcc/cp/parser.c 2019-03-07 17:02:12.514890165 +0100 > @@ -27865,7 +27865,7 @@ cp_parser_template_declaration_after_par > if (cxx_dialect > cxx17) > error ("literal operator template %qD has invalid parameter list;" > " Expected non-type template parameter pack " > -" or single non-type parameter of class type", > +"or single non-type parameter of class type", > decl); > else > error ("literal operator template %qD has invalid parameter list." > --- gcc/fortran/arith.c.jj2019-02-25 10:12:55.454061762 +0100 > +++ gcc/fortran/arith.c 2019-03-07 17:06:09.267034995 +0100 > @@ -2472,7 +2472,7 @@ gfc_complex2complex (gfc_expr *src, int >int w = warn_conversion ? OPT_Wconversion : OPT_Wconversion_extra; > >gfc_warning_now (w, "Change of value in conversion from " > -" %qs to %qs at %L", > +"%qs to %qs at %L", > gfc_typename (&src->ts), gfc_typename (&result->ts), > &src->where); >did_warn = true; > --- gcc/fortran/resolve.c.jj 2019-03-04 10:22:33.985168769 +0100 > +++ gcc/fortran/resolve.c 2019-03-07 17:06:38.781554480 +0100 > @@ -7798,7 +7798,7 @@ resolve_allocate_expr (gfc_expr *e, gfc_ > if (mpz_cmp_si (ar->start[i]->value.integer, 1) < 0) > { > gfc_error ("Upper cobound is less than lower cobound " > -" of 1 at %L", &ar->start[i]->where); > +"of 1 at %L", &ar->start[i]->where); > goto failure; > } > } > > Jakub > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH, i386]: Add _mm_loadu_si64 and _mm_storeu_si64 intrinsics.
Looks good to me. Thanks. On Fri, Mar 8, 2019, 6:04 PM Uros Bizjak wrote: > Attached patch adds missing _mm_loadu_si64 and _mm_storeu_si64 intrinsics. > > 2019-03-08 Uroš Bizjak > > PR target/68924 > PR target/78782 > PR target/87558 > * config/i386/emmintrin.h (_mm_loadu_si64): New intrinsic. > (_mm_storeu_si64): Ditto. > > testsuite/ChangeLog: > > 2019-03-08 Uroš Bizjak > > PR target/68924 > PR target/78782 > PR target/87558 > * gcc.target/i386/pr78782.c: New test. > * gcc.target/i386/pr87558.c: Ditto. > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > HJ, are these changes OK? > > Uros. >
Re: [C++ PATCH] Add -fconstexpr-loop-nest-limit= option (PR c++/87481)
On Thu, Mar 7, 2019 at 8:22 PM Jakub Jelinek wrote: > > Hi! > > We have -fconstexpr-loop-limit= option to have an upper bound for constexpr > evaluation of a single loop. Even with that limit in place, if we have > several nested loops during constexpr evaluation, even when each could have > a few hundred iterations, the whole loop nest could take years to evaluate. > And the loops can be split across several constexpr function calls even. > > The following patch adds another, slightly larger, limit on total number of > iterations in the whole loop nest. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Shouldn't we somehow limit the number of stmt evaluations instead? I can imagine using constexpr templates you can do "loops" easily without actually writing loops ... Richard. > 2019-03-07 Jakub Jelinek > > PR c++/87481 > * doc/invoke.texi (-fconstexpr-loop-nest-limit=): Document. > (-fconstexpr-loop-limit=): Slightly adjust wording. > > * c.opt (-fconstexpr-loop-nest-limit=): New option. > (-fconstexpr-loop-limit=): Slightly adjust description. > > * constexpr.c (cxx_eval_loop_expr): Count also number of iterations > of all nested loops and punt if it is above constexpr_loop_nest_limit. > > * g++.dg/cpp1y/constexpr-87481.C: New test. > > --- gcc/doc/invoke.texi.jj 2019-03-02 09:03:25.822755663 +0100 > +++ gcc/doc/invoke.texi 2019-03-07 14:25:34.881197149 +0100 > @@ -210,7 +210,7 @@ in the following sections. > @gccoptlist{-fabi-version=@var{n} -fno-access-control @gol > -faligned-new=@var{n} -fargs-in-order=@var{n} -fchar8_t -fcheck-new @gol > -fconstexpr-depth=@var{n} -fconstexpr-loop-limit=@var{n} @gol > --fno-elide-constructors @gol > +-fconstexpr-loop-nest-limit=@var{n} -fno-elide-constructors @gol > -fno-enforce-eh-specs @gol > -fno-gnu-keywords @gol > -fno-implicit-templates @gol > @@ -2522,10 +2522,19 @@ is 512. > > @item -fconstexpr-loop-limit=@var{n} > @opindex fconstexpr-loop-limit > -Set the maximum number of iterations for a loop in C++14 constexpr functions > -to @var{n}. A limit is needed to detect infinite loops during > +Set the maximum number of iterations for a single loop in C++14 constexpr > +functions to @var{n}. A limit is needed to detect infinite loops during > constant expression evaluation. The default is 262144 (1<<18). > > +@item -fconstexpr-loop-nest-limit=@var{n} > +@opindex fconstexpr-loop-nest-limit > +Set the maximum number of iterations for all loops in a loop nest in C++14 > +constexpr functions to @var{n}. Even when number of iterations of a single > +loop is limited with the above limit, if there are several nested loops and > +each of them has many iterations but still smaller than the above limit, > +the resulting evaluation of the loop nest might take too long. > +The default is 1048576 (1<<20). > + > @item -fdeduce-init-list > @opindex fdeduce-init-list > Enable deduction of a template type parameter as > --- gcc/c-family/c.opt.jj 2019-02-25 23:56:58.149055806 +0100 > +++ gcc/c-family/c.opt 2019-03-07 14:25:07.782639790 +0100 > @@ -1414,7 +1414,11 @@ C++ ObjC++ Joined RejectNegative UIntege > > fconstexpr-loop-limit= > C++ ObjC++ Joined RejectNegative UInteger Var(constexpr_loop_limit) > Init(262144) > --fconstexpr-loop-limit=Specify maximum constexpr loop > iteration count. > +-fconstexpr-loop-limit=Specify maximum constexpr loop > iteration count of a single loop. > + > +fconstexpr-loop-nest-limit= > +C++ ObjC++ Joined RejectNegative UInteger Var(constexpr_loop_nest_limit) > Init(1048576) > +-fconstexpr-loop-nest-limit= Specify maximum constexpr loop > iteration count for all nested loops. > > fdebug-cpp > C ObjC C++ ObjC++ > --- gcc/cp/constexpr.c.jj 2019-03-06 19:45:40.360751756 +0100 > +++ gcc/cp/constexpr.c 2019-03-07 14:03:00.040329107 +0100 > @@ -4190,8 +4190,20 @@ cxx_eval_loop_expr (const constexpr_ctx > default: >gcc_unreachable (); > } > + > + /* True on entry to cxx_eval_loop_expr if called indirectly from another > + cxx_eval_loop_expr. */ > + static bool constexpr_loop_nested; > + > + /* Number of evaluated loop iterations in the whole current loop nest. */ > + static int constexpr_loop_nest_count; > + > + bool save_constexpr_loop_nested = constexpr_loop_nested; > + constexpr_loop_nested = true; > + >hash_set save_exprs; >new_ctx.save_exprs = &save_exprs; > + >do > { >if (count != -1) > @@ -4248,6 +4260,24 @@ cxx_eval_loop_expr (const constexpr_ctx > *non_constant_p = true; > break; > } > + > + /* In nested loops, don't count the first iteration, as it has been > +counted in the parent loop already. */ > + if (count > (save_constexpr_loop_nested ? 1 : 0)) > + { > + if (++constexpr_loop_nest_count >= constexpr_loop_nest_limit) > + { > + if (!ctx->quiet) > +
Re: [PATCH] Two diagnostics fixes for ipa-devirt.c (PR ipa/80000, PR target/85665)
On Thu, Mar 7, 2019 at 8:32 PM Jakub Jelinek wrote: > > Hi! > > The following patch fixes two diagnostics issues in ipa-devirt.c, one > is trailing space in one warning, the other is lack of articles in another > warning, both reported by the translation team. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. RIchard. > 2019-03-07 Jakub Jelinek > > PR ipa/8 > * ipa-devirt.c (compare_virtual_tables): Remove two trailing spaces > from diagnostics. Formatting fixes. > > PR target/85665 > * ipa-devirt.c (odr_types_equivalent_p): Fix grammar in > warn_odr diagnostics. > > --- gcc/ipa-devirt.c.jj 2019-01-10 11:43:14.380377810 +0100 > +++ gcc/ipa-devirt.c2019-03-07 16:21:14.641937211 +0100 > @@ -842,17 +842,16 @@ compare_virtual_tables (varpool_node *pr > { > class_type->odr_violated = true; > auto_diagnostic_group d; > - if (warning_at (DECL_SOURCE_LOCATION > - (TYPE_NAME (DECL_CONTEXT (vtable->decl))), > - OPT_Wodr, > + tree ctx = TYPE_NAME (DECL_CONTEXT (vtable->decl)); > + if (warning_at (DECL_SOURCE_LOCATION (ctx), OPT_Wodr, > "virtual table of type %qD violates " > - "one definition rule ", > + "one definition rule", > DECL_CONTEXT (vtable->decl))) > { > - inform (DECL_SOURCE_LOCATION > - (TYPE_NAME (DECL_CONTEXT (prevailing->decl))), > - "the conflicting type defined in another > translation " > - "unit has virtual table of different size"); > + ctx = TYPE_NAME (DECL_CONTEXT (prevailing->decl)); > + inform (DECL_SOURCE_LOCATION (ctx), > + "the conflicting type defined in another > translation" > + " unit has virtual table of different size"); > } > } > return; > @@ -1607,7 +1606,8 @@ odr_types_equivalent_p (tree t1, tree t2 > if (DECL_BIT_FIELD (f1) != DECL_BIT_FIELD (f2)) > { > warn_odr (t1, t2, f1, f2, warn, warned, > - G_("one field is bitfield while other is not")); > + G_("one field is a bitfield while the other " > +"is not")); > return false; > } > else > > Jakub
Re: [PATCH] Fix a config/s390/s390.c diagnostics bug (PR target/79846)
On Thu, Mar 7, 2019 at 8:53 PM Jakub Jelinek wrote: > > Hi! > > As mentioned in the PR, using HOST_WIDE_INT_PRINT_* in the middle of > translatable message is highly undesirable, we end up with: > #: config/s390/s390.c:737 > #, gcc-internal-format > msgid "constant argument %d for builtin %qF is out of range (0.." > msgstr "" > > #: config/s390/s390.c:754 > #, gcc-internal-format > msgid "constant argument %d for builtin %qF is out of range (" > msgstr "" > in gcc.pot that way and nothing is translated. > > The following patch should fix that by using proper %wu/%wd. > Tested by building a cross-compiler to s390x-linux, ok for trunk? OK. Richard. > 2019-03-07 Jakub Jelinek > > PR target/79846 > * config/s390/s390.c (s390_const_operand_ok): Use %wu instead of > HOST_WIDE_INT_PRINT_UNSIGNED and %wd instead of > HOST_WIDE_INT_PRINT_DEC. Formatting fixes. > > --- gcc/config/s390/s390.c.jj 2019-02-18 20:48:32.873728534 +0100 > +++ gcc/config/s390/s390.c 2019-03-07 18:13:44.757949114 +0100 > @@ -734,10 +734,9 @@ s390_const_operand_ok (tree arg, int arg >if (!tree_fits_uhwi_p (arg) > || tree_to_uhwi (arg) > (HOST_WIDE_INT_1U << bitwidth) - 1) > { > - error("constant argument %d for builtin %qF is out of range (0.." > - HOST_WIDE_INT_PRINT_UNSIGNED ")", > - argnum, decl, > - (HOST_WIDE_INT_1U << bitwidth) - 1); > + error ("constant argument %d for builtin %qF is out of range " > +"(0..%wu)", argnum, decl, > +(HOST_WIDE_INT_1U << bitwidth) - 1); > return false; > } > } > @@ -751,12 +750,10 @@ s390_const_operand_ok (tree arg, int arg > || tree_to_shwi (arg) < -(HOST_WIDE_INT_1 << (bitwidth - 1)) > || tree_to_shwi (arg) > ((HOST_WIDE_INT_1 << (bitwidth - 1)) - 1)) > { > - error("constant argument %d for builtin %qF is out of range (" > - HOST_WIDE_INT_PRINT_DEC ".." > - HOST_WIDE_INT_PRINT_DEC ")", > - argnum, decl, > - -(HOST_WIDE_INT_1 << (bitwidth - 1)), > - (HOST_WIDE_INT_1 << (bitwidth - 1)) - 1); > + error ("constant argument %d for builtin %qF is out of range " > +"(%wd..%wd)", argnum, decl, > +-(HOST_WIDE_INT_1 << (bitwidth - 1)), > +(HOST_WIDE_INT_1 << (bitwidth - 1)) - 1); > return false; > } > } > > Jakub
Re: [PATCH] rs6000: Fix lost ud chains in swap optimization
On Fri, Mar 8, 2019 at 1:34 AM Bill Schmidt wrote: > > Hi, > > We recently discovered a problem in swap optimization where the du- and > ud-chains > were getting corrupted after a preliminary modification phase and prior to the > main body of the pass. The fix for this is to rebuild the chains between > phases. It looks expensive - is it possible to keep them up-to-date instead? > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. > I've not included a test case because the problem tends to get lost in > reduction, > and may shift over time anyway. Is this okay for trunk, and eventual backport > to 8 and 7? > > Thanks! > > Bill > > > 2019-03-07 Bill Schmidt > > * config/rs6000/rs6000-p8swap.c (rs6000_analyze_swaps): Rebuild > ud- and du-chains between phases. > > > Index: gcc/config/rs6000/rs6000-p8swap.c > === > --- gcc/config/rs6000/rs6000-p8swap.c (revision 269471) > +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) > @@ -2316,7 +2316,14 @@ rs6000_analyze_swaps (function *fun) > >/* Pre-pass to recombine lvx and stvx patterns so we don't lose info. */ >recombine_lvx_stvx_patterns (fun); > + > + /* Rebuild ud- and du-chains. */ > + df_remove_problem (df_chain); >df_process_deferred_rescans (); > + df_set_flags (DF_RD_PRUNE_DEAD_DEFS); > + df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN); > + df_analyze (); > + df_set_flags (DF_DEFER_INSN_RESCAN); > >/* Allocate structure to represent webs of insns. */ >insn_entry = XCNEWVEC (swap_web_entry, get_max_uid ()); >
Re: [PATCH] PR88497 - Extend reassoc for vector bit_field_ref
On Fri, Mar 8, 2019 at 2:40 AM Kewen.Lin wrote: > > Hi, > > As PR88497 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88497), > when we meet some code pattern like: > >V1[0] + V1[1] + ... + V1[k] + V2[0] + ... + V2[k] + ... Vn[k] >// V1...Vn of VECTOR_TYPE > > We can teach reassoc to transform it to: > >Vs = (V1 + V2 + ... + Vn) >Vs[0] + Vs[1] + ... + Vs[k] > > It saves addition and bit_field_ref operations and exposes more > opportunities for downstream passes, I notice that even on one > target doesn't support vector type and vector type gets expanded > in veclower, it's still better to have it, since the generated > sequence is more friendly for widening_mul. (If one more time > DCE after forwprop, it should be the same. ) > > Bootstrapped/regtested on powerpc64le-linux-gnu, ok for trunk? Hmm. You are basically vectorizing the reduction partially here (note basic-block vectorization doesn't handle reductions yet). So I'm not sure whether we should eventually just change op sort order to sort v1[1] after v2[0] to sort v1[0] and v2[0] together. That would be also maybe an intermediate step that makes implementing the vectorization part "easier"? I also imagine that the "vectorization" would be profitable even if there's just two elements of the vectors summed and the real vectors are larger. Note that the code you transform contains no vector operations (just element extracts) and thus you need to check for target support before creating vector add. This is for GCC 10. Also it doesn't fix PR88497, does it? Richard. > Thanks in advance! > > > gcc/ChangeLog > > 2019-03-08 Kewen Lin > > PR target/88497 > * tree-ssa-reassoc.c (reassociate_bb): Swap the positions of > GIMPLE_BINARY_RHS check and gimple_visited_p check, call new > function undistribute_bitref_for_vector. > (undistribute_bitref_for_vector): New function. > (cleanup_vinfo_map): Likewise. > (unsigned_cmp): Likewise. > > gcc/testsuite/ChangeLog > > 2019-03-08 Kewen Lin > > * gcc.dg/tree-ssa/pr88497.c: New test. > > --- > gcc/testsuite/gcc.dg/tree-ssa/pr88497.c | 18 +++ > gcc/tree-ssa-reassoc.c | 274 > +++- > 2 files changed, 287 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497.c > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c > new file mode 100644 > index 000..4d9ac67 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */ > +typedef double v2df __attribute__ ((vector_size (16))); > +double > +test (double accumulator, v2df arg1[], v2df arg2[]) > +{ > + v2df temp; > + temp = arg1[0] * arg2[0]; > + accumulator += temp[0] + temp[1]; > + temp = arg1[1] * arg2[1]; > + accumulator += temp[0] + temp[1]; > + temp = arg1[2] * arg2[2]; > + accumulator += temp[0] + temp[1]; > + temp = arg1[3] * arg2[3]; > + accumulator += temp[0] + temp[1]; > + return accumulator; > +} > +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" } } */ > diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c > index e1c4dfe..fc0e297 100644 > --- a/gcc/tree-ssa-reassoc.c > +++ b/gcc/tree-ssa-reassoc.c > @@ -1772,6 +1772,263 @@ undistribute_ops_list (enum tree_code opcode, >return changed; > } > > +/* Hold the information of one specific VECTOR_TYPE SSA_NAME. > +- offsets: for different BIT_FIELD_REF offsets accessing same VECTOR. > +- ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF. */ > +struct v_info > +{ > + auto_vec offsets; > + auto_vec ops_indexes; > +}; > + > +typedef struct v_info *v_info_ptr; > + > +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets. */ > +static int > +unsigned_cmp (const void *p_i, const void *p_j) > +{ > + if (*(const unsigned *) p_i >= *(const unsigned *) p_j) > +return 1; > + else > +return -1; > +} > + > +/* Cleanup hash map for VECTOR information. */ > +static void > +cleanup_vinfo_map (hash_map &info_map) > +{ > + for (hash_map::iterator it = info_map.begin (); > + it != info_map.end (); ++it) > +{ > + v_info_ptr info = (*it).second; > + delete info; > + (*it).second = NULL; > +} > +} > + > +/* Perform un-distribution of BIT_FIELD_REF on VECTOR_TYPE. > + V1[0] + V1[1] + ... + V1[k] + V2[0] + V2[1] + ... + V2[k] + ... Vn[k] > + is transformed to > + Vs = (V1 + V2 + ... + Vn) > + Vs[0] + Vs[1] + ... + Vs[k] > + > + The basic steps are listed below: > + > +1) Check the addition chain *OPS by looking those summands coming from > + VECTOR bit_field_ref on VECTOR type. Put the information into > + v_info_map for each satisfied summand, using VECTOR SSA_NAME as key. > + > +2) For each key (VECTOR SSA_NAME), validate all its BIT_FIELD_REFs ar
Re: V5 [PATCH] Optimize vector constructor
On Fri, Mar 8, 2019 at 9:49 AM H.J. Lu wrote: > > On Thu, Mar 7, 2019 at 9:51 AM H.J. Lu wrote: > > > > On Wed, Mar 6, 2019 at 8:33 PM Richard Biener > > wrote: > > > > > > On Wed, Mar 6, 2019 at 8:46 AM H.J. Lu wrote: > > > > > > > > On Tue, Mar 5, 2019 at 1:46 AM H.J. Lu wrote: > > > > > > > > > > On Mon, Mar 04, 2019 at 12:55:04PM +0100, Richard Biener wrote: > > > > > > On Sun, Mar 3, 2019 at 10:13 PM H.J. Lu wrote: > > > > > > > > > > > > > > On Sun, Mar 03, 2019 at 06:40:09AM -0800, Andrew Pinski wrote: > > > > > > > > ) > > > > > > > > ,On Sun, Mar 3, 2019 at 6:32 AM H.J. Lu > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > For vector init constructor: > > > > > > > > > > > > > > > > > > --- > > > > > > > > > typedef float __v4sf __attribute__ ((__vector_size__ (16))); > > > > > > > > > > > > > > > > > > __v4sf > > > > > > > > > foo (__v4sf x, float f) > > > > > > > > > { > > > > > > > > > __v4sf y = { f, x[1], x[2], x[3] }; > > > > > > > > > return y; > > > > > > > > > } > > > > > > > > > --- > > > > > > > > > > > > > > > > > > we can optimize vector init constructor with vector copy or > > > > > > > > > permute > > > > > > > > > followed by a single scalar insert: > > > > > > > > > > > and you want to advance to the _1 = BIT_INSERT_EXPR here. The > > > > > > easiest way > > > > > > is to emit a new stmt for _2 = copy ...; and do the set_rhs with the > > > > > > BIT_INSERT_EXPR. > > > > > > > > > > Thanks for BIT_INSERT_EXPR suggestion. I am testing this patch. > > > > > > > > > > > > > > > H.J. > > > > > --- > > > > > We can optimize vector constructor with vector copy or permute > > > > > followed > > > > > by a single scalar insert: > > > > > > > > > > __v4sf y; > > > > > __v4sf D.1930; > > > > > float _1; > > > > > float _2; > > > > > float _3; > > > > > > > > > >: > > > > > _1 = BIT_FIELD_REF ; > > > > > _2 = BIT_FIELD_REF ; > > > > > _3 = BIT_FIELD_REF ; > > > > > y_6 = {f_5(D), _3, _2, _1}; > > > > > return y_6; > > > > > > > > > > with > > > > > > > > > > __v4sf y; > > > > > __v4sf D.1930; > > > > > float _1; > > > > > float _2; > > > > > float _3; > > > > > vector(4) float _8; > > > > > > > > > >: > > > > > _1 = BIT_FIELD_REF ; > > > > > _2 = BIT_FIELD_REF ; > > > > > _3 = BIT_FIELD_REF ; > > > > > _8 = x_9(D); > > > > > y_6 = BIT_INSERT_EXPR ; > > > > > return y_6; > > > > > > > > > > gcc/ > > > > > > > > > > PR tree-optimization/88828 > > > > > * tree-ssa-forwprop.c (simplify_vector_constructor): Optimize > > > > > vector init constructor with vector copy or permute followed > > > > > by a single scalar insert. > > > > > > > > > > gcc/testsuite/ > > > > > > > > > > PR tree-optimization/88828 > > > > > * gcc.target/i386/pr88828-1a.c: New test. > > > > > * gcc.target/i386/pr88828-2b.c: Likewise. > > > > > * gcc.target/i386/pr88828-2.c: Likewise. > > > > > * gcc.target/i386/pr88828-3a.c: Likewise. > > > > > * gcc.target/i386/pr88828-3b.c: Likewise. > > > > > * gcc.target/i386/pr88828-3c.c: Likewise. > > > > > * gcc.target/i386/pr88828-3d.c: Likewise. > > > > > * gcc.target/i386/pr88828-4a.c: Likewise. > > > > > * gcc.target/i386/pr88828-4b.c: Likewise. > > > > > * gcc.target/i386/pr88828-5a.c: Likewise. > > > > > * gcc.target/i386/pr88828-5b.c: Likewise. > > > > > * gcc.target/i386/pr88828-6a.c: Likewise. > > > > > * gcc.target/i386/pr88828-6b.c: Likewise. > > > > > > > > Here is the updated patch with run-time tests. > > > > > > - if (TREE_CODE (elt->value) != SSA_NAME) > > > + if (TREE_CODE (ce->value) != SSA_NAME) > > > return false; > > > > > > hmm, so it doesn't allow { 0, v[1], v[2], v[3] }? I think the single > > > scalar value can be a constant as well. > > > > Fixed. > > > > >if (!def_stmt) > > > - return false; > > > + { > > > + if (gimple_nop_p (SSA_NAME_DEF_STMT (ce->value))) > > > > > > if (SSA_NAME_IS_DEFAULT_DEF (ce->value)) > > > > > > + { > > > > > > also you seem to disallow > > > > > > { i + 1, v[1], v[2], v[3] } > > > > Fixed by > > > > if (code != BIT_FIELD_REF) > > { > > /* Only allow one scalar insert. */ > > if (nscalars != 0) > > return false; > > > > nscalars = 1; > > insert = true; > > scalar_idx = i; > > sel.quick_push (i); > > scalar_element = ce->value; > > continue; > > } > > > > > because get_prop_source_stmt will return the definition computing > > > i + 1 in this case and your code will be skipped? > > > > > > I think you can simplify the code by treating scalar_element != NULL > > > as nscalars == 1 and eliding nscalars. > > > > It works only if > > > > TYPE_VECTOR_SUBPARTS (type).to_constant () == (nscalars + nvectors) > > > > We need to check both nscalars and nvecto
[PATCH] Restrict IPA split (PR go/63560).
Hi. This is quite old hunk that Honza forgot to install: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63560#c1 Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Or should I wait for next stage1? Thanks, Martin gcc/ChangeLog: 2019-03-08 Jan Hubicka PR go/63560 * ipa-split.c (execute_split_functions): Do not split 'noinline' or 'section' function. --- gcc/ipa-split.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/gcc/ipa-split.c b/gcc/ipa-split.c index cfd03abb07d..5eaf8257f98 100644 --- a/gcc/ipa-split.c +++ b/gcc/ipa-split.c @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3. If not see #include "gimple-pretty-print.h" #include "ipa-fnsummary.h" #include "cfgloop.h" +#include "attribs.h" /* Per basic block info. */ @@ -1751,6 +1752,20 @@ execute_split_functions (void) return 0; } + if (lookup_attribute ("noinline", DECL_ATTRIBUTES (current_function_decl))) +{ + if (dump_file) + fprintf (dump_file, "Not splitting: function is noinline.\n"); + return 0; +} + if (lookup_attribute ("section", DECL_ATTRIBUTES (current_function_decl))) +{ + if (dump_file) + fprintf (dump_file, "Not splitting: function is in user defined " + "section.\n"); + return 0; +} + /* We enforce splitting after loop headers when profile info is not available. */ if (profile_status_for_fn (cfun) != PROFILE_READ)
Re: [C++ PATCH] Add -fconstexpr-loop-nest-limit= option (PR c++/87481)
On Fri, Mar 08, 2019 at 11:36:36AM +0100, Richard Biener wrote: > Shouldn't we somehow limit the number of stmt evaluations instead? > I can imagine using constexpr templates you can do "loops" easily > without actually writing loops ... Maybe. The question is what exactly should we count. We could count only in the cxx_eval_statement_list loop individual non-debug statements, but that would miss statements that aren't in STATEMENT_LISTs. Or we could count number of cxx_eval_constant_expression calls (perhaps after the initial if (jump_target && *jump_target) skipping) within a single cxx_eval_outermost_constant_expr, but we couldn't call that number of statements even when it would be much more accurate on how long actually something will take to evaluate, because one can have a single statement with hundreds of thousands of operations in it. So -fconstexpr-ops-limit= ? Probably should use Host_Wide_Int in that case rather than UInteger though, so users can allow more than 4G ops. Looking at what clang++ does, they have -fconstexpr-steps= limit which is declared to count statements with a quite low default 1048576 and it rejects already constexpr unsigned foo () { unsigned int r = 0; for (int i = 0; i < 65536; i++) for (int j = 0; j < 14; j++) r += (i + j); return r; } constexpr auto x = foo (); (but not 13 instead of 14). 13 * 65536 is 851968, so that would be count of r += (i + j); statements, plus the inner for as whole probably 65536 times and with 14 * 65536 + 65536 we are still below the limit, so it must count something else, but e.g. can't count j++ already. > > 2019-03-07 Jakub Jelinek > > > > PR c++/87481 > > * doc/invoke.texi (-fconstexpr-loop-nest-limit=): Document. > > (-fconstexpr-loop-limit=): Slightly adjust wording. > > > > * c.opt (-fconstexpr-loop-nest-limit=): New option. > > (-fconstexpr-loop-limit=): Slightly adjust description. > > > > * constexpr.c (cxx_eval_loop_expr): Count also number of iterations > > of all nested loops and punt if it is above > > constexpr_loop_nest_limit. > > > > * g++.dg/cpp1y/constexpr-87481.C: New test. Jakub
Re: [C++ PATCH] Add -fconstexpr-loop-nest-limit= option (PR c++/87481)
On Fri, Mar 08, 2019 at 12:25:31PM +0100, Jakub Jelinek wrote: > Maybe. The question is what exactly should we count. We could count only > in the cxx_eval_statement_list loop individual non-debug statements, > but that would miss statements that aren't in STATEMENT_LISTs. > Or we could count number of cxx_eval_constant_expression calls > (perhaps after the initial if (jump_target && *jump_target) skipping) > within a single cxx_eval_outermost_constant_expr, but we couldn't call that > number of statements even when it would be much more accurate on how long > actually something will take to evaluate, because one can have a single > statement with hundreds of thousands of operations in it. > So -fconstexpr-ops-limit= ? Probably should use Host_Wide_Int in that case > rather than UInteger though, so users can allow more than 4G ops. Or, if we wanted to count statements, perhaps count just specific subset of tree codes in cxx_eval_statement_list, like *_STMT, RETURN_EXPR, SWITCH_EXPR, anything else that must be a statement rather than some subexpression and is usually not wrapped with EXPR_STMT? Or count both statements and ops and have two limits. With the ops limit I'm afraid even if we have quite high default, it might trigger if people use huge multi-megabyte constexpr arrays or similar. Jakub
Re: [PATCH] Restrict IPA split (PR go/63560).
> Hi. > > This is quite old hunk that Honza forgot to install: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63560#c1 > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > Ready to be installed? Or should I wait for next stage1? > > Thanks, > Martin > > gcc/ChangeLog: > > 2019-03-08 Jan Hubicka > > PR go/63560 > * ipa-split.c (execute_split_functions): Do not split > 'noinline' or 'section' function. It is OK now. It only makes us more conservative and moving code out of user sections may lead to surprises. Honza
[PATCH] Fix -Wenum-compare-switch warning in i386.c.
A small patch that deals with: gcc/config/i386/i386.c:39427:11:Semantic Issue: comparison of two values with different enumeration types in switch statement ('enum built_in_function' and 'ix86_builtins'): -Wenum-compare-switch Is it fine to install it? Thanks, Martin gcc/ChangeLog: 2019-03-08 Martin Liska * config/i386/i386.c (ix86_builtin_reciprocal): Cast DECL_FUNCTION_CODE into ix86_builtins enum before the switch statement. --- gcc/config/i386/i386.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2d6a993238b..f170180304b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -39424,7 +39424,9 @@ use_rsqrt_p () static tree ix86_builtin_reciprocal (tree fndecl) { - switch (DECL_FUNCTION_CODE (fndecl)) + enum ix86_builtins fn_code += (enum ix86_builtins) DECL_FUNCTION_CODE (fndecl); + switch (fn_code) { /* Vectorized version of sqrt to rsqrt conversion. */ case IX86_BUILTIN_SQRTPS_NR:
Re: [PATCH] rs6000: Fix lost ud chains in swap optimization
On 3/8/19 4:40 AM, Richard Biener wrote: > On Fri, Mar 8, 2019 at 1:34 AM Bill Schmidt wrote: >> Hi, >> >> We recently discovered a problem in swap optimization where the du- and >> ud-chains >> were getting corrupted after a preliminary modification phase and prior to >> the >> main body of the pass. The fix for this is to rebuild the chains between >> phases. > It looks expensive - is it possible to keep them up-to-date instead? That's what I've been doing up till now. There appears to be a problem with rescanning and the DF_DU_CHAIN and DF_UD_CHAIN dataflow problems. Whether I use df_process_deferred_rescans or df_insn_rescan_all, I see a problem where now and then a use-def chain gets lost. As I looked into it further, I found this comment on df_insn_rescan_all: /* Rescan all of the insns in the function. Note that the artificial uses and defs are not touched. This function will destroy def-use or use-def chains. */ So if that's accurate, it appears that the rescan machinery is not compatible with du-/ud-chains, which is consistent with my experience. This is the second time we've encountered this issue. It doesn't come up often, but when it does, it's a real head-scratcher to debug. If an expert in the df code thinks the above comment is erroneous, then there is at least a subtle bug somewhere in the df code that would be good to fix. But seeing the comment, I felt warned off completely from rescan. This at least allows us to fix the problem that the OpenCV community found. Thanks, Bill > >> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. >> I've not included a test case because the problem tends to get lost in >> reduction, >> and may shift over time anyway. Is this okay for trunk, and eventual >> backport >> to 8 and 7? >> >> Thanks! >> >> Bill >> >> >> 2019-03-07 Bill Schmidt >> >> * config/rs6000/rs6000-p8swap.c (rs6000_analyze_swaps): Rebuild >> ud- and du-chains between phases. >> >> >> Index: gcc/config/rs6000/rs6000-p8swap.c >> === >> --- gcc/config/rs6000/rs6000-p8swap.c (revision 269471) >> +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) >> @@ -2316,7 +2316,14 @@ rs6000_analyze_swaps (function *fun) >> >>/* Pre-pass to recombine lvx and stvx patterns so we don't lose info. */ >>recombine_lvx_stvx_patterns (fun); >> + >> + /* Rebuild ud- and du-chains. */ >> + df_remove_problem (df_chain); >>df_process_deferred_rescans (); >> + df_set_flags (DF_RD_PRUNE_DEAD_DEFS); >> + df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN); >> + df_analyze (); >> + df_set_flags (DF_DEFER_INSN_RESCAN); >> >>/* Allocate structure to represent webs of insns. */ >>insn_entry = XCNEWVEC (swap_web_entry, get_max_uid ()); >>
Re: [PATCH] x86: Disable jump tables when retpolines are used (PR target/86952).
> Hi. > > Thanks to Intel guys, we've done some re-measurement in PR86952 > about usage of jump tables when retpolines are used. > Numbers prove that disabling of JT should be the best for now. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? OK, thanks! I wonder if there is some threshold for extremely large jumptables where branchy sequence will loose, but I think it is better to disable them than what we have right now. Honza
Re: [C++ PATCH] PR c++/88123 - lambda and using-directive.
On 3/7/19 9:54 PM, Jason Merrill wrote: For named function calls in a template, the result of unqualified lookup is safed in CALL_EXPR_FN. But for operator expressions, no unqualified lookup is performed until we know whether the operands have class type. So when we see in a lambda a use of an operator that might be overloaded, we need to do that lookup then and save it away somewhere. One possibility would be in the expression, but we can't really add extra conditional operands to standard tree codes. I mostly implemented another approach using a new WITH_LOOKUP_EXPR code, but teaching everywhere how to handle a new tree code is always complicated. Then it occurred to me that we could associate the lookups with the function, which is both simpler and smaller. So this patch stores any operator bindings needed by a lambda function in an internal attribute on the lambda call operator. Tested x86_64-pc-linux-gnu, applying to trunk. Nathan, does slipping these bindings into the sk_function_parms binding level this way make sense to you? oo sneaky, I like it. Perhaps a comment explaining the surprising scope these are being pushed into? nathan -- Nathan Sidwell
Re: [PATCH] x86: Disable jump tables when retpolines are used (PR target/86952).
On 3/8/19 1:44 PM, Jan Hubicka wrote: >> Hi. >> >> Thanks to Intel guys, we've done some re-measurement in PR86952 >> about usage of jump tables when retpolines are used. >> Numbers prove that disabling of JT should be the best for now. >> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. >> >> Ready to be installed? > > OK, thanks! > I wonder if there is some threshold for extremely large jumptables where > branchy sequence will loose, but I think it is better to disable them > than what we have right now. I tested switch statements up to 4096 and it was still slower ;) Martin > > Honza >
Re: [C++ PATCH] Add -fconstexpr-loop-nest-limit= option (PR c++/87481)
On Fri, Mar 8, 2019 at 12:45 PM Jakub Jelinek wrote: > > On Fri, Mar 08, 2019 at 12:25:31PM +0100, Jakub Jelinek wrote: > > Maybe. The question is what exactly should we count. We could count only > > in the cxx_eval_statement_list loop individual non-debug statements, > > but that would miss statements that aren't in STATEMENT_LISTs. > > Or we could count number of cxx_eval_constant_expression calls > > (perhaps after the initial if (jump_target && *jump_target) skipping) > > within a single cxx_eval_outermost_constant_expr, but we couldn't call that > > number of statements even when it would be much more accurate on how long > > actually something will take to evaluate, because one can have a single > > statement with hundreds of thousands of operations in it. > > So -fconstexpr-ops-limit= ? Probably should use Host_Wide_Int in that case > > rather than UInteger though, so users can allow more than 4G ops. > > Or, if we wanted to count statements, perhaps count just specific subset > of tree codes in cxx_eval_statement_list, like *_STMT, RETURN_EXPR, > SWITCH_EXPR, anything else that must be a statement rather than some > subexpression and is usually not wrapped with EXPR_STMT? > > Or count both statements and ops and have two limits. > > With the ops limit I'm afraid even if we have quite high default, it might > trigger if people use huge multi-megabyte constexpr arrays or similar. Well, I think having only a single limit is desirable which means we have to count something resembling the overall work done. We don't have to document that it matches "statements" (whatever that exactly would be). I don't see multi-megabyte constexpr arrays as an issue - do you think we'd reject them already when parsing them? We could always decide to elide counting things we didn't do any real work on (like a literal '1' initializer). Richard. > Jakub
Re: [PATCH] rs6000: Fix lost ud chains in swap optimization
On Fri, Mar 8, 2019 at 1:35 PM Bill Schmidt wrote: > > On 3/8/19 4:40 AM, Richard Biener wrote: > > On Fri, Mar 8, 2019 at 1:34 AM Bill Schmidt wrote: > > Hi, > > We recently discovered a problem in swap optimization where the du- and > ud-chains > were getting corrupted after a preliminary modification phase and prior to the > main body of the pass. The fix for this is to rebuild the chains between > phases. > > It looks expensive - is it possible to keep them up-to-date instead? > > That's what I've been doing up till now. There appears to be a problem with > rescanning and the DF_DU_CHAIN and DF_UD_CHAIN dataflow problems. Whether I > use df_process_deferred_rescans or df_insn_rescan_all, I see a problem where > now and then a use-def chain gets lost. > > As I looked into it further, I found this comment on df_insn_rescan_all: > > /* Rescan all of the insns in the function. Note that the artificial >uses and defs are not touched. This function will destroy def-use >or use-def chains. */ > > So if that's accurate, it appears that the rescan machinery is not compatible > with du-/ud-chains, which is consistent with my experience. This is the > second time we've encountered this issue. It doesn't come up often, but > when it does, it's a real head-scratcher to debug. > > If an expert in the df code thinks the above comment is erroneous, then there > is at least a subtle bug somewhere in the df code that would be good to fix. > But seeing the comment, I felt warned off completely from rescan. This at > least allows us to fix the problem that the OpenCV community found. OK, I see - I'm not very familiar with DF. > Thanks, > Bill > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. > I've not included a test case because the problem tends to get lost in > reduction, > and may shift over time anyway. Is this okay for trunk, and eventual backport > to 8 and 7? > > Thanks! > > Bill > > > 2019-03-07 Bill Schmidt > > * config/rs6000/rs6000-p8swap.c (rs6000_analyze_swaps): Rebuild > ud- and du-chains between phases. > > > Index: gcc/config/rs6000/rs6000-p8swap.c > === > --- gcc/config/rs6000/rs6000-p8swap.c (revision 269471) > +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) > @@ -2316,7 +2316,14 @@ rs6000_analyze_swaps (function *fun) > >/* Pre-pass to recombine lvx and stvx patterns so we don't lose info. */ >recombine_lvx_stvx_patterns (fun); > + > + /* Rebuild ud- and du-chains. */ > + df_remove_problem (df_chain); >df_process_deferred_rescans (); > + df_set_flags (DF_RD_PRUNE_DEAD_DEFS); > + df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN); > + df_analyze (); > + df_set_flags (DF_DEFER_INSN_RESCAN); > >/* Allocate structure to represent webs of insns. */ >insn_entry = XCNEWVEC (swap_web_entry, get_max_uid ()); > >
[PATCH] Add fixed underlying type to enum path::format
* include/bits/fs_path.h (path::format): Add fixed underlying type. Tested powerpc64le-linux, committed to trunk. commit 9bb37d0b7196d4a97d65ff597af884e2fa4ebb52 Author: Jonathan Wakely Date: Thu Mar 7 15:00:56 2019 + Add fixed underlying type to enum path::format * include/bits/fs_path.h (path::format): Add fixed underlying type. diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h index 077045e6c78..96033f68c36 100644 --- a/libstdc++-v3/include/bits/fs_path.h +++ b/libstdc++-v3/include/bits/fs_path.h @@ -166,7 +166,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 #endif typedef std::basic_string string_type; -enum format { native_format, generic_format, auto_format }; +enum format : unsigned char { native_format, generic_format, auto_format }; // constructors and destructor
[PATCH] Fix text of hyperlink in manual
* doc/xml/manual/using.xml: Use link element instead of xref. * doc/html/*: Regenerate. Committed to trunk. commit 756ccc4dcdcc318d811b67e027202251d7dac57f Author: Jonathan Wakely Date: Fri Mar 8 13:38:31 2019 + Fix text of hyperlink in manual * doc/xml/manual/using.xml: Use link element instead of xref. * doc/html/*: Regenerate. diff --git a/libstdc++-v3/doc/xml/manual/using.xml b/libstdc++-v3/doc/xml/manual/using.xml index 2d44a739406..7647e9b8dad 100644 --- a/libstdc++-v3/doc/xml/manual/using.xml +++ b/libstdc++-v3/doc/xml/manual/using.xml @@ -1192,8 +1192,8 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 hello.cc -o test.exe enabled for std::vector> and only when std::allocator is derived from -new_allocator -or malloc_allocator. The annotations +new_allocator +or malloc_allocator. The annotations must be present on all vector operations or none, so this macro must be defined to the same value for all translation units that create, destroy or modify vectors.
Re: [PATCH] x86: Disable jump tables when retpolines are used (PR target/86952).
> On 3/8/19 1:44 PM, Jan Hubicka wrote: > >> Hi. > >> > >> Thanks to Intel guys, we've done some re-measurement in PR86952 > >> about usage of jump tables when retpolines are used. > >> Numbers prove that disabling of JT should be the best for now. > >> > >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > >> > >> Ready to be installed? > > > > OK, thanks! > > I wonder if there is some threshold for extremely large jumptables where > > branchy sequence will loose, but I think it is better to disable them > > than what we have right now. > > I tested switch statements up to 4096 and it was still slower ;) Well, we have switch statements with this many cases in insn-attrtab/latencytab tables. I suppose what kind of code you compile with retpolines, but I would expect even bigger switch statements to appear in real code (large DFA automatons and such) Honza > > Martin > > > > > Honza > > >
[PATCH] Add tests for resolved PR (PR c/85870).
Hi. I'm going to install Jakub's test-case for a PR that's fixed on trunk. Martin gcc/testsuite/ChangeLog: 2019-03-08 Jakub Jelinek PR c/85870 * gcc.dg/lto/pr85870_0.c: New test. * gcc.dg/lto/pr85870_1.c: New test. --- gcc/testsuite/gcc.dg/lto/pr85870_0.c | 34 gcc/testsuite/gcc.dg/lto/pr85870_1.c | 27 ++ 2 files changed, 61 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/lto/pr85870_0.c create mode 100644 gcc/testsuite/gcc.dg/lto/pr85870_1.c diff --git a/gcc/testsuite/gcc.dg/lto/pr85870_0.c b/gcc/testsuite/gcc.dg/lto/pr85870_0.c new file mode 100644 index 000..4b59f607ebd --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr85870_0.c @@ -0,0 +1,34 @@ +/* PR c/85870 */ +/* { dg-lto-do link } */ +/* { dg-lto-options { { -flto -O2 } } } */ + +typedef struct abc_s { + char a1; + short a2; + unsigned int a3; + unsigned int a4; +} abc; + +typedef struct xyz_s { + unsigned x1; + unsigned x2; + abc *x3; +} xyz; + +extern xyz XYZ[3]; +static const abc Arr1[]={ + {0,0,0xdeadbeaf,0xbeefdead} , +#line 1040 +{0,0,0xdeadbeaf,0xbeefdead} }; + +void init_xyz_0() { + XYZ[0].x1=975753; + XYZ[0].x2=1024; + XYZ[0].x3=(abc *)Arr1; + +} + +int +main () +{ +} diff --git a/gcc/testsuite/gcc.dg/lto/pr85870_1.c b/gcc/testsuite/gcc.dg/lto/pr85870_1.c new file mode 100644 index 000..cd1cd310164 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/pr85870_1.c @@ -0,0 +1,27 @@ +typedef struct abc_s { + char a1; + short a2; + unsigned int a3; + unsigned int a4; +} abc; + + +typedef struct xyz_s { + unsigned int x1; + unsigned int x2; + abc *x3; +} xyz; + + +extern xyz XYZ[3]; +static const abc Arr2[]={ + {0,0,0xbeafdead,0xdeadbeef} , +#line 1048594 + {0,0,0xbeafdead,0xdeadbeef} }; + +void init_xyz_1() { + XYZ[1].x1=425753; + XYZ[1].x2=1048576; + XYZ[1].x3=(abc *)Arr2; + +}
Re: [PATCH] x86: Disable jump tables when retpolines are used (PR target/86952).
On 3/8/19 2:58 PM, Jan Hubicka wrote: >> On 3/8/19 1:44 PM, Jan Hubicka wrote: Hi. Thanks to Intel guys, we've done some re-measurement in PR86952 about usage of jump tables when retpolines are used. Numbers prove that disabling of JT should be the best for now. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? >>> >>> OK, thanks! >>> I wonder if there is some threshold for extremely large jumptables where >>> branchy sequence will loose, but I think it is better to disable them >>> than what we have right now. >> >> I tested switch statements up to 4096 and it was still slower ;) > > Well, we have switch statements with this many cases in > insn-attrtab/latencytab tables. I suppose what kind of code you compile > with retpolines, but I would expect even bigger switch statements to > appear in real code (large DFA automatons and such) Question is whether you'll meet such huge switch statements in Linux kernel? Martin > > Honza >> >> Martin >> >>> >>> Honza >>> >>
Re: [PATCH] x86: Disable jump tables when retpolines are used (PR target/86952).
On Fri, Mar 08, 2019 at 02:50:26PM +0100, Martin Liška wrote: > On 3/8/19 1:44 PM, Jan Hubicka wrote: > >> Hi. > >> > >> Thanks to Intel guys, we've done some re-measurement in PR86952 > >> about usage of jump tables when retpolines are used. > >> Numbers prove that disabling of JT should be the best for now. > >> > >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > >> > >> Ready to be installed? > > > > OK, thanks! > > I wonder if there is some threshold for extremely large jumptables where > > branchy sequence will loose, but I think it is better to disable them > > than what we have right now. > > I tested switch statements up to 4096 and it was still slower ;) Try one with 1000 of entries ;) and also compare code size and data segment size. Jakub
[PATCH] Cleanup TODO handling for CFG clenaup & SSA update
There's an old comment /* When cleanup_tree_cfg merges consecutive blocks, it may perform some simplistic propagation when removing single valued PHI nodes. This propagation may, in turn, cause the SSA form to become out-of-date (see PR 22037). So, even if the parent pass had not scheduled an SSA update, we may still need to do one. */ if (!(flags & TODO_update_ssa_any) && need_ssa_update_p (cfun)) flags |= TODO_update_ssa; which is from times we've had multiple virtual operands. After those went away we could still run into this for example when propagating a non-const function address into an indirect call through a const function type. This has been fixed as well (we retain the const-ness of the call). Thus the above is no longer necessary and we can simplify the code. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. I'm not really nervous about this change but if you think it should wait for GCC 10 speak up. Richard. 2019-03-08 Richard Biener * passes.c (execute_function_todo): Remove dead code. Index: gcc/passes.c === --- gcc/passes.c(revision 269302) +++ gcc/passes.c(working copy) @@ -1924,26 +1924,12 @@ execute_function_todo (function *fn, voi push_cfun (fn); - /* Always cleanup the CFG before trying to update SSA. */ + /* If we need to cleanup the CFG let it perform a needed SSA update. */ if (flags & TODO_cleanup_cfg) -{ - cleanup_tree_cfg (flags & TODO_update_ssa_any); - - /* When cleanup_tree_cfg merges consecutive blocks, it may -perform some simplistic propagation when removing single -valued PHI nodes. This propagation may, in turn, cause the -SSA form to become out-of-date (see PR 22037). So, even -if the parent pass had not scheduled an SSA update, we may -still need to do one. */ - if (!(flags & TODO_update_ssa_any) && need_ssa_update_p (cfun)) - flags |= TODO_update_ssa; -} - - if (flags & TODO_update_ssa_any) -{ - unsigned update_flags = flags & TODO_update_ssa_any; - update_ssa (update_flags); -} +cleanup_tree_cfg (flags & TODO_update_ssa_any); + else if (flags & TODO_update_ssa_any) +update_ssa (flags & TODO_update_ssa_any); + gcc_assert (!need_ssa_update_p (fn)); if (flag_tree_pta && (flags & TODO_rebuild_alias)) compute_may_aliases ();
Re: [PATCH] Cleanup TODO handling for CFG clenaup & SSA update
On 3/8/19 7:23 AM, Richard Biener wrote: > > There's an old comment > > /* When cleanup_tree_cfg merges consecutive blocks, it may > perform some simplistic propagation when removing single > valued PHI nodes. This propagation may, in turn, cause the > SSA form to become out-of-date (see PR 22037). So, even > if the parent pass had not scheduled an SSA update, we may > still need to do one. */ > if (!(flags & TODO_update_ssa_any) && need_ssa_update_p (cfun)) > flags |= TODO_update_ssa; > > which is from times we've had multiple virtual operands. After > those went away we could still run into this for example when > propagating a non-const function address into an indirect call > through a const function type. This has been fixed as well > (we retain the const-ness of the call). Thus the above is > no longer necessary and we can simplify the code. > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > I'm not really nervous about this change but if you think it > should wait for GCC 10 speak up. > > Richard. > > 2019-03-08 Richard Biener > > * passes.c (execute_function_todo): Remove dead code. What's driving the desire to change this for gcc-9? I think it's a fine cleanup for gcc-10, but it's not clear to me we want to push it into gcc-9. jeff
Re: [C++ PATCH] PR c++/88123 - lambda and using-directive.
On Fri, Mar 8, 2019 at 8:04 AM Nathan Sidwell wrote: > > On 3/7/19 9:54 PM, Jason Merrill wrote: > > For named function calls in a template, the result of unqualified lookup is > > safed in CALL_EXPR_FN. But for operator expressions, no unqualified lookup > > is performed until we know whether the operands have class type. So when we > > see in a lambda a use of an operator that might be overloaded, we need to do > > that lookup then and save it away somewhere. One possibility would be in > > the expression, but we can't really add extra conditional operands to > > standard tree codes. I mostly implemented another approach using a new > > WITH_LOOKUP_EXPR code, but teaching everywhere how to handle a new tree code > > is always complicated. Then it occurred to me that we could associate the > > lookups with the function, which is both simpler and smaller. So this patch > > stores any operator bindings needed by a lambda function in an internal > > attribute on the lambda call operator. > > > > Tested x86_64-pc-linux-gnu, applying to trunk. Nathan, does slipping these > > bindings into the sk_function_parms binding level this way make sense to > > you? > > oo sneaky, I like it. Perhaps a comment explaining the surprising scope > these are being pushed into? The comment for push_operator_bindings already says "push the bindings we saved away in maybe_save_op_lookup into the function parameter binding level", are you thinking of something else/more detailed? Jason
Re: [PATCH] Cleanup TODO handling for CFG clenaup & SSA update
On March 8, 2019 3:52:36 PM GMT+01:00, Jeff Law wrote: >On 3/8/19 7:23 AM, Richard Biener wrote: >> >> There's an old comment >> >> /* When cleanup_tree_cfg merges consecutive blocks, it may >> perform some simplistic propagation when removing single >> valued PHI nodes. This propagation may, in turn, cause the >> SSA form to become out-of-date (see PR 22037). So, even >> if the parent pass had not scheduled an SSA update, we may >> still need to do one. */ >> if (!(flags & TODO_update_ssa_any) && need_ssa_update_p (cfun)) >> flags |= TODO_update_ssa; >> >> which is from times we've had multiple virtual operands. After >> those went away we could still run into this for example when >> propagating a non-const function address into an indirect call >> through a const function type. This has been fixed as well >> (we retain the const-ness of the call). Thus the above is >> no longer necessary and we can simplify the code. >> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. >> >> I'm not really nervous about this change but if you think it >> should wait for GCC 10 speak up. >> >> Richard. >> >> 2019-03-08 Richard Biener >> >> * passes.c (execute_function_todo): Remove dead code. >What's driving the desire to change this for gcc-9? I think it's a >fine >cleanup for gcc-10, but it's not clear to me we want to push it into >gcc-9. Just that I came along this with the previous related CFG cleanup fix and got the time to test it. Queued for GCC 10 instead. Richard. >jeff
Re: [C++ PATCH] PR c++/88123 - lambda and using-directive.
On 3/8/19 10:30 AM, Jason Merrill wrote: The comment for push_operator_bindings already says "push the bindings we saved away in maybe_save_op_lookup into the function parameter binding level", are you thinking of something else/more detailed? good enough, thanks -- Nathan Sidwell
Re: [PATCH] PR88497 - Extend reassoc for vector bit_field_ref
Hi Kewen, Just one quick note: On Fri, Mar 08, 2019 at 09:40:30AM +0800, Kewen.Lin wrote: > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */ > +typedef double v2df __attribute__ ((vector_size (16))); > +double > +test (double accumulator, v2df arg1[], v2df arg2[]) > +{ > + v2df temp; > + temp = arg1[0] * arg2[0]; > + accumulator += temp[0] + temp[1]; > + temp = arg1[1] * arg2[1]; > + accumulator += temp[0] + temp[1]; > + temp = arg1[2] * arg2[2]; > + accumulator += temp[0] + temp[1]; > + temp = arg1[3] * arg2[3]; > + accumulator += temp[0] + temp[1]; > + return accumulator; > +} > +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" } } */ Please write a word or two about what that test is testing? This helps a lot when a test starts failing. Segher
Re: [PATCH] rs6000: Fix lost ud chains in swap optimization
Hi Bill, On Thu, Mar 07, 2019 at 06:33:48PM -0600, Bill Schmidt wrote: > We recently discovered a problem in swap optimization where the du- and > ud-chains > were getting corrupted after a preliminary modification phase and prior to the > main body of the pass. The fix for this is to rebuild the chains between > phases. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. > I've not included a test case because the problem tends to get lost in > reduction, > and may shift over time anyway. Is this okay for trunk, and eventual backport > to 8 and 7? Like Richard I think this is expensive. But I don't think you can do better currently, so, okay for trunk and backports. Thanks! Segher
[PATCH, rs6000] PR87532: Bad Results from vec_extract(unsigned char, foo) dependent upon function inline
This problem report, though initially motivated by differences in behavior between constant and non-constant selector arguments, uncovered a number of inconsistencies in the implementation of vec_extract. This patch provides several fixes to make handling of constant selector expressions the same as the handling of non-constant selector expressions. In the process of testing, it was observed that certain existing regression tests were looking for the wrong instructions to be emitted and those tests have been updated. This has bootstrapped and tested without regressions on powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P7 big-endian, with both -m32 and -m64 target options). Is this ok for trunk? gcc/ChangeLog: 2019-03-08 Kelvin Nilsen PR target/87532 * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): When handling vec-extract, use modular arithmetic to allow constant selectors greater than vector length. * config/rs6000/rs6000.c (rs6000_expand_vector_extract): Allow V1TImode vectors to have constant selector values greater than 0. Use modular arithmetic to compute vector index. (rs6000_split_vec_extract_var): Use modular arithmetic to compute index for in-memory vectors. Correct code generation for in-register vectors. (altivec_expand_vec_ext_builtin): Use modular arithmetic to compute index. gcc/testsuite/ChangeLog: 2019-03-08 Kelvin Nilsen PR target/87532 * gcc.target/powerpc/vsx-builtin-10a.c: New test. * gcc.target/powerpc/vsx-builtin-20a.c: New test. * gcc.target/powerpc/vsx-builtin-11b.c: New test. * gcc.target/powerpc/vsx-builtin-9b.c: New test. * gcc.target/powerpc/vsx-builtin-12a.c: New test. * gcc.target/powerpc/vsx-builtin-13b.c: New test. * gcc.target/powerpc/vsx-builtin-14a.c: New test. * gcc.target/powerpc/vec-extract-v16qiu-v2a.c: New test. * gcc.target/powerpc/vsx-builtin-15b.c: New test. * gcc.target/powerpc/vsx-builtin-16a.c: New test. * gcc.target/powerpc/vsx-builtin-17b.c: New test. * gcc.target/powerpc/vsx-builtin-18a.c: New test. * gcc.target/powerpc/pr87532-mc.c: New test. * gcc.target/powerpc/vsx-builtin-19b.c: New test. * gcc.target/powerpc/vsx-builtin-10b.c: New test. * gcc.target/powerpc/vsx-builtin-11a.c: New test. * gcc.target/powerpc/vsx-builtin-9a.c: New test. * gcc.target/powerpc/vsx-builtin-20b.c: New test. * gcc.target/powerpc/vsx-builtin-12b.c: New test. * gcc.target/powerpc/vsx-builtin-13a.c: New test. * gcc.target/powerpc/vsx-builtin-14b.c: New test. * gcc.target/powerpc/vsx-builtin-15a.c: New test. * gcc.target/powerpc/vec-extract-v16qiu-v2b.c: New test. * gcc.target/powerpc/pr87532.c: New test. * gcc.target/powerpc/vsx-builtin-16b.c: New test. * gcc.target/powerpc/vec-extract-v16qiu-v2.h: New test. * gcc.target/powerpc/vsx-builtin-17a.c: New test. * gcc.target/powerpc/vsx-builtin-18b.c: New test. * gcc.target/powerpc/vsx-builtin-19a.c: New test. * gcc.target/powerpc/fold-vec-extract-char.p8.c: Modify expected instruction selection. * gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise. * gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise. Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c === --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c (revision 0) @@ -0,0 +1,157 @@ +/* { dg-do run { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target dfp_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } */ +/* { dg-options "-maltivec" } */ + +/* This test should run the same on any target that supports altivec/dfp + instructions. Intentionally not specifying cpu in order to test + all code generation paths. */ + +#include + +extern void abort (void); + +#define CONST0 (0) +#define CONST1 (1) +#define CONST2 (2) +#define CONST3 (3) +#define CONST4 (4) +#define CONST5 (5) +#define CONST6 (6) +#define CONST7 (7) + + +/* Test that indices > length of vector are applied modulo the vector + length. */ + +/* Test for vector residing in register. */ +short s3 (vector short v) +{ + return __builtin_vec_ext_v8hi (v, 3); +} + +short s7 (vector short v) +{ + return __builtin_vec_ext_v8hi (v, 7); +} + +short s21 (vector short v) +{ + return __builtin_vec_ext_v8hi (v, 21); +} + +short s30 (vector short v) +{ + return __builtin_vec_ext_v8hi (v, 30); +} + +/* Test for vector residing in memory. */ +short ms3 (vector short *vp) +{ + return __builtin_v
Re: [PATCH 6/7 v2] i386.c: make "sorry" message more amenable to translation (PR target/79926)
On Fri, Mar 10, 2017 at 09:21:53PM -0500, David Malcolm wrote: > Martin pointed out that the wording of these messages could be improved > by adding an article, and by adding quotes to "no_caller_saved_registers" > > Here's a revised version that makes those changes (in addition to > the ones you suggested). > > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. > > OK for trunk? > > gcc/ChangeLog: > PR target/79926 > * config/i386/i386.c (ix86_set_current_function): Make "sorry" > messages more amenable to translation, and improve wording. > > gcc/testsuite/ChangeLog: > PR target/79926 > * gcc.target/i386/interrupt-387-err-1.c: Update expected message. > * gcc.target/i386/interrupt-387-err-2.c: Likewise. > * gcc.target/i386/interrupt-bnd-err-1.c: Likewise. > * gcc.target/i386/interrupt-bnd-err-2.c: Likewise. > * gcc.target/i386/interrupt-mmx-err-1.c: Likewise. > * gcc.target/i386/interrupt-mmx-err-2.c: Likewise. Sorry for the delay, I can't find this mail in my mailbox (grabbed it from the archives). This is ok for trunk. > --- > gcc/config/i386/i386.c | 13 - > gcc/testsuite/gcc.target/i386/interrupt-387-err-1.c | 4 ++-- > gcc/testsuite/gcc.target/i386/interrupt-387-err-2.c | 2 +- > gcc/testsuite/gcc.target/i386/interrupt-bnd-err-1.c | 4 ++-- > gcc/testsuite/gcc.target/i386/interrupt-bnd-err-2.c | 2 +- > gcc/testsuite/gcc.target/i386/interrupt-mmx-err-1.c | 4 ++-- > gcc/testsuite/gcc.target/i386/interrupt-mmx-err-2.c | 2 +- > 7 files changed, 17 insertions(+), 14 deletions(-) > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index e705a3e..9fbf8d0 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -7271,12 +7271,15 @@ ix86_set_current_function (tree fndecl) >if (isa != NULL) > { > if (cfun->machine->func_type != TYPE_NORMAL) > - sorry ("%s instructions aren't allowed in %s service routine", > -isa, (cfun->machine->func_type == TYPE_EXCEPTION > - ? "exception" : "interrupt")); > + sorry (cfun->machine->func_type == TYPE_EXCEPTION > +? G_("%s instructions aren%'t allowed in an" > + " exception service routine") > +: G_("%s instructions aren%'t allowed in an" > + " interrupt service routine"), > +isa); > else > - sorry ("%s instructions aren't allowed in function with " > -"no_caller_saved_registers attribute", isa); > + sorry ("%s instructions aren%'t allowed in a function with " > +"the % attribute", isa); > /* Don't issue the same error twice. */ > cfun->machine->func_type = TYPE_NORMAL; > cfun->machine->no_caller_saved_registers = false; > diff --git a/gcc/testsuite/gcc.target/i386/interrupt-387-err-1.c > b/gcc/testsuite/gcc.target/i386/interrupt-387-err-1.c > index 3fbdc88..8561a3c 100644 > --- a/gcc/testsuite/gcc.target/i386/interrupt-387-err-1.c > +++ b/gcc/testsuite/gcc.target/i386/interrupt-387-err-1.c > @@ -6,11 +6,11 @@ typedef unsigned int uword_t __attribute__ ((mode > (__word__))); > void > __attribute__((interrupt)) > fn1 (void *frame, uword_t error) > -{ /* { dg-message "80387 instructions aren't allowed in exception service > routine" } */ > +{ /* { dg-message "80387 instructions aren't allowed in an exception service > routine" } */ > } > > void > __attribute__((interrupt)) > fn2 (void *frame) > -{ /* { dg-message "80387 instructions aren't allowed in interrupt service > routine" } */ > +{ /* { dg-message "80387 instructions aren't allowed in an interrupt service > routine" } */ > } > diff --git a/gcc/testsuite/gcc.target/i386/interrupt-387-err-2.c > b/gcc/testsuite/gcc.target/i386/interrupt-387-err-2.c > index 3203d64..9810f18 100644 > --- a/gcc/testsuite/gcc.target/i386/interrupt-387-err-2.c > +++ b/gcc/testsuite/gcc.target/i386/interrupt-387-err-2.c > @@ -4,5 +4,5 @@ > void > __attribute__((no_caller_saved_registers)) > fn1 (void) > -{ /* { dg-message "80387 instructions aren't allowed in function with > no_caller_saved_registers attribute" } */ > +{ /* { dg-message "80387 instructions aren't allowed in a function with the > 'no_caller_saved_registers' attribute" } */ > } > diff --git a/gcc/testsuite/gcc.target/i386/interrupt-bnd-err-1.c > b/gcc/testsuite/gcc.target/i386/interrupt-bnd-err-1.c > index e07688e..1126fca 100644 > --- a/gcc/testsuite/gcc.target/i386/interrupt-bnd-err-1.c > +++ b/gcc/testsuite/gcc.target/i386/interrupt-bnd-err-1.c > @@ -6,11 +6,11 @@ typedef unsigned int uword_t __attribute__ ((mode > (__word__))); > void > __attribute__((interrupt)) > fn1 (void *frame) > -{ /* { dg-message "MPX instructions aren't allowed in interrupt service > routine" } */ > +{ /* { dg-message "MPX instructions aren't allowed in an interrupt service > ro
[GCC, Arm, committed] Fix availability of FP16-FP64 conversion instructions
Hi, vcvtb.f16.f64 and vcvtb.f64.f16 were being made available even for FPUs that do not support double precision. This patch fixes that. Regression tested for arm-none-eabi. Committed in r269499. Cheers, Andre gcc/ChangeLog: 2019-03-08 Andre Vieira * config/arm/arm.h (TARGET_FP16_TO_DOUBLE): Add TARGET_VFP_DOUBLE requirement. gcc/testsuite/ChangeLog: 2019-03-08 Andre Vieira * gcc.target/arm/f16_f64_conv_no_dp.c: New test. >From 870d88c1d2cf9f1e11ab85b4048739c0c98e9a06 Mon Sep 17 00:00:00 2001 From: Andre Vieira Date: Fri, 8 Mar 2019 16:11:10 + Subject: [PATCH] [GCC, Arm] Fix availability of FP16-FP64 conversion instructions vcvtb.f16.f64 and vcvtb.f64.f16 were being made available even for FPUs that do not support double precision. This patch fixes that. --- gcc/config/arm/arm.h | 2 +- gcc/testsuite/gcc.target/arm/f16_f64_conv_no_dp.c | 15 +++ 2 files changed, 16 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/arm/f16_f64_conv_no_dp.c diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 103d390dd17273e947bbd4d605be2c1ef70fb137..7adafead0f20832467b873f53d298a2e1a25ab0a 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -195,7 +195,7 @@ extern tree arm_fp16_type_node; /* FPU supports converting between HFmode and DFmode in a single hardware step. */ #define TARGET_FP16_TO_DOUBLE \ - (TARGET_HARD_FLOAT && (TARGET_FP16 && TARGET_VFP5)) + (TARGET_HARD_FLOAT && TARGET_FP16 && TARGET_VFP5 && TARGET_VFP_DOUBLE) /* FPU supports fused-multiply-add operations. */ #define TARGET_FMA (bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv4)) diff --git a/gcc/testsuite/gcc.target/arm/f16_f64_conv_no_dp.c b/gcc/testsuite/gcc.target/arm/f16_f64_conv_no_dp.c new file mode 100644 index ..99b62a8ffd5de5c61c08980a067d8fc08cb24b11 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/f16_f64_conv_no_dp.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-skip-if "do not override fpu" { *-*-* } { "-mfpu=*" } { "-mfpu=fpv5-sp-d16" } } */ +/* { dg-skip-if "do not disable fpu" { *-*-* } { "-mfloat-abi=soft" } { * } } */ +/* { dg-skip-if "do not override fp16-format" { *-*-* } { "-mfp16-format=*" } { "-mfp16-format=ieee" } } */ +/* { dg-options "-O1 -mfpu=fpv5-sp-d16 -mfloat-abi=hard -mfp16-format=ieee" } */ + +__fp16 foo (double a) +{ + return a; +} + +double bar (__fp16 a) +{ + return a; +} -- 2.17.1
Re: [patch, fortran] Fix PR 66089, ICE (plus wrong code) in dependency handling
Am 08.03.19 um 08:04 schrieb Bernhard Reutner-Fischer: Please change call abort to stop N in the test? Done. Anything else? OK for trunk? Regards Thomas
[committed] Fix POLY_INT_CST/CONST_POLY_INT typo (PR 89631)
Tested on aarch64-linux-gnu (with SVE) and x86_64-linux-gnu. Applied as obvious. Richard 2019-03-08 Richard Sandiford gcc/ PR debug/89631 * dwarf2cfi.c (dwarf2out_frame_debug_expr): Use CONST_POLY_INT instead of POLY_INT_CST. -- Index: gcc/dwarf2cfi.c === --- gcc/dwarf2cfi.c 2019-03-08 18:15:36.700740351 + +++ gcc/dwarf2cfi.c 2019-03-08 18:17:35.892287413 + @@ -1778,7 +1778,7 @@ dwarf2out_frame_debug_expr (rtx expr) /* Rule 6 */ case CONST_INT: - case POLY_INT_CST: + case CONST_POLY_INT: cur_trace->cfa_temp.reg = dwf_regno (dest); cur_trace->cfa_temp.offset = rtx_to_poly_int64 (src); break;
[committed] Add testcase for already fixed PR c++/82075
Hi! This got fixed with r257057 aka PR84031 fix, but the committed testcase doesn't resemble anything close to this one. Tested on x86_64-linux, committed to trunk as obvious. 2019-03-08 Jakub Jelinek PR c++/82075 * g++.dg/cpp1z/decomp49.C: New test. --- gcc/testsuite/g++.dg/cpp1z/decomp49.C.jj2019-03-08 19:25:32.680069446 +0100 +++ gcc/testsuite/g++.dg/cpp1z/decomp49.C 2019-03-08 19:26:06.192521962 +0100 @@ -0,0 +1,14 @@ +// PR c++/82075 +// { dg-do run { target c++11 } } +// { dg-options "" } + +struct B { }; +struct D : B { int i; }; + +int +main () +{ + auto [i] = D{}; // { dg-warning "only available with" "" { target c++14_down } } + if (i != 0) +__builtin_abort (); +} Jakub
Re: [PATCH, rs6000] PR87532: Bad Results from vec_extract(unsigned char, foo) dependent upon function inline
Hi Kelvin, On Fri, Mar 08, 2019 at 10:59:35AM -0600, Kelvin Nilsen wrote: > This problem report, though initially motivated by differences in behavior > between constant and non-constant selector arguments, uncovered a number of > inconsistencies in the implementation of vec_extract. > > This patch provides several fixes to make handling of constant selector > expressions the same as the handling of non-constant selector expressions. > In the process of testing, it was observed that certain existing regression > tests were looking for the wrong instructions to be emitted and those tests > have been updated. > > This has bootstrapped and tested without regressions on > powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P7 > big-endian, with both -m32 and -m64 target options). Thanks for the patch. I have lots of comments/questions, mostly about the testcases. Sorry :-/ The actual compiler code part looks good though. > --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c(revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c(revision 0) > @@ -0,0 +1,157 @@ > +/* { dg-do run { target { powerpc*-*-* } } } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ Is there any reason to think this testcase will fail on Darwin? I mean, it requires VSX, and that should skip the testcase already on Darwin? > +/* { dg-require-effective-target dfp_hw } */ Why is that? I don't see any dfp used here? > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } > */ You do not use -mcpu=, so why this dg-skip-if? And please set -mdejagnu-cpu= instead. > --- gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c > (revision 269370) > +++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c > (working copy) > @@ -18,7 +18,7 @@ > /* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target lp64 } } } */ > /* { dg-final { scan-assembler-times {\mvslo\M} 3 { target lp64 } } } */ > /* { dg-final { scan-assembler-times {\mmfvsrd\M} 6 { target lp64 } } } */ > -/* { dg-final { scan-assembler-times {\msradi\M} 3 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times {\msrdi\M} 3 { target lp64 } } } */ Maybe allow both? So {\msra?di\M}? Or does sradi make no sense here? > --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-14a.c(revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-14a.c(revision 0) > @@ -0,0 +1,128 @@ > +/* { dg-do run { target { powerpc*-*-* } } } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > +/* { dg-require-effective-target dfp_hw } */ > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } > */ > +/* { dg-options "-maltivec" } */ > + > +/* This test should run the same on any target that supports altivec/dfp > + instructions. Intentionally not specifying cpu in order to test > + all code generation paths. */ I don't see where dfp comes in? For server CPUs it is p6 and up, the same as AltiVec, but that is coincidental. > --- gcc/testsuite/gcc.target/powerpc/pr87532-mc.c (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/pr87532-mc.c (revision 0) > @@ -0,0 +1,260 @@ > +/* { dg-do run { target { powerpc*-*-* && lp64 } } } */ Check for int128, instead? Or is there another reason to want lp64? > + __asm__ (" # %x0" : "+wa" (a)); "wa" requires VSX. > --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-14b.c(revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-14b.c(revision 0) > @@ -0,0 +1,128 @@ > +/* { dg-do run { target { powerpc*-*-* } } } */ Btw, you can just say { dg-do run }: everything in gcc.target/powerpc is only run for powerpc*-*-* targets. > --- gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2b.c (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2b.c (revision 0) > @@ -0,0 +1,16 @@ > +/* { dg-do run { target { powerpc*-*-linux* } } } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > +/* { dg-require-effective-target vsx_hw } */ > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } > */ > +/* { dg-options "-O2 -mvsx -save-temps -dp -g" } */ I think that is debugging code left over? -dp -g should be harmless, but -save-temps is littering ;-) > --- gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vec-extract-v16qiu-v2.h (revision 0) > +static vector TYPE > +deoptimize (vector TYPE a) > +{ > + __asm__ (" # %x0" : "+wa" (a)); > + return a; > +} Is this only ever compiled with VSX? If not, use "v" instead? > --- gcc/config/rs6000/rs6000-c.c (revision 269370) > +++ gcc/config/rs6000/rs6000-c.c (working copy) > @@ -6568,9 +6568,15 @@ > /* If the second argument is an integer constant, if the value is in >the expected range, generate the built-in code if we can. We need >64-bit and direct move to extract the
Re: [PR 85762, 87008, 85459] Relax MEM_REF check in contains_vce_or_bfcref_p
On March 8, 2019 5:57:35 PM GMT+01:00, Martin Jambor wrote: >Hi, > >BTW, we have dropped the gcc-patches from the CC quite some emails ago >:-/ > >On Fri, Mar 08 2019, Richard Biener wrote: >> yOn Thu, 7 Mar 2019, Martin Jambor wrote: >> >>> Hello again, >>> >>> On Thu, Mar 07 2019, Martin Jambor wrote: >>> > Hi, >>> > >>> > On Thu, Mar 07 2019, Richard Biener wrote: >>> >> >>> >> So possibly we should restrict total scalarization to the case >>> >> where all accesses are layout compatible (that is, >>> >> types_compatible_p)? >>> >> >>> > >>> > That would be the patch below (currently undergoing testing). It >is >>> > slightly more strict than mine but on the three testcases in >question, >>> > it gives the same results too. People memcpying their aggregates >>> > around, even when not doing anything silly like storing stuff in >>> > padding, will find their code slower but arguably closer to what >they >>> > wanted. >>> >>> So, testing revealed that (unlike my previous version) this breaks >two >>> guality tests where we depend on total scalarization to convey to >the >>> debugger what is in bits of an aggregate. >>> >>> In gcc.dg/guality/pr54970.c, the array a is no longer totally >scalarized >>> because it is memcpied into and gdb no longer knows what constant >lurks >>> in a[0], which is otherwise unused. Probably not a big deal. >>> >>> In gcc.dg/guality/pr59776.c, where the ain type and function are: >>> >>> struct S { float f, g; }; >>> >>> __attribute__((noinline, noclone)) void >>> foo (struct S *p) >>> { >>>struct S s1, s2; /* { dg-final { >gdb-test pr59776.c:17 "s1.f" "5.0" } } */ >>>s1 = *p; /* { dg-final { >gdb-test pr59776.c:17 "s1.g" "6.0" } } */ >>>s2 = s1; /* { dg-final { >gdb-test pr59776.c:17 "s2.f" "0.0" } } */ >>>*(int *) &s2.f = 0; /* { dg-final { >gdb-test pr59776.c:17 "s2.g" "6.0" } } */ >>>asm volatile (NOP : : : "memory");/* { dg-final { >gdb-test pr59776.c:20 "s1.f" "5.0" } } */ >>>asm volatile (NOP : : : "memory");/* { dg-final { >gdb-test pr59776.c:20 "s1.g" "6.0" } } */ >>>s2 = s1; /* { dg-final { >gdb-test pr59776.c:20 "s2.f" "5.0" } } */ >>>asm volatile (NOP : : : "memory");/* { dg-final { >gdb-test pr59776.c:20 "s2.g" "6.0" } } */ >>>asm volatile (NOP : : : "memory"); >>> } >>> >>> the zeroing through integer type caused that s2 variable is no >longer >>> totally scalarized and thus somehow opaque for the debugger. I >don't >>> think this particular test is a big concern either, but such scalar >type >>> casted accesses were exactly the reason why I initially decided >against >>> completely disabling total scalarization when I see a >types-incompatible >>> access. >>> >>> In any event, although I would still slightly prefer my patch from >>> yesterday (or at least tolerating scalar type-incompatible >accesses), I >>> am also happy with the one from today, provided I have permission to >>> XFAIL the guality tests. >> >> I guess I'd play safe at this point and do less adjustments. The >> second patch looks nicer, with >> >> static bool >> -contains_vce_or_bfcref_p (const_tree ref) >> +contains_vce_or_bfcref_p (const_tree ref, bool *type_changing_p = >NULL) > >OK > >> >> (a default for type_chagning_p) there would be even less hunks. Is >> it functionally equivalent to the first variant when you change >> the types_compatible_p back to the TYPE_MAIN_VARIANT compare? >> That is, is this the only real difference? > >I probably don't understand the question because I'd say that the >second >guality failure shows there is a difference between being careful we >just don't loose any data in padding and disabling total scalarization >altogether. > >To give a non-guality example, the following will regress with the >yesterday's variant - compared to current trunk: > > struct S {unsigned a,b;}; > > void foo (struct S *in, struct S *out) > { >struct S s; >s = *in; >*(int *)&s.b = -1; >*out = s; > } > >Currently s is totally scalarized away, but if we "restrict total >scalarization to the case where all accesses are layout compatible," it >of course isn't and s survives all the way to optimized dump, whereas >currently we eliminate it even in early SRA. > >Even the assembly output changes from: > > movl(%rdi), %eax > movl%eax, (%rsi) > movl$-1, 4(%rsi) > ret > >to a bit unexpected: > > movabsq $-4294967296, %rax > orq (%rdi), %rax > movq%rax, (%rsi) > ret > >So I guess now I am even more convinced about what I wrote yesterday, >i.e. that if we should at least allow the scalar incompatible types. >The reason why this is safe is that when a scalar access is not a leaf >of an access tree, scalarization of that portion of the aggregate fails >(and thus
[PATCH] Fix various description text issues in *.opt files (PR target/79645)
Hi! The PR complained about lack of terminating dot at the end of one option description, but I've found many others. The patch ensures that normal option descriptions start with a capital letter (unless it starts with a keyword, number, or - with tab in the middle of description, etc.) and ends with a dot, for the Enum descriptions that it starts with a capital letter and ends with a colon and I've tried to verify the Warn etc. messages don't start with a capital letter (unless it would be written that way in the middle of sentence) and don't end with a dot or period. Tested on x86_64-linux, ok for trunk? 2019-03-08 Jakub Jelinek PR target/79645 * common.opt (fdiagnostics-show-labels, fdiagnostics-show-line-numbers, fdiagnostics-format=, fdiagnostics-minimum-margin-width=, fgnat-encodings=, gas-loc-support, gas-locview-support, ginline-points, ginternal-reset-location-views): Terminate description text with a dot. * config/microblaze/microblaze.opt (mxl-prefetch): Likewise. * config/mcore/mcore.opt (m210, m340): Likewise. * config/epiphany/epiphany.opt (mprefer-short-insn-regs, mcmove, mnops=): Start description text with a capital letter. * config/arc/arc.opt (msize-level=): Likewise. * config/sh/sh.opt (minline-ic_invalidate): Likewise. * config/rs6000/sysv4.opt (mno-toc, mtoc, mno-traceback, mshlib, mnewlib): Likewise. * config/ft32/ft32.opt (msim): Likewise. (mft32b, mcompress): Likewise. Terminate description text with a dot. (mnodiv, mnopm): Terminate description text with a dot. * config/c6x/c6x.opt (c6x_sdata): Terminate Enum description with a colon. * config/i386/i386.opt (prefer_vector_width, instrument_return): Likewise. * config/rx/rx.opt (nofpu): Remove trailing spaces from description text. lto/ * lang.opt: Terminate description text with a dot. --- gcc/common.opt.jj 2019-02-25 23:57:00.971009713 +0100 +++ gcc/common.opt 2019-03-08 20:47:17.258098651 +0100 @@ -1251,11 +1251,11 @@ Show the source line with a caret indica fdiagnostics-show-labels Common Var(flag_diagnostics_show_labels) Init(1) -Show labels annotating ranges of source code when showing source +Show labels annotating ranges of source code when showing source. fdiagnostics-show-line-numbers Common Var(flag_diagnostics_show_line_numbers) Init(1) -Show line numbers in the left margin when showing source +Show line numbers in the left margin when showing source. fdiagnostics-color Common Alias(fdiagnostics-color=,always,never) @@ -1283,7 +1283,7 @@ Enum(diagnostic_color_rule) String(auto) fdiagnostics-format= Common Joined RejectNegative Enum(diagnostics_output_format) --fdiagnostics-format=[text|json] Select output format +-fdiagnostics-format=[text|json] Select output format. ; Required for these enum values. SourceInclude @@ -1312,7 +1312,7 @@ Amend appropriate diagnostic messages wi fdiagnostics-minimum-margin-width= Common Joined UInteger Var(diagnostics_minimum_margin_width) Init(6) -Set minimum width of left margin of source code when showing source +Set minimum width of left margin of source code when showing source. fdisable- Common Joined RejectNegative Var(common_deferred_options) Defer @@ -1521,7 +1521,7 @@ Enum(dwarf_gnat_encodings) String(minima fgnat-encodings= Common Enum(dwarf_gnat_encodings) Joined RejectNegative Report Undocumented Var(gnat_encodings) --fgnat-encodings=[all|gdb|minimal] Select the balance between GNAT encodings and standard DWARF emitted in the debug information +-fgnat-encodings=[all|gdb|minimal] Select the balance between GNAT encodings and standard DWARF emitted in the debug information. ; This option is not documented yet as its semantics will change. fgraphite @@ -2969,11 +2969,11 @@ Generate debug information in default fo gas-loc-support Common Driver Var(dwarf2out_as_loc_support) Init(2) -Assume assembler support for (DWARF2+) .loc directives +Assume assembler support for (DWARF2+) .loc directives. gas-locview-support Common Driver Var(dwarf2out_as_locview_support) Init(2) -Assume assembler support for view in (DWARF2+) .loc directives +Assume assembler support for view in (DWARF2+) .loc directives. gcoff Common Driver Deprecated @@ -3009,11 +3009,11 @@ Generate debug information in default ex ginline-points Common Driver Var(debug_inline_points) Init(2) -Generate extended entry point information for inlined functions +Generate extended entry point information for inlined functions. ginternal-reset-location-views Common Driver Var(debug_internal_reset_location_views) Init(2) -Compute locview reset points based on insn length estimates +Compute locview reset points based on insn length estimates. gno- RejectNegative Joined Undocumented --- gcc/config/epiphany/epiphany.opt.jj 2019-01-01 12:37:23.560869895 +0100 +++
[Patch, aarch64] PR 89628 - fix register allocation in SIMD functions
This is a patch to fix the register allocation in SIMD functions. In normal functions registers V16 to V31 are all caller saved. In SIMD functions V16 to V23 are callee saved and V24 to V31 are caller saved. This means that SIMD functions should use V24 to V31 before V16 to V23 in order to avoid some saves and restores. My fix for this is to define REG_ALLOC_ORDER. Right now it is not defined on Aarch64, so I just defined it as all the registers in order except for putting V24 to V31 before V16 to V23. This fixes the register allocation in SIMD functions. It also changes the register allocation order in regular functions but since all the registers (V16 to V31) are caller saved in that case, it doesn't matter. We could use ADJUST_REG_ALLOC_ORDER to only affect SIMD functions but I did not see any reason to do that. Tested on ThunderX2 Aarch64 with no regressions. OK to checkin? 2018-03-08 Steve Ellcey PR target/89628 * config/aarch64/aarch64.h (V16-V31): Update comment. (REG_ALLOC_ORDER): New macro. diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 7bd3bf5..d3723ff 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -328,7 +328,9 @@ extern unsigned aarch64_architecture_version; ZR zero register, encoded as X/R31 elsewhere 32 x 128-bit floating-point/vector registers - V16-V31 Caller-saved (temporary) registers + V24-V31 Caller-saved (temporary) registers + V16-V23 Caller-saved (temporary) registers in most functions; + Callee-saved in SIMD functions. V8-V15 Callee-saved registers V0-V7 Parameter/result registers @@ -431,6 +433,26 @@ extern unsigned aarch64_architecture_version; V_ALIASES(28), V_ALIASES(29), V_ALIASES(30), V_ALIASES(31) \ } +/* This is the default order for everything except V16-V23. These + are caller saved in most functions but callee saved in SIMD + functions so we want to use V24-V31 which are always + caller saved before using these registers. */ + +#define REG_ALLOC_ORDER \ + {\ + 0, 1, 2, 3,4, 5, 6, 7, /* R0 - R7 */ \ + 8, 9, 10, 11, 12, 13, 14, 15, /* R8 - R15 */ \ +16, 17, 18, 19, 20, 21, 22, 23, /* R16 - R23 */ \ +24, 25, 26, 27, 28, 29, 30, 31, /* R24 - R30, SP */ \ +32, 33, 34, 35, 36, 37, 38, 39, /* V0 - V7 */ \ +40, 41, 42, 43, 44, 45, 46, 47, /* V8 - V15 */ \ +56, 57, 58, 59, 60, 61, 62, 63, /* V24 - V31 */ \ +48, 49, 50, 51, 52, 53, 54, 55, /* V16 - V23 */ \ +64, 65, 66, 57, /* SFP, AP, CC, VG */ \ +68, 69, 70, 71, 72, 73, 74, 75, /* P0 - P7 */ \ +76, 77, 78, 79, 80, 81, 82, 83, /* P8 - P15 */ \ + } + #define EPILOGUE_USES(REGNO) (aarch64_epilogue_uses (REGNO)) /* EXIT_IGNORE_STACK should be nonzero if, when returning from a function,
Re: [PATCH] Add tests for resolved PR (PR c/85870).
On Fri, Mar 08, 2019 at 03:03:07PM +0100, Martin Liška wrote: > I'm going to install Jakub's test-case for a PR that's fixed > on trunk. This fails on i686-linux regtest when linker plugin is disabled (I use that in my setup, because my ld is 64-bit and 32-bit plugins don't really work very well with 64-bit linker). The problem is that XYZ is extern in both sources, no idea why it actually works with linker plugin. That said, dropping extern from one of the two doesn't really work, the ICE when the fix is reverted goes away. But with the following change the ICE is still in there when r268789 and r268981 is reverted (both in this i686-linux setup and normal x86_64-linux with linker plugin), and with vanilla cc1/lto1 works. So, I've committed this to trunk as obvious. 2019-03-08 Jakub Jelinek PR c/85870 * gcc.dg/lto/pr85870_0.c: Add dg-extra-ld-options with -r -nostdlib -flinker-output=nolto-rel. --- gcc/testsuite/gcc.dg/lto/pr85870_0.c.jj 2019-03-08 21:43:52.032655832 +0100 +++ gcc/testsuite/gcc.dg/lto/pr85870_0.c2019-03-08 23:25:15.150784557 +0100 @@ -1,6 +1,7 @@ /* PR c/85870 */ /* { dg-lto-do link } */ /* { dg-lto-options { { -flto -O2 } } } */ +/* { dg-extra-ld-options { -r -nostdlib -flinker-output=nolto-rel } } */ typedef struct abc_s { char a1; Jakub
libgo patch committed: Use more largefile functions
This libgo patch, based on one by Rainer Orth, uses more largefile functions. With this patch we consistently call __go_openat for openat and we use fstatat64, creat64, sendfile64, and getdents64 where needed. This is for PR 89447. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 269443) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -959260238817af3205fb9907dd92319291e6a893 +3106ec19626d75d8275be16c86421132548fa13e The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/internal/syscall/unix/at.go === --- libgo/go/internal/syscall/unix/at.go(revision 269196) +++ libgo/go/internal/syscall/unix/at.go(working copy) @@ -13,12 +13,9 @@ import ( //extern unlinkat func unlinkat(int32, *byte, int32) int32 -//extern openat +//extern __go_openat func openat(int32, *byte, int32, syscall.Mode_t) int32 -//extern fstatat -func fstatat(int32, *byte, *syscall.Stat_t, int32) int32 - func Unlinkat(dirfd int, path string, flags int) error { var p *byte p, err := syscall.BytePtrFromString(path) Index: libgo/go/internal/syscall/unix/at_largefile.go === --- libgo/go/internal/syscall/unix/at_largefile.go (nonexistent) +++ libgo/go/internal/syscall/unix/at_largefile.go (working copy) @@ -0,0 +1,14 @@ +// Copyright 2019 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +// +build aix hurd linux solaris,386 solaris,sparc + +package unix + +import ( + "syscall" +) + +//extern fstatat64 +func fstatat(int32, *byte, *syscall.Stat_t, int32) int32 Index: libgo/go/internal/syscall/unix/at_regfile.go === --- libgo/go/internal/syscall/unix/at_regfile.go(nonexistent) +++ libgo/go/internal/syscall/unix/at_regfile.go(working copy) @@ -0,0 +1,18 @@ +// Copyright 2019 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +// +build !aix +// +build !hurd +// +build !linux +// +build !solaris !386 +// +build !solaris !sparc + +package unix + +import ( + "syscall" +) + +//extern fstatat +func fstatat(int32, *byte, *syscall.Stat_t, int32) int32 Index: libgo/go/syscall/libcall_bsd.go === --- libgo/go/syscall/libcall_bsd.go (revision 269196) +++ libgo/go/syscall/libcall_bsd.go (working copy) @@ -13,8 +13,6 @@ import ( "unsafe" ) -//sys sendfile(outfd int, infd int, offset *Offset_t, count int) (written int, err error) -//sendfile(outfd _C_int, infd _C_int, offset *Offset_t, count Size_t) Ssize_t func Sendfile(outfd int, infd int, offset *int64, count int) (written int, err error) { if race.Enabled { race.ReleaseMerge(unsafe.Pointer(&ioSync)) Index: libgo/go/syscall/libcall_bsd_largefile.go === --- libgo/go/syscall/libcall_bsd_largefile.go (nonexistent) +++ libgo/go/syscall/libcall_bsd_largefile.go (working copy) @@ -0,0 +1,10 @@ +// Copyright 2019 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +// +build solaris,386 solaris,sparc + +package syscall + +//sys sendfile(outfd int, infd int, offset *Offset_t, count int) (written int, err error) +//sendfile64(outfd _C_int, infd _C_int, offset *Offset_t, count Size_t) Ssize_t Index: libgo/go/syscall/libcall_bsd_regfile.go === --- libgo/go/syscall/libcall_bsd_regfile.go (nonexistent) +++ libgo/go/syscall/libcall_bsd_regfile.go (working copy) @@ -0,0 +1,10 @@ +// Copyright 2019 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +// +build darwin dragonfly freebsd netbsd openbsd solaris,amd64 solaris,sparc64 + +package syscall + +//sys sendfile(outfd int, infd int, offset *Offset_t, count int) (written int, err error) +//sendfile(outfd _C_int, infd _C_int, offset *Offset_t, count Size_t) Ssize_t Index: libgo/go/syscall/libcall_posix.go === --- libgo/go/syscall/libcall_posix.go (revision 269196) +++ libgo/go/syscall/libcall_posix.go (working copy) @@ -184,9 +184,6 @@ func FDZero(set *FdSet) { //sys Close(fd int) (err error) //close(fd _C_int) _C_int -//sys Creat(path string, mode ui
Re: [PATCH] Fix -gdwarf-5 -gsplit-dwarf ICEs (PR debug/89498)
On Mar 6, 2019, Jakub Jelinek wrote: > On Tue, Mar 05, 2019 at 04:06:59PM -0500, Jason Merrill wrote: >> > Assuming output_view_list_offset is correct, the following patch adjusts >> > size_of_die/value_format accordingly. >> >> I would guess that omitting the handling from output_view_list_offset was an >> oversight in the view work. Alex, which is right? :) I guess both are :-) I don't recall having dealt with this particular combination of flags. Indeed, I'm pretty sure I wrote the bulk of the locview output support before DWARFv5 support was there, and AFAICT I failed to pick up the more compact DWARFv5 representation as I updated the patchset, proposed the locview extension for DWARFv6 in line with DWARFv5 locviews, and implemented it as the incompatv5 extension for v5 loclists with locviews that Jakub noticed. > In order to use DW_FORM_loclistx .uleb128 for the DW_AT_GNU_locviews, > we'd need to emit .long .LVUS0-.Ldebug_loc0 etc. next to the > .LLST0-.Ldebug_loc0 in the section offsets table at .Ldebug_loc0, > but I'd expect debug info consumers to assume that anything listed > in the offset table is actually a valid DW_LLE_* sequence, which is not the > case for the .LVUS* labels. *nod* > Note the > .long .LVUS0-.Ldebug_loc0 # DW_AT_GNU_locviews > looks weird, at least with the patch it is given DW_FORM_sec_offset, but > because it is a difference between .LVUS0-.Ldebug_loc0, it is actually not a > sec offset. Shall it use a different form in that case? Ugh. That's more fallout from the concurrent implementations of v5 and LVU, ISTM. My bad for not catching, no doubt, since LVU landed later; I don't mean to excuse it, just to explain how it came about. I'm afraid I hadn't thought of it to the point of saying what would be correct, let alone more desirable. We're in non-strict dwarf territory, trying to extend something that was not designed to be extended. Unfortunately, the extensions designed for v4 do not work for v5, and the extensions proposed for v6 don't either, so we're in a bit of a limbo, so I guess anything that makes sense and doesn't break unsuspecting consumers will do. I'm thinking the most reasonable way forward in this non-standard minefield is to document that the section base address used for DW_AT_GNU_locviews is the same base address used for the corresponding loclistx table. That would make it trivially correct for v4-, and not-so-trivially correct, but mostly sensible IMHO, for v5. I'd argue there are even backward-compatibility reasons to document it that way, since it is not hard to figure out that this was (by inaction) the intent in the, erhm, reference implementation, even though, as you noticed, it didn't work for split v5. Of course, since it was broken, it wouldn't be unreasonable to disregard backward compatibility altogether and go for something entirely different; I don't know of any other implementations that might have been based on this accidental implementation detail, and I'd assume that if anyone even noticed the problem while collecting data from the reference implementation, we'd have got a bug report about it shortly thereafter. Another possibility of encoding that comes to mind would be to output an offset between the LLST and the LVUS labels as DW_AT_GNU_locviews, but since it's not a compile-time constant, we won't want to make it uleb128. It would still be resolved to a constant and avoid relocations, as preferred by split-dwarf, but it wouldn't save space or relocations WRT what we'd get after your corrective patch. Yet another, more far-out possibilities, are to introduce a list of locview offsets right after the end of the loclist offsets, or even a separate section with a separate section header for them, so that we could use uleb128 compiler-computed indices to reference locview extensions to loclists. That would be a significant amount of work for little to no benefit AFAICT, and only on the short term. In the long term, I expect/hope some way to represent locviews to be adopted in standard DWARF. -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain EngineerFree Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe
Re: [PATCH] Fix -gdwarf-5 -gsplit-dwarf ICEs (PR debug/89498)
On Sat, Mar 09, 2019 at 03:12:28AM -0300, Alexandre Oliva wrote: > Ugh. That's more fallout from the concurrent implementations of v5 and > LVU, ISTM. My bad for not catching, no doubt, since LVU landed later; I > don't mean to excuse it, just to explain how it came about. > > > I'm afraid I hadn't thought of it to the point of saying what would be > correct, let alone more desirable. We're in non-strict dwarf territory, > trying to extend something that was not designed to be extended. > Unfortunately, the extensions designed for v4 do not work for v5, and > the extensions proposed for v6 don't either, so we're in a bit of a > limbo, so I guess anything that makes sense and doesn't break > unsuspecting consumers will do. > > > I'm thinking the most reasonable way forward in this non-standard > minefield is to document that the section base address used for > DW_AT_GNU_locviews is the same base address used for the corresponding > loclistx table. That would make it trivially correct for v4-, and > not-so-trivially correct, but mostly sensible IMHO, for v5. > > I'd argue there are even backward-compatibility reasons to document it > that way, since it is not hard to figure out that this was (by inaction) > the intent in the, erhm, reference implementation, even though, as you > noticed, it didn't work for split v5. Of course, since it was broken, > it wouldn't be unreasonable to disregard backward compatibility > altogether and go for something entirely different; I don't know of any > other implementations that might have been based on this accidental > implementation detail, and I'd assume that if anyone even noticed the > problem while collecting data from the reference implementation, we'd > have got a bug report about it shortly thereafter. > > > Another possibility of encoding that comes to mind would be to output an > offset between the LLST and the LVUS labels as DW_AT_GNU_locviews, but > since it's not a compile-time constant, we won't want to make it > uleb128. It would still be resolved to a constant and avoid > relocations, as preferred by split-dwarf, but it wouldn't save space or > relocations WRT what we'd get after your corrective patch. > > Yet another, more far-out possibilities, are to introduce a list of > locview offsets right after the end of the loclist offsets, or even a > separate section with a separate section header for them, so that we > could use uleb128 compiler-computed indices to reference locview > extensions to loclists. That would be a significant amount of work for > little to no benefit AFAICT, and only on the short term. In the long > term, I expect/hope some way to represent locviews to be adopted in > standard DWARF. Whatever we choose, IMHO: 1) we can't introduce new DW_FORM_* 2) we must avoid changing anything on how it was represented for the non-split case, we've already shipped GCC 8.[123] with it 3) the consumer must be able to figure out easily according to clear rules how to interpret the attribute and find the views. So, either it needs to use different form from what is used normally if it needs to be interpreted differently, or it needs to use a different attribute, and from that and possibly version number in .debug_info header it needs to figure out if it should use absolute or relative offset to .debug_loc, or absolute or relative offset to .debug_loclist, in split or non-split case etc. We need to look at what we emit with -gdwarf-{2,3,4,5} {,-gsplit-dwarf}. If we added offsets to LVU entries to the table at the start of .debug_loclists, they shouldn't be counted towards the limit and guess some locviews unaware consumers could be upset, because the uleb128s would be above that limit. If we include them in the limit, then even more consumers would be upset because they don't find valid DW_LLE_* at those spots when they just try to dump the .debug_loclists section without accessing .debug_info. Jakub
[PATCH] Fix RTL thread_jump for doloop branches (PR rtl-optimization/89634)
Hi! The following testcase used to be miscompiled by RTL jump threading on s390x-linux with -march=zEC12 at -O2 before r269302 which made it latent. The problem is we had: (jump_insn 26 25 87 4 (parallel [ (set (pc) (if_then_else (eq (reg/v:DI 5 %r5 [orig:74 e ] [74]) (const_int 1 [0x1])) (label_ref 43) (pc))) (clobber (reg:CC 33 %cc)) ]) "rh1686696.c":16:7 1263 {*cmp_and_br_signed_di} (int_list:REG_BR_PROB 118111604 (nil)) -> 43) as the sole non-note non-code_label insn in basic block 4, and (jump_insn 68 67 134 12 (parallel [ (set (pc) (if_then_else (ne (reg/v:DI 5 %r5 [orig:74 e ] [74]) (const_int 1 [0x1])) (label_ref:DI 23) (pc))) (set (reg/v:DI 5 %r5 [orig:74 e ] [74]) (plus:DI (reg/v:DI 5 %r5 [orig:74 e ] [74]) (const_int -1 [0x]))) (clobber (scratch:DI)) (clobber (reg:CC 33 %cc)) ]) "rh1686696.c":13:3 1922 {doloop_di} (int_list:REG_BR_PROB 904381916 (nil)) -> 23) as the sole non-note non-code_label insn in basic block 12, the doloop conditional jump branching to the bb with the above jump_insn 26. thread_jump sees one condition being %r5 == 1 and the other %r5 != 1, performs the cselib assisted checks if the b sequence is really a noop (but that starts with the original complete bb as a base and only does nonequal processing in the e->src bb) and decided to change insn 68 to jump after the jump_insn 26 because the comparison of %r5 against 1 has been done already. It didn't take into account that %r5 has been modified in the doloop_di insn though, so the conditions are comparing different values. I have thought about different approaches, below is the simplest one, just punt if modified_in_p. Another possibility would be to handle the BB_END (e->src) insn in the second rather than first loop and do the nonequal handling in there, but I couldn't figure out what kind of insns it would handle better than this more simple approach. It wouldn't handle hypothetical condjump instruction which compares one register and modifies another one, as it would add that register into nonequal and unless that register has been unconditionally set to something else, would keep it in nonequal and bail out. Bootstrapped/regtested on {x86_64,i686}-linux (where it didn't check much because those targets don't have doloops) and on r269253 trunk (i.e. before the r269302 change) on {powerpc64le,s390x}-linux (which both have doloop insns). Ok for trunk? 2019-03-09 Jakub Jelinek PR rtl-optimization/89634 * cfgcleanup.c (thread_jump): Punt if registers mentioned in cond1 are modified in BB_END (e->src) instruction. * gcc.c-torture/execute/pr89634.c: New test. --- gcc/cfgcleanup.c.jj 2019-01-22 10:11:59.652429658 +0100 +++ gcc/cfgcleanup.c2019-03-08 17:47:11.997954352 +0100 @@ -308,6 +308,11 @@ thread_jump (edge e, basic_block b) || !rtx_equal_p (XEXP (cond1, 1), XEXP (cond2, 1))) return NULL; + /* Punt if BB_END (e->src) is doloop-like conditional jump that modifies + the registers used in cond1. */ + if (modified_in_p (cond1, BB_END (e->src))) +return NULL; + /* Short circuit cases where block B contains some side effects, as we can't safely bypass it. */ for (insn = NEXT_INSN (BB_HEAD (b)); insn != NEXT_INSN (BB_END (b)); --- gcc/testsuite/gcc.c-torture/execute/pr89634.c.jj2019-03-08 17:48:08.161045920 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr89634.c 2019-03-08 17:47:52.151304876 +0100 @@ -0,0 +1,40 @@ +/* PR rtl-optimization/89634 */ + +static unsigned long * +foo (unsigned long *x) +{ + return x + (1 + *x); +} + +__attribute__((noipa)) unsigned long +bar (unsigned long *x) +{ + unsigned long c, d = 1, e, *f, g, h = 0, i; + for (e = *x - 1; e > 0; e--) +{ + f = foo (x + 1); + for (i = 1; i < e; i++) + f = foo (f); + c = *f; + if (c == 2) + d *= 2; + else + { + i = (c - 1) / 2 - 1; + g = (2 * i + 1) * (d + 1) + (2 * d + 1); + if (g > h) + h = g; + d *= c; + } +} + return h; +} + +int +main () +{ + unsigned long a[18] = { 4, 2, -200, 200, 2, -400, 400, 3, -600, 0, 600, 5, -100, -66, 0, 66, 100, __LONG_MAX__ / 8 + 1 }; + if (bar (a) != 17) +__builtin_abort (); + return 0; +} Jakub